blob: 61919dcf2dd784a170480b77b1be4551fabdc55c [file] [log] [blame] [view]
Jon Brandveina2e50842017-02-23 20:40:21 +00001---
2layout: documentation
3title: Depsets
4---
5
6# Depsets
7
laurentlb377fbd02017-11-29 09:43:34 -08008[Depsets](lib/depset.html) are a specialized data structure for efficiently
9collecting data across a targets transitive dependencies. Since this use case
10concerns the [analysis phase](concepts.md#evaluation-model), depsets are useful
11for authors of rules and aspects, but probably not macros.
Jon Brandveina2e50842017-02-23 20:40:21 +000012
13The main feature of depsets is that they support a time- and space-efficient
14merge operation, whose cost is independent of the size of the existing contents.
15Depsets also have well-defined ordering semantics.
16
17Example uses of depsets include:
18
19* storing the paths of all object files for a programs libraries, which can
20 then be passed to a linker action
21
22* for an interpreted language, storing the transitive source files that will
23 be included in an executable's runfiles
24
laurentlb377fbd02017-11-29 09:43:34 -080025If you don't need the merge operation, consider using another type, such as
26[list](lib/list.html) or [dict](lib/dict.html).
27
dmartingf8480fa2017-09-15 13:15:43 +020028<!-- [TOC] -->
Jon Brandveina2e50842017-02-23 20:40:21 +000029
30## Full example
31
Dmitry Lomov9ee40732017-11-24 03:48:44 -080032
laurentlb0e8af492018-01-25 09:06:08 -080033This example is avalable at
34[https://github.com/bazelbuild/examples/tree/master/rules/depsets](https://github.com/bazelbuild/examples/tree/master/rules/depsets).
Dmitry Lomov9ee40732017-11-24 03:48:44 -080035
Jon Brandveina2e50842017-02-23 20:40:21 +000036Suppose we have a hypothetical interpreted language Foo. In order to build each
laurentlb377fbd02017-11-29 09:43:34 -080037`foo_binary` we need to know all the `*.foo` files that it directly or indirectly
Jon Brandveina2e50842017-02-23 20:40:21 +000038depends on.
39
40```python
Dmitry Lomov9ee40732017-11-24 03:48:44 -080041# //depsets:BUILD
Jon Brandveina2e50842017-02-23 20:40:21 +000042
43load(":foo.bzl", "foo_library", "foo_binary")
44
45# Our hypothetical Foo compiler.
46py_binary(
47 name = "foocc",
48 srcs = ["foocc.py"],
49)
50
51foo_library(
52 name = "a",
53 srcs = ["a.foo", "a_impl.foo"],
54)
55
56foo_library(
57 name = "b",
58 srcs = ["b.foo", "b_impl.foo"],
59 deps = [":a"],
60)
61
62foo_library(
63 name = "c",
64 srcs = ["c.foo", "c_impl.foo"],
65 deps = [":a"],
66)
67
68foo_binary(
69 name = "d",
70 srcs = ["d.foo"],
71 deps = [":b", ":c"],
72)
73```
74
75```python
Dmitry Lomov9ee40732017-11-24 03:48:44 -080076# //depsets:foocc.py
Jon Brandveina2e50842017-02-23 20:40:21 +000077
78# "Foo compiler" that just concatenates its inputs to form its output.
79import sys
80
81if __name__ == "__main__":
82 assert len(sys.argv) >= 1
83 output = open(sys.argv[1], "wt")
84 for path in sys.argv[2:]:
85 input = open(path, "rt")
86 output.write(input.read())
87```
88
laurentlb377fbd02017-11-29 09:43:34 -080089Here, the transitive sources of the binary `d` are all of the `*.foo` files in
Jon Brandveina2e50842017-02-23 20:40:21 +000090the `srcs` fields of `a`, `b`, `c`, and `d`. In order for the `foo_binary`
91target to know about any file besides `d.foo`, the `foo_library` targets need to
92pass them along in a provider. Each library receives the providers from its own
93dependencies, adds its own immediate sources, and passes on a new provider with
94the augmented contents. The `foo_binary` rule does the same, except that instead
95of returning a provider, it uses the complete list of sources to construct a
96command line for an action.
97
98Heres a complete implementation of the `foo_library` and `foo_binary` rules.
99
100```python
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800101# //depsets/foo.bzl
Jon Brandveina2e50842017-02-23 20:40:21 +0000102
103# A provider with one field, transitive_sources.
laurentlb0e8af492018-01-25 09:06:08 -0800104FooFiles = provider(fields = ["transitive_sources"])
Jon Brandveina2e50842017-02-23 20:40:21 +0000105
106def get_transitive_srcs(srcs, deps):
107 """Obtain the source files for a target and its transitive dependencies.
108
109 Args:
110 srcs: a list of source files
111 deps: a list of targets that are direct dependencies
112 Returns:
113 a collection of the transitive sources
114 """
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800115 return depset(
116 srcs,
117 transitive = [dep[FooFiles].transitive_sources for dep in deps])
Jon Brandveina2e50842017-02-23 20:40:21 +0000118
119def _foo_library_impl(ctx):
120 trans_srcs = get_transitive_srcs(ctx.files.srcs, ctx.attr.deps)
121 return [FooFiles(transitive_sources=trans_srcs)]
122
123foo_library = rule(
124 implementation = _foo_library_impl,
125 attrs = {
126 "srcs": attr.label_list(allow_files=True),
127 "deps": attr.label_list(),
128 },
129)
130
131def _foo_binary_impl(ctx):
132 foocc = ctx.executable._foocc
133 out = ctx.outputs.out
134 trans_srcs = get_transitive_srcs(ctx.files.srcs, ctx.attr.deps)
135 srcs_list = trans_srcs.to_list()
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800136 ctx.actions.run(executable = foocc,
137 arguments = [out.path] + [src.path for src in srcs_list],
138 inputs = srcs_list + [foocc],
139 outputs = [out])
Jon Brandveina2e50842017-02-23 20:40:21 +0000140
141foo_binary = rule(
142 implementation = _foo_binary_impl,
143 attrs = {
144 "srcs": attr.label_list(allow_files=True),
145 "deps": attr.label_list(),
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800146 "_foocc": attr.label(default=Label("//depsets:foocc"),
Jon Brandveina2e50842017-02-23 20:40:21 +0000147 allow_files=True, executable=True, cfg="host")
148 },
149 outputs = {"out": "%{name}.out"},
150)
151```
152
153You can test this by copying these files into a fresh package, renaming the
laurentlb377fbd02017-11-29 09:43:34 -0800154labels appropriately, creating the source `*.foo` files with dummy content, and
Jon Brandveina2e50842017-02-23 20:40:21 +0000155building the `d` target.
156
157## Description and operations
158
159Conceptually, a depset is a directed acyclic graph (DAG) that typically looks
160similar to the target graph. It is constructed from the leaves up to the root.
161Each target in a dependency chain can add its own contents on top of the
162previous without having to read or copy them.
163
164Each node in the DAG holds a list of direct elements and a list of child nodes.
165The contents of the depset are the transitive elements, i.e. the direct elements
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800166of all the nodes. A new depset can be created using the
167[depset](lib/globals.html#depset) constructor: it accepts a list of direct
168elemens and another list of child nodes.
Jon Brandvein25aa0332017-02-24 16:25:15 +0000169
170```python
171s = depset(["a", "b", "c"])
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800172t = depset(["d", "e"], transitive = [s])
Jon Brandvein25aa0332017-02-24 16:25:15 +0000173
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800174print(s) # depset(["a", "b", "c"])
175print(t) # depset(["d", "e", "a", "b", "c"])
Jon Brandvein25aa0332017-02-24 16:25:15 +0000176```
Jon Brandveina2e50842017-02-23 20:40:21 +0000177
178To retrieve the contents of a depset, use the
179[to_list()](lib/depset.html#to_list) method. It returns a list of all transitive
180elements, not including duplicates. There is no way to directly inspect the
181precise structure of the DAG, although this structure does affect the order in
182which the elements are returned.
183
Jon Brandvein25aa0332017-02-24 16:25:15 +0000184```python
185s = depset(["a", "b", "c"])
186
187print("c" in s.to_list()) # True
188print(s.to_list() == ["a", "b", "c"]) # True
189```
190
191The allowed items in a depset are restricted, just as the allowed keys in
192dictionaries are restricted. In particular, depset contents may not be mutable.
193
194Depsets use reference equality: a depset is equal to itself, but unequal to any
195other depset, even if they have the same contents and same internal structure.
196
197```python
198s = depset(["a", "b", "c"])
199t = s
200print(s == t) # True
201
202t = depset(["a", "b", "c"])
203print(s == t) # False
204
Jon Brandvein25aa0332017-02-24 16:25:15 +0000205d = {}
206d[s] = None
207d[t] = None
208print(len(d)) # 2
209```
210
211To compare depsets by their contents, convert them to sorted lists.
212
213```python
214s = depset(["a", "b", "c"])
215t = depset(["c", "b", "a"])
216print(sorted(s.to_list()) == sorted(t.to_list())) # True
217```
218
219There is no ability to remove elements from a depset. If this is needed, you
220must read out the entire contents of the depset, filter the elements you want to
221remove, and reconstruct a new depset. This is not particularly efficient.
222
223```python
224s = depset(["a", "b", "c"])
225t = depset(["b", "c"])
226
227# Compute set difference s - t. Precompute t.to_list() so it's not done
228# in a loop, and convert it to a dictionary for fast membership tests.
229t_items = {e: None for e in t.to_list()}
230diff_items = [x for x in s.to_list() if x not in t_items]
231# Convert back to depset if it's still going to be used for merge operations.
232s = depset(diff_items)
233print(s) # depset(["a"])
234```
235
Jon Brandveina2e50842017-02-23 20:40:21 +0000236## Order
237
238The `to_list` operation performs a traversal over the DAG. The kind of traversal
239depends on the *order* that was specified at the time the depset was
240constructed. It is useful for Bazel to support multiple orders because sometimes
241tools care about the order of their inputs. For example, a linker action may
242need to ensure that if `B` depends on `A`, then `A.o` comes before `B.o` on the
243linkers command line. Other tools might have the opposite requirement.
244
245Three traversal orders are supported: `postorder`, `preorder`, and
246`topological`. The first two work exactly like [tree
247traversals](https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search)
248except that they operate on DAGs and skip already visited nodes. The third order
249works as a topological sort from root to leaves, essentially the same as
250preorder except that shared children are listed only after all of their parents.
Jon Brandvein25aa0332017-02-24 16:25:15 +0000251Preorder and postorder operate as left-to-right traversals, but note that within
252each node direct elements have no order relative to children. For topological
Jon Brandveina2e50842017-02-23 20:40:21 +0000253order, there is no left-to-right guarantee, and even the
254all-parents-before-child guarantee does not apply in the case that there are
255duplicate elements in different nodes of the DAG.
256
Jon Brandvein25aa0332017-02-24 16:25:15 +0000257```python
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800258# This demonstrates different traversal orders.
Jon Brandvein25aa0332017-02-24 16:25:15 +0000259
260def create(order):
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800261 cd = depset(["c", "d"], order = order)
262 gh = depset(["g", "h"], order = order)
263 return depset(["a", "b", "e", "f"], transitive = [cd, gh])
Jon Brandvein25aa0332017-02-24 16:25:15 +0000264
265print(create("postorder").to_list()) # ["c", "d", "g", "h", "a", "b", "e", "f"]
266print(create("preorder").to_list()) # ["a", "b", "e", "f", "c", "d", "g", "h"]
267```
268
269```python
270# This demonstrates different orders on a diamond graph.
271
272def create(order):
273 a = depset(["a"], order=order)
Dmitry Lomov9ee40732017-11-24 03:48:44 -0800274 b = depset(["b"], transitive = [a], order = order)
275 c = depset(["c"], transitive = [a], order = order)
276 d = depset(["d"], transtive = [b, c], order = order)
Jon Brandvein25aa0332017-02-24 16:25:15 +0000277 return d
278
279print(create("postorder").to_list()) # ["a", "b", "c", "d"]
280print(create("preorder").to_list()) # ["d", "b", "a", "c"]
281print(create("topological").to_list()) # ["d", "b", "c", "a"]
282```
283
Jon Brandveina2e50842017-02-23 20:40:21 +0000284Due to how traversals are implemented, the order must be specified at the time
285the depset is created with the constructors `order` keyword argument. If this
286argument is omitted, the depset has the special `default` order, in which case
brandjoncb68a962018-06-25 09:34:55 -0700287there are no guarantees about the order of any of its elements (except that it
288is deterministic).
Jon Brandvein25aa0332017-02-24 16:25:15 +0000289
290For safety, depsets with different orders cannot be merged with the `+` operator
291unless one of them uses the default order; the resulting depsets order is the
292same as the left operand. Note that when two depsets of different order are
293merged in this way, the child may appear to have had its elements rearranged
Googlera1a1e612018-01-22 06:06:00 -0800294when it is traversed via the parent. **The `+` operator is deprecated, anyway;
295use the `transitive` argument instead.**
Jon Brandveina2e50842017-02-23 20:40:21 +0000296
297## Performance
298
299To see the motivation for using depsets, consider what would have happened if we
300had implemented `get_transitive_srcs()` without them. A naive way of writing
301this function would be to collect the sources in a list.
302
303```python
304def get_transitive_srcs(srcs, deps):
305 trans_srcs = []
306 for dep in deps:
307 trans_srcs += dep[FooFiles].transitive_sources
308 trans_srcs += srcs
309 return trans_srcs
310```
311
312However, this does not take into account duplicates, so the source files for `a`
313will appear twice on the command line and twice in the contents of the output
314file.
315
316The next alternative is using a general set, which can be simulated by a
Googler9594d5e2018-06-21 10:58:15 -0700317dictionary where the keys are the elements and all the keys map to `True`.
Jon Brandveina2e50842017-02-23 20:40:21 +0000318
319```python
320def get_transitive_srcs(srcs, deps):
321 trans_srcs = {}
322 for dep in deps:
323 for file in dep[FooFiles].transitive_sources:
Googler9594d5e2018-06-21 10:58:15 -0700324 trans_srcs[file] = True
Jon Brandveina2e50842017-02-23 20:40:21 +0000325 for file in srcs:
Googler9594d5e2018-06-21 10:58:15 -0700326 trans_srcs[file] = True
Jon Brandveina2e50842017-02-23 20:40:21 +0000327 return trans_srcs
328```
329
330This gets rid of the duplicates, but it makes the order of the command line
331arguments (and therefore the contents of the files) unspecified, although still
332deterministic.
333
Jon Brandvein49fc3052017-03-07 15:50:31 +0000334Moreover, both this approach and the list-based one are asymptotically worse
335than the depset-based approach. Consider the case where there is a long chain of
Jon Brandveina2e50842017-02-23 20:40:21 +0000336dependencies on Foo libraries. Processing every rule requires copying all of the
337transitive sources that came before it into a new data structure. This means
338that the time and space cost for analyzing an individual library or binary
339target is proportional to its own height in the chain. For a chain of length n,
340foolib_1 foolib_2 foolib_n, the overall cost is effectively the
341[triangle sum](https://en.wikipedia.org/wiki/Triangular_number) 1 + 2 + … + n,
342which is O(n^2). This cost is wasteful because the library rules behavior is
343not actually affected by the transitive sources.
344
345Generally speaking, depsets should be used whenever you are accumulating more
346and more information through your transitive dependencies. This helps ensure
347that your build scales well as your target graph grows deeper. The exact
348advantage will depend on how deep the target graph is and how many elements per
349target are added.
350
351To actually get the performance advantage, its important to not retrieve the
352contents of the depset unnecessarily in library rules. One call to `to_list()`
353at the end in a binary rule is fine, since the overall cost is just O(n). Its
354when many non-terminal targets try to call `to_list()` that we start to get into
355quadratic behavior.
356
357## Upcoming changes
358
359The API for depsets is being updated to be more consistent. Here are some recent
360and/or upcoming changes.
361
brandjonbe02b512018-08-28 08:15:18 -0700362* When it's necessary to retrieve a depset's contents, this should be done by
363 explicitly converting the depset to a list via its `to_list()` method. Do
364 not iterate directly over the depset itself; direct iteration is deprecated
365 and will be removed. For example, don't use `list(...)`, `sorted(...)`, or
366 other functions expecting an iterable, on depsets. The rationale of this
367 change is that iterating over depsets is generally expensive, and expensive
368 operations should be made obvious in code.
Jon Brandveina2e50842017-02-23 20:40:21 +0000369
Jon Brandvein25aa0332017-02-24 16:25:15 +0000370* Depset elements currently must have the same type, e.g. all ints or all
371 strings. This restriction will be lifted.
372
Googlera1a1e612018-01-22 06:06:00 -0800373* A merge operation should be done by using the `transitive` argument in the
374 depset constructor. All other methods (`|` and `+` operators, and the
375 `union` method) are deprecated and will be going away.