blob: d77cbf434f66a9d88b7a3f2f9b885f035abd0f64 [file] [log] [blame] [view]
laurentlb4f2991c52020-08-12 11:37:32 -07001# The Bazel Code Base
2
3This document is a description of the code base and how Bazel is structured. It
4is intended for people willing to contribute to Bazel, not for end-users.
5
6## Introduction
7
8The code base of Bazel is large (~350KLOC production code and ~260 KLOC test
9code) and no one is familiar with the whole landscape: everyone knows their
10particular valley very well, but few know what lies over the hills in every
11direction.
12
13In order for people midway upon the journey not to find themselves within a
14forest dark with the straightforward pathway being lost, this document tries to
15give an overview of the code base so that it's easier to get started with
16working on it.
17
18The public version of the source code of Bazel lives on GitHub at
19http://github.com/bazelbuild/bazel . This is not the “source of truth”; it’s
20derived from a Google-internal source tree that contains additional
21functionality that is not useful outside Google. The long term goal is to make
22GitHub the source of truth.
23
24Contributions are accepted through the regular GitHub pull request mechanism,
25and manually imported by a Googler into the internal source tree, then
26re-exported back out to GitHub.
27
28## Client/server architecture
29
30The bulk of Bazel resides in a server process that stays in RAM between builds.
31This allows Bazel to maintain state between builds.
32
33This is why the Bazel command line has two kinds of options: startup and
34command. In a command line like this:
35
36```
37 bazel --host_jvm_args=-Xmx8G build -c opt //foo:bar
38```
39
40Some options (`--host_jvm_args=`) are before the name of the command to be run
41and some are after (`-c opt`); the former kind is called a "startup option" and
42affects the server process as a whole, whereas the latter kind, the "command
43option", only affects a single command.
44
45Each server instance has a single associated source tree ("workspace") and each
46workspace usually has a single active server instance. This can be circumvented
47by specifying a custom output base (see the "Directory layout" section for more
48information).
49
50Bazel is distributed as a single ELF executable that is also a valid .zip file.
51When you type `bazel`, the above ELF executable implemented in C++ (the
52"client") gets control. It sets up an appropriate server process using the
53following steps:
54
551. Checks whether it has already extracted itself. If not, it does that. This
56 is where the implementation of the server comes from.
572. Checks whether there is an active server instance that works: it is running,
58 it has the right startup options and uses the right workspace directory. It
59 finds the running server by looking at the directory `$OUTPUT_BASE/server`
60 where there is a lock file with the port the server is listening on.
613. If needed, kills the old server process
624. If needed, starts up a new server process
63
64After a suitable server process is ready, the command that needs to be run is
65communicated to it over a gRPC interface, then the output of Bazel is piped back
66to the terminal. Only one command can be running at the same time. This is
67implemented using an elaborate locking mechanism with parts in C++ and parts in
68Java. There is some infrastructure for running multiple commands in parallel,
69since the inability to run e.g. `bazel version` in parallel with another command
jingwenf8b2d3b2020-10-02 06:35:24 -070070is somewhat embarrassing. The main blocker is the life cycle of `BlazeModule`s
71and some state in `BlazeRuntime`.
laurentlb4f2991c52020-08-12 11:37:32 -070072
73At the end of a command, the Bazel server transmits the exit code the client
74should return. An interesting wrinkle is the implementation of `bazel run`: the
75job of this command is to run something Bazel just built, but it can't do that
76from the server process because it doesn't have a terminal. So instead it tells
77the client what binary it should exec() and with what arguments.
78
79When one presses Ctrl-C, the client translates it to a Cancel call on the gRPC
80connection, which tries to terminate the command as soon as possible. After the
81third Ctrl-C, the client sends a SIGKILL to the server instead.
82
83The source code of the client is under `src/main/cpp` and the protocol used to
84communicate with the server is in `src/main/protobuf/command_server.proto` .
85
jingwenf8b2d3b2020-10-02 06:35:24 -070086The main entry point of the server is `BlazeRuntime.main()` and the gRPC calls
laurentlb4f2991c52020-08-12 11:37:32 -070087from the client are handled by `GrpcServerImpl.run()`.
88
89## Directory layout
90
91Bazel creates a somewhat complicated set of directories during a build. A full
92description is available
fwead37a372022-03-08 03:27:15 -080093[here](https://bazel.build/docs/output_directories).
laurentlb4f2991c52020-08-12 11:37:32 -070094
95The "workspace" is the source tree Bazel is run in. It usually corresponds to
96something you checked out from source control.
97
98Bazel puts all of its data under the "output user root". This is usually
99`$HOME/.cache/bazel/_bazel_${USER}`, but can be overridden using the
100`--output_user_root` startup option.
101
102The "install base" is where Bazel is extracted to. This is done automatically
103and each Bazel version gets a subdirectory based on its checksum under the
104install base. It's at `$OUTPUT_USER_ROOT/install` by default and can be changed
105using the `--install_base` command line option.
106
107The "output base" is the place where the Bazel instance attached to a specific
108workspace writes to. Each output base has at most one Bazel server instance
109running at any time. It's usually at `$OUTPUT_USER_ROOT/<checksum of the path
110to the workspace>`. It can be changed using the `--output_base` startup option,
111which is, among other things, useful for getting around the limitation that only
112one Bazel instance can be running in any workspace at any given time.
113
114The output directory contains, among other things:
115
jingwenf8b2d3b2020-10-02 06:35:24 -0700116* The fetched external repositories at `$OUTPUT_BASE/external`.
laurentlb4f2991c52020-08-12 11:37:32 -0700117* The exec root, i.e. a directory that contains symlinks to all the source
118 code for the current build. It's located at `$OUTPUT_BASE/execroot`. During
119 the build, the working directory is `$EXECROOT/<name of main
120 repository>`. We are planning to change this to `$EXECROOT`, although it's a
121 long term plan because it's a very incompatible change.
122* Files built during the build.
123
124## The process of executing a command
125
126Once the Bazel server gets control and is informed about a command it needs to
127execute, the following sequence of events happens:
128
jingwenf8b2d3b2020-10-02 06:35:24 -07001291. `BlazeCommandDispatcher` is informed about the new request. It decides
laurentlb4f2991c52020-08-12 11:37:32 -0700130 whether the command needs a workspace to run in (almost every command except
131 for ones that don't have anything to do with source code, e.g. version or
132 help) and whether another command is running.
133
1342. The right command is found. Each command must implement the interface
jingwenf8b2d3b2020-10-02 06:35:24 -0700135 `BlazeCommand` and must have the `@Command` annotation (this is a bit of an
laurentlb4f2991c52020-08-12 11:37:32 -0700136 antipattern, it would be nice if all the metadata a command needs was
jingwenf8b2d3b2020-10-02 06:35:24 -0700137 described by methods on `BlazeCommand`)
laurentlb4f2991c52020-08-12 11:37:32 -0700138
1393. The command line options are parsed. Each command has different command line
140 options, which are described in the `@Command` annotation.
141
1424. An event bus is created. The event bus is a stream for events that happen
143 during the build. Some of these are exported to outside of Bazel under the
144 aegis of the Build Event Protocol in order to tell the world how the build
145 goes.
146
1475. The command gets control. The most interesting commands are those that run a
148 build: build, test, run, coverage and so on: this functionality is
149 implemented by `BuildTool`.
150
1516. The set of target patterns on the command line is parsed and wildcards like
152 `//pkg:all` and `//pkg/...` are resolved. This is implemented in
153 `AnalysisPhaseRunner.evaluateTargetPatterns()` and reified in Skyframe as
154 `TargetPatternPhaseValue`.
155
1567. The loading/analysis phase is run to produce the action graph (a directed
157 acyclic graph of commands that need to be executed for the build).
158
1598. The execution phase is run. This means running every action required to
160 build the top-level targets that are requested are run.
161
162## Command line options
163
164The command line options for a Bazel invocation are described in an
165`OptionsParsingResult` object, which in turn contains a map from "option
166classes" to the values of the options. An "option class" is a subclass of
167`OptionsBase` and groups command line options together that are related to each
168other. For example:
169
1701. Options related to a programming language (`CppOptions` or `JavaOptions`).
171 These should be a subclass of `FragmentOptions` and are eventually wrapped
172 into a `BuildOptions` object.
1732. Options related to the way Bazel executes actions (`ExecutionOptions`)
174
175These options are designed to be consumed in the analysis phase and (either
176through `RuleContext.getFragment()` in Java or `ctx.fragments` in Starlark).
177Some of them (for example, whether to do C++ include scanning or not) are read
178in the execution phase, but that always requires explicit plumbing since
179`BuildConfiguration` is not available then. For more information, see the
180section “Configurations”.
181
182**WARNING:** We like to pretend that `OptionsBase` instances are immutable and
183use them that way (e.g. as part of `SkyKeys`). This is not the case and
184modifying them is a really good way to break Bazel in subtle ways that are hard
185to debug. Unfortunately, making them actually immutable is a large endeavor.
186(Modifying a `FragmentOptions` immediately after construction before anyone else
187gets a chance to keep a reference to it and before `equals()` or `hashCode()` is
188called on it is okay.)
189
190Bazel learns about option classes in the following ways:
191
1921. Some are hard-wired into Bazel (`CommonCommandOptions`)
1932. From the @Command annotation on each Bazel command
1943. From `ConfiguredRuleClassProvider` (these are command line options related
195 to individual programming languages)
1964. Starlark rules can also define their own options (see
fwead37a372022-03-08 03:27:15 -0800197 [here](https://bazel.build/rules/config))
laurentlb4f2991c52020-08-12 11:37:32 -0700198
199Each option (excluding Starlark-defined options) is a member variable of a
200`FragmentOptions` subclass that has the `@Option` annotation, which specifies
201the name and the type of the command line option along with some help text.
202
203The Java type of the value of a command line option is usually something simple
204(a string, an integer, a Boolean, a label, etc.). However, we also support
205options of more complicated types; in this case, the job of converting from the
206command line string to the data type falls to an implementation of
207`com.google.devtools.common.options.Converter` .
208
209## The source tree, as seen by Bazel
210
211Bazel is in the business of building software, which happens by reading and
212interpreting the source code. The totality of the source code Bazel operates on
213is called "the workspace" and it is structured into repositories, packages and
214rules. A description of these concepts for the users of Bazel is available
fwead37a372022-03-08 03:27:15 -0800215[here](https://bazel.build/concepts/build-ref).
laurentlb4f2991c52020-08-12 11:37:32 -0700216
217### Repositories
218
219A "repository" is a source tree on which a developer works; it usually
jingwenf8b2d3b2020-10-02 06:35:24 -0700220represents a single project. Bazel's ancestor, Blaze, operated on a monorepo,
laurentlb4f2991c52020-08-12 11:37:32 -0700221i.e. a single source tree that contains all source code used to run the build.
222Bazel, in contrast, supports projects whose source code spans multiple
223repositories. The repository from which Bazel is invoked is called the “main
224repository”, the others are called “external repositories”.
225
226A repository is marked by a file called `WORKSPACE` (or `WORKSPACE.bazel`) in
227its root directory. This file contains information that is "global" to the whole
228build, for example, the set of available external repositories. It works like a
229regular Starlark file which means that one can `load()` other Starlark files.
230This is commonly used to pull in repositories that are needed by a repository
231that's explicitly referenced (we call this the "`deps.bzl` pattern")
232
233Code of external repositories is symlinked or downloaded under
234`$OUTPUT_BASE/external`.
235
236When running the build, the whole source tree needs to be pieced together; this
237is done by SymlinkForest, which symlinks every package in the main repository to
238`$EXECROOT` and every external repository to either `$EXECROOT/external` or
239`$EXECROOT/..` (the former of course makes it impossible to have a package
240called `external` in the main repository; that's why we are migrating away from
241it)
242
243### Packages
244
245Every repository is composed of packages, i.e. a collection of related files and
246a specification of the dependencies. These are specified by a file called
247`BUILD` or `BUILD.bazel`. If both exist, Bazel prefers `BUILD.bazel`; the reason
jingwenf8b2d3b2020-10-02 06:35:24 -0700248why BUILD files are still accepted is that Bazel’s ancestor, Blaze, used this
laurentlb4f2991c52020-08-12 11:37:32 -0700249file name. However, it turned out to be a commonly used path segment, especially
250on Windows, where file names are case-insensitive.
251
252Packages are independent of each other: changes to the BUILD file of a package
253cannot cause other packages to change. The addition or removal of BUILD files
254_can _change other packages, since recursive globs stop at package boundaries
255and thus the presence of a BUILD file stops the recursion.
256
257The evaluation of a BUILD file is called "package loading". It's implemented in
258the class `PackageFactory`, works by calling the Starlark interpreter and
259requires knowledge of the set of available rule classes. The result of package
260loading is a `Package` object. It's mostly a map from a string (the name of a
261target) to the target itself.
262
263A large chunk of complexity during package loading is globbing: Bazel does not
264require every source file to be explicitly listed and instead can run globs
265(e.g. `glob(["**/*.java"])`). Unlike the shell, it supports recursive globs that
266descend into subdirectories (but not into subpackages). This requires access to
267the file system and since that can be slow, we implement all sorts of tricks to
268make it run in parallel and as efficiently as possible.
269
270Globbing is implemented in the following classes:
271
272* `LegacyGlobber`, a fast and blissfully Skyframe-unaware globber
273* `SkyframeHybridGlobber`, a version that uses Skyframe and reverts back to
274 the legacy globber in order to avoid “Skyframe restarts” (described below)
275
276The `Package` class itself contains some members that are exclusively used to
277parse the WORKSPACE file and which do not make sense for real packages. This is
278a design flaw because objects describing regular packages should not contain
279fields that describe something else. These include:
280
281* The repository mappings
282* The registered toolchains
283* The registered execution platforms
284
285Ideally, there would be more separation between parsing the WORKSPACE file from
286parsing regular packages so that `Package`does not need to cater for the needs
287of both. This is unfortunately difficult to do because the two are intertwined
288quite deeply.
289
290### Labels, Targets and Rules
291
292Packages are composed of targets, which have the following types:
293
2941. **Files:** things that are either the input or the output of the build. In
295 Bazel parlance, we call them _artifacts_ (discussed elsewhere). Not all
296 files created during the build are targets; it’s common for an output of
297 Bazel not to have an associated label.
2982. **Rules:** these describe steps to derive its outputs from its inputs. They
299 are generally associated with a programming language (e.g. `cc_library`,
300 `java_library` or `py_library`), but there are some language-agnostic ones
301 (e.g. `genrule` or `filegroup`)
3023. **Package groups:** discussed in the [Visibility](#visibility) section.
303
304The name of a target is called a _Label_. The syntax of labels is
305`@repo//pac/kage:name`, where `repo` is the name of the repository the Label is
306in, `pac/kage` is the directory its BUILD file is in and `name` is the path of
307the file (if the label refers to a source file) relative to the directory of the
308package. When referring to a target on the command line, some parts of the label
309can be omitted:
310
3111. If the repository is omitted, the label is taken to be in the main
312 repository.
3132. If the package part is omitted (e.g. `name` or `:name`), the label is taken
314 to be in the package of the current working directory (relative paths
315 containing uplevel references (..) are not allowed)
316
317A kind of a rule (e.g. "C++ library") is called a "rule class". Rule classes may
318be implemented either in Starlark (the `rule()` function) or in Java (so called
319“native rules”, type `RuleClass`). In the long term, every language-specific
320rule will be implemented in Starlark, but some legacy rule families (e.g. Java
321or C++) are still in Java for the time being.
322
323Starlark rule classes need to be imported at the beginning of BUILD files using
324the `load()` statement, whereas Java rule classes are "innately" known by Bazel,
325by virtue of being registered with the `ConfiguredRuleClassProvider`.
326
327Rule classes contain information such as:
328
3291. Its attributes (e.g., `srcs`, `deps`): their types, default values,
330 constraints, etc.
3312. The configuration transitions and aspects attached to each attribute, if any
3323. The implementation of the rule
3334. The transitive info providers the rule "usually" creates
334
335**Terminology note:** In the code base, we often use “Rule” to mean the target
336created by a rule class. But in Starlark and in user-facing documentation,
337“Rule” should be used exclusively to refer to the rule class itself; the target
338is just a “target”. Also note that despite `RuleClass` having “class” in its
339name, there is no Java inheritance relationship between a rule class and targets
340of that type.
341
342## Skyframe
343
344The evaluation framework underlying Bazel is called Skyframe. Its model is that
345everything that needs to be built during a build is organized into a directed
346acyclic graph with edges pointing from any pieces of data to its dependencies,
347that is, other pieces of data that need to be known to construct it.
348
349The nodes in the graph are called `SkyValue`s and their names are called
350`SkyKey`s. Both are deeply immutable, i.e. only immutable objects should be
351reachable from them. This invariant almost always holds, and in case it doesn't
352(e.g. for the individual options classes `BuildOptions`, which is a member of
353`BuildConfigurationValue` and its `SkyKey`) we try really hard not to change
354them or to change them in only ways that are not observable from the outside.
355From this it follows that everything that is computed within Skyframe (e.g.
356configured targets) must also be immutable.
357
358The most convenient way to observe the Skyframe graph is to run `bazel dump
359--skyframe=detailed`, which dumps the graph, one `SkyValue` per line. It's best
360to do it for tiny builds, since it can get pretty large.
361
362Skyframe lives in the `com.google.devtools.build.skyframe` package. The
363similarly-named package `com.google.devtools.build.lib.skyframe` contains the
364implementation of Bazel on top of Skyframe. More information about Skyframe is
365available [here](https://bazel.build/designs/skyframe.html).
366
367Generating a new `SkyValue` involves the following steps:
368
3691. Running the associated `SkyFunction`
3702. Declaring the dependencies (i.e. `SkyValue`s) that the `SkyFunction` needs
371 to do its job. This is done by calling the various overloads of
372 `SkyFunction.Environment.getValue()`.
3733. If a dependency is not available, Skyframe signals that by returning null
374 from `getValue()`. In this case, the `SkyFunction` is expected to yield
375 control to Skyframe by returning null, then Skyframe evaluates the
376 dependencies that haven't been evaluated yet and calls the `SkyFunction`
377 again, thus going back to (1).
3784. Constructing the resulting `SkyValue`
379
380A consequence of this is that if not all dependencies are available in (3), the
381function needs to be completely restarted and thus computation needs to be
nharmata17e84b32022-01-05 10:57:58 -0800382re-done, which is obviously inefficient. `SkyFunction.Environment.getState()`
383lets us directly work around this issue by having Skyframe maintain the
384`SkyKeyComputeState` instance between calls to `SkyFunction.compute` for the
385same `SkyKey`. Check out the example in the javadoc for
386`SkyFunction.Environment.getState()`, as well as real usages in the Bazel
387codebase.
388
389Other indirect workarounds:
laurentlb4f2991c52020-08-12 11:37:32 -0700390
3911. Declaring dependencies of `SkyFunction`s in groups so that if a function
392 has, say, 10 dependencies, it only needs to restart once instead of ten
393 times.
3942. Splitting `SkyFunction`s so that one function does not need to be restarted
395 many times. This has the side effect of interning data into Skyframe that
396 may be internal to the `SkyFunction`, thus increasing memory use.
laurentlb4f2991c52020-08-12 11:37:32 -0700397
nharmata17e84b32022-01-05 10:57:58 -0800398These are all just workarounds for the limitations of Skyframe, which
laurentlb4f2991c52020-08-12 11:37:32 -0700399is mostly a consequence of the fact that Java doesn't support lightweight
400threads and that we routinely have hundreds of thousands of in-flight Skyframe
401nodes.
402
403## Starlark
404
405Starlark is the domain-specific language people use to configure and extend
406Bazel. It's conceived as a restricted subset of Python that has far fewer types,
407more restrictions on control flow, and most importantly, strong immutability
408guarantees to enable concurrent reads. It is not Turing-complete, which
409discourages some (but not all) users from trying to accomplish general
410programming tasks within the language.
411
412Starlark is implemented in the `com.google.devtools.build.lib.syntax` package.
413It also has an independent Go implementation
414[here](https://github.com/google/starlark-go). The Java implementation used in
415Bazel is currently an interpreter.
416
417Starlark is used in four contexts:
418
4191. **The BUILD language.** This is where new rules are defined. Starlark code
420 running in this context only has access to the contents of the BUILD file
421 itself and Starlark files loaded by it.
4222. **Rule definitions.** This is how new rules (e.g. support for a new
423 language) are defined. Starlark code running in this context has access to
424 the configuration and data provided by its direct dependencies (more on this
425 later).
4263. **The WORKSPACE file.** This is where external repositories (code that's not
427 in the main source tree) are defined.
4284. **Repository rule definitions.** This is where new external repository types
429 are defined. Starlark code running in this context can run arbitrary code on
430 the machine where Bazel is running, and reach outside the workspace.
431
432The dialects available for BUILD and .bzl files are slightly different because
433they express different things. A list of differences is available
fwead37a372022-03-08 03:27:15 -0800434[here](https://bazel.build/rules/language#differences-between-build-and-bzl-files).
laurentlb4f2991c52020-08-12 11:37:32 -0700435
436More information about Starlark is available
fwead37a372022-03-08 03:27:15 -0800437[here](https://bazel.build/rules/language).
laurentlb4f2991c52020-08-12 11:37:32 -0700438
439## The loading/analysis phase
440
441The loading/analysis phase is where Bazel determines what actions are needed to
442build a particular rule. Its basic unit is a "configured target", which is,
443quite sensibly, a (target, configuration) pair.
444
445It's called the "loading/analysis phase" because it can be split into two
446distinct parts, which used to be serialized, but they can now overlap in time:
447
4481. Loading packages, that is, turning BUILD files into the `Package` objects
449 that represent them
4502. Analyzing configured targets, that is, running the implementation of the
451 rules to produce the action graph
452
453Each configured target in the transitive closure of the configured targets
454requested on the command line must be analyzed bottom-up, i.e. leaf nodes first,
455then up to the ones on the command line. The inputs to the analysis of a single
456configured target are:
457
4581. **The configuration.** ("how" to build that rule; for example, the target
459 platform but also things like command line options the user wants to be
460 passed to the C++ compiler)
4612. **The direct dependencies.** Their transitive info providers are available
462 to the rule being analyzed. They are called like that because they provide a
463 "roll-up" of the information in the transitive closure of the configured
464 target, e.g. all the .jar files on the classpath or all the .o files that
465 need to be linked into a C++ binary)
4663. **The target itself**. This is the result of loading the package the target
467 is in. For rules, this includes its attributes, which is usually what
468 matters.
4694. **The implementation of the configured target.** For rules, this can either
470 be in Starlark or in Java. All non-rule configured targets are implemented
471 in Java.
472
473The output of analyzing a configured target is:
474
4751. The transitive info providers that configured targets that depend on it can
476 access
4772. The artifacts it can create and the actions that produce them.
478
479The API offered to Java rules is `RuleContext`, which is the equivalent of the
480`ctx` argument of Starlark rules. Its API is more powerful, but at the same
481time, it's easier to do Bad Things™, for example to write code whose time or
482space complexity is quadratic (or worse), to make the Bazel server crash with a
483Java exception or to violate invariants (e.g. by inadvertently modifying an
484`Options` instance or by making a configured target mutable)
485
486The algorithm that determines the direct dependencies of a configured target
487lives in `DependencyResolver.dependentNodeMap()`.
488
489### Configurations
490
491Configurations are the "how" of building a target: for what platform, with what
492command line options, etc.
493
494The same target can be built for multiple configurations in the same build. This
495is useful, for example, when the same code is used for a tool that's run during
496the build and for the target code and we are cross-compiling or when we are
497building a fat Android app (one that contains native code for multiple CPU
498architectures)
499
500Conceptually, the configuration is a `BuildOptions` instance. However, in
501practice, `BuildOptions` is wrapped by `BuildConfiguration` that provides
502additional sundry pieces of functionality. It propagates from the top of the
503dependency graph to the bottom. If it changes, the build needs to be
504re-analyzed.
505
506This results in anomalies like having to re-analyze the whole build if e.g. the
507number of requested test runs changes, even though that only affects test
508targets (we have plans to "trim" configurations so that this is not the case,
509but it's not ready yet)
510
511When a rule implementation needs part of the configuration, it needs to declare
512it in its definition using `RuleClass.Builder.requiresConfigurationFragments()`
513. This is both to avoid mistakes (e.g. Python rules using the Java fragment) and
514to facilitate configuration trimming so that e.g. if Python options change, C++
515targets don't need to be re-analyzed.
516
517The configuration of a rule is not necessarily the same as that of its "parent"
518rule. The process of changing the configuration in a dependency edge is called a
519"configuration transition". It can happen in two places:
520
5211. On a dependency edge. These transitions are specified in
522 `Attribute.Builder.cfg()` and are functions from a `Rule` (where the
523 transition happens) and a `BuildOptions` (the original configuration) to one
524 or more `BuildOptions` (the output configuration).
5252. On any incoming edge to a configured target. These are specified in
526 `RuleClass.Builder.cfg()`.
527
528The relevant classes are `TransitionFactory` and `ConfigurationTransition`.
529
530Configuration transitions are used, for example:
531
5321. To declare that a particular dependency is used during the build and it
533 should thus be built in the execution architecture
5342. To declare that a particular dependency must be built for multiple
535 architectures (e.g. for native code in fat Android APKs)
536
537If a configuration transition results in multiple configurations, it's called a
538_split transition._
539
540Configuration transitions can also be implemented in Starlark (documentation
fwead37a372022-03-08 03:27:15 -0800541[here](https://bazel.build/rules/config))
laurentlb4f2991c52020-08-12 11:37:32 -0700542
543### Transitive info providers
544
545Transitive info providers are a way (and the _only _way) for configured targets
546to tell things about other configured targets that depend on it. The reason why
547"transitive" is in their name is that this is usually some sort of roll-up of
548the transitive closure of a configured target.
549
550There is generally a 1:1 correspondence between Java transitive info providers
551and Starlark ones (the exception is `DefaultInfo` which is an amalgamation of
552`FileProvider`, `FilesToRunProvider` and `RunfilesProvider` because that API was
553deemed to be more Starlark-ish than a direct transliteration of the Java one).
554Their key is one of the following things:
555
5561. A Java Class object. This is only available for providers that are not
557 accessible from Starlark. These providers are a subclass of
558 `TransitiveInfoProvider`.
5592. A string. This is legacy and heavily discouraged since it's susceptible to
560 name clashes. Such transitive info providers are direct subclasses of
561 `build.lib.packages.Info` .
5623. A provider symbol. This can be created from Starlark using the `provider()`
563 function and is the recommended way to create new providers. The symbol is
564 represented by a `Provider.Key` instance in Java.
565
566New providers implemented in Java should be implemented using `BuiltinProvider`.
567`NativeProvider` is deprecated (we haven't had time to remove it yet) and
568`TransitiveInfoProvider` subclasses cannot be accessed from Starlark.
569
570### Configured targets
571
572Configured targets are implemented as `RuleConfiguredTargetFactory`. There is a
573subclass for each rule class implemented in Java. Starlark configured targets
Xavier Bonaventurafbb19fb2021-06-02 09:53:05 -0700574are created through `StarlarkRuleConfiguredTargetUtil.buildRule()` .
laurentlb4f2991c52020-08-12 11:37:32 -0700575
576Configured target factories should use `RuleConfiguredTargetBuilder` to
577construct their return value. It consists of the following things:
578
5791. Their `filesToBuild`, i.e. the hazy concept of "the set of files this rule
580 represents". These are the files that get built when the configured target
581 is on the command line or in the srcs of a genrule.
5822. Their runfiles, regular and data.
5833. Their output groups. These are various "other sets of files" the rule can
584 build. They can be accessed using the output\_group attribute of the
585 filegroup rule in BUILD and using the `OutputGroupInfo` provider in Java.
586
587### Runfiles
588
589Some binaries need data files to run. A prominent example is tests that need
590input files. This is represented in Bazel by the concept of "runfiles". A
591"runfiles tree" is a directory tree of the data files for a particular binary.
592It is created in the file system as a symlink tree with individual symlinks
593pointing to the files in the source of output trees.
594
595A set of runfiles is represented as a `Runfiles` instance. It is conceptually a
596map from the path of a file in the runfiles tree to the `Artifact` instance that
597represents it. It's a little more complicated than a single `Map` for two
598reasons:
599
600* Most of the time, the runfiles path of a file is the same as its execpath.
601 We use this to save some RAM.
602* There are various legacy kinds of entries in runfiles trees, which also need
603 to be represented.
604
605Runfiles are collected using `RunfilesProvider`: an instance of this class
606represents the runfiles a configured target (e.g. a library) and its transitive
607closure needs and they are gathered like a nested set (in fact, they are
608implemented using nested sets under the cover): each target unions the runfiles
609of its dependencies, adds some of its own, then sends the resulting set upwards
610in the dependency graph. A `RunfilesProvider` instance contains two `Runfiles`
611instances, one for when the rule is depended on through the "data" attribute and
612one for every other kind of incoming dependency. This is because a target
613sometimes presents different runfiles when depended on through a data attribute
614than otherwise. This is undesired legacy behavior that we haven't gotten around
615removing yet.
616
617Runfiles of binaries are represented as an instance of `RunfilesSupport`. This
618is different from `Runfiles` because `RunfilesSupport` has the capability of
619actually being built (unlike `Runfiles`, which is just a mapping). This
620necessitates the following additional components:
621
622* **The input runfiles manifest.** This is a serialized description of the
623 runfiles tree. It is used as a proxy for the contents of the runfiles tree
624 and Bazel assumes that the runfiles tree changes if and only if the contents
625 of the manifest change.
626* **The output runfiles manifest.** This is used by runtime libraries that
627 handle runfiles trees, notably on Windows, which sometimes doesn't support
628 symbolic links.
629* **The runfiles middleman.** In order for a runfiles tree to exist, one needs
630 to build the symlink tree and the artifact the symlinks point to. In order
631 to decrease the number of dependency edges, the runfiles middleman can be
632 used to represent all these.
633* **Command line arguments** for running the binary whose runfiles the
634 `RunfilesSupport` object represents.
635
636### Aspects
637
638Aspects are a way to "propagate computation down the dependency graph". They are
639described for users of Bazel
fwead37a372022-03-08 03:27:15 -0800640[here](https://bazel.build/rules/aspects). A good
laurentlb4f2991c52020-08-12 11:37:32 -0700641motivating example is protocol buffers: a `proto_library` rule should not know
642about any particular language, but building the implementation of a protocol
643buffer message (the “basic unit” of protocol buffers) in any programming
644language should be coupled to the `proto_library` rule so that if two targets in
645the same language depend on the same protocol buffer, it gets built only once.
646
647Just like configured targets, they are represented in Skyframe as a `SkyValue`
648and the way they are constructed is very similar to how configured targets are
649built: they have a factory class called `ConfiguredAspectFactory` that has
650access to a `RuleContext`, but unlike configured target factories, it also knows
651about the configured target it is attached to and its providers.
652
653The set of aspects propagated down the dependency graph is specified for each
654attribute using the `Attribute.Builder.aspects()` function. There are a few
655confusingly-named classes that participate in the process:
656
6571. `AspectClass` is the implementation of the aspect. It can be either in Java
658 (in which case it's a subclass) or in Starlark (in which case it's an
Xavier Bonaventurafbb19fb2021-06-02 09:53:05 -0700659 instance of `StarlarkAspectClass`). It's analogous to
laurentlb4f2991c52020-08-12 11:37:32 -0700660 `RuleConfiguredTargetFactory`.
6612. `AspectDefinition` is the definition of the aspect; it includes the
662 providers it requires, the providers it provides and contains a reference to
663 its implementation, i.e. the appropriate `AspectClass` instance. It's
664 analogous to `RuleClass`.
6653. `AspectParameters` is a way to parametrize an aspect that is propagated down
666 the dependency graph. It's currently a string to string map. A good example
667 of why it's useful is protocol buffers: if a language has multiple APIs, the
668 information as to which API the protocol buffers should be built for should
669 be propagated down the dependency graph.
6704. `Aspect` represents all the data that's needed to compute an aspect that
671 propagates down the dependency graph. It consists of the aspect class, its
672 definition and its parameters.
6735. `RuleAspect` is the function that determines which aspects a particular rule
674 should propagate. It's a `Rule` -> `Aspect` function.
675
676A somewhat unexpected complication is that aspects can attach to other aspects;
677for example, an aspect collecting the classpath for a Java IDE will probably
678want to know about all the .jar files on the classpath, but some of them are
679protocol buffers. In that case, the IDE aspect will want to attach to the
680(`proto_library` rule + Java proto aspect) pair.
681
682The complexity of aspects on aspects is captured in the class
683`AspectCollection`.
684
685### Platforms and toolchains
686
687Bazel supports multi-platform builds, that is, builds where there may be
688multiple architectures where build actions run and multiple architectures for
689which code is built. These architectures are referred to as _platforms_ in Bazel
690parlance (full documentation
fwead37a372022-03-08 03:27:15 -0800691[here](https://bazel.build/docs/platforms))
laurentlb4f2991c52020-08-12 11:37:32 -0700692
693A platform is described by a key-value mapping from _constraint settings_ (e.g.
694the concept of "CPU architecture") to _constraint values_ (e.g. a particular CPU
695like x86\_64). We have a "dictionary" of the most commonly used constraint
696settings and values in the `@platforms` repository.
697
698The concept of _toolchain_ comes from the fact that depending on what platforms
699the build is running on and what platforms are targeted, one may need to use
700different compilers; for example, a particular C++ toolchain may run on a
701specific OS and be able to target some other OSes. Bazel must determine the C++
702compiler that is used based on the set execution and target platform
703(documentation for toolchains
fwead37a372022-03-08 03:27:15 -0800704[here](https://bazel.build/docs/toolchains)).
laurentlb4f2991c52020-08-12 11:37:32 -0700705
706In order to do this, toolchains are annotated with the set of execution and
707target platform constraints they support. In order to do this, the definition of
708a toolchain are split into two parts:
709
7101. A `toolchain()` rule that describes the set of execution and target
711 constraints a toolchain supports and tells what kind (e.g. C++ or Java) of
712 toolchain it is (the latter is represented by the `toolchain_type()` rule)
7132. A language-specific rule that describes the actual toolchain (e.g.
714 `cc_toolchain()`)
715
716This is done in this way because we need to know the constraints for every
717toolchain in order to do toolchain resolution and language-specific
718`*_toolchain()` rules contain much more information than that, so they take more
719time to load.
720
721Execution platforms are specified in one of the following ways:
722
7231. In the WORKSPACE file using the `register_execution_platforms()` function
7242. On the command line using the --extra\_execution\_platforms command line
725 option
726
727The set of available execution platforms is computed in
728`RegisteredExecutionPlatformsFunction` .
729
730The target platform for a configured target is determined by
731`PlatformOptions.computeTargetPlatform()` . It's a list of platforms because we
732eventually want to support multiple target platforms, but it's not implemented
733yet.
734
735The set of toolchains to be used for a configured target is determined by
736`ToolchainResolutionFunction`. It is a function of:
737
738* The set of registered toolchains (in the WORKSPACE file and the
739 configuration)
740* The desired execution and target platforms (in the configuration)
741* The set of toolchain types that are required by the configured target (in
742 `UnloadedToolchainContextKey)`
743* The set of execution platform constraints of the configured target (the
744 `exec_compatible_with` attribute) and the configuration
745 (`--experimental_add_exec_constraints_to_targets`), in
746 `UnloadedToolchainContextKey`
747
748Its result is an `UnloadedToolchainContext`, which is essentially a map from
749toolchain type (represented as a `ToolchainTypeInfo` instance) to the label of
750the selected toolchain. It's called "unloaded" because it does not contain the
751toolchains themselves, only their labels.
752
753Then the toolchains are actually loaded using `ResolvedToolchainContext.load()`
754and used by the implementation of the configured target that requested them.
755
756We also have a legacy system that relies on there being one single "host"
757configuration and target configurations being represented by various
758configuration flags, e.g. `--cpu` . We are gradually transitioning to the above
759system. In order to handle cases where people rely on the legacy configuration
760values, we have implemented
761"[platform mappings](https://docs.google.com/document/d/1Vg_tPgiZbSrvXcJ403vZVAGlsWhH9BUDrAxMOYnO0Ls)"
762to translate between the legacy flags and the new-style platform constraints.
763Their code is in `PlatformMappingFunction` and uses a non-Starlark "little
764language".
765
766### Constraints
767
768Sometimes one wants to designate a target as being compatible with only a few
769platforms. Bazel has (unfortunately) multiple mechanisms to achieve this end:
770
771* Rule-specific constraints
772* `environment_group()` / `environment()`
773* Platform constraints
774
775Rule-specific constraints are mostly used within Google for Java rules; they are
776on their way out and they are not available in Bazel, but the source code may
777contain references to it. The attribute that governs this is called
778`constraints=` .
779
780#### environment_group() and environment()
781
782These rules are a legacy mechanism and are not widely used.
783
784All build rules can declare which "environments" they can be built for, where a
785"environment" is an instance of the `environment()` rule.
786
787There are various ways supported environments can be specified for a rule:
788
7891. Through the `restricted_to=` attribute. This is the most direct form of
790 specification; it declares the exact set of environments the rule supports
791 for this group.
7922. Through the `compatible_with=` attribute. This declares environments a rule
793 supports in addition to "standard" environments that are supported by
794 default.
7953. Through the package-level attributes `default_restricted_to=` and
796 `default_compatible_with=`.
7974. Through default specifications in `environment_group()` rules. Every
798 environment belongs to a group of thematically related peers (e.g. "CPU
799 architectures", "JDK versions" or "mobile operating systems"). The
800 definition of an environment group includes which of these environments
801 should be supported by "default" if not otherwise specified by the
802 `restricted_to=` / `environment()` attributes. A rule with no such
803 attributes inherits all defaults.
8045. Through a rule class default. This overrides global defaults for all
805 instances of the given rule class. This can be used, for example, to make
806 all `*_test` rules testable without each instance having to explicitly
807 declare this capability.
808
809`environment()` is implemented as a regular rule whereas `environment_group()`
810is both a subclass of `Target` but not `Rule` (`EnvironmentGroup`) and a
811function that is available by default from Starlark
812(`StarlarkLibrary.environmentGroup()`) which eventually creates an eponymous
813target. This is to avoid a cyclic dependency that would arise because each
814environment needs to declare the environment group it belongs to and each
815environment group needs to declare its default environments.
816
817A build can be restricted to a certain environment with the
818`--target_environment` command line option.
819
820The implementation of the constraint check is in
821`RuleContextConstraintSemantics` and `TopLevelConstraintSemantics`.
822
823#### Platform constraints
824
825The current "official" way to describe what platforms a target is compatible
826with is by using the same constraints used to describe toolchains and platforms.
827It's under review in pull request
828[#10945](https://github.com/bazelbuild/bazel/pull/10945).
829
830### Visibility
831
832If you work on a large codebase with a lot of developers (like at Google), you
833don't necessarily want everyone else to be able to depend on your code so that
834you retain the liberty to change things that you deem to be implementation
835details (otherwise, as per [Hyrum's law](https://www.hyrumslaw.com/), people
836_will_ come to depend on all parts of your code).
837
838Bazel supports this by the mechanism called _visibility: _you can declare that a
839particular rule can only be depended on using the visibility attribute
840(documentation
fwead37a372022-03-08 03:27:15 -0800841[here](https://bazel.build/reference/be/common-definitions#common-attributes)).
laurentlb4f2991c52020-08-12 11:37:32 -0700842This attribute is a little special because unlike every other attribute, the set
843of dependencies it generates is not simply the set of labels listed (yes, this
844is a design flaw).
845
846This is implemented in the following places:
847
848* The `RuleVisibility` interface represents a visibility declaration. It can
849 be either a constant (fully public or fully private) or a list of labels.
850* Labels can refer to either package groups (predefined list of packages), to
851 packages directly (`//pkg:__pkg__`) or subtrees of packages
852 (`//pkg:__subpackages__`). This is different from the command line syntax,
853 which uses `//pkg:*` or `//pkg/...`.
854* Package groups are implemented as their own target and configured target
855 types (`PackageGroup` and `PackageGroupConfiguredTarget`). We could probably
856 replace these with simple rules if we wanted to.
857* The conversion from visibility label lists to dependencies is done in
858 `DependencyResolver.visitTargetVisibility` and a few other miscellaneous
859 places.
860* The actual check is done in
861 `CommonPrerequisiteValidator.validateDirectPrerequisiteVisibility()`
862
863### Nested sets
864
865Oftentimes, a configured target aggregates a set of files from its dependencies,
866adds its own, and wraps the aggregate set into a transitive info provider so
867that configured targets that depend on it can do the same. Examples:
868
869* The C++ header files used for a build
870* The object files that represent the transitive closure of a `cc_library`
871* The set of .jar files that need to be on the classpath for a Java rule to
872 compile or run
873* The set of Python files in the transitive closure of a Python rule
874
875If we did this the naive way by using e.g. `List` or `Set`, we'd end up with
876quadratic memory usage: if there is a chain of N rules and each rule adds a
877file, we'd have 1+2+...+N collection members.
878
879In order to get around this problem, we came up with the concept of a
880`NestedSet`. It's a data structure that is composed of other `NestedSet`
881instances and some members of its own, thereby forming a directed acyclic graph
882of sets. They are immutable and their members can be iterated over. We define
883multiple iteration order (`NestedSet.Order`): preorder, postorder, topological
884(a node always comes after its ancestors) and "don't care, but it should be the
885same each time".
886
887The same data structure is called `depset` in Starlark.
888
889### Artifacts and Actions
890
891The actual build consists of a set of commands that need to be run to produce
892the output the user wants. The commands are represented as instances of the
893class `Action` and the files are represented as instances of the class
894`Artifact`. They are arranged in a bipartite, directed, acyclic graph called the
895"action graph".
896
897Artifacts come in two kinds: source artifacts (i.e. ones that are available
898before Bazel starts executing) and derived artifacts (ones that need to be
899built). Derived artifacts can themselves be multiple kinds:
900
9011. **Regular artifacts. **These are checked for up-to-dateness by computing
902 their checksum, with mtime as a shortcut; we don't checksum the file if its
903 ctime hasn't changed.
9042. **Unresolved symlink artifacts.** These are checked for up-to-dateness by
905 calling readlink(). Unlike regular artifacts, these can be dangling
906 symlinks. Usually used in cases where one then packs up some files into an
907 archive of some sort.
9083. **Tree artifacts.** These are not single files, but directory trees. They
909 are checked for up-to-dateness by checking the set of files in it and their
910 contents. They are represented as a `TreeArtifact`.
9114. **Constant metadata artifacts.** Changes to these artifacts don't trigger a
912 rebuild. This is used exclusively for build stamp information: we don't want
913 to do a rebuild just because the current time changed.
914
915There is no fundamental reason why source artifacts cannot be tree artifacts or
916unresolved symlink artifacts, it's just that we haven't implemented it yet (we
917should, though -- referencing a source directory in a BUILD file is one of the
918few known long-standing incorrectness issues with Bazel; we have an
919implementation that kind of works which is enabled by the
920`BAZEL_TRACK_SOURCE_DIRECTORIES=1` JVM property)
921
922A notable kind of `Artifact` are middlemen. They are indicated by `Artifact`
923instances that are the outputs of `MiddlemanAction`. They are used to
924special-case some things:
925
926* Aggregating middlemen are used to group artifacts together. This is so that
927 if a lot of actions use the same large set of inputs, we don't have N\*M
928 dependency edges, only N+M (they are being replaced with nested sets)
929* Scheduling dependency middlemen ensure that an action runs before another.
930 They are mostly used for linting but also for C++ compilation (see
931 `CcCompilationContext.createMiddleman()` for an explanation)
932* Runfiles middlemen are used to ensure the presence of a runfiles tree so
933 that one does not separately need to depend on the output manifest and every
934 single artifact referenced by the runfiles tree.
935
936Actions are best understood as a command that needs to be run, the environment
937it needs and the set of outputs it produces. The following things are the main
938components of the description of an action:
939
940* The command line that needs to be run
941* The input artifacts it needs
942* The environment variables that need to be set
943* Annotations that describe the environment (e.g. platform) it needs to run in
944 \
945
946There are also a few other special cases, like writing a file whose content is
947known to Bazel. They are a subclass of `AbstractAction`. Most of the actions are
948a `SpawnAction` or a `StarlarkAction` (the same, they should arguably not be
949separate classes), although Java and C++ have their own action types
950(`JavaCompileAction`, `CppCompileAction` and `CppLinkAction`).
951
952We eventually want to move everything to `SpawnAction`; `JavaCompileAction` is
953pretty close, but C++ is a bit of a special-case due to .d file parsing and
954include scanning.
955
956The action graph is mostly "embedded" into the Skyframe graph: conceptually, the
957execution of an action is represented as an invocation of
958`ActionExecutionFunction`. The mapping from an action graph dependency edge to a
959Skyframe dependency edge is described in
960`ActionExecutionFunction.getInputDeps()` and `Artifact.key()` and has a few
961optimizations in order to keep the number of Skyframe edges low:
962
963* Derived artifacts do not have their own `SkyValue`s. Instead,
964 `Artifact.getGeneratingActionKey()` is used to find out the key for the
965 action that generates it
966* Nested sets have their own Skyframe key.
967
968### Shared actions
969
970Some actions are generated by multiple configured targets; Starlark rules are
971more limited since they are only allowed to put their derived actions into a
972directory determined by their configuration and their package (but even so,
973rules in the same package can conflict), but rules implemented in Java can put
974derived artifacts anywhere.
975
976This is considered to be a misfeature, but getting rid of it is really hard
977because it produces significant savings in execution time when e.g. a source
978file needs to be processed somehow and that file is referenced by multiple rules
979(handwave-handwave). This comes at the cost of some RAM: each instance of a
980shared action needs to be stored in memory separately.
981
982If two actions generate the same output file, they must be exactly the same:
983have the same inputs, the same outputs and run the same command line. This
984equivalence relation is implemented in `Actions.canBeShared()` and it is
985verified between the analysis and execution phases by looking at every Action.
986This is implemented in `SkyframeActionExecutor.findAndStoreArtifactConflicts()`
987and is one of the few places in Bazel that requires a "global" view of the
988build.
989
990## The execution phase
991
992This is when Bazel actually starts running build actions, i.e. commands that
993produce outputs.
994
995The first thing Bazel does after the analysis phase is to determine what
996Artifacts need to be built. The logic for this is encoded in
997`TopLevelArtifactHelper`; roughly speaking, it's the `filesToBuild` of the
998configured targets on the command line and the contents of a special output
999group for the explicit purpose of expressing "if this target is on the command
1000line, build these artifacts".
1001
1002The next step is creating the execution root. Since Bazel has the option to read
1003source packages from different locations in the file system (`--package_path`),
1004it needs to provide locally executed actions with a full source tree. This is
1005handled by the class `SymlinkForest` and works by taking note of every target
1006used in the analysis phase and building up a single directory tree that symlinks
1007every package with a used target from its actual location. An alternative would
1008be to pass the correct paths to commands (taking `--package_path` into account).
1009This is undesirable because:
1010
1011* It changes action command lines when a package is moved from a package path
1012 entry to another (used to be a common occurrence)
1013* It results in different command lines if an action is run remotely than if
1014 it's run locally
1015* It requires a command line transformation specific to the tool in use
1016 (consider the difference between e.g. Java classpaths and C++ include paths)
1017* Changing the command line of an action invalidates its action cache entry
1018* `--package_path` is slowly and steadily being deprecated
1019
1020Then, Bazel starts traversing the action graph (the bipartite, directed graph
1021composed of actions and their input and output artifacts) and running actions.
1022The execution of each action is represented by an instance of the `SkyValue`
1023class `ActionExecutionValue`.
1024
1025Since running an action is expensive, we have a few layers of caching that can
1026be hit behind Skyframe:
1027
1028* `ActionExecutionFunction.stateMap` contains data to make Skyframe restarts
1029 of `ActionExecutionFunction` cheap
1030* The local action cache contains data about the state of the file system
1031* Remote execution systems usually also contain their own cache
1032
1033### The local action cache
1034
1035This cache is another layer that sits behind Skyframe; even if an action is
1036re-executed in Skyframe, it can still be a hit in the local action cache. It
1037represents the state of the local file system and it's serialized to disk which
1038means that when one starts up a new Bazel server, one can get local action cache
1039hits even though the Skyframe graph is empty.
1040
1041This cache is checked for hits using the method
1042`ActionCacheChecker.getTokenIfNeedToExecute()` .
1043
1044Contrary to its name, it's a map from the path of a derived artifact to the
1045action that emitted it. The action is described as:
1046
10471. The set of its input and output files and their checksum
10482. Its "action key", which is usually the command line that was executed, but
1049 in general, represents everything that's not captured by the checksum of the
1050 input files (e.g. for `FileWriteAction`, it's the checksum of the data
1051 that's written)
1052
1053There is also a highly experimental “top-down action cache” that is still under
1054development, which uses transitive hashes to avoid going to the cache as many
1055times.
1056
1057### Input discovery and input pruning
1058
1059Some actions are more complicated than just having a set of inputs. Changes to
1060the set of inputs of an action come in two forms:
1061
1062* An action may discover new inputs before its execution or decide that some
1063 of its inputs are not actually necessary. The canonical example is C++,
1064 where it's better to make an educated guess about what header files a C++
1065 file uses from its transitive closure so that we don't heed to send every
1066 file to remote executors; therefore, we have an option not to register every
1067 header file as an "input", but scan the source file for transitively
1068 included headers and only mark those header files as inputs that are
1069 mentioned in `#include` statements (we overestimate so that we don't need to
lberki1df4c712021-05-17 05:15:13 -07001070 implement a full C preprocessor) This option is currently hard-wired to
1071 "false" in Bazel and is only used at Google.
laurentlb4f2991c52020-08-12 11:37:32 -07001072* An action may realize that some files were not used during its execution. In
1073 C++, this is called ".d files": the compiler tells which header files were
1074 used after the fact, and in order to avoid the embarrassment of having worse
1075 incrementality than Make, Bazel makes use of this fact. This offers a better
1076 estimate than the include scanner because it relies on the compiler.
1077
1078These are implemented using methods on Action:
1079
10801. `Action.discoverInputs()` is called. It should return a nested set of
1081 Artifacts that are determined to be required. These must be source artifacts
1082 so that there are no dependency edges in the action graph that don't have an
1083 equivalent in the configured target graph.
10842. The action is executed by calling `Action.execute()`.
10853. At the end of `Action.execute()`, the action can call
1086 `Action.updateInputs()` to tell Bazel that not all of its inputs were
1087 needed. This can result in incorrect incremental builds if a used input is
1088 reported as unused.
1089
1090When an action cache returns a hit on a fresh Action instance (e.g. created
1091after a server restart), Bazel calls `updateInputs()` itself so that the set of
1092inputs reflects the result of input discovery and pruning done before.
1093
1094Starlark actions can make use of the facility to declare some inputs as unused
1095using the `unused_inputs_list=` argument of
fwead37a372022-03-08 03:27:15 -08001096<code>[ctx.actions.run()](https://bazel.build/rules/lib/actions#run)</code>.
laurentlb4f2991c52020-08-12 11:37:32 -07001097
1098### Various ways to run actions: Strategies/ActionContexts
1099
1100Some actions can be run in different ways. For example, a command line can be
1101executed locally, locally but in various kinds of sandboxes, or remotely. The
1102concept that embodies this is called an `ActionContext` (or `Strategy`, since we
1103successfully went only halfway with a rename...)
1104
1105The life cycle of an action context is as follows:
1106
jingwenf8b2d3b2020-10-02 06:35:24 -070011071. When the execution phase is started, `BlazeModule` instances are asked what
laurentlb4f2991c52020-08-12 11:37:32 -07001108 action contexts they have. This happens in the constructor of
1109 `ExecutionTool`. Action context types are identified by a Java `Class`
1110 instance that refers to a sub-interface of `ActionContext` and which
1111 interface the action context must implement.
11122. The appropriate action context is selected from the available ones and is
jingwenf8b2d3b2020-10-02 06:35:24 -07001113 forwarded to `ActionExecutionContext` and `BlazeExecutor` .
laurentlb4f2991c52020-08-12 11:37:32 -070011143. Actions request contexts using `ActionExecutionContext.getContext()` and
jingwenf8b2d3b2020-10-02 06:35:24 -07001115 `BlazeExecutor.getStrategy()` (there should really be only one way to do
laurentlb4f2991c52020-08-12 11:37:32 -07001116 it…)
1117
1118Strategies are free to call other strategies to do their jobs; this is used, for
1119example, in the dynamic strategy that starts actions both locally and remotely,
1120then uses whichever finishes first.
1121
1122One notable strategy is the one that implements persistent worker processes
1123(`WorkerSpawnStrategy`). The idea is that some tools have a long startup time
1124and should therefore be reused between actions instead of starting one anew for
1125every action (This does represent a potential correctness issue, since Bazel
1126relies on the promise of the worker process that it doesn't carry observable
1127state between individual requests)
1128
1129If the tool changes, the worker process needs to be restarted. Whether a worker
1130can be reused is determined by computing a checksum for the tool used using
1131`WorkerFilesHash`. It relies on knowing which inputs of the action represent
1132part of the tool and which represent inputs; this is determined by the creator
1133of the Action: `Spawn.getToolFiles()` and the runfiles of the `Spawn` are
1134counted as parts of the tool.
1135
1136More information about strategies (or action contexts!):
1137
1138* Information about various strategies for running actions is available
1139 [here](https://jmmv.dev/2019/12/bazel-strategies.html).
1140* Information about the dynamic strategy, one where we run an action both
1141 locally and remotely to see whichever finishes first is available
1142 [here](https://jmmv.dev/series.html#Bazel%20dynamic%20execution).
1143* Information about the intricacies of executing actions locally is available
1144 [here](https://jmmv.dev/2019/11/bazel-process-wrapper.html).
1145
1146### The local resource manager
1147
1148Bazel _can_ run many actions in parallel. The number of local actions that
1149_should_ be run in parallel differs from action to action: the more resources an
1150action requires, the less instances should be running at the same time to avoid
1151overloading the local machine.
1152
1153This is implemented in the class `ResourceManager`: each action has to be
1154annotated with an estimate of the local resources it requires in the form of a
1155`ResourceSet` instance (CPU and RAM). Then when action contexts do something
1156that requires local resources, they call `ResourceManager.acquireResources()`
1157and are blocked until the required resources are available.
1158
1159A more detailed description of local resource management is available
1160[here](https://jmmv.dev/2019/12/bazel-local-resources.html).
1161
1162### The structure of the output directory
1163
1164Each action requires a separate place in the output directory where it places
1165its outputs. The location of derived artifacts is usually as follows:
1166
1167```
1168$EXECROOT/bazel-out/<configuration>/bin/<package>/<artifact name>
1169```
1170
1171How is the name of the directory that is associated with a particular
1172configuration determined? There are two conflicting desirable properties:
1173
11741. If two configurations can occur in the same build, they should have
1175 different directories so that both can have their own version of the same
1176 action; otherwise, if the two configurations disagree about e.g. the command
1177 line of an action producing the same output file, Bazel doesn't know which
1178 action to choose (an "action conflict")
11792. If two configurations represent "roughly" the same thing, they should have
1180 the same name so that actions executed in one can be reused for the other if
1181 the command lines match: for example, changes to the command line options to
1182 the Java compiler should not result in C++ compile actions being re-run.
1183
1184So far, we have not come up with a principled way of solving this problem, which
1185has similarities to the problem of configuration trimming. A longer discussion
1186of options is available
1187[here](https://docs.google.com/document/d/1fZI7wHoaS-vJvZy9SBxaHPitIzXE_nL9v4sS4mErrG4/edit).
1188The main problematic areas are Starlark rules (whose authors usually aren't
1189intimately familiar with Bazel) and aspects, which add another dimension to the
1190space of things that can produce the "same" output file.
1191
1192The current approach is that the path segment for the configuration is
1193`<CPU>-<compilation mode>` with various suffixes added so that configuration
1194transitions implemented in Java don't result in action conflicts. In addition, a
1195checksum of the set of Starlark configuration transitions is added so that users
1196can't cause action conflicts. It is far from perfect. This is implemented in
1197`OutputDirectories.buildMnemonic()` and relies on each configuration fragment
1198adding its own part to the name of the output directory.
1199
1200## Tests
1201
1202Bazel has rich support for running tests. It supports:
1203
1204* Running tests remotely (if a remote execution backend is available)
1205* Running tests multiple times in parallel (for deflaking or gathering timing
1206 data)
1207* Sharding tests (splitting test cases in same test over multiple processes
1208 for speed)
1209* Re-running flaky tests
1210* Grouping tests into test suites
1211
1212Tests are regular configured targets that have a TestProvider, which describes
1213how the test should be run:
1214
1215* The artifacts whose building result in the test being run. This is a "cache
1216 status" file that contains a serialized `TestResultData` message
1217* The number of times the test should be run
1218* The number of shards the test should be split into
1219* Some parameters about how the test should be run (e.g. the test timeout)
1220
1221### Determining which tests to run
1222
1223Determining which tests are run is an elaborate process.
1224
1225First, during target pattern parsing, test suites are recursively expanded. The
1226expansion is implemented in `TestsForTargetPatternFunction`. A somewhat
1227surprising wrinkle is that if a test suite declares no tests, it refers to
1228_every_ test in its package. This is implemented in `Package.beforeBuild()` by
1229adding an implicit attribute called `$implicit_tests` to test suite rules.
1230
1231Then, tests are filtered for size, tags, timeout and language according to the
1232command line options. This is implemented in `TestFilter` and is called from
1233`TargetPatternPhaseFunction.determineTests()` during target parsing and the
1234result is put into `TargetPatternPhaseValue.getTestsToRunLabels()`. The reason
1235why rule attributes which can be filtered for are not configurable is that this
1236happens before the analysis phase, therefore, the configuration is not
1237available.
1238
1239This is then processed further in `BuildView.createResult()`: targets whose
1240analysis failed are filtered out and tests are split into exclusive and
1241non-exclusive tests. It's then put into `AnalysisResult`, which is how
1242`ExecutionTool` knows which tests to run.
1243
1244In order to lend some transparency to this elaborate process, the `tests()`
1245query operator (implemented in `TestsFunction`) is available to tell which tests
1246are run when a particular target is specified on the command line. It's
1247unfortunately a reimplementation, so it probably deviates from the above in
1248multiple subtle ways.
1249
1250### Running tests
1251
1252The way the tests are run is by requesting cache status artifacts. This then
1253results in the execution of a `TestRunnerAction`, which eventually calls the
1254`TestActionContext` chosen by the `--test_strategy` command line option that
1255runs the test in the requested way.
1256
1257Tests are run according to an elaborate protocol that uses environment variables
1258to tell tests what's expected from them. A detailed description of what Bazel
1259expects from tests and what tests can expect from Bazel is available
fwead37a372022-03-08 03:27:15 -08001260[here](https://bazel.build/reference/test-encyclopedia). At the
laurentlb4f2991c52020-08-12 11:37:32 -07001261simplest, an exit code of 0 means success, anything else means failure.
1262
1263In addition to the cache status file, each test process emits a number of other
1264files. They are put in the "test log directory" which is the subdirectory called
1265`testlogs` of the output directory of the target configuration:
1266
1267* `test.xml`, a JUnit-style XML file detailing the individual test cases in
1268 the test shard
1269* `test.log`, the console output of the test. stdout and stderr are not
1270 separated.
1271* `test.outputs`, the "undeclared outputs directory"; this is used by tests
1272 that want to output files in addition to what they print to the terminal.
1273
1274There are two things that can happen during test execution that cannot during
1275building regular targets: exclusive test execution and output streaming.
1276
1277Some tests need to be executed in exclusive mode, i.e. not in parallel with
1278other tests. This can be elicited either by adding `tags=["exclusive"]` to the
1279test rule or running the test with `--test_strategy=exclusive` . Each exclusive
1280test is run by a separate Skyframe invocation requesting the execution of the
1281test after the "main" build. This is implemented in
1282`SkyframeExecutor.runExclusiveTest()`.
1283
1284Unlike regular actions, whose terminal output is dumped when the action
1285finishes, the user can request the output of tests to be streamed so that they
1286get informed about the progress of a long-running test. This is specified by the
1287`--test_output=streamed` command line option and implies exclusive test
1288execution so that outputs of different tests are not interspersed.
1289
1290This is implemented in the aptly-named `StreamedTestOutput` class and works by
1291polling changes to the `test.log` file of the test in question and dumping new
1292bytes to the terminal where Bazel rules.
1293
1294Results of the executed tests are available on the event bus by observing
1295various events (e.g. `TestAttempt`, `TestResult` or `TestingCompleteEvent`).
1296They are dumped to the Build Event Protocol and they are emitted to the console
1297by `AggregatingTestListener`.
1298
1299### Coverage collection
1300
1301Coverage is reported by the tests in LCOV format in the files
1302`bazel-testlogs/$PACKAGE/$TARGET/coverage.dat` .
1303
1304To collect coverage, each test execution is wrapped in a script called
1305`collect_coverage.sh` .
1306
1307This script sets up the environment of the test to enable coverage collection
1308and determine where the coverage files are written by the coverage runtime(s).
1309It then runs the test. A test may itself run multiple subprocesses and consist
1310of parts written in multiple different programming languages (with separate
1311coverage collection runtimes). The wrapper script is responsible for converting
1312the resulting files to LCOV format if necessary, and merges them into a single
1313file.
1314
1315The interposition of `collect_coverage.sh` is done by the test strategies and
1316requires `collect_coverage.sh` to be on the inputs of the test. This is
1317accomplished by the implicit attribute `:coverage_support` which is resolved to
1318the value of the configuration flag `--coverage_support` (see
1319`TestConfiguration.TestOptions.coverageSupport`)
1320
1321Some languages do offline instrumentation, meaning that the coverage
1322instrumentation is added at compile time (e.g. C++) and others do online
1323instrumentation, meaning that coverage instrumentation is added at execution
1324time.
1325
1326Another core concept is _baseline coverage_. This is the coverage of a library,
1327binary, or test if no code in it was run. The problem it solves is that if you
1328want to compute the test coverage for a binary, it is not enough to merge the
1329coverage of all of the tests because there may be code in the binary that is not
1330linked into any test. Therefore, what we do is to emit a coverage file for every
1331binary which contains only the files we collect coverage for with no covered
1332lines. The baseline coverage file for a target is at
1333`bazel-testlogs/$PACKAGE/$TARGET/baseline_coverage.dat` . It is also generated
1334for binaries and libraries in addition to tests if you pass the
1335`--nobuild_tests_only` flag to Bazel.
1336
1337Baseline coverage is currently broken.
1338
1339We track two groups of files for coverage collection for each rule: the set of
1340instrumented files and the set of instrumentation metadata files.
1341
1342The set of instrumented files is just that, a set of files to instrument. For
1343online coverage runtimes, this can be used at runtime to decide which files to
1344instrument. It is also used to implement baseline coverage.
1345
1346The set of instrumentation metadata files is the set of extra files a test needs
1347to generate the LCOV files Bazel requires from it. In practice, this consists of
1348runtime-specific files; for example, gcc emits .gcno files during compilation.
1349These are added to the set of inputs of test actions if coverage mode is
1350enabled.
1351
1352Whether or not coverage is being collected is stored in the
1353`BuildConfiguration`. This is handy because it is an easy way to change the test
1354action and the action graph depending on this bit, but it also means that if
1355this bit is flipped, all targets need to be re-analyzed (some languages, e.g.
1356C++ require different compiler options to emit code that can collect coverage,
1357which mitigates this issue somewhat, since then a re-analysis is needed anyway).
1358
1359The coverage support files are depended on through labels in an implicit
1360dependency so that they can be overridden by the invocation policy, which allows
1361them to differ between the different versions of Bazel. Ideally, these
1362differences would be removed, and we standardized on one of them.
1363
1364We also generate a "coverage report" which merges the coverage collected for
1365every test in a Bazel invocation. This is handled by
1366`CoverageReportActionFactory` and is called from `BuildView.createResult()` . It
1367gets access to the tools it needs by looking at the `:coverage_report_generator`
1368attribute of the first test that is executed.
1369
1370## The query engine
1371
1372Bazel has a
fwead37a372022-03-08 03:27:15 -08001373[little language](https://bazel.build/docs/query-how-to)
laurentlb4f2991c52020-08-12 11:37:32 -07001374used to ask it various things about various graphs. The following query kinds
1375are provided:
1376
1377* `bazel query` is used to investigate the target graph
1378* `bazel cquery` is used to investigate the configured target graph
1379* `bazel aquery` is used to investigate the action graph
1380
jingwenf8b2d3b2020-10-02 06:35:24 -07001381Each of these is implemented by subclassing `AbstractBlazeQueryEnvironment`.
laurentlb4f2991c52020-08-12 11:37:32 -07001382Additional additional query functions can be done by subclassing `QueryFunction`
1383. In order to allow streaming query results, instead of collecting them to some
1384data structure, a `query2.engine.Callback` is passed to `QueryFunction`, which
1385calls it for results it wants to return.
1386
1387The result of a query can be emitted in various ways: labels, labels and rule
1388classes, XML, protobuf and so on. These are implemented as subclasses of
1389`OutputFormatter`.
1390
1391A subtle requirement of some query output formats (proto, definitely) is that
1392Bazel needs to emit _all _the information that package loading provides so that
1393one can diff the output and determine whether a particular target has changed.
1394As a consequence, attribute values need to be serializable, which is why there
1395are only so few attribute types without any attributes having complex Starlark
1396values. The usual workaround is to use a label, and attach the complex
1397information to the rule with that label. It's not a very satisfying workaround
1398and it would be very nice to lift this requirement.
1399
1400## The module system
1401
1402Bazel can be extended by adding modules to it. Each module must subclass
jingwenf8b2d3b2020-10-02 06:35:24 -07001403`BlazeModule` (the name is a relic of the history of Bazel when it used to be
1404called Blaze) and gets information about various events during the execution of
laurentlb4f2991c52020-08-12 11:37:32 -07001405a command.
1406
1407They are mostly used to implement various pieces of "non-core" functionality
1408that only some versions of Bazel (e.g. the one we use at Google) need:
1409
1410* Interfaces to remote execution systems
1411* New commands
1412
jingwenf8b2d3b2020-10-02 06:35:24 -07001413The set of extension points `BlazeModule` offers is somewhat haphazard. Don't
laurentlb4f2991c52020-08-12 11:37:32 -07001414use it as an example of good design principles.
1415
1416## The event bus
1417
jingwenf8b2d3b2020-10-02 06:35:24 -07001418The main way BlazeModules communicate with the rest of Bazel is by an event bus
laurentlb4f2991c52020-08-12 11:37:32 -07001419(`EventBus`): a new instance is created for every build, various parts of Bazel
1420can post events to it and modules can register listeners for the events they are
1421interested in. For example, the following things are represented as events:
1422
1423* The list of build targets to be built has been determined
1424 (`TargetParsingCompleteEvent`)
1425* The top-level configurations have been determined
1426 (`BuildConfigurationEvent`)
1427* A target was built, successfully or not (`TargetCompleteEvent`)
1428* A test was run (`TestAttempt`, `TestSummary`)
1429
1430Some of these events are represented outside of Bazel in the
fwead37a372022-03-08 03:27:15 -08001431[Build Event Protocol](https://bazel.build/docs/build-event-protocol)
jingwenf8b2d3b2020-10-02 06:35:24 -07001432(they are `BuildEvent`s). This allows not only `BlazeModule`s, but also things
laurentlb4f2991c52020-08-12 11:37:32 -07001433outside the Bazel process to observe the build. They are accessible either as a
1434file that contains protocol messages or Bazel can connect to a server (called
1435the Build Event Service) to stream events.
1436
1437This is implemented in the `build.lib.buildeventservice` and
1438`build.lib.buildeventstream` Java packages.
1439
1440## External repositories
1441
1442Whereas Bazel was originally designed to be used in a monorepo (a single source
1443tree containing everything one needs to build), Bazel lives in a world where
1444this is not necessarily true. "External repositories" are an abstraction used to
1445bridge these two worlds: they represent code that is necessary for the build but
1446is not in the main source tree.
1447
1448### The WORKSPACE file
1449
1450The set of external repositories is determined by parsing the WORKSPACE file.
1451For example, a declaration like this:
1452
1453```
1454 local_repository(name="foo", path="/foo/bar")
1455```
1456
1457Results in the repository called `@foo` being available. Where this gets
1458complicated is that one can define new repository rules in Starlark files, which
1459can then be used to load new Starlark code, which can be used to define new
1460repository rules and so on…
1461
1462To handle this case, the parsing of the WORKSPACE file (in
1463`WorkspaceFileFunction`) is split up into chunks delineated by `load()`
1464statements. The chunk index is indicated by `WorkspaceFileKey.getIndex()` and
1465computing `WorkspaceFileFunction` until index X means evaluating it until the
1466Xth `load()` statement.
1467
1468### Fetching repositories
1469
1470Before the code of the repository is available to Bazel, it needs to be
1471_fetched_. This results in Bazel creating a directory under
1472`$OUTPUT_BASE/external/<repository name>`.
1473
1474Fetching the repository happens in the following steps:
1475
14761. `PackageLookupFunction` realizes that it needs a repository and creates a
1477 `RepositoryName` as a `SkyKey`, which invokes `RepositoryLoaderFunction`
14782. `RepositoryLoaderFunction` forwards the request to
1479 `RepositoryDelegatorFunction` for unclear reasons (the code says it's to
1480 avoid re-downloading things in case of Skyframe restarts, but it's not a
1481 very solid reasoning)
14823. `RepositoryDelegatorFunction` finds out the repository rule it's asked to
1483 fetch by iterating over the chunks of the WORKSPACE file until the requested
1484 repository is found
14854. The appropriate `RepositoryFunction` is found that implements the repository
1486 fetching; it's either the Starlark implementation of the repository or a
1487 hard-coded map for repositories that are implemented in Java.
1488
1489There are various layers of caching since fetching a repository can be very
1490expensive:
1491
14921. There is a cache for downloaded files that is keyed by their checksum
1493 (`RepositoryCache`). This requires the checksum to be available in the
1494 WORKSPACE file, but that's good for hermeticity anyway. This is shared by
1495 every Bazel server instance on the same workstation, regardless of which
1496 workspace or output base they are running in.
14972. A "marker file" is written for each repository under `$OUTPUT_BASE/external`
1498 that contains a checksum of the rule that was used to fetch it. If the Bazel
1499 server restarts but the checksum does not change, it's not re-fetched. This
1500 is implemented in `RepositoryDelegatorFunction.DigestWriter` .
15013. The `--distdir` command line option designates another cache that is used to
1502 look up artifacts to be downloaded. This is useful in enterprise settings
1503 where Bazel should not fetch random things from the Internet. This is
1504 implemented by `DownloadManager` .
1505
1506Once a repository is downloaded, the artifacts in it are treated as source
1507artifacts. This poses a problem because Bazel usually checks for up-to-dateness
1508of source artifacts by calling stat() on them, and these artifacts are also
1509invalidated when the definition of the repository they are in changes. Thus,
1510`FileStateValue`s for an artifact in an external repository need to depend on
1511their external repository. This is handled by `ExternalFilesHelper`.
1512
1513### Managed directories
1514
1515Sometimes, external repositories need to modify files under the workspace root
1516(e.g. a package manager that houses the downloaded packages in a subdirectory of
1517the source tree). This is at odds with the assumption Bazel makes that source
1518files are only modified by the user and not by itself and allows packages to
1519refer to every directory under the workspace root. In order to make this kind of
1520external repository work, Bazel does two things:
1521
15221. Allows the user to specify subdirectories of the workspace Bazel is not
1523 allowed to reach into. They are listed in a file called `.bazelignore` and
1524 the functionality is implemented in `BlacklistedPackagePrefixesFunction`.
15252. We encode the mapping from the subdirectory of the workspace to the external
1526 repository it is handled by into `ManagedDirectoriesKnowledge` and handle
1527 `FileStateValue`s referring to them in the same way as those for regular
1528 external repositories.
1529
1530### Repository mappings
1531
1532It can happen that multiple repositories want to depend on the same repository,
1533but in different versions (this is an instance of the "diamond dependency
1534problem"). For example, if two binaries in separate repositories in the build
1535want to depend on Guava, they will presumably both refer to Guava with labels
1536starting `@guava//` and expect that to mean different versions of it.
1537
1538Therefore, Bazel allows one to re-map external repository labels so that the
1539string `@guava//` can refer to one Guava repository (e.g. `@guava1//`) in the
1540repository of one binary and another Guava repository (e.g. `@guava2//`) the the
1541repository of the other.
1542
1543Alternatively, this can also be used to **join** diamonds. If a repository
1544depends on `@guava1//`, and another depends on `@guava2//`, repository mapping
1545allows one to re-map both repositories to use a canonical `@guava//` repository.
1546
1547The mapping is specified in the WORKSPACE file as the `repo_mapping` attribute
1548of individual repository definitions. It then appears in Skyframe as a member of
1549`WorkspaceFileValue`, where it is plumbed to:
1550
1551* `Package.Builder.repositoryMapping` which is used to transform label-valued
1552 attributes of rules in the package by
1553 `RuleClass.populateRuleAttributeValues()`
1554* `Package.repositoryMapping` which is used in the analysis phase (for
1555 resolving things like `$(location)` which are not parsed in the loading
1556 phase)
Xavier Bonaventurafbb19fb2021-06-02 09:53:05 -07001557* `BzlLoadFunction` for resolving labels in load() statements
laurentlb4f2991c52020-08-12 11:37:32 -07001558
1559## JNI bits
1560
1561The server of Bazel is_ mostly _written in Java. The exception is the parts that
1562Java cannot do by itself or couldn't do by itself when we implemented it. This
1563is mostly limited to interaction with the file system, process control and
1564various other low-level things.
1565
1566The C++ code lives under src/main/native and the Java classes with native
1567methods are:
1568
1569* `NativePosixFiles` and `NativePosixFileSystem`
1570* `ProcessUtils`
1571* `WindowsFileOperations` and `WindowsFileProcesses`
1572* `com.google.devtools.build.lib.platform`
1573
1574## Console output
1575
1576Emitting console output seems like a simple thing, but the confluence of running
1577multiple processes (sometimes remotely), fine-grained caching, the desire to
1578have a nice and colorful terminal output and having a long-running server makes
1579it non-trivial.
1580
1581Right after the RPC call comes in from the client, two `RpcOutputStream`
1582instances are created (for stdout and stderr) that forward the data printed into
1583them to the client. These are then wrapped in an `OutErr` (an (stdout, stderr)
1584pair). Anything that needs to be printed on the console goes through these
1585streams. Then these streams are handed over to
jingwenf8b2d3b2020-10-02 06:35:24 -07001586`BlazeCommandDispatcher.execExclusively()`.
laurentlb4f2991c52020-08-12 11:37:32 -07001587
1588Output is by default printed with ANSI escape sequences. When these are not
1589desired (`--color=no`), they are stripped by an `AnsiStrippingOutputStream`. In
1590addition, `System.out` and `System.err` are redirected to these output streams.
1591This is so that debugging information can be printed using
1592`System.err.println()` and still end up in the terminal output of the client
1593(which is different from that of the server). Care is taken that if a process
1594produces binary output (e.g. `bazel query --output=proto`), no munging of stdout
1595takes place.
1596
1597Short messages (errors, warnings and the like) are expressed through the
1598`EventHandler` interface. Notably, these are different from what one posts to
1599the `EventBus` (this is confusing). Each `Event` has an `EventKind` (error,
1600warning, info, and a few others) and they may have a `Location` (the place in
1601the source code that caused the event to happen).
1602
1603Some `EventHandler` implementations store the events they received. This is used
1604to replay information to the UI caused by various kinds of cached processing,
1605for example, the warnings emitted by a cached configured target.
1606
1607Some `EventHandler`s also allow posting events that eventually find their way to
1608the event bus (regular `Event`s do _not _appear there). These are
1609implementations of `ExtendedEventHandler` and their main use is to replay cached
1610`EventBus` events. These `EventBus` events all implement `Postable`, but not
1611everything that is posted to `EventBus` necessarily implements this interface;
1612only those that are cached by an `ExtendedEventHandler` (it would be nice and
1613most of the things do; it's not enforced, though)
1614
1615Terminal output is _mostly_ emitted through `UiEventHandler`, which is
1616responsible for all the fancy output formatting and progress reporting Bazel
1617does. It has two inputs:
1618
1619* The event bus
1620* The event stream piped into it through Reporter
1621
1622The only direct connection the command execution machinery (i.e. the rest of
1623Bazel) has to the RPC stream to the client is through `Reporter.getOutErr()`,
1624which allows direct access to these streams. It's only used when a command needs
1625to dump large amounts of possible binary data (e.g. `bazel query`).
1626
1627## Profiling Bazel
1628
1629Bazel is fast. Bazel is also slow, because builds tend to grow until just the
1630edge of what's bearable. For this reason, Bazel includes a profiler which can be
1631used to profile builds and Bazel itself. It's implemented in a class that's
1632aptly named `Profiler`. It's turned on by default, although it records only
1633abridged data so that its overhead is tolerable; The command line
1634`--record_full_profiler_data` makes it record everything it can.
1635
1636It emits a profile in the Chrome profiler format; it's best viewed in Chrome.
1637It's data model is that of task stacks: one can start tasks and end tasks and
1638they are supposed to be neatly nested within each other. Each Java thread gets
1639its own task stack. **TODO:** How does this work with actions and
1640continuation-passing style?
1641
jingwenf8b2d3b2020-10-02 06:35:24 -07001642The profiler is started and stopped in `BlazeRuntime.initProfiler()` and
1643`BlazeRuntime.afterCommand()` respectively and attempts to be live for as long
laurentlb4f2991c52020-08-12 11:37:32 -07001644as possible so that we can profile everything. To add something to the profile,
1645call `Profiler.instance().profile()`. It returns a `Closeable`, whose closure
1646represents the end of the task. It's best used with try-with-resources
1647statements.
1648
1649We also do rudimentary memory profiling in `MemoryProfiler`. It's also always on
1650and it mostly records maximum heap sizes and GC behavior.
1651
1652## Testing Bazel
1653
1654Bazel has two main kinds of tests: ones that observe Bazel as a "black box" and
1655ones that only run the analysis phase. We call the former "integration tests"
1656and the latter "unit tests", although they are more like integration tests that
1657are, well, less integrated. We also have some actual unit tests, where they are
1658necessary.
1659
1660Of integration tests, we have two kinds:
1661
16621. Ones implemented using a very elaborate bash test framework under
1663 `src/test/shell`
16642. Ones implemented in Java. These are implemented as subclasses of
dacekf474a3b2022-01-11 08:22:04 -08001665 'BuildIntegrationTestCase'
laurentlb4f2991c52020-08-12 11:37:32 -07001666
dacekd72ae002022-01-10 09:13:33 -08001667`BuildIntegrationTestCase` is the preferred integration testing framework as it
1668is well-equipped for most testing scenarios. As it is a Java framework, it
1669provides debuggability and seamless integration with many common development
1670tools. There are many examples of `BuildIntegrationTestCase` classes in the
1671Bazel repository.
laurentlb4f2991c52020-08-12 11:37:32 -07001672
1673Analysis tests are implemented as subclasses of `BuildViewTestCase`. There is a
1674scratch file system you can use to write BUILD files, then various helper
1675methods can request configured targets, change the configuration and assert
1676various things about the result of the analysis.