laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1 | # The Bazel Code Base |
| 2 | |
| 3 | This document is a description of the code base and how Bazel is structured. It |
| 4 | is intended for people willing to contribute to Bazel, not for end-users. |
| 5 | |
| 6 | ## Introduction |
| 7 | |
| 8 | The code base of Bazel is large (~350KLOC production code and ~260 KLOC test |
| 9 | code) and no one is familiar with the whole landscape: everyone knows their |
| 10 | particular valley very well, but few know what lies over the hills in every |
| 11 | direction. |
| 12 | |
| 13 | In order for people midway upon the journey not to find themselves within a |
| 14 | forest dark with the straightforward pathway being lost, this document tries to |
| 15 | give an overview of the code base so that it's easier to get started with |
| 16 | working on it. |
| 17 | |
| 18 | The public version of the source code of Bazel lives on GitHub at |
| 19 | http://github.com/bazelbuild/bazel . This is not the “source of truth”; it’s |
| 20 | derived from a Google-internal source tree that contains additional |
| 21 | functionality that is not useful outside Google. The long term goal is to make |
| 22 | GitHub the source of truth. |
| 23 | |
| 24 | Contributions are accepted through the regular GitHub pull request mechanism, |
| 25 | and manually imported by a Googler into the internal source tree, then |
| 26 | re-exported back out to GitHub. |
| 27 | |
| 28 | ## Client/server architecture |
| 29 | |
| 30 | The bulk of Bazel resides in a server process that stays in RAM between builds. |
| 31 | This allows Bazel to maintain state between builds. |
| 32 | |
| 33 | This is why the Bazel command line has two kinds of options: startup and |
| 34 | command. In a command line like this: |
| 35 | |
| 36 | ``` |
| 37 | bazel --host_jvm_args=-Xmx8G build -c opt //foo:bar |
| 38 | ``` |
| 39 | |
| 40 | Some options (`--host_jvm_args=`) are before the name of the command to be run |
| 41 | and some are after (`-c opt`); the former kind is called a "startup option" and |
| 42 | affects the server process as a whole, whereas the latter kind, the "command |
| 43 | option", only affects a single command. |
| 44 | |
| 45 | Each server instance has a single associated source tree ("workspace") and each |
| 46 | workspace usually has a single active server instance. This can be circumvented |
| 47 | by specifying a custom output base (see the "Directory layout" section for more |
| 48 | information). |
| 49 | |
| 50 | Bazel is distributed as a single ELF executable that is also a valid .zip file. |
| 51 | When you type `bazel`, the above ELF executable implemented in C++ (the |
| 52 | "client") gets control. It sets up an appropriate server process using the |
| 53 | following steps: |
| 54 | |
| 55 | 1. Checks whether it has already extracted itself. If not, it does that. This |
| 56 | is where the implementation of the server comes from. |
| 57 | 2. Checks whether there is an active server instance that works: it is running, |
| 58 | it has the right startup options and uses the right workspace directory. It |
| 59 | finds the running server by looking at the directory `$OUTPUT_BASE/server` |
| 60 | where there is a lock file with the port the server is listening on. |
| 61 | 3. If needed, kills the old server process |
| 62 | 4. If needed, starts up a new server process |
| 63 | |
| 64 | After a suitable server process is ready, the command that needs to be run is |
| 65 | communicated to it over a gRPC interface, then the output of Bazel is piped back |
| 66 | to the terminal. Only one command can be running at the same time. This is |
| 67 | implemented using an elaborate locking mechanism with parts in C++ and parts in |
| 68 | Java. There is some infrastructure for running multiple commands in parallel, |
| 69 | since the inability to run e.g. `bazel version` in parallel with another command |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 70 | is somewhat embarrassing. The main blocker is the life cycle of `BlazeModule`s |
| 71 | and some state in `BlazeRuntime`. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 72 | |
| 73 | At the end of a command, the Bazel server transmits the exit code the client |
| 74 | should return. An interesting wrinkle is the implementation of `bazel run`: the |
| 75 | job of this command is to run something Bazel just built, but it can't do that |
| 76 | from the server process because it doesn't have a terminal. So instead it tells |
| 77 | the client what binary it should exec() and with what arguments. |
| 78 | |
| 79 | When one presses Ctrl-C, the client translates it to a Cancel call on the gRPC |
| 80 | connection, which tries to terminate the command as soon as possible. After the |
| 81 | third Ctrl-C, the client sends a SIGKILL to the server instead. |
| 82 | |
| 83 | The source code of the client is under `src/main/cpp` and the protocol used to |
| 84 | communicate with the server is in `src/main/protobuf/command_server.proto` . |
| 85 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 86 | The main entry point of the server is `BlazeRuntime.main()` and the gRPC calls |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 87 | from the client are handled by `GrpcServerImpl.run()`. |
| 88 | |
| 89 | ## Directory layout |
| 90 | |
| 91 | Bazel creates a somewhat complicated set of directories during a build. A full |
| 92 | description is available |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 93 | [here](https://bazel.build/docs/output_directories). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 94 | |
| 95 | The "workspace" is the source tree Bazel is run in. It usually corresponds to |
| 96 | something you checked out from source control. |
| 97 | |
| 98 | Bazel puts all of its data under the "output user root". This is usually |
| 99 | `$HOME/.cache/bazel/_bazel_${USER}`, but can be overridden using the |
| 100 | `--output_user_root` startup option. |
| 101 | |
| 102 | The "install base" is where Bazel is extracted to. This is done automatically |
| 103 | and each Bazel version gets a subdirectory based on its checksum under the |
| 104 | install base. It's at `$OUTPUT_USER_ROOT/install` by default and can be changed |
| 105 | using the `--install_base` command line option. |
| 106 | |
| 107 | The "output base" is the place where the Bazel instance attached to a specific |
| 108 | workspace writes to. Each output base has at most one Bazel server instance |
| 109 | running at any time. It's usually at `$OUTPUT_USER_ROOT/<checksum of the path |
| 110 | to the workspace>`. It can be changed using the `--output_base` startup option, |
| 111 | which is, among other things, useful for getting around the limitation that only |
| 112 | one Bazel instance can be running in any workspace at any given time. |
| 113 | |
| 114 | The output directory contains, among other things: |
| 115 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 116 | * The fetched external repositories at `$OUTPUT_BASE/external`. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 117 | * The exec root, i.e. a directory that contains symlinks to all the source |
| 118 | code for the current build. It's located at `$OUTPUT_BASE/execroot`. During |
| 119 | the build, the working directory is `$EXECROOT/<name of main |
| 120 | repository>`. We are planning to change this to `$EXECROOT`, although it's a |
| 121 | long term plan because it's a very incompatible change. |
| 122 | * Files built during the build. |
| 123 | |
| 124 | ## The process of executing a command |
| 125 | |
| 126 | Once the Bazel server gets control and is informed about a command it needs to |
| 127 | execute, the following sequence of events happens: |
| 128 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 129 | 1. `BlazeCommandDispatcher` is informed about the new request. It decides |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 130 | whether the command needs a workspace to run in (almost every command except |
| 131 | for ones that don't have anything to do with source code, e.g. version or |
| 132 | help) and whether another command is running. |
| 133 | |
| 134 | 2. The right command is found. Each command must implement the interface |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 135 | `BlazeCommand` and must have the `@Command` annotation (this is a bit of an |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 136 | antipattern, it would be nice if all the metadata a command needs was |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 137 | described by methods on `BlazeCommand`) |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 138 | |
| 139 | 3. The command line options are parsed. Each command has different command line |
| 140 | options, which are described in the `@Command` annotation. |
| 141 | |
| 142 | 4. An event bus is created. The event bus is a stream for events that happen |
| 143 | during the build. Some of these are exported to outside of Bazel under the |
| 144 | aegis of the Build Event Protocol in order to tell the world how the build |
| 145 | goes. |
| 146 | |
| 147 | 5. The command gets control. The most interesting commands are those that run a |
| 148 | build: build, test, run, coverage and so on: this functionality is |
| 149 | implemented by `BuildTool`. |
| 150 | |
| 151 | 6. The set of target patterns on the command line is parsed and wildcards like |
| 152 | `//pkg:all` and `//pkg/...` are resolved. This is implemented in |
| 153 | `AnalysisPhaseRunner.evaluateTargetPatterns()` and reified in Skyframe as |
| 154 | `TargetPatternPhaseValue`. |
| 155 | |
| 156 | 7. The loading/analysis phase is run to produce the action graph (a directed |
| 157 | acyclic graph of commands that need to be executed for the build). |
| 158 | |
| 159 | 8. The execution phase is run. This means running every action required to |
| 160 | build the top-level targets that are requested are run. |
| 161 | |
| 162 | ## Command line options |
| 163 | |
| 164 | The command line options for a Bazel invocation are described in an |
| 165 | `OptionsParsingResult` object, which in turn contains a map from "option |
| 166 | classes" to the values of the options. An "option class" is a subclass of |
| 167 | `OptionsBase` and groups command line options together that are related to each |
| 168 | other. For example: |
| 169 | |
| 170 | 1. Options related to a programming language (`CppOptions` or `JavaOptions`). |
| 171 | These should be a subclass of `FragmentOptions` and are eventually wrapped |
| 172 | into a `BuildOptions` object. |
| 173 | 2. Options related to the way Bazel executes actions (`ExecutionOptions`) |
| 174 | |
| 175 | These options are designed to be consumed in the analysis phase and (either |
| 176 | through `RuleContext.getFragment()` in Java or `ctx.fragments` in Starlark). |
| 177 | Some of them (for example, whether to do C++ include scanning or not) are read |
| 178 | in the execution phase, but that always requires explicit plumbing since |
| 179 | `BuildConfiguration` is not available then. For more information, see the |
| 180 | section “Configurations”. |
| 181 | |
| 182 | **WARNING:** We like to pretend that `OptionsBase` instances are immutable and |
| 183 | use them that way (e.g. as part of `SkyKeys`). This is not the case and |
| 184 | modifying them is a really good way to break Bazel in subtle ways that are hard |
| 185 | to debug. Unfortunately, making them actually immutable is a large endeavor. |
| 186 | (Modifying a `FragmentOptions` immediately after construction before anyone else |
| 187 | gets a chance to keep a reference to it and before `equals()` or `hashCode()` is |
| 188 | called on it is okay.) |
| 189 | |
| 190 | Bazel learns about option classes in the following ways: |
| 191 | |
| 192 | 1. Some are hard-wired into Bazel (`CommonCommandOptions`) |
| 193 | 2. From the @Command annotation on each Bazel command |
| 194 | 3. From `ConfiguredRuleClassProvider` (these are command line options related |
| 195 | to individual programming languages) |
| 196 | 4. Starlark rules can also define their own options (see |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 197 | [here](https://bazel.build/rules/config)) |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 198 | |
| 199 | Each option (excluding Starlark-defined options) is a member variable of a |
| 200 | `FragmentOptions` subclass that has the `@Option` annotation, which specifies |
| 201 | the name and the type of the command line option along with some help text. |
| 202 | |
| 203 | The Java type of the value of a command line option is usually something simple |
| 204 | (a string, an integer, a Boolean, a label, etc.). However, we also support |
| 205 | options of more complicated types; in this case, the job of converting from the |
| 206 | command line string to the data type falls to an implementation of |
| 207 | `com.google.devtools.common.options.Converter` . |
| 208 | |
| 209 | ## The source tree, as seen by Bazel |
| 210 | |
| 211 | Bazel is in the business of building software, which happens by reading and |
| 212 | interpreting the source code. The totality of the source code Bazel operates on |
| 213 | is called "the workspace" and it is structured into repositories, packages and |
| 214 | rules. A description of these concepts for the users of Bazel is available |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 215 | [here](https://bazel.build/concepts/build-ref). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 216 | |
| 217 | ### Repositories |
| 218 | |
| 219 | A "repository" is a source tree on which a developer works; it usually |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 220 | represents a single project. Bazel's ancestor, Blaze, operated on a monorepo, |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 221 | i.e. a single source tree that contains all source code used to run the build. |
| 222 | Bazel, in contrast, supports projects whose source code spans multiple |
| 223 | repositories. The repository from which Bazel is invoked is called the “main |
| 224 | repository”, the others are called “external repositories”. |
| 225 | |
| 226 | A repository is marked by a file called `WORKSPACE` (or `WORKSPACE.bazel`) in |
| 227 | its root directory. This file contains information that is "global" to the whole |
| 228 | build, for example, the set of available external repositories. It works like a |
| 229 | regular Starlark file which means that one can `load()` other Starlark files. |
| 230 | This is commonly used to pull in repositories that are needed by a repository |
| 231 | that's explicitly referenced (we call this the "`deps.bzl` pattern") |
| 232 | |
| 233 | Code of external repositories is symlinked or downloaded under |
| 234 | `$OUTPUT_BASE/external`. |
| 235 | |
| 236 | When running the build, the whole source tree needs to be pieced together; this |
| 237 | is done by SymlinkForest, which symlinks every package in the main repository to |
| 238 | `$EXECROOT` and every external repository to either `$EXECROOT/external` or |
| 239 | `$EXECROOT/..` (the former of course makes it impossible to have a package |
| 240 | called `external` in the main repository; that's why we are migrating away from |
| 241 | it) |
| 242 | |
| 243 | ### Packages |
| 244 | |
| 245 | Every repository is composed of packages, i.e. a collection of related files and |
| 246 | a specification of the dependencies. These are specified by a file called |
| 247 | `BUILD` or `BUILD.bazel`. If both exist, Bazel prefers `BUILD.bazel`; the reason |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 248 | why BUILD files are still accepted is that Bazel’s ancestor, Blaze, used this |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 249 | file name. However, it turned out to be a commonly used path segment, especially |
| 250 | on Windows, where file names are case-insensitive. |
| 251 | |
| 252 | Packages are independent of each other: changes to the BUILD file of a package |
| 253 | cannot cause other packages to change. The addition or removal of BUILD files |
| 254 | _can _change other packages, since recursive globs stop at package boundaries |
| 255 | and thus the presence of a BUILD file stops the recursion. |
| 256 | |
| 257 | The evaluation of a BUILD file is called "package loading". It's implemented in |
| 258 | the class `PackageFactory`, works by calling the Starlark interpreter and |
| 259 | requires knowledge of the set of available rule classes. The result of package |
| 260 | loading is a `Package` object. It's mostly a map from a string (the name of a |
| 261 | target) to the target itself. |
| 262 | |
| 263 | A large chunk of complexity during package loading is globbing: Bazel does not |
| 264 | require every source file to be explicitly listed and instead can run globs |
| 265 | (e.g. `glob(["**/*.java"])`). Unlike the shell, it supports recursive globs that |
| 266 | descend into subdirectories (but not into subpackages). This requires access to |
| 267 | the file system and since that can be slow, we implement all sorts of tricks to |
| 268 | make it run in parallel and as efficiently as possible. |
| 269 | |
| 270 | Globbing is implemented in the following classes: |
| 271 | |
| 272 | * `LegacyGlobber`, a fast and blissfully Skyframe-unaware globber |
| 273 | * `SkyframeHybridGlobber`, a version that uses Skyframe and reverts back to |
| 274 | the legacy globber in order to avoid “Skyframe restarts” (described below) |
| 275 | |
| 276 | The `Package` class itself contains some members that are exclusively used to |
| 277 | parse the WORKSPACE file and which do not make sense for real packages. This is |
| 278 | a design flaw because objects describing regular packages should not contain |
| 279 | fields that describe something else. These include: |
| 280 | |
| 281 | * The repository mappings |
| 282 | * The registered toolchains |
| 283 | * The registered execution platforms |
| 284 | |
| 285 | Ideally, there would be more separation between parsing the WORKSPACE file from |
| 286 | parsing regular packages so that `Package`does not need to cater for the needs |
| 287 | of both. This is unfortunately difficult to do because the two are intertwined |
| 288 | quite deeply. |
| 289 | |
| 290 | ### Labels, Targets and Rules |
| 291 | |
| 292 | Packages are composed of targets, which have the following types: |
| 293 | |
| 294 | 1. **Files:** things that are either the input or the output of the build. In |
| 295 | Bazel parlance, we call them _artifacts_ (discussed elsewhere). Not all |
| 296 | files created during the build are targets; it’s common for an output of |
| 297 | Bazel not to have an associated label. |
| 298 | 2. **Rules:** these describe steps to derive its outputs from its inputs. They |
| 299 | are generally associated with a programming language (e.g. `cc_library`, |
| 300 | `java_library` or `py_library`), but there are some language-agnostic ones |
| 301 | (e.g. `genrule` or `filegroup`) |
| 302 | 3. **Package groups:** discussed in the [Visibility](#visibility) section. |
| 303 | |
| 304 | The name of a target is called a _Label_. The syntax of labels is |
| 305 | `@repo//pac/kage:name`, where `repo` is the name of the repository the Label is |
| 306 | in, `pac/kage` is the directory its BUILD file is in and `name` is the path of |
| 307 | the file (if the label refers to a source file) relative to the directory of the |
| 308 | package. When referring to a target on the command line, some parts of the label |
| 309 | can be omitted: |
| 310 | |
| 311 | 1. If the repository is omitted, the label is taken to be in the main |
| 312 | repository. |
| 313 | 2. If the package part is omitted (e.g. `name` or `:name`), the label is taken |
| 314 | to be in the package of the current working directory (relative paths |
| 315 | containing uplevel references (..) are not allowed) |
| 316 | |
| 317 | A kind of a rule (e.g. "C++ library") is called a "rule class". Rule classes may |
| 318 | be implemented either in Starlark (the `rule()` function) or in Java (so called |
| 319 | “native rules”, type `RuleClass`). In the long term, every language-specific |
| 320 | rule will be implemented in Starlark, but some legacy rule families (e.g. Java |
| 321 | or C++) are still in Java for the time being. |
| 322 | |
| 323 | Starlark rule classes need to be imported at the beginning of BUILD files using |
| 324 | the `load()` statement, whereas Java rule classes are "innately" known by Bazel, |
| 325 | by virtue of being registered with the `ConfiguredRuleClassProvider`. |
| 326 | |
| 327 | Rule classes contain information such as: |
| 328 | |
| 329 | 1. Its attributes (e.g., `srcs`, `deps`): their types, default values, |
| 330 | constraints, etc. |
| 331 | 2. The configuration transitions and aspects attached to each attribute, if any |
| 332 | 3. The implementation of the rule |
| 333 | 4. The transitive info providers the rule "usually" creates |
| 334 | |
| 335 | **Terminology note:** In the code base, we often use “Rule” to mean the target |
| 336 | created by a rule class. But in Starlark and in user-facing documentation, |
| 337 | “Rule” should be used exclusively to refer to the rule class itself; the target |
| 338 | is just a “target”. Also note that despite `RuleClass` having “class” in its |
| 339 | name, there is no Java inheritance relationship between a rule class and targets |
| 340 | of that type. |
| 341 | |
| 342 | ## Skyframe |
| 343 | |
| 344 | The evaluation framework underlying Bazel is called Skyframe. Its model is that |
| 345 | everything that needs to be built during a build is organized into a directed |
| 346 | acyclic graph with edges pointing from any pieces of data to its dependencies, |
| 347 | that is, other pieces of data that need to be known to construct it. |
| 348 | |
| 349 | The nodes in the graph are called `SkyValue`s and their names are called |
| 350 | `SkyKey`s. Both are deeply immutable, i.e. only immutable objects should be |
| 351 | reachable from them. This invariant almost always holds, and in case it doesn't |
| 352 | (e.g. for the individual options classes `BuildOptions`, which is a member of |
| 353 | `BuildConfigurationValue` and its `SkyKey`) we try really hard not to change |
| 354 | them or to change them in only ways that are not observable from the outside. |
| 355 | From this it follows that everything that is computed within Skyframe (e.g. |
| 356 | configured targets) must also be immutable. |
| 357 | |
| 358 | The most convenient way to observe the Skyframe graph is to run `bazel dump |
| 359 | --skyframe=detailed`, which dumps the graph, one `SkyValue` per line. It's best |
| 360 | to do it for tiny builds, since it can get pretty large. |
| 361 | |
| 362 | Skyframe lives in the `com.google.devtools.build.skyframe` package. The |
| 363 | similarly-named package `com.google.devtools.build.lib.skyframe` contains the |
| 364 | implementation of Bazel on top of Skyframe. More information about Skyframe is |
| 365 | available [here](https://bazel.build/designs/skyframe.html). |
| 366 | |
| 367 | Generating a new `SkyValue` involves the following steps: |
| 368 | |
| 369 | 1. Running the associated `SkyFunction` |
| 370 | 2. Declaring the dependencies (i.e. `SkyValue`s) that the `SkyFunction` needs |
| 371 | to do its job. This is done by calling the various overloads of |
| 372 | `SkyFunction.Environment.getValue()`. |
| 373 | 3. If a dependency is not available, Skyframe signals that by returning null |
| 374 | from `getValue()`. In this case, the `SkyFunction` is expected to yield |
| 375 | control to Skyframe by returning null, then Skyframe evaluates the |
| 376 | dependencies that haven't been evaluated yet and calls the `SkyFunction` |
| 377 | again, thus going back to (1). |
| 378 | 4. Constructing the resulting `SkyValue` |
| 379 | |
| 380 | A consequence of this is that if not all dependencies are available in (3), the |
| 381 | function needs to be completely restarted and thus computation needs to be |
nharmata | 17e84b3 | 2022-01-05 10:57:58 -0800 | [diff] [blame] | 382 | re-done, which is obviously inefficient. `SkyFunction.Environment.getState()` |
| 383 | lets us directly work around this issue by having Skyframe maintain the |
| 384 | `SkyKeyComputeState` instance between calls to `SkyFunction.compute` for the |
| 385 | same `SkyKey`. Check out the example in the javadoc for |
| 386 | `SkyFunction.Environment.getState()`, as well as real usages in the Bazel |
| 387 | codebase. |
| 388 | |
| 389 | Other indirect workarounds: |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 390 | |
| 391 | 1. Declaring dependencies of `SkyFunction`s in groups so that if a function |
| 392 | has, say, 10 dependencies, it only needs to restart once instead of ten |
| 393 | times. |
| 394 | 2. Splitting `SkyFunction`s so that one function does not need to be restarted |
| 395 | many times. This has the side effect of interning data into Skyframe that |
| 396 | may be internal to the `SkyFunction`, thus increasing memory use. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 397 | |
nharmata | 17e84b3 | 2022-01-05 10:57:58 -0800 | [diff] [blame] | 398 | These are all just workarounds for the limitations of Skyframe, which |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 399 | is mostly a consequence of the fact that Java doesn't support lightweight |
| 400 | threads and that we routinely have hundreds of thousands of in-flight Skyframe |
| 401 | nodes. |
| 402 | |
| 403 | ## Starlark |
| 404 | |
| 405 | Starlark is the domain-specific language people use to configure and extend |
| 406 | Bazel. It's conceived as a restricted subset of Python that has far fewer types, |
| 407 | more restrictions on control flow, and most importantly, strong immutability |
| 408 | guarantees to enable concurrent reads. It is not Turing-complete, which |
| 409 | discourages some (but not all) users from trying to accomplish general |
| 410 | programming tasks within the language. |
| 411 | |
| 412 | Starlark is implemented in the `com.google.devtools.build.lib.syntax` package. |
| 413 | It also has an independent Go implementation |
| 414 | [here](https://github.com/google/starlark-go). The Java implementation used in |
| 415 | Bazel is currently an interpreter. |
| 416 | |
| 417 | Starlark is used in four contexts: |
| 418 | |
| 419 | 1. **The BUILD language.** This is where new rules are defined. Starlark code |
| 420 | running in this context only has access to the contents of the BUILD file |
| 421 | itself and Starlark files loaded by it. |
| 422 | 2. **Rule definitions.** This is how new rules (e.g. support for a new |
| 423 | language) are defined. Starlark code running in this context has access to |
| 424 | the configuration and data provided by its direct dependencies (more on this |
| 425 | later). |
| 426 | 3. **The WORKSPACE file.** This is where external repositories (code that's not |
| 427 | in the main source tree) are defined. |
| 428 | 4. **Repository rule definitions.** This is where new external repository types |
| 429 | are defined. Starlark code running in this context can run arbitrary code on |
| 430 | the machine where Bazel is running, and reach outside the workspace. |
| 431 | |
| 432 | The dialects available for BUILD and .bzl files are slightly different because |
| 433 | they express different things. A list of differences is available |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 434 | [here](https://bazel.build/rules/language#differences-between-build-and-bzl-files). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 435 | |
| 436 | More information about Starlark is available |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 437 | [here](https://bazel.build/rules/language). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 438 | |
| 439 | ## The loading/analysis phase |
| 440 | |
| 441 | The loading/analysis phase is where Bazel determines what actions are needed to |
| 442 | build a particular rule. Its basic unit is a "configured target", which is, |
| 443 | quite sensibly, a (target, configuration) pair. |
| 444 | |
| 445 | It's called the "loading/analysis phase" because it can be split into two |
| 446 | distinct parts, which used to be serialized, but they can now overlap in time: |
| 447 | |
| 448 | 1. Loading packages, that is, turning BUILD files into the `Package` objects |
| 449 | that represent them |
| 450 | 2. Analyzing configured targets, that is, running the implementation of the |
| 451 | rules to produce the action graph |
| 452 | |
| 453 | Each configured target in the transitive closure of the configured targets |
| 454 | requested on the command line must be analyzed bottom-up, i.e. leaf nodes first, |
| 455 | then up to the ones on the command line. The inputs to the analysis of a single |
| 456 | configured target are: |
| 457 | |
| 458 | 1. **The configuration.** ("how" to build that rule; for example, the target |
| 459 | platform but also things like command line options the user wants to be |
| 460 | passed to the C++ compiler) |
| 461 | 2. **The direct dependencies.** Their transitive info providers are available |
| 462 | to the rule being analyzed. They are called like that because they provide a |
| 463 | "roll-up" of the information in the transitive closure of the configured |
| 464 | target, e.g. all the .jar files on the classpath or all the .o files that |
| 465 | need to be linked into a C++ binary) |
| 466 | 3. **The target itself**. This is the result of loading the package the target |
| 467 | is in. For rules, this includes its attributes, which is usually what |
| 468 | matters. |
| 469 | 4. **The implementation of the configured target.** For rules, this can either |
| 470 | be in Starlark or in Java. All non-rule configured targets are implemented |
| 471 | in Java. |
| 472 | |
| 473 | The output of analyzing a configured target is: |
| 474 | |
| 475 | 1. The transitive info providers that configured targets that depend on it can |
| 476 | access |
| 477 | 2. The artifacts it can create and the actions that produce them. |
| 478 | |
| 479 | The API offered to Java rules is `RuleContext`, which is the equivalent of the |
| 480 | `ctx` argument of Starlark rules. Its API is more powerful, but at the same |
| 481 | time, it's easier to do Bad Things™, for example to write code whose time or |
| 482 | space complexity is quadratic (or worse), to make the Bazel server crash with a |
| 483 | Java exception or to violate invariants (e.g. by inadvertently modifying an |
| 484 | `Options` instance or by making a configured target mutable) |
| 485 | |
| 486 | The algorithm that determines the direct dependencies of a configured target |
| 487 | lives in `DependencyResolver.dependentNodeMap()`. |
| 488 | |
| 489 | ### Configurations |
| 490 | |
| 491 | Configurations are the "how" of building a target: for what platform, with what |
| 492 | command line options, etc. |
| 493 | |
| 494 | The same target can be built for multiple configurations in the same build. This |
| 495 | is useful, for example, when the same code is used for a tool that's run during |
| 496 | the build and for the target code and we are cross-compiling or when we are |
| 497 | building a fat Android app (one that contains native code for multiple CPU |
| 498 | architectures) |
| 499 | |
| 500 | Conceptually, the configuration is a `BuildOptions` instance. However, in |
| 501 | practice, `BuildOptions` is wrapped by `BuildConfiguration` that provides |
| 502 | additional sundry pieces of functionality. It propagates from the top of the |
| 503 | dependency graph to the bottom. If it changes, the build needs to be |
| 504 | re-analyzed. |
| 505 | |
| 506 | This results in anomalies like having to re-analyze the whole build if e.g. the |
| 507 | number of requested test runs changes, even though that only affects test |
| 508 | targets (we have plans to "trim" configurations so that this is not the case, |
| 509 | but it's not ready yet) |
| 510 | |
| 511 | When a rule implementation needs part of the configuration, it needs to declare |
| 512 | it in its definition using `RuleClass.Builder.requiresConfigurationFragments()` |
| 513 | . This is both to avoid mistakes (e.g. Python rules using the Java fragment) and |
| 514 | to facilitate configuration trimming so that e.g. if Python options change, C++ |
| 515 | targets don't need to be re-analyzed. |
| 516 | |
| 517 | The configuration of a rule is not necessarily the same as that of its "parent" |
| 518 | rule. The process of changing the configuration in a dependency edge is called a |
| 519 | "configuration transition". It can happen in two places: |
| 520 | |
| 521 | 1. On a dependency edge. These transitions are specified in |
| 522 | `Attribute.Builder.cfg()` and are functions from a `Rule` (where the |
| 523 | transition happens) and a `BuildOptions` (the original configuration) to one |
| 524 | or more `BuildOptions` (the output configuration). |
| 525 | 2. On any incoming edge to a configured target. These are specified in |
| 526 | `RuleClass.Builder.cfg()`. |
| 527 | |
| 528 | The relevant classes are `TransitionFactory` and `ConfigurationTransition`. |
| 529 | |
| 530 | Configuration transitions are used, for example: |
| 531 | |
| 532 | 1. To declare that a particular dependency is used during the build and it |
| 533 | should thus be built in the execution architecture |
| 534 | 2. To declare that a particular dependency must be built for multiple |
| 535 | architectures (e.g. for native code in fat Android APKs) |
| 536 | |
| 537 | If a configuration transition results in multiple configurations, it's called a |
| 538 | _split transition._ |
| 539 | |
| 540 | Configuration transitions can also be implemented in Starlark (documentation |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 541 | [here](https://bazel.build/rules/config)) |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 542 | |
| 543 | ### Transitive info providers |
| 544 | |
| 545 | Transitive info providers are a way (and the _only _way) for configured targets |
| 546 | to tell things about other configured targets that depend on it. The reason why |
| 547 | "transitive" is in their name is that this is usually some sort of roll-up of |
| 548 | the transitive closure of a configured target. |
| 549 | |
| 550 | There is generally a 1:1 correspondence between Java transitive info providers |
| 551 | and Starlark ones (the exception is `DefaultInfo` which is an amalgamation of |
| 552 | `FileProvider`, `FilesToRunProvider` and `RunfilesProvider` because that API was |
| 553 | deemed to be more Starlark-ish than a direct transliteration of the Java one). |
| 554 | Their key is one of the following things: |
| 555 | |
| 556 | 1. A Java Class object. This is only available for providers that are not |
| 557 | accessible from Starlark. These providers are a subclass of |
| 558 | `TransitiveInfoProvider`. |
| 559 | 2. A string. This is legacy and heavily discouraged since it's susceptible to |
| 560 | name clashes. Such transitive info providers are direct subclasses of |
| 561 | `build.lib.packages.Info` . |
| 562 | 3. A provider symbol. This can be created from Starlark using the `provider()` |
| 563 | function and is the recommended way to create new providers. The symbol is |
| 564 | represented by a `Provider.Key` instance in Java. |
| 565 | |
| 566 | New providers implemented in Java should be implemented using `BuiltinProvider`. |
| 567 | `NativeProvider` is deprecated (we haven't had time to remove it yet) and |
| 568 | `TransitiveInfoProvider` subclasses cannot be accessed from Starlark. |
| 569 | |
| 570 | ### Configured targets |
| 571 | |
| 572 | Configured targets are implemented as `RuleConfiguredTargetFactory`. There is a |
| 573 | subclass for each rule class implemented in Java. Starlark configured targets |
Xavier Bonaventura | fbb19fb | 2021-06-02 09:53:05 -0700 | [diff] [blame] | 574 | are created through `StarlarkRuleConfiguredTargetUtil.buildRule()` . |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 575 | |
| 576 | Configured target factories should use `RuleConfiguredTargetBuilder` to |
| 577 | construct their return value. It consists of the following things: |
| 578 | |
| 579 | 1. Their `filesToBuild`, i.e. the hazy concept of "the set of files this rule |
| 580 | represents". These are the files that get built when the configured target |
| 581 | is on the command line or in the srcs of a genrule. |
| 582 | 2. Their runfiles, regular and data. |
| 583 | 3. Their output groups. These are various "other sets of files" the rule can |
| 584 | build. They can be accessed using the output\_group attribute of the |
| 585 | filegroup rule in BUILD and using the `OutputGroupInfo` provider in Java. |
| 586 | |
| 587 | ### Runfiles |
| 588 | |
| 589 | Some binaries need data files to run. A prominent example is tests that need |
| 590 | input files. This is represented in Bazel by the concept of "runfiles". A |
| 591 | "runfiles tree" is a directory tree of the data files for a particular binary. |
| 592 | It is created in the file system as a symlink tree with individual symlinks |
| 593 | pointing to the files in the source of output trees. |
| 594 | |
| 595 | A set of runfiles is represented as a `Runfiles` instance. It is conceptually a |
| 596 | map from the path of a file in the runfiles tree to the `Artifact` instance that |
| 597 | represents it. It's a little more complicated than a single `Map` for two |
| 598 | reasons: |
| 599 | |
| 600 | * Most of the time, the runfiles path of a file is the same as its execpath. |
| 601 | We use this to save some RAM. |
| 602 | * There are various legacy kinds of entries in runfiles trees, which also need |
| 603 | to be represented. |
| 604 | |
| 605 | Runfiles are collected using `RunfilesProvider`: an instance of this class |
| 606 | represents the runfiles a configured target (e.g. a library) and its transitive |
| 607 | closure needs and they are gathered like a nested set (in fact, they are |
| 608 | implemented using nested sets under the cover): each target unions the runfiles |
| 609 | of its dependencies, adds some of its own, then sends the resulting set upwards |
| 610 | in the dependency graph. A `RunfilesProvider` instance contains two `Runfiles` |
| 611 | instances, one for when the rule is depended on through the "data" attribute and |
| 612 | one for every other kind of incoming dependency. This is because a target |
| 613 | sometimes presents different runfiles when depended on through a data attribute |
| 614 | than otherwise. This is undesired legacy behavior that we haven't gotten around |
| 615 | removing yet. |
| 616 | |
| 617 | Runfiles of binaries are represented as an instance of `RunfilesSupport`. This |
| 618 | is different from `Runfiles` because `RunfilesSupport` has the capability of |
| 619 | actually being built (unlike `Runfiles`, which is just a mapping). This |
| 620 | necessitates the following additional components: |
| 621 | |
| 622 | * **The input runfiles manifest.** This is a serialized description of the |
| 623 | runfiles tree. It is used as a proxy for the contents of the runfiles tree |
| 624 | and Bazel assumes that the runfiles tree changes if and only if the contents |
| 625 | of the manifest change. |
| 626 | * **The output runfiles manifest.** This is used by runtime libraries that |
| 627 | handle runfiles trees, notably on Windows, which sometimes doesn't support |
| 628 | symbolic links. |
| 629 | * **The runfiles middleman.** In order for a runfiles tree to exist, one needs |
| 630 | to build the symlink tree and the artifact the symlinks point to. In order |
| 631 | to decrease the number of dependency edges, the runfiles middleman can be |
| 632 | used to represent all these. |
| 633 | * **Command line arguments** for running the binary whose runfiles the |
| 634 | `RunfilesSupport` object represents. |
| 635 | |
| 636 | ### Aspects |
| 637 | |
| 638 | Aspects are a way to "propagate computation down the dependency graph". They are |
| 639 | described for users of Bazel |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 640 | [here](https://bazel.build/rules/aspects). A good |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 641 | motivating example is protocol buffers: a `proto_library` rule should not know |
| 642 | about any particular language, but building the implementation of a protocol |
| 643 | buffer message (the “basic unit” of protocol buffers) in any programming |
| 644 | language should be coupled to the `proto_library` rule so that if two targets in |
| 645 | the same language depend on the same protocol buffer, it gets built only once. |
| 646 | |
| 647 | Just like configured targets, they are represented in Skyframe as a `SkyValue` |
| 648 | and the way they are constructed is very similar to how configured targets are |
| 649 | built: they have a factory class called `ConfiguredAspectFactory` that has |
| 650 | access to a `RuleContext`, but unlike configured target factories, it also knows |
| 651 | about the configured target it is attached to and its providers. |
| 652 | |
| 653 | The set of aspects propagated down the dependency graph is specified for each |
| 654 | attribute using the `Attribute.Builder.aspects()` function. There are a few |
| 655 | confusingly-named classes that participate in the process: |
| 656 | |
| 657 | 1. `AspectClass` is the implementation of the aspect. It can be either in Java |
| 658 | (in which case it's a subclass) or in Starlark (in which case it's an |
Xavier Bonaventura | fbb19fb | 2021-06-02 09:53:05 -0700 | [diff] [blame] | 659 | instance of `StarlarkAspectClass`). It's analogous to |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 660 | `RuleConfiguredTargetFactory`. |
| 661 | 2. `AspectDefinition` is the definition of the aspect; it includes the |
| 662 | providers it requires, the providers it provides and contains a reference to |
| 663 | its implementation, i.e. the appropriate `AspectClass` instance. It's |
| 664 | analogous to `RuleClass`. |
| 665 | 3. `AspectParameters` is a way to parametrize an aspect that is propagated down |
| 666 | the dependency graph. It's currently a string to string map. A good example |
| 667 | of why it's useful is protocol buffers: if a language has multiple APIs, the |
| 668 | information as to which API the protocol buffers should be built for should |
| 669 | be propagated down the dependency graph. |
| 670 | 4. `Aspect` represents all the data that's needed to compute an aspect that |
| 671 | propagates down the dependency graph. It consists of the aspect class, its |
| 672 | definition and its parameters. |
| 673 | 5. `RuleAspect` is the function that determines which aspects a particular rule |
| 674 | should propagate. It's a `Rule` -> `Aspect` function. |
| 675 | |
| 676 | A somewhat unexpected complication is that aspects can attach to other aspects; |
| 677 | for example, an aspect collecting the classpath for a Java IDE will probably |
| 678 | want to know about all the .jar files on the classpath, but some of them are |
| 679 | protocol buffers. In that case, the IDE aspect will want to attach to the |
| 680 | (`proto_library` rule + Java proto aspect) pair. |
| 681 | |
| 682 | The complexity of aspects on aspects is captured in the class |
| 683 | `AspectCollection`. |
| 684 | |
| 685 | ### Platforms and toolchains |
| 686 | |
| 687 | Bazel supports multi-platform builds, that is, builds where there may be |
| 688 | multiple architectures where build actions run and multiple architectures for |
| 689 | which code is built. These architectures are referred to as _platforms_ in Bazel |
| 690 | parlance (full documentation |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 691 | [here](https://bazel.build/docs/platforms)) |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 692 | |
| 693 | A platform is described by a key-value mapping from _constraint settings_ (e.g. |
| 694 | the concept of "CPU architecture") to _constraint values_ (e.g. a particular CPU |
| 695 | like x86\_64). We have a "dictionary" of the most commonly used constraint |
| 696 | settings and values in the `@platforms` repository. |
| 697 | |
| 698 | The concept of _toolchain_ comes from the fact that depending on what platforms |
| 699 | the build is running on and what platforms are targeted, one may need to use |
| 700 | different compilers; for example, a particular C++ toolchain may run on a |
| 701 | specific OS and be able to target some other OSes. Bazel must determine the C++ |
| 702 | compiler that is used based on the set execution and target platform |
| 703 | (documentation for toolchains |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 704 | [here](https://bazel.build/docs/toolchains)). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 705 | |
| 706 | In order to do this, toolchains are annotated with the set of execution and |
| 707 | target platform constraints they support. In order to do this, the definition of |
| 708 | a toolchain are split into two parts: |
| 709 | |
| 710 | 1. A `toolchain()` rule that describes the set of execution and target |
| 711 | constraints a toolchain supports and tells what kind (e.g. C++ or Java) of |
| 712 | toolchain it is (the latter is represented by the `toolchain_type()` rule) |
| 713 | 2. A language-specific rule that describes the actual toolchain (e.g. |
| 714 | `cc_toolchain()`) |
| 715 | |
| 716 | This is done in this way because we need to know the constraints for every |
| 717 | toolchain in order to do toolchain resolution and language-specific |
| 718 | `*_toolchain()` rules contain much more information than that, so they take more |
| 719 | time to load. |
| 720 | |
| 721 | Execution platforms are specified in one of the following ways: |
| 722 | |
| 723 | 1. In the WORKSPACE file using the `register_execution_platforms()` function |
| 724 | 2. On the command line using the --extra\_execution\_platforms command line |
| 725 | option |
| 726 | |
| 727 | The set of available execution platforms is computed in |
| 728 | `RegisteredExecutionPlatformsFunction` . |
| 729 | |
| 730 | The target platform for a configured target is determined by |
| 731 | `PlatformOptions.computeTargetPlatform()` . It's a list of platforms because we |
| 732 | eventually want to support multiple target platforms, but it's not implemented |
| 733 | yet. |
| 734 | |
| 735 | The set of toolchains to be used for a configured target is determined by |
| 736 | `ToolchainResolutionFunction`. It is a function of: |
| 737 | |
| 738 | * The set of registered toolchains (in the WORKSPACE file and the |
| 739 | configuration) |
| 740 | * The desired execution and target platforms (in the configuration) |
| 741 | * The set of toolchain types that are required by the configured target (in |
| 742 | `UnloadedToolchainContextKey)` |
| 743 | * The set of execution platform constraints of the configured target (the |
| 744 | `exec_compatible_with` attribute) and the configuration |
| 745 | (`--experimental_add_exec_constraints_to_targets`), in |
| 746 | `UnloadedToolchainContextKey` |
| 747 | |
| 748 | Its result is an `UnloadedToolchainContext`, which is essentially a map from |
| 749 | toolchain type (represented as a `ToolchainTypeInfo` instance) to the label of |
| 750 | the selected toolchain. It's called "unloaded" because it does not contain the |
| 751 | toolchains themselves, only their labels. |
| 752 | |
| 753 | Then the toolchains are actually loaded using `ResolvedToolchainContext.load()` |
| 754 | and used by the implementation of the configured target that requested them. |
| 755 | |
| 756 | We also have a legacy system that relies on there being one single "host" |
| 757 | configuration and target configurations being represented by various |
| 758 | configuration flags, e.g. `--cpu` . We are gradually transitioning to the above |
| 759 | system. In order to handle cases where people rely on the legacy configuration |
| 760 | values, we have implemented |
| 761 | "[platform mappings](https://docs.google.com/document/d/1Vg_tPgiZbSrvXcJ403vZVAGlsWhH9BUDrAxMOYnO0Ls)" |
| 762 | to translate between the legacy flags and the new-style platform constraints. |
| 763 | Their code is in `PlatformMappingFunction` and uses a non-Starlark "little |
| 764 | language". |
| 765 | |
| 766 | ### Constraints |
| 767 | |
| 768 | Sometimes one wants to designate a target as being compatible with only a few |
| 769 | platforms. Bazel has (unfortunately) multiple mechanisms to achieve this end: |
| 770 | |
| 771 | * Rule-specific constraints |
| 772 | * `environment_group()` / `environment()` |
| 773 | * Platform constraints |
| 774 | |
| 775 | Rule-specific constraints are mostly used within Google for Java rules; they are |
| 776 | on their way out and they are not available in Bazel, but the source code may |
| 777 | contain references to it. The attribute that governs this is called |
| 778 | `constraints=` . |
| 779 | |
| 780 | #### environment_group() and environment() |
| 781 | |
| 782 | These rules are a legacy mechanism and are not widely used. |
| 783 | |
| 784 | All build rules can declare which "environments" they can be built for, where a |
| 785 | "environment" is an instance of the `environment()` rule. |
| 786 | |
| 787 | There are various ways supported environments can be specified for a rule: |
| 788 | |
| 789 | 1. Through the `restricted_to=` attribute. This is the most direct form of |
| 790 | specification; it declares the exact set of environments the rule supports |
| 791 | for this group. |
| 792 | 2. Through the `compatible_with=` attribute. This declares environments a rule |
| 793 | supports in addition to "standard" environments that are supported by |
| 794 | default. |
| 795 | 3. Through the package-level attributes `default_restricted_to=` and |
| 796 | `default_compatible_with=`. |
| 797 | 4. Through default specifications in `environment_group()` rules. Every |
| 798 | environment belongs to a group of thematically related peers (e.g. "CPU |
| 799 | architectures", "JDK versions" or "mobile operating systems"). The |
| 800 | definition of an environment group includes which of these environments |
| 801 | should be supported by "default" if not otherwise specified by the |
| 802 | `restricted_to=` / `environment()` attributes. A rule with no such |
| 803 | attributes inherits all defaults. |
| 804 | 5. Through a rule class default. This overrides global defaults for all |
| 805 | instances of the given rule class. This can be used, for example, to make |
| 806 | all `*_test` rules testable without each instance having to explicitly |
| 807 | declare this capability. |
| 808 | |
| 809 | `environment()` is implemented as a regular rule whereas `environment_group()` |
| 810 | is both a subclass of `Target` but not `Rule` (`EnvironmentGroup`) and a |
| 811 | function that is available by default from Starlark |
| 812 | (`StarlarkLibrary.environmentGroup()`) which eventually creates an eponymous |
| 813 | target. This is to avoid a cyclic dependency that would arise because each |
| 814 | environment needs to declare the environment group it belongs to and each |
| 815 | environment group needs to declare its default environments. |
| 816 | |
| 817 | A build can be restricted to a certain environment with the |
| 818 | `--target_environment` command line option. |
| 819 | |
| 820 | The implementation of the constraint check is in |
| 821 | `RuleContextConstraintSemantics` and `TopLevelConstraintSemantics`. |
| 822 | |
| 823 | #### Platform constraints |
| 824 | |
| 825 | The current "official" way to describe what platforms a target is compatible |
| 826 | with is by using the same constraints used to describe toolchains and platforms. |
| 827 | It's under review in pull request |
| 828 | [#10945](https://github.com/bazelbuild/bazel/pull/10945). |
| 829 | |
| 830 | ### Visibility |
| 831 | |
| 832 | If you work on a large codebase with a lot of developers (like at Google), you |
| 833 | don't necessarily want everyone else to be able to depend on your code so that |
| 834 | you retain the liberty to change things that you deem to be implementation |
| 835 | details (otherwise, as per [Hyrum's law](https://www.hyrumslaw.com/), people |
| 836 | _will_ come to depend on all parts of your code). |
| 837 | |
| 838 | Bazel supports this by the mechanism called _visibility: _you can declare that a |
| 839 | particular rule can only be depended on using the visibility attribute |
| 840 | (documentation |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 841 | [here](https://bazel.build/reference/be/common-definitions#common-attributes)). |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 842 | This attribute is a little special because unlike every other attribute, the set |
| 843 | of dependencies it generates is not simply the set of labels listed (yes, this |
| 844 | is a design flaw). |
| 845 | |
| 846 | This is implemented in the following places: |
| 847 | |
| 848 | * The `RuleVisibility` interface represents a visibility declaration. It can |
| 849 | be either a constant (fully public or fully private) or a list of labels. |
| 850 | * Labels can refer to either package groups (predefined list of packages), to |
| 851 | packages directly (`//pkg:__pkg__`) or subtrees of packages |
| 852 | (`//pkg:__subpackages__`). This is different from the command line syntax, |
| 853 | which uses `//pkg:*` or `//pkg/...`. |
| 854 | * Package groups are implemented as their own target and configured target |
| 855 | types (`PackageGroup` and `PackageGroupConfiguredTarget`). We could probably |
| 856 | replace these with simple rules if we wanted to. |
| 857 | * The conversion from visibility label lists to dependencies is done in |
| 858 | `DependencyResolver.visitTargetVisibility` and a few other miscellaneous |
| 859 | places. |
| 860 | * The actual check is done in |
| 861 | `CommonPrerequisiteValidator.validateDirectPrerequisiteVisibility()` |
| 862 | |
| 863 | ### Nested sets |
| 864 | |
| 865 | Oftentimes, a configured target aggregates a set of files from its dependencies, |
| 866 | adds its own, and wraps the aggregate set into a transitive info provider so |
| 867 | that configured targets that depend on it can do the same. Examples: |
| 868 | |
| 869 | * The C++ header files used for a build |
| 870 | * The object files that represent the transitive closure of a `cc_library` |
| 871 | * The set of .jar files that need to be on the classpath for a Java rule to |
| 872 | compile or run |
| 873 | * The set of Python files in the transitive closure of a Python rule |
| 874 | |
| 875 | If we did this the naive way by using e.g. `List` or `Set`, we'd end up with |
| 876 | quadratic memory usage: if there is a chain of N rules and each rule adds a |
| 877 | file, we'd have 1+2+...+N collection members. |
| 878 | |
| 879 | In order to get around this problem, we came up with the concept of a |
| 880 | `NestedSet`. It's a data structure that is composed of other `NestedSet` |
| 881 | instances and some members of its own, thereby forming a directed acyclic graph |
| 882 | of sets. They are immutable and their members can be iterated over. We define |
| 883 | multiple iteration order (`NestedSet.Order`): preorder, postorder, topological |
| 884 | (a node always comes after its ancestors) and "don't care, but it should be the |
| 885 | same each time". |
| 886 | |
| 887 | The same data structure is called `depset` in Starlark. |
| 888 | |
| 889 | ### Artifacts and Actions |
| 890 | |
| 891 | The actual build consists of a set of commands that need to be run to produce |
| 892 | the output the user wants. The commands are represented as instances of the |
| 893 | class `Action` and the files are represented as instances of the class |
| 894 | `Artifact`. They are arranged in a bipartite, directed, acyclic graph called the |
| 895 | "action graph". |
| 896 | |
| 897 | Artifacts come in two kinds: source artifacts (i.e. ones that are available |
| 898 | before Bazel starts executing) and derived artifacts (ones that need to be |
| 899 | built). Derived artifacts can themselves be multiple kinds: |
| 900 | |
| 901 | 1. **Regular artifacts. **These are checked for up-to-dateness by computing |
| 902 | their checksum, with mtime as a shortcut; we don't checksum the file if its |
| 903 | ctime hasn't changed. |
| 904 | 2. **Unresolved symlink artifacts.** These are checked for up-to-dateness by |
| 905 | calling readlink(). Unlike regular artifacts, these can be dangling |
| 906 | symlinks. Usually used in cases where one then packs up some files into an |
| 907 | archive of some sort. |
| 908 | 3. **Tree artifacts.** These are not single files, but directory trees. They |
| 909 | are checked for up-to-dateness by checking the set of files in it and their |
| 910 | contents. They are represented as a `TreeArtifact`. |
| 911 | 4. **Constant metadata artifacts.** Changes to these artifacts don't trigger a |
| 912 | rebuild. This is used exclusively for build stamp information: we don't want |
| 913 | to do a rebuild just because the current time changed. |
| 914 | |
| 915 | There is no fundamental reason why source artifacts cannot be tree artifacts or |
| 916 | unresolved symlink artifacts, it's just that we haven't implemented it yet (we |
| 917 | should, though -- referencing a source directory in a BUILD file is one of the |
| 918 | few known long-standing incorrectness issues with Bazel; we have an |
| 919 | implementation that kind of works which is enabled by the |
| 920 | `BAZEL_TRACK_SOURCE_DIRECTORIES=1` JVM property) |
| 921 | |
| 922 | A notable kind of `Artifact` are middlemen. They are indicated by `Artifact` |
| 923 | instances that are the outputs of `MiddlemanAction`. They are used to |
| 924 | special-case some things: |
| 925 | |
| 926 | * Aggregating middlemen are used to group artifacts together. This is so that |
| 927 | if a lot of actions use the same large set of inputs, we don't have N\*M |
| 928 | dependency edges, only N+M (they are being replaced with nested sets) |
| 929 | * Scheduling dependency middlemen ensure that an action runs before another. |
| 930 | They are mostly used for linting but also for C++ compilation (see |
| 931 | `CcCompilationContext.createMiddleman()` for an explanation) |
| 932 | * Runfiles middlemen are used to ensure the presence of a runfiles tree so |
| 933 | that one does not separately need to depend on the output manifest and every |
| 934 | single artifact referenced by the runfiles tree. |
| 935 | |
| 936 | Actions are best understood as a command that needs to be run, the environment |
| 937 | it needs and the set of outputs it produces. The following things are the main |
| 938 | components of the description of an action: |
| 939 | |
| 940 | * The command line that needs to be run |
| 941 | * The input artifacts it needs |
| 942 | * The environment variables that need to be set |
| 943 | * Annotations that describe the environment (e.g. platform) it needs to run in |
| 944 | \ |
| 945 | |
| 946 | There are also a few other special cases, like writing a file whose content is |
| 947 | known to Bazel. They are a subclass of `AbstractAction`. Most of the actions are |
| 948 | a `SpawnAction` or a `StarlarkAction` (the same, they should arguably not be |
| 949 | separate classes), although Java and C++ have their own action types |
| 950 | (`JavaCompileAction`, `CppCompileAction` and `CppLinkAction`). |
| 951 | |
| 952 | We eventually want to move everything to `SpawnAction`; `JavaCompileAction` is |
| 953 | pretty close, but C++ is a bit of a special-case due to .d file parsing and |
| 954 | include scanning. |
| 955 | |
| 956 | The action graph is mostly "embedded" into the Skyframe graph: conceptually, the |
| 957 | execution of an action is represented as an invocation of |
| 958 | `ActionExecutionFunction`. The mapping from an action graph dependency edge to a |
| 959 | Skyframe dependency edge is described in |
| 960 | `ActionExecutionFunction.getInputDeps()` and `Artifact.key()` and has a few |
| 961 | optimizations in order to keep the number of Skyframe edges low: |
| 962 | |
| 963 | * Derived artifacts do not have their own `SkyValue`s. Instead, |
| 964 | `Artifact.getGeneratingActionKey()` is used to find out the key for the |
| 965 | action that generates it |
| 966 | * Nested sets have their own Skyframe key. |
| 967 | |
| 968 | ### Shared actions |
| 969 | |
| 970 | Some actions are generated by multiple configured targets; Starlark rules are |
| 971 | more limited since they are only allowed to put their derived actions into a |
| 972 | directory determined by their configuration and their package (but even so, |
| 973 | rules in the same package can conflict), but rules implemented in Java can put |
| 974 | derived artifacts anywhere. |
| 975 | |
| 976 | This is considered to be a misfeature, but getting rid of it is really hard |
| 977 | because it produces significant savings in execution time when e.g. a source |
| 978 | file needs to be processed somehow and that file is referenced by multiple rules |
| 979 | (handwave-handwave). This comes at the cost of some RAM: each instance of a |
| 980 | shared action needs to be stored in memory separately. |
| 981 | |
| 982 | If two actions generate the same output file, they must be exactly the same: |
| 983 | have the same inputs, the same outputs and run the same command line. This |
| 984 | equivalence relation is implemented in `Actions.canBeShared()` and it is |
| 985 | verified between the analysis and execution phases by looking at every Action. |
| 986 | This is implemented in `SkyframeActionExecutor.findAndStoreArtifactConflicts()` |
| 987 | and is one of the few places in Bazel that requires a "global" view of the |
| 988 | build. |
| 989 | |
| 990 | ## The execution phase |
| 991 | |
| 992 | This is when Bazel actually starts running build actions, i.e. commands that |
| 993 | produce outputs. |
| 994 | |
| 995 | The first thing Bazel does after the analysis phase is to determine what |
| 996 | Artifacts need to be built. The logic for this is encoded in |
| 997 | `TopLevelArtifactHelper`; roughly speaking, it's the `filesToBuild` of the |
| 998 | configured targets on the command line and the contents of a special output |
| 999 | group for the explicit purpose of expressing "if this target is on the command |
| 1000 | line, build these artifacts". |
| 1001 | |
| 1002 | The next step is creating the execution root. Since Bazel has the option to read |
| 1003 | source packages from different locations in the file system (`--package_path`), |
| 1004 | it needs to provide locally executed actions with a full source tree. This is |
| 1005 | handled by the class `SymlinkForest` and works by taking note of every target |
| 1006 | used in the analysis phase and building up a single directory tree that symlinks |
| 1007 | every package with a used target from its actual location. An alternative would |
| 1008 | be to pass the correct paths to commands (taking `--package_path` into account). |
| 1009 | This is undesirable because: |
| 1010 | |
| 1011 | * It changes action command lines when a package is moved from a package path |
| 1012 | entry to another (used to be a common occurrence) |
| 1013 | * It results in different command lines if an action is run remotely than if |
| 1014 | it's run locally |
| 1015 | * It requires a command line transformation specific to the tool in use |
| 1016 | (consider the difference between e.g. Java classpaths and C++ include paths) |
| 1017 | * Changing the command line of an action invalidates its action cache entry |
| 1018 | * `--package_path` is slowly and steadily being deprecated |
| 1019 | |
| 1020 | Then, Bazel starts traversing the action graph (the bipartite, directed graph |
| 1021 | composed of actions and their input and output artifacts) and running actions. |
| 1022 | The execution of each action is represented by an instance of the `SkyValue` |
| 1023 | class `ActionExecutionValue`. |
| 1024 | |
| 1025 | Since running an action is expensive, we have a few layers of caching that can |
| 1026 | be hit behind Skyframe: |
| 1027 | |
| 1028 | * `ActionExecutionFunction.stateMap` contains data to make Skyframe restarts |
| 1029 | of `ActionExecutionFunction` cheap |
| 1030 | * The local action cache contains data about the state of the file system |
| 1031 | * Remote execution systems usually also contain their own cache |
| 1032 | |
| 1033 | ### The local action cache |
| 1034 | |
| 1035 | This cache is another layer that sits behind Skyframe; even if an action is |
| 1036 | re-executed in Skyframe, it can still be a hit in the local action cache. It |
| 1037 | represents the state of the local file system and it's serialized to disk which |
| 1038 | means that when one starts up a new Bazel server, one can get local action cache |
| 1039 | hits even though the Skyframe graph is empty. |
| 1040 | |
| 1041 | This cache is checked for hits using the method |
| 1042 | `ActionCacheChecker.getTokenIfNeedToExecute()` . |
| 1043 | |
| 1044 | Contrary to its name, it's a map from the path of a derived artifact to the |
| 1045 | action that emitted it. The action is described as: |
| 1046 | |
| 1047 | 1. The set of its input and output files and their checksum |
| 1048 | 2. Its "action key", which is usually the command line that was executed, but |
| 1049 | in general, represents everything that's not captured by the checksum of the |
| 1050 | input files (e.g. for `FileWriteAction`, it's the checksum of the data |
| 1051 | that's written) |
| 1052 | |
| 1053 | There is also a highly experimental “top-down action cache” that is still under |
| 1054 | development, which uses transitive hashes to avoid going to the cache as many |
| 1055 | times. |
| 1056 | |
| 1057 | ### Input discovery and input pruning |
| 1058 | |
| 1059 | Some actions are more complicated than just having a set of inputs. Changes to |
| 1060 | the set of inputs of an action come in two forms: |
| 1061 | |
| 1062 | * An action may discover new inputs before its execution or decide that some |
| 1063 | of its inputs are not actually necessary. The canonical example is C++, |
| 1064 | where it's better to make an educated guess about what header files a C++ |
| 1065 | file uses from its transitive closure so that we don't heed to send every |
| 1066 | file to remote executors; therefore, we have an option not to register every |
| 1067 | header file as an "input", but scan the source file for transitively |
| 1068 | included headers and only mark those header files as inputs that are |
| 1069 | mentioned in `#include` statements (we overestimate so that we don't need to |
lberki | 1df4c71 | 2021-05-17 05:15:13 -0700 | [diff] [blame] | 1070 | implement a full C preprocessor) This option is currently hard-wired to |
| 1071 | "false" in Bazel and is only used at Google. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1072 | * An action may realize that some files were not used during its execution. In |
| 1073 | C++, this is called ".d files": the compiler tells which header files were |
| 1074 | used after the fact, and in order to avoid the embarrassment of having worse |
| 1075 | incrementality than Make, Bazel makes use of this fact. This offers a better |
| 1076 | estimate than the include scanner because it relies on the compiler. |
| 1077 | |
| 1078 | These are implemented using methods on Action: |
| 1079 | |
| 1080 | 1. `Action.discoverInputs()` is called. It should return a nested set of |
| 1081 | Artifacts that are determined to be required. These must be source artifacts |
| 1082 | so that there are no dependency edges in the action graph that don't have an |
| 1083 | equivalent in the configured target graph. |
| 1084 | 2. The action is executed by calling `Action.execute()`. |
| 1085 | 3. At the end of `Action.execute()`, the action can call |
| 1086 | `Action.updateInputs()` to tell Bazel that not all of its inputs were |
| 1087 | needed. This can result in incorrect incremental builds if a used input is |
| 1088 | reported as unused. |
| 1089 | |
| 1090 | When an action cache returns a hit on a fresh Action instance (e.g. created |
| 1091 | after a server restart), Bazel calls `updateInputs()` itself so that the set of |
| 1092 | inputs reflects the result of input discovery and pruning done before. |
| 1093 | |
| 1094 | Starlark actions can make use of the facility to declare some inputs as unused |
| 1095 | using the `unused_inputs_list=` argument of |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 1096 | <code>[ctx.actions.run()](https://bazel.build/rules/lib/actions#run)</code>. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1097 | |
| 1098 | ### Various ways to run actions: Strategies/ActionContexts |
| 1099 | |
| 1100 | Some actions can be run in different ways. For example, a command line can be |
| 1101 | executed locally, locally but in various kinds of sandboxes, or remotely. The |
| 1102 | concept that embodies this is called an `ActionContext` (or `Strategy`, since we |
| 1103 | successfully went only halfway with a rename...) |
| 1104 | |
| 1105 | The life cycle of an action context is as follows: |
| 1106 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1107 | 1. When the execution phase is started, `BlazeModule` instances are asked what |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1108 | action contexts they have. This happens in the constructor of |
| 1109 | `ExecutionTool`. Action context types are identified by a Java `Class` |
| 1110 | instance that refers to a sub-interface of `ActionContext` and which |
| 1111 | interface the action context must implement. |
| 1112 | 2. The appropriate action context is selected from the available ones and is |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1113 | forwarded to `ActionExecutionContext` and `BlazeExecutor` . |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1114 | 3. Actions request contexts using `ActionExecutionContext.getContext()` and |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1115 | `BlazeExecutor.getStrategy()` (there should really be only one way to do |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1116 | it…) |
| 1117 | |
| 1118 | Strategies are free to call other strategies to do their jobs; this is used, for |
| 1119 | example, in the dynamic strategy that starts actions both locally and remotely, |
| 1120 | then uses whichever finishes first. |
| 1121 | |
| 1122 | One notable strategy is the one that implements persistent worker processes |
| 1123 | (`WorkerSpawnStrategy`). The idea is that some tools have a long startup time |
| 1124 | and should therefore be reused between actions instead of starting one anew for |
| 1125 | every action (This does represent a potential correctness issue, since Bazel |
| 1126 | relies on the promise of the worker process that it doesn't carry observable |
| 1127 | state between individual requests) |
| 1128 | |
| 1129 | If the tool changes, the worker process needs to be restarted. Whether a worker |
| 1130 | can be reused is determined by computing a checksum for the tool used using |
| 1131 | `WorkerFilesHash`. It relies on knowing which inputs of the action represent |
| 1132 | part of the tool and which represent inputs; this is determined by the creator |
| 1133 | of the Action: `Spawn.getToolFiles()` and the runfiles of the `Spawn` are |
| 1134 | counted as parts of the tool. |
| 1135 | |
| 1136 | More information about strategies (or action contexts!): |
| 1137 | |
| 1138 | * Information about various strategies for running actions is available |
| 1139 | [here](https://jmmv.dev/2019/12/bazel-strategies.html). |
| 1140 | * Information about the dynamic strategy, one where we run an action both |
| 1141 | locally and remotely to see whichever finishes first is available |
| 1142 | [here](https://jmmv.dev/series.html#Bazel%20dynamic%20execution). |
| 1143 | * Information about the intricacies of executing actions locally is available |
| 1144 | [here](https://jmmv.dev/2019/11/bazel-process-wrapper.html). |
| 1145 | |
| 1146 | ### The local resource manager |
| 1147 | |
| 1148 | Bazel _can_ run many actions in parallel. The number of local actions that |
| 1149 | _should_ be run in parallel differs from action to action: the more resources an |
| 1150 | action requires, the less instances should be running at the same time to avoid |
| 1151 | overloading the local machine. |
| 1152 | |
| 1153 | This is implemented in the class `ResourceManager`: each action has to be |
| 1154 | annotated with an estimate of the local resources it requires in the form of a |
| 1155 | `ResourceSet` instance (CPU and RAM). Then when action contexts do something |
| 1156 | that requires local resources, they call `ResourceManager.acquireResources()` |
| 1157 | and are blocked until the required resources are available. |
| 1158 | |
| 1159 | A more detailed description of local resource management is available |
| 1160 | [here](https://jmmv.dev/2019/12/bazel-local-resources.html). |
| 1161 | |
| 1162 | ### The structure of the output directory |
| 1163 | |
| 1164 | Each action requires a separate place in the output directory where it places |
| 1165 | its outputs. The location of derived artifacts is usually as follows: |
| 1166 | |
| 1167 | ``` |
| 1168 | $EXECROOT/bazel-out/<configuration>/bin/<package>/<artifact name> |
| 1169 | ``` |
| 1170 | |
| 1171 | How is the name of the directory that is associated with a particular |
| 1172 | configuration determined? There are two conflicting desirable properties: |
| 1173 | |
| 1174 | 1. If two configurations can occur in the same build, they should have |
| 1175 | different directories so that both can have their own version of the same |
| 1176 | action; otherwise, if the two configurations disagree about e.g. the command |
| 1177 | line of an action producing the same output file, Bazel doesn't know which |
| 1178 | action to choose (an "action conflict") |
| 1179 | 2. If two configurations represent "roughly" the same thing, they should have |
| 1180 | the same name so that actions executed in one can be reused for the other if |
| 1181 | the command lines match: for example, changes to the command line options to |
| 1182 | the Java compiler should not result in C++ compile actions being re-run. |
| 1183 | |
| 1184 | So far, we have not come up with a principled way of solving this problem, which |
| 1185 | has similarities to the problem of configuration trimming. A longer discussion |
| 1186 | of options is available |
| 1187 | [here](https://docs.google.com/document/d/1fZI7wHoaS-vJvZy9SBxaHPitIzXE_nL9v4sS4mErrG4/edit). |
| 1188 | The main problematic areas are Starlark rules (whose authors usually aren't |
| 1189 | intimately familiar with Bazel) and aspects, which add another dimension to the |
| 1190 | space of things that can produce the "same" output file. |
| 1191 | |
| 1192 | The current approach is that the path segment for the configuration is |
| 1193 | `<CPU>-<compilation mode>` with various suffixes added so that configuration |
| 1194 | transitions implemented in Java don't result in action conflicts. In addition, a |
| 1195 | checksum of the set of Starlark configuration transitions is added so that users |
| 1196 | can't cause action conflicts. It is far from perfect. This is implemented in |
| 1197 | `OutputDirectories.buildMnemonic()` and relies on each configuration fragment |
| 1198 | adding its own part to the name of the output directory. |
| 1199 | |
| 1200 | ## Tests |
| 1201 | |
| 1202 | Bazel has rich support for running tests. It supports: |
| 1203 | |
| 1204 | * Running tests remotely (if a remote execution backend is available) |
| 1205 | * Running tests multiple times in parallel (for deflaking or gathering timing |
| 1206 | data) |
| 1207 | * Sharding tests (splitting test cases in same test over multiple processes |
| 1208 | for speed) |
| 1209 | * Re-running flaky tests |
| 1210 | * Grouping tests into test suites |
| 1211 | |
| 1212 | Tests are regular configured targets that have a TestProvider, which describes |
| 1213 | how the test should be run: |
| 1214 | |
| 1215 | * The artifacts whose building result in the test being run. This is a "cache |
| 1216 | status" file that contains a serialized `TestResultData` message |
| 1217 | * The number of times the test should be run |
| 1218 | * The number of shards the test should be split into |
| 1219 | * Some parameters about how the test should be run (e.g. the test timeout) |
| 1220 | |
| 1221 | ### Determining which tests to run |
| 1222 | |
| 1223 | Determining which tests are run is an elaborate process. |
| 1224 | |
| 1225 | First, during target pattern parsing, test suites are recursively expanded. The |
| 1226 | expansion is implemented in `TestsForTargetPatternFunction`. A somewhat |
| 1227 | surprising wrinkle is that if a test suite declares no tests, it refers to |
| 1228 | _every_ test in its package. This is implemented in `Package.beforeBuild()` by |
| 1229 | adding an implicit attribute called `$implicit_tests` to test suite rules. |
| 1230 | |
| 1231 | Then, tests are filtered for size, tags, timeout and language according to the |
| 1232 | command line options. This is implemented in `TestFilter` and is called from |
| 1233 | `TargetPatternPhaseFunction.determineTests()` during target parsing and the |
| 1234 | result is put into `TargetPatternPhaseValue.getTestsToRunLabels()`. The reason |
| 1235 | why rule attributes which can be filtered for are not configurable is that this |
| 1236 | happens before the analysis phase, therefore, the configuration is not |
| 1237 | available. |
| 1238 | |
| 1239 | This is then processed further in `BuildView.createResult()`: targets whose |
| 1240 | analysis failed are filtered out and tests are split into exclusive and |
| 1241 | non-exclusive tests. It's then put into `AnalysisResult`, which is how |
| 1242 | `ExecutionTool` knows which tests to run. |
| 1243 | |
| 1244 | In order to lend some transparency to this elaborate process, the `tests()` |
| 1245 | query operator (implemented in `TestsFunction`) is available to tell which tests |
| 1246 | are run when a particular target is specified on the command line. It's |
| 1247 | unfortunately a reimplementation, so it probably deviates from the above in |
| 1248 | multiple subtle ways. |
| 1249 | |
| 1250 | ### Running tests |
| 1251 | |
| 1252 | The way the tests are run is by requesting cache status artifacts. This then |
| 1253 | results in the execution of a `TestRunnerAction`, which eventually calls the |
| 1254 | `TestActionContext` chosen by the `--test_strategy` command line option that |
| 1255 | runs the test in the requested way. |
| 1256 | |
| 1257 | Tests are run according to an elaborate protocol that uses environment variables |
| 1258 | to tell tests what's expected from them. A detailed description of what Bazel |
| 1259 | expects from tests and what tests can expect from Bazel is available |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 1260 | [here](https://bazel.build/reference/test-encyclopedia). At the |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1261 | simplest, an exit code of 0 means success, anything else means failure. |
| 1262 | |
| 1263 | In addition to the cache status file, each test process emits a number of other |
| 1264 | files. They are put in the "test log directory" which is the subdirectory called |
| 1265 | `testlogs` of the output directory of the target configuration: |
| 1266 | |
| 1267 | * `test.xml`, a JUnit-style XML file detailing the individual test cases in |
| 1268 | the test shard |
| 1269 | * `test.log`, the console output of the test. stdout and stderr are not |
| 1270 | separated. |
| 1271 | * `test.outputs`, the "undeclared outputs directory"; this is used by tests |
| 1272 | that want to output files in addition to what they print to the terminal. |
| 1273 | |
| 1274 | There are two things that can happen during test execution that cannot during |
| 1275 | building regular targets: exclusive test execution and output streaming. |
| 1276 | |
| 1277 | Some tests need to be executed in exclusive mode, i.e. not in parallel with |
| 1278 | other tests. This can be elicited either by adding `tags=["exclusive"]` to the |
| 1279 | test rule or running the test with `--test_strategy=exclusive` . Each exclusive |
| 1280 | test is run by a separate Skyframe invocation requesting the execution of the |
| 1281 | test after the "main" build. This is implemented in |
| 1282 | `SkyframeExecutor.runExclusiveTest()`. |
| 1283 | |
| 1284 | Unlike regular actions, whose terminal output is dumped when the action |
| 1285 | finishes, the user can request the output of tests to be streamed so that they |
| 1286 | get informed about the progress of a long-running test. This is specified by the |
| 1287 | `--test_output=streamed` command line option and implies exclusive test |
| 1288 | execution so that outputs of different tests are not interspersed. |
| 1289 | |
| 1290 | This is implemented in the aptly-named `StreamedTestOutput` class and works by |
| 1291 | polling changes to the `test.log` file of the test in question and dumping new |
| 1292 | bytes to the terminal where Bazel rules. |
| 1293 | |
| 1294 | Results of the executed tests are available on the event bus by observing |
| 1295 | various events (e.g. `TestAttempt`, `TestResult` or `TestingCompleteEvent`). |
| 1296 | They are dumped to the Build Event Protocol and they are emitted to the console |
| 1297 | by `AggregatingTestListener`. |
| 1298 | |
| 1299 | ### Coverage collection |
| 1300 | |
| 1301 | Coverage is reported by the tests in LCOV format in the files |
| 1302 | `bazel-testlogs/$PACKAGE/$TARGET/coverage.dat` . |
| 1303 | |
| 1304 | To collect coverage, each test execution is wrapped in a script called |
| 1305 | `collect_coverage.sh` . |
| 1306 | |
| 1307 | This script sets up the environment of the test to enable coverage collection |
| 1308 | and determine where the coverage files are written by the coverage runtime(s). |
| 1309 | It then runs the test. A test may itself run multiple subprocesses and consist |
| 1310 | of parts written in multiple different programming languages (with separate |
| 1311 | coverage collection runtimes). The wrapper script is responsible for converting |
| 1312 | the resulting files to LCOV format if necessary, and merges them into a single |
| 1313 | file. |
| 1314 | |
| 1315 | The interposition of `collect_coverage.sh` is done by the test strategies and |
| 1316 | requires `collect_coverage.sh` to be on the inputs of the test. This is |
| 1317 | accomplished by the implicit attribute `:coverage_support` which is resolved to |
| 1318 | the value of the configuration flag `--coverage_support` (see |
| 1319 | `TestConfiguration.TestOptions.coverageSupport`) |
| 1320 | |
| 1321 | Some languages do offline instrumentation, meaning that the coverage |
| 1322 | instrumentation is added at compile time (e.g. C++) and others do online |
| 1323 | instrumentation, meaning that coverage instrumentation is added at execution |
| 1324 | time. |
| 1325 | |
| 1326 | Another core concept is _baseline coverage_. This is the coverage of a library, |
| 1327 | binary, or test if no code in it was run. The problem it solves is that if you |
| 1328 | want to compute the test coverage for a binary, it is not enough to merge the |
| 1329 | coverage of all of the tests because there may be code in the binary that is not |
| 1330 | linked into any test. Therefore, what we do is to emit a coverage file for every |
| 1331 | binary which contains only the files we collect coverage for with no covered |
| 1332 | lines. The baseline coverage file for a target is at |
| 1333 | `bazel-testlogs/$PACKAGE/$TARGET/baseline_coverage.dat` . It is also generated |
| 1334 | for binaries and libraries in addition to tests if you pass the |
| 1335 | `--nobuild_tests_only` flag to Bazel. |
| 1336 | |
| 1337 | Baseline coverage is currently broken. |
| 1338 | |
| 1339 | We track two groups of files for coverage collection for each rule: the set of |
| 1340 | instrumented files and the set of instrumentation metadata files. |
| 1341 | |
| 1342 | The set of instrumented files is just that, a set of files to instrument. For |
| 1343 | online coverage runtimes, this can be used at runtime to decide which files to |
| 1344 | instrument. It is also used to implement baseline coverage. |
| 1345 | |
| 1346 | The set of instrumentation metadata files is the set of extra files a test needs |
| 1347 | to generate the LCOV files Bazel requires from it. In practice, this consists of |
| 1348 | runtime-specific files; for example, gcc emits .gcno files during compilation. |
| 1349 | These are added to the set of inputs of test actions if coverage mode is |
| 1350 | enabled. |
| 1351 | |
| 1352 | Whether or not coverage is being collected is stored in the |
| 1353 | `BuildConfiguration`. This is handy because it is an easy way to change the test |
| 1354 | action and the action graph depending on this bit, but it also means that if |
| 1355 | this bit is flipped, all targets need to be re-analyzed (some languages, e.g. |
| 1356 | C++ require different compiler options to emit code that can collect coverage, |
| 1357 | which mitigates this issue somewhat, since then a re-analysis is needed anyway). |
| 1358 | |
| 1359 | The coverage support files are depended on through labels in an implicit |
| 1360 | dependency so that they can be overridden by the invocation policy, which allows |
| 1361 | them to differ between the different versions of Bazel. Ideally, these |
| 1362 | differences would be removed, and we standardized on one of them. |
| 1363 | |
| 1364 | We also generate a "coverage report" which merges the coverage collected for |
| 1365 | every test in a Bazel invocation. This is handled by |
| 1366 | `CoverageReportActionFactory` and is called from `BuildView.createResult()` . It |
| 1367 | gets access to the tools it needs by looking at the `:coverage_report_generator` |
| 1368 | attribute of the first test that is executed. |
| 1369 | |
| 1370 | ## The query engine |
| 1371 | |
| 1372 | Bazel has a |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 1373 | [little language](https://bazel.build/docs/query-how-to) |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1374 | used to ask it various things about various graphs. The following query kinds |
| 1375 | are provided: |
| 1376 | |
| 1377 | * `bazel query` is used to investigate the target graph |
| 1378 | * `bazel cquery` is used to investigate the configured target graph |
| 1379 | * `bazel aquery` is used to investigate the action graph |
| 1380 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1381 | Each of these is implemented by subclassing `AbstractBlazeQueryEnvironment`. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1382 | Additional additional query functions can be done by subclassing `QueryFunction` |
| 1383 | . In order to allow streaming query results, instead of collecting them to some |
| 1384 | data structure, a `query2.engine.Callback` is passed to `QueryFunction`, which |
| 1385 | calls it for results it wants to return. |
| 1386 | |
| 1387 | The result of a query can be emitted in various ways: labels, labels and rule |
| 1388 | classes, XML, protobuf and so on. These are implemented as subclasses of |
| 1389 | `OutputFormatter`. |
| 1390 | |
| 1391 | A subtle requirement of some query output formats (proto, definitely) is that |
| 1392 | Bazel needs to emit _all _the information that package loading provides so that |
| 1393 | one can diff the output and determine whether a particular target has changed. |
| 1394 | As a consequence, attribute values need to be serializable, which is why there |
| 1395 | are only so few attribute types without any attributes having complex Starlark |
| 1396 | values. The usual workaround is to use a label, and attach the complex |
| 1397 | information to the rule with that label. It's not a very satisfying workaround |
| 1398 | and it would be very nice to lift this requirement. |
| 1399 | |
| 1400 | ## The module system |
| 1401 | |
| 1402 | Bazel can be extended by adding modules to it. Each module must subclass |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1403 | `BlazeModule` (the name is a relic of the history of Bazel when it used to be |
| 1404 | called Blaze) and gets information about various events during the execution of |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1405 | a command. |
| 1406 | |
| 1407 | They are mostly used to implement various pieces of "non-core" functionality |
| 1408 | that only some versions of Bazel (e.g. the one we use at Google) need: |
| 1409 | |
| 1410 | * Interfaces to remote execution systems |
| 1411 | * New commands |
| 1412 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1413 | The set of extension points `BlazeModule` offers is somewhat haphazard. Don't |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1414 | use it as an example of good design principles. |
| 1415 | |
| 1416 | ## The event bus |
| 1417 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1418 | The main way BlazeModules communicate with the rest of Bazel is by an event bus |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1419 | (`EventBus`): a new instance is created for every build, various parts of Bazel |
| 1420 | can post events to it and modules can register listeners for the events they are |
| 1421 | interested in. For example, the following things are represented as events: |
| 1422 | |
| 1423 | * The list of build targets to be built has been determined |
| 1424 | (`TargetParsingCompleteEvent`) |
| 1425 | * The top-level configurations have been determined |
| 1426 | (`BuildConfigurationEvent`) |
| 1427 | * A target was built, successfully or not (`TargetCompleteEvent`) |
| 1428 | * A test was run (`TestAttempt`, `TestSummary`) |
| 1429 | |
| 1430 | Some of these events are represented outside of Bazel in the |
fwe | ad37a37 | 2022-03-08 03:27:15 -0800 | [diff] [blame] | 1431 | [Build Event Protocol](https://bazel.build/docs/build-event-protocol) |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1432 | (they are `BuildEvent`s). This allows not only `BlazeModule`s, but also things |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1433 | outside the Bazel process to observe the build. They are accessible either as a |
| 1434 | file that contains protocol messages or Bazel can connect to a server (called |
| 1435 | the Build Event Service) to stream events. |
| 1436 | |
| 1437 | This is implemented in the `build.lib.buildeventservice` and |
| 1438 | `build.lib.buildeventstream` Java packages. |
| 1439 | |
| 1440 | ## External repositories |
| 1441 | |
| 1442 | Whereas Bazel was originally designed to be used in a monorepo (a single source |
| 1443 | tree containing everything one needs to build), Bazel lives in a world where |
| 1444 | this is not necessarily true. "External repositories" are an abstraction used to |
| 1445 | bridge these two worlds: they represent code that is necessary for the build but |
| 1446 | is not in the main source tree. |
| 1447 | |
| 1448 | ### The WORKSPACE file |
| 1449 | |
| 1450 | The set of external repositories is determined by parsing the WORKSPACE file. |
| 1451 | For example, a declaration like this: |
| 1452 | |
| 1453 | ``` |
| 1454 | local_repository(name="foo", path="/foo/bar") |
| 1455 | ``` |
| 1456 | |
| 1457 | Results in the repository called `@foo` being available. Where this gets |
| 1458 | complicated is that one can define new repository rules in Starlark files, which |
| 1459 | can then be used to load new Starlark code, which can be used to define new |
| 1460 | repository rules and so on… |
| 1461 | |
| 1462 | To handle this case, the parsing of the WORKSPACE file (in |
| 1463 | `WorkspaceFileFunction`) is split up into chunks delineated by `load()` |
| 1464 | statements. The chunk index is indicated by `WorkspaceFileKey.getIndex()` and |
| 1465 | computing `WorkspaceFileFunction` until index X means evaluating it until the |
| 1466 | Xth `load()` statement. |
| 1467 | |
| 1468 | ### Fetching repositories |
| 1469 | |
| 1470 | Before the code of the repository is available to Bazel, it needs to be |
| 1471 | _fetched_. This results in Bazel creating a directory under |
| 1472 | `$OUTPUT_BASE/external/<repository name>`. |
| 1473 | |
| 1474 | Fetching the repository happens in the following steps: |
| 1475 | |
| 1476 | 1. `PackageLookupFunction` realizes that it needs a repository and creates a |
| 1477 | `RepositoryName` as a `SkyKey`, which invokes `RepositoryLoaderFunction` |
| 1478 | 2. `RepositoryLoaderFunction` forwards the request to |
| 1479 | `RepositoryDelegatorFunction` for unclear reasons (the code says it's to |
| 1480 | avoid re-downloading things in case of Skyframe restarts, but it's not a |
| 1481 | very solid reasoning) |
| 1482 | 3. `RepositoryDelegatorFunction` finds out the repository rule it's asked to |
| 1483 | fetch by iterating over the chunks of the WORKSPACE file until the requested |
| 1484 | repository is found |
| 1485 | 4. The appropriate `RepositoryFunction` is found that implements the repository |
| 1486 | fetching; it's either the Starlark implementation of the repository or a |
| 1487 | hard-coded map for repositories that are implemented in Java. |
| 1488 | |
| 1489 | There are various layers of caching since fetching a repository can be very |
| 1490 | expensive: |
| 1491 | |
| 1492 | 1. There is a cache for downloaded files that is keyed by their checksum |
| 1493 | (`RepositoryCache`). This requires the checksum to be available in the |
| 1494 | WORKSPACE file, but that's good for hermeticity anyway. This is shared by |
| 1495 | every Bazel server instance on the same workstation, regardless of which |
| 1496 | workspace or output base they are running in. |
| 1497 | 2. A "marker file" is written for each repository under `$OUTPUT_BASE/external` |
| 1498 | that contains a checksum of the rule that was used to fetch it. If the Bazel |
| 1499 | server restarts but the checksum does not change, it's not re-fetched. This |
| 1500 | is implemented in `RepositoryDelegatorFunction.DigestWriter` . |
| 1501 | 3. The `--distdir` command line option designates another cache that is used to |
| 1502 | look up artifacts to be downloaded. This is useful in enterprise settings |
| 1503 | where Bazel should not fetch random things from the Internet. This is |
| 1504 | implemented by `DownloadManager` . |
| 1505 | |
| 1506 | Once a repository is downloaded, the artifacts in it are treated as source |
| 1507 | artifacts. This poses a problem because Bazel usually checks for up-to-dateness |
| 1508 | of source artifacts by calling stat() on them, and these artifacts are also |
| 1509 | invalidated when the definition of the repository they are in changes. Thus, |
| 1510 | `FileStateValue`s for an artifact in an external repository need to depend on |
| 1511 | their external repository. This is handled by `ExternalFilesHelper`. |
| 1512 | |
| 1513 | ### Managed directories |
| 1514 | |
| 1515 | Sometimes, external repositories need to modify files under the workspace root |
| 1516 | (e.g. a package manager that houses the downloaded packages in a subdirectory of |
| 1517 | the source tree). This is at odds with the assumption Bazel makes that source |
| 1518 | files are only modified by the user and not by itself and allows packages to |
| 1519 | refer to every directory under the workspace root. In order to make this kind of |
| 1520 | external repository work, Bazel does two things: |
| 1521 | |
| 1522 | 1. Allows the user to specify subdirectories of the workspace Bazel is not |
| 1523 | allowed to reach into. They are listed in a file called `.bazelignore` and |
| 1524 | the functionality is implemented in `BlacklistedPackagePrefixesFunction`. |
| 1525 | 2. We encode the mapping from the subdirectory of the workspace to the external |
| 1526 | repository it is handled by into `ManagedDirectoriesKnowledge` and handle |
| 1527 | `FileStateValue`s referring to them in the same way as those for regular |
| 1528 | external repositories. |
| 1529 | |
| 1530 | ### Repository mappings |
| 1531 | |
| 1532 | It can happen that multiple repositories want to depend on the same repository, |
| 1533 | but in different versions (this is an instance of the "diamond dependency |
| 1534 | problem"). For example, if two binaries in separate repositories in the build |
| 1535 | want to depend on Guava, they will presumably both refer to Guava with labels |
| 1536 | starting `@guava//` and expect that to mean different versions of it. |
| 1537 | |
| 1538 | Therefore, Bazel allows one to re-map external repository labels so that the |
| 1539 | string `@guava//` can refer to one Guava repository (e.g. `@guava1//`) in the |
| 1540 | repository of one binary and another Guava repository (e.g. `@guava2//`) the the |
| 1541 | repository of the other. |
| 1542 | |
| 1543 | Alternatively, this can also be used to **join** diamonds. If a repository |
| 1544 | depends on `@guava1//`, and another depends on `@guava2//`, repository mapping |
| 1545 | allows one to re-map both repositories to use a canonical `@guava//` repository. |
| 1546 | |
| 1547 | The mapping is specified in the WORKSPACE file as the `repo_mapping` attribute |
| 1548 | of individual repository definitions. It then appears in Skyframe as a member of |
| 1549 | `WorkspaceFileValue`, where it is plumbed to: |
| 1550 | |
| 1551 | * `Package.Builder.repositoryMapping` which is used to transform label-valued |
| 1552 | attributes of rules in the package by |
| 1553 | `RuleClass.populateRuleAttributeValues()` |
| 1554 | * `Package.repositoryMapping` which is used in the analysis phase (for |
| 1555 | resolving things like `$(location)` which are not parsed in the loading |
| 1556 | phase) |
Xavier Bonaventura | fbb19fb | 2021-06-02 09:53:05 -0700 | [diff] [blame] | 1557 | * `BzlLoadFunction` for resolving labels in load() statements |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1558 | |
| 1559 | ## JNI bits |
| 1560 | |
| 1561 | The server of Bazel is_ mostly _written in Java. The exception is the parts that |
| 1562 | Java cannot do by itself or couldn't do by itself when we implemented it. This |
| 1563 | is mostly limited to interaction with the file system, process control and |
| 1564 | various other low-level things. |
| 1565 | |
| 1566 | The C++ code lives under src/main/native and the Java classes with native |
| 1567 | methods are: |
| 1568 | |
| 1569 | * `NativePosixFiles` and `NativePosixFileSystem` |
| 1570 | * `ProcessUtils` |
| 1571 | * `WindowsFileOperations` and `WindowsFileProcesses` |
| 1572 | * `com.google.devtools.build.lib.platform` |
| 1573 | |
| 1574 | ## Console output |
| 1575 | |
| 1576 | Emitting console output seems like a simple thing, but the confluence of running |
| 1577 | multiple processes (sometimes remotely), fine-grained caching, the desire to |
| 1578 | have a nice and colorful terminal output and having a long-running server makes |
| 1579 | it non-trivial. |
| 1580 | |
| 1581 | Right after the RPC call comes in from the client, two `RpcOutputStream` |
| 1582 | instances are created (for stdout and stderr) that forward the data printed into |
| 1583 | them to the client. These are then wrapped in an `OutErr` (an (stdout, stderr) |
| 1584 | pair). Anything that needs to be printed on the console goes through these |
| 1585 | streams. Then these streams are handed over to |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1586 | `BlazeCommandDispatcher.execExclusively()`. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1587 | |
| 1588 | Output is by default printed with ANSI escape sequences. When these are not |
| 1589 | desired (`--color=no`), they are stripped by an `AnsiStrippingOutputStream`. In |
| 1590 | addition, `System.out` and `System.err` are redirected to these output streams. |
| 1591 | This is so that debugging information can be printed using |
| 1592 | `System.err.println()` and still end up in the terminal output of the client |
| 1593 | (which is different from that of the server). Care is taken that if a process |
| 1594 | produces binary output (e.g. `bazel query --output=proto`), no munging of stdout |
| 1595 | takes place. |
| 1596 | |
| 1597 | Short messages (errors, warnings and the like) are expressed through the |
| 1598 | `EventHandler` interface. Notably, these are different from what one posts to |
| 1599 | the `EventBus` (this is confusing). Each `Event` has an `EventKind` (error, |
| 1600 | warning, info, and a few others) and they may have a `Location` (the place in |
| 1601 | the source code that caused the event to happen). |
| 1602 | |
| 1603 | Some `EventHandler` implementations store the events they received. This is used |
| 1604 | to replay information to the UI caused by various kinds of cached processing, |
| 1605 | for example, the warnings emitted by a cached configured target. |
| 1606 | |
| 1607 | Some `EventHandler`s also allow posting events that eventually find their way to |
| 1608 | the event bus (regular `Event`s do _not _appear there). These are |
| 1609 | implementations of `ExtendedEventHandler` and their main use is to replay cached |
| 1610 | `EventBus` events. These `EventBus` events all implement `Postable`, but not |
| 1611 | everything that is posted to `EventBus` necessarily implements this interface; |
| 1612 | only those that are cached by an `ExtendedEventHandler` (it would be nice and |
| 1613 | most of the things do; it's not enforced, though) |
| 1614 | |
| 1615 | Terminal output is _mostly_ emitted through `UiEventHandler`, which is |
| 1616 | responsible for all the fancy output formatting and progress reporting Bazel |
| 1617 | does. It has two inputs: |
| 1618 | |
| 1619 | * The event bus |
| 1620 | * The event stream piped into it through Reporter |
| 1621 | |
| 1622 | The only direct connection the command execution machinery (i.e. the rest of |
| 1623 | Bazel) has to the RPC stream to the client is through `Reporter.getOutErr()`, |
| 1624 | which allows direct access to these streams. It's only used when a command needs |
| 1625 | to dump large amounts of possible binary data (e.g. `bazel query`). |
| 1626 | |
| 1627 | ## Profiling Bazel |
| 1628 | |
| 1629 | Bazel is fast. Bazel is also slow, because builds tend to grow until just the |
| 1630 | edge of what's bearable. For this reason, Bazel includes a profiler which can be |
| 1631 | used to profile builds and Bazel itself. It's implemented in a class that's |
| 1632 | aptly named `Profiler`. It's turned on by default, although it records only |
| 1633 | abridged data so that its overhead is tolerable; The command line |
| 1634 | `--record_full_profiler_data` makes it record everything it can. |
| 1635 | |
| 1636 | It emits a profile in the Chrome profiler format; it's best viewed in Chrome. |
| 1637 | It's data model is that of task stacks: one can start tasks and end tasks and |
| 1638 | they are supposed to be neatly nested within each other. Each Java thread gets |
| 1639 | its own task stack. **TODO:** How does this work with actions and |
| 1640 | continuation-passing style? |
| 1641 | |
jingwen | f8b2d3b | 2020-10-02 06:35:24 -0700 | [diff] [blame] | 1642 | The profiler is started and stopped in `BlazeRuntime.initProfiler()` and |
| 1643 | `BlazeRuntime.afterCommand()` respectively and attempts to be live for as long |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1644 | as possible so that we can profile everything. To add something to the profile, |
| 1645 | call `Profiler.instance().profile()`. It returns a `Closeable`, whose closure |
| 1646 | represents the end of the task. It's best used with try-with-resources |
| 1647 | statements. |
| 1648 | |
| 1649 | We also do rudimentary memory profiling in `MemoryProfiler`. It's also always on |
| 1650 | and it mostly records maximum heap sizes and GC behavior. |
| 1651 | |
| 1652 | ## Testing Bazel |
| 1653 | |
| 1654 | Bazel has two main kinds of tests: ones that observe Bazel as a "black box" and |
| 1655 | ones that only run the analysis phase. We call the former "integration tests" |
| 1656 | and the latter "unit tests", although they are more like integration tests that |
| 1657 | are, well, less integrated. We also have some actual unit tests, where they are |
| 1658 | necessary. |
| 1659 | |
| 1660 | Of integration tests, we have two kinds: |
| 1661 | |
| 1662 | 1. Ones implemented using a very elaborate bash test framework under |
| 1663 | `src/test/shell` |
| 1664 | 2. Ones implemented in Java. These are implemented as subclasses of |
dacek | f474a3b | 2022-01-11 08:22:04 -0800 | [diff] [blame] | 1665 | 'BuildIntegrationTestCase' |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1666 | |
dacek | d72ae00 | 2022-01-10 09:13:33 -0800 | [diff] [blame] | 1667 | `BuildIntegrationTestCase` is the preferred integration testing framework as it |
| 1668 | is well-equipped for most testing scenarios. As it is a Java framework, it |
| 1669 | provides debuggability and seamless integration with many common development |
| 1670 | tools. There are many examples of `BuildIntegrationTestCase` classes in the |
| 1671 | Bazel repository. |
laurentlb | 4f2991c5 | 2020-08-12 11:37:32 -0700 | [diff] [blame] | 1672 | |
| 1673 | Analysis tests are implemented as subclasses of `BuildViewTestCase`. There is a |
| 1674 | scratch file system you can use to write BUILD files, then various helper |
| 1675 | methods can request configured targets, change the configuration and assert |
| 1676 | various things about the result of the analysis. |