Blame - CODEBASE.md - bazel

blob: d77cbf434f66a9d88b7a3f2f9b885f035abd0f64 [file] [log] [blame] [view]

laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1	# The Bazel Code Base
				2
				3	This document is a description of the code base and how Bazel is structured. It
				4	is intended for people willing to contribute to Bazel, not for end-users.
				5
				6	## Introduction
				7
				8	The code base of Bazel is large (~350KLOC production code and ~260 KLOC test
				9	code) and no one is familiar with the whole landscape: everyone knows their
				10	particular valley very well, but few know what lies over the hills in every
				11	direction.
				12
				13	In order for people midway upon the journey not to find themselves within a
				14	forest dark with the straightforward pathway being lost, this document tries to
				15	give an overview of the code base so that it's easier to get started with
				16	working on it.
				17
				18	The public version of the source code of Bazel lives on GitHub at
				19	http://github.com/bazelbuild/bazel . This is not the “source of truth”; it’s
				20	derived from a Google-internal source tree that contains additional
				21	functionality that is not useful outside Google. The long term goal is to make
				22	GitHub the source of truth.
				23
				24	Contributions are accepted through the regular GitHub pull request mechanism,
				25	and manually imported by a Googler into the internal source tree, then
				26	re-exported back out to GitHub.
				27
				28	## Client/server architecture
				29
				30	The bulk of Bazel resides in a server process that stays in RAM between builds.
				31	This allows Bazel to maintain state between builds.
				32
				33	This is why the Bazel command line has two kinds of options: startup and
				34	command. In a command line like this:
				35
				36	```
				37	bazel --host_jvm_args=-Xmx8G build -c opt //foo:bar
				38	```
				39
				40	Some options (`--host_jvm_args=`) are before the name of the command to be run
				41	and some are after (`-c opt`); the former kind is called a "startup option" and
				42	affects the server process as a whole, whereas the latter kind, the "command
				43	option", only affects a single command.
				44
				45	Each server instance has a single associated source tree ("workspace") and each
				46	workspace usually has a single active server instance. This can be circumvented
				47	by specifying a custom output base (see the "Directory layout" section for more
				48	information).
				49
				50	Bazel is distributed as a single ELF executable that is also a valid .zip file.
				51	When you type `bazel`, the above ELF executable implemented in C++ (the
				52	"client") gets control. It sets up an appropriate server process using the
				53	following steps:
				54
				55	1. Checks whether it has already extracted itself. If not, it does that. This
				56	is where the implementation of the server comes from.
				57	2. Checks whether there is an active server instance that works: it is running,
				58	it has the right startup options and uses the right workspace directory. It
				59	finds the running server by looking at the directory `$OUTPUT_BASE/server`
				60	where there is a lock file with the port the server is listening on.
				61	3. If needed, kills the old server process
				62	4. If needed, starts up a new server process
				63
				64	After a suitable server process is ready, the command that needs to be run is
				65	communicated to it over a gRPC interface, then the output of Bazel is piped back
				66	to the terminal. Only one command can be running at the same time. This is
				67	implemented using an elaborate locking mechanism with parts in C++ and parts in
				68	Java. There is some infrastructure for running multiple commands in parallel,
				69	since the inability to run e.g. `bazel version` in parallel with another command
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	70	is somewhat embarrassing. The main blocker is the life cycle of `BlazeModule`s
				71	and some state in `BlazeRuntime`.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	72
				73	At the end of a command, the Bazel server transmits the exit code the client
				74	should return. An interesting wrinkle is the implementation of `bazel run`: the
				75	job of this command is to run something Bazel just built, but it can't do that
				76	from the server process because it doesn't have a terminal. So instead it tells
				77	the client what binary it should exec() and with what arguments.
				78
				79	When one presses Ctrl-C, the client translates it to a Cancel call on the gRPC
				80	connection, which tries to terminate the command as soon as possible. After the
				81	third Ctrl-C, the client sends a SIGKILL to the server instead.
				82
				83	The source code of the client is under `src/main/cpp` and the protocol used to
				84	communicate with the server is in `src/main/protobuf/command_server.proto` .
				85
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	86	The main entry point of the server is `BlazeRuntime.main()` and the gRPC calls
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	87	from the client are handled by `GrpcServerImpl.run()`.
				88
				89	## Directory layout
				90
				91	Bazel creates a somewhat complicated set of directories during a build. A full
				92	description is available
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	93	[here](https://bazel.build/docs/output_directories).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	94
				95	The "workspace" is the source tree Bazel is run in. It usually corresponds to
				96	something you checked out from source control.
				97
				98	Bazel puts all of its data under the "output user root". This is usually
				99	`$HOME/.cache/bazel/_bazel_${USER}`, but can be overridden using the
				100	`--output_user_root` startup option.
				101
				102	The "install base" is where Bazel is extracted to. This is done automatically
				103	and each Bazel version gets a subdirectory based on its checksum under the
				104	install base. It's at `$OUTPUT_USER_ROOT/install` by default and can be changed
				105	using the `--install_base` command line option.
				106
				107	The "output base" is the place where the Bazel instance attached to a specific
				108	workspace writes to. Each output base has at most one Bazel server instance
				109	running at any time. It's usually at `$OUTPUT_USER_ROOT/<checksum of the path
				110	to the workspace>`. It can be changed using the `--output_base` startup option,
				111	which is, among other things, useful for getting around the limitation that only
				112	one Bazel instance can be running in any workspace at any given time.
				113
				114	The output directory contains, among other things:
				115
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	116	* The fetched external repositories at `$OUTPUT_BASE/external`.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	117	* The exec root, i.e. a directory that contains symlinks to all the source
				118	code for the current build. It's located at `$OUTPUT_BASE/execroot`. During
				119	the build, the working directory is `$EXECROOT/<name of main
				120	repository>`. We are planning to change this to `$EXECROOT`, although it's a
				121	long term plan because it's a very incompatible change.
				122	* Files built during the build.
				123
				124	## The process of executing a command
				125
				126	Once the Bazel server gets control and is informed about a command it needs to
				127	execute, the following sequence of events happens:
				128
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	129	1. `BlazeCommandDispatcher` is informed about the new request. It decides
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	130	whether the command needs a workspace to run in (almost every command except
				131	for ones that don't have anything to do with source code, e.g. version or
				132	help) and whether another command is running.
				133
				134	2. The right command is found. Each command must implement the interface
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	135	`BlazeCommand` and must have the `@Command` annotation (this is a bit of an
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	136	antipattern, it would be nice if all the metadata a command needs was
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	137	described by methods on `BlazeCommand`)
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	138
				139	3. The command line options are parsed. Each command has different command line
				140	options, which are described in the `@Command` annotation.
				141
				142	4. An event bus is created. The event bus is a stream for events that happen
				143	during the build. Some of these are exported to outside of Bazel under the
				144	aegis of the Build Event Protocol in order to tell the world how the build
				145	goes.
				146
				147	5. The command gets control. The most interesting commands are those that run a
				148	build: build, test, run, coverage and so on: this functionality is
				149	implemented by `BuildTool`.
				150
				151	6. The set of target patterns on the command line is parsed and wildcards like
				152	`//pkg:all` and `//pkg/...` are resolved. This is implemented in
				153	`AnalysisPhaseRunner.evaluateTargetPatterns()` and reified in Skyframe as
				154	`TargetPatternPhaseValue`.
				155
				156	7. The loading/analysis phase is run to produce the action graph (a directed
				157	acyclic graph of commands that need to be executed for the build).
				158
				159	8. The execution phase is run. This means running every action required to
				160	build the top-level targets that are requested are run.
				161
				162	## Command line options
				163
				164	The command line options for a Bazel invocation are described in an
				165	`OptionsParsingResult` object, which in turn contains a map from "option
				166	classes" to the values of the options. An "option class" is a subclass of
				167	`OptionsBase` and groups command line options together that are related to each
				168	other. For example:
				169
				170	1. Options related to a programming language (`CppOptions` or `JavaOptions`).
				171	These should be a subclass of `FragmentOptions` and are eventually wrapped
				172	into a `BuildOptions` object.
				173	2. Options related to the way Bazel executes actions (`ExecutionOptions`)
				174
				175	These options are designed to be consumed in the analysis phase and (either
				176	through `RuleContext.getFragment()` in Java or `ctx.fragments` in Starlark).
				177	Some of them (for example, whether to do C++ include scanning or not) are read
				178	in the execution phase, but that always requires explicit plumbing since
				179	`BuildConfiguration` is not available then. For more information, see the
				180	section “Configurations”.
				181
				182	WARNING: We like to pretend that `OptionsBase` instances are immutable and
				183	use them that way (e.g. as part of `SkyKeys`). This is not the case and
				184	modifying them is a really good way to break Bazel in subtle ways that are hard
				185	to debug. Unfortunately, making them actually immutable is a large endeavor.
				186	(Modifying a `FragmentOptions` immediately after construction before anyone else
				187	gets a chance to keep a reference to it and before `equals()` or `hashCode()` is
				188	called on it is okay.)
				189
				190	Bazel learns about option classes in the following ways:
				191
				192	1. Some are hard-wired into Bazel (`CommonCommandOptions`)
				193	2. From the @Command annotation on each Bazel command
				194	3. From `ConfiguredRuleClassProvider` (these are command line options related
				195	to individual programming languages)
				196	4. Starlark rules can also define their own options (see
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	197	[here](https://bazel.build/rules/config))
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	198
				199	Each option (excluding Starlark-defined options) is a member variable of a
				200	`FragmentOptions` subclass that has the `@Option` annotation, which specifies
				201	the name and the type of the command line option along with some help text.
				202
				203	The Java type of the value of a command line option is usually something simple
				204	(a string, an integer, a Boolean, a label, etc.). However, we also support
				205	options of more complicated types; in this case, the job of converting from the
				206	command line string to the data type falls to an implementation of
				207	`com.google.devtools.common.options.Converter` .
				208
				209	## The source tree, as seen by Bazel
				210
				211	Bazel is in the business of building software, which happens by reading and
				212	interpreting the source code. The totality of the source code Bazel operates on
				213	is called "the workspace" and it is structured into repositories, packages and
				214	rules. A description of these concepts for the users of Bazel is available
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	215	[here](https://bazel.build/concepts/build-ref).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	216
				217	### Repositories
				218
				219	A "repository" is a source tree on which a developer works; it usually
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	220	represents a single project. Bazel's ancestor, Blaze, operated on a monorepo,
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	221	i.e. a single source tree that contains all source code used to run the build.
				222	Bazel, in contrast, supports projects whose source code spans multiple
				223	repositories. The repository from which Bazel is invoked is called the “main
				224	repository”, the others are called “external repositories”.
				225
				226	A repository is marked by a file called `WORKSPACE` (or `WORKSPACE.bazel`) in
				227	its root directory. This file contains information that is "global" to the whole
				228	build, for example, the set of available external repositories. It works like a
				229	regular Starlark file which means that one can `load()` other Starlark files.
				230	This is commonly used to pull in repositories that are needed by a repository
				231	that's explicitly referenced (we call this the "`deps.bzl` pattern")
				232
				233	Code of external repositories is symlinked or downloaded under
				234	`$OUTPUT_BASE/external`.
				235
				236	When running the build, the whole source tree needs to be pieced together; this
				237	is done by SymlinkForest, which symlinks every package in the main repository to
				238	`$EXECROOT` and every external repository to either `$EXECROOT/external` or
				239	`$EXECROOT/..` (the former of course makes it impossible to have a package
				240	called `external` in the main repository; that's why we are migrating away from
				241	it)
				242
				243	### Packages
				244
				245	Every repository is composed of packages, i.e. a collection of related files and
				246	a specification of the dependencies. These are specified by a file called
				247	`BUILD` or `BUILD.bazel`. If both exist, Bazel prefers `BUILD.bazel`; the reason
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	248	why BUILD files are still accepted is that Bazel’s ancestor, Blaze, used this
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	249	file name. However, it turned out to be a commonly used path segment, especially
				250	on Windows, where file names are case-insensitive.
				251
				252	Packages are independent of each other: changes to the BUILD file of a package
				253	cannot cause other packages to change. The addition or removal of BUILD files
				254	_can _change other packages, since recursive globs stop at package boundaries
				255	and thus the presence of a BUILD file stops the recursion.
				256
				257	The evaluation of a BUILD file is called "package loading". It's implemented in
				258	the class `PackageFactory`, works by calling the Starlark interpreter and
				259	requires knowledge of the set of available rule classes. The result of package
				260	loading is a `Package` object. It's mostly a map from a string (the name of a
				261	target) to the target itself.
				262
				263	A large chunk of complexity during package loading is globbing: Bazel does not
				264	require every source file to be explicitly listed and instead can run globs
				265	(e.g. `glob(["*/.java"])`). Unlike the shell, it supports recursive globs that
				266	descend into subdirectories (but not into subpackages). This requires access to
				267	the file system and since that can be slow, we implement all sorts of tricks to
				268	make it run in parallel and as efficiently as possible.
				269
				270	Globbing is implemented in the following classes:
				271
				272	* `LegacyGlobber`, a fast and blissfully Skyframe-unaware globber
				273	* `SkyframeHybridGlobber`, a version that uses Skyframe and reverts back to
				274	the legacy globber in order to avoid “Skyframe restarts” (described below)
				275
				276	The `Package` class itself contains some members that are exclusively used to
				277	parse the WORKSPACE file and which do not make sense for real packages. This is
				278	a design flaw because objects describing regular packages should not contain
				279	fields that describe something else. These include:
				280
				281	* The repository mappings
				282	* The registered toolchains
				283	* The registered execution platforms
				284
				285	Ideally, there would be more separation between parsing the WORKSPACE file from
				286	parsing regular packages so that `Package`does not need to cater for the needs
				287	of both. This is unfortunately difficult to do because the two are intertwined
				288	quite deeply.
				289
				290	### Labels, Targets and Rules
				291
				292	Packages are composed of targets, which have the following types:
				293
				294	1. Files: things that are either the input or the output of the build. In
				295	Bazel parlance, we call them _artifacts_ (discussed elsewhere). Not all
				296	files created during the build are targets; it’s common for an output of
				297	Bazel not to have an associated label.
				298	2. Rules: these describe steps to derive its outputs from its inputs. They
				299	are generally associated with a programming language (e.g. `cc_library`,
				300	`java_library` or `py_library`), but there are some language-agnostic ones
				301	(e.g. `genrule` or `filegroup`)
				302	3. Package groups: discussed in the [Visibility](#visibility) section.
				303
				304	The name of a target is called a _Label_. The syntax of labels is
				305	`@repo//pac/kage:name`, where `repo` is the name of the repository the Label is
				306	in, `pac/kage` is the directory its BUILD file is in and `name` is the path of
				307	the file (if the label refers to a source file) relative to the directory of the
				308	package. When referring to a target on the command line, some parts of the label
				309	can be omitted:
				310
				311	1. If the repository is omitted, the label is taken to be in the main
				312	repository.
				313	2. If the package part is omitted (e.g. `name` or `:name`), the label is taken
				314	to be in the package of the current working directory (relative paths
				315	containing uplevel references (..) are not allowed)
				316
				317	A kind of a rule (e.g. "C++ library") is called a "rule class". Rule classes may
				318	be implemented either in Starlark (the `rule()` function) or in Java (so called
				319	“native rules”, type `RuleClass`). In the long term, every language-specific
				320	rule will be implemented in Starlark, but some legacy rule families (e.g. Java
				321	or C++) are still in Java for the time being.
				322
				323	Starlark rule classes need to be imported at the beginning of BUILD files using
				324	the `load()` statement, whereas Java rule classes are "innately" known by Bazel,
				325	by virtue of being registered with the `ConfiguredRuleClassProvider`.
				326
				327	Rule classes contain information such as:
				328
				329	1. Its attributes (e.g., `srcs`, `deps`): their types, default values,
				330	constraints, etc.
				331	2. The configuration transitions and aspects attached to each attribute, if any
				332	3. The implementation of the rule
				333	4. The transitive info providers the rule "usually" creates
				334
				335	Terminology note: In the code base, we often use “Rule” to mean the target
				336	created by a rule class. But in Starlark and in user-facing documentation,
				337	“Rule” should be used exclusively to refer to the rule class itself; the target
				338	is just a “target”. Also note that despite `RuleClass` having “class” in its
				339	name, there is no Java inheritance relationship between a rule class and targets
				340	of that type.
				341
				342	## Skyframe
				343
				344	The evaluation framework underlying Bazel is called Skyframe. Its model is that
				345	everything that needs to be built during a build is organized into a directed
				346	acyclic graph with edges pointing from any pieces of data to its dependencies,
				347	that is, other pieces of data that need to be known to construct it.
				348
				349	The nodes in the graph are called `SkyValue`s and their names are called
				350	`SkyKey`s. Both are deeply immutable, i.e. only immutable objects should be
				351	reachable from them. This invariant almost always holds, and in case it doesn't
				352	(e.g. for the individual options classes `BuildOptions`, which is a member of
				353	`BuildConfigurationValue` and its `SkyKey`) we try really hard not to change
				354	them or to change them in only ways that are not observable from the outside.
				355	From this it follows that everything that is computed within Skyframe (e.g.
				356	configured targets) must also be immutable.
				357
				358	The most convenient way to observe the Skyframe graph is to run `bazel dump
				359	--skyframe=detailed`, which dumps the graph, one `SkyValue` per line. It's best
				360	to do it for tiny builds, since it can get pretty large.
				361
				362	Skyframe lives in the `com.google.devtools.build.skyframe` package. The
				363	similarly-named package `com.google.devtools.build.lib.skyframe` contains the
				364	implementation of Bazel on top of Skyframe. More information about Skyframe is
				365	available [here](https://bazel.build/designs/skyframe.html).
				366
				367	Generating a new `SkyValue` involves the following steps:
				368
				369	1. Running the associated `SkyFunction`
				370	2. Declaring the dependencies (i.e. `SkyValue`s) that the `SkyFunction` needs
				371	to do its job. This is done by calling the various overloads of
				372	`SkyFunction.Environment.getValue()`.
				373	3. If a dependency is not available, Skyframe signals that by returning null
				374	from `getValue()`. In this case, the `SkyFunction` is expected to yield
				375	control to Skyframe by returning null, then Skyframe evaluates the
				376	dependencies that haven't been evaluated yet and calls the `SkyFunction`
				377	again, thus going back to (1).
				378	4. Constructing the resulting `SkyValue`
				379
				380	A consequence of this is that if not all dependencies are available in (3), the
				381	function needs to be completely restarted and thus computation needs to be
nharmata	17e84b3	2022-01-05 10:57:58 -0800	[diff] [blame]	382	re-done, which is obviously inefficient. `SkyFunction.Environment.getState()`
				383	lets us directly work around this issue by having Skyframe maintain the
				384	`SkyKeyComputeState` instance between calls to `SkyFunction.compute` for the
				385	same `SkyKey`. Check out the example in the javadoc for
				386	`SkyFunction.Environment.getState()`, as well as real usages in the Bazel
				387	codebase.
				388
				389	Other indirect workarounds:
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	390
				391	1. Declaring dependencies of `SkyFunction`s in groups so that if a function
				392	has, say, 10 dependencies, it only needs to restart once instead of ten
				393	times.
				394	2. Splitting `SkyFunction`s so that one function does not need to be restarted
				395	many times. This has the side effect of interning data into Skyframe that
				396	may be internal to the `SkyFunction`, thus increasing memory use.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	397
nharmata	17e84b3	2022-01-05 10:57:58 -0800	[diff] [blame]	398	These are all just workarounds for the limitations of Skyframe, which
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	399	is mostly a consequence of the fact that Java doesn't support lightweight
				400	threads and that we routinely have hundreds of thousands of in-flight Skyframe
				401	nodes.
				402
				403	## Starlark
				404
				405	Starlark is the domain-specific language people use to configure and extend
				406	Bazel. It's conceived as a restricted subset of Python that has far fewer types,
				407	more restrictions on control flow, and most importantly, strong immutability
				408	guarantees to enable concurrent reads. It is not Turing-complete, which
				409	discourages some (but not all) users from trying to accomplish general
				410	programming tasks within the language.
				411
				412	Starlark is implemented in the `com.google.devtools.build.lib.syntax` package.
				413	It also has an independent Go implementation
				414	[here](https://github.com/google/starlark-go). The Java implementation used in
				415	Bazel is currently an interpreter.
				416
				417	Starlark is used in four contexts:
				418
				419	1. The BUILD language. This is where new rules are defined. Starlark code
				420	running in this context only has access to the contents of the BUILD file
				421	itself and Starlark files loaded by it.
				422	2. Rule definitions. This is how new rules (e.g. support for a new
				423	language) are defined. Starlark code running in this context has access to
				424	the configuration and data provided by its direct dependencies (more on this
				425	later).
				426	3. The WORKSPACE file. This is where external repositories (code that's not
				427	in the main source tree) are defined.
				428	4. Repository rule definitions. This is where new external repository types
				429	are defined. Starlark code running in this context can run arbitrary code on
				430	the machine where Bazel is running, and reach outside the workspace.
				431
				432	The dialects available for BUILD and .bzl files are slightly different because
				433	they express different things. A list of differences is available
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	434	[here](https://bazel.build/rules/language#differences-between-build-and-bzl-files).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	435
				436	More information about Starlark is available
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	437	[here](https://bazel.build/rules/language).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	438
				439	## The loading/analysis phase
				440
				441	The loading/analysis phase is where Bazel determines what actions are needed to
				442	build a particular rule. Its basic unit is a "configured target", which is,
				443	quite sensibly, a (target, configuration) pair.
				444
				445	It's called the "loading/analysis phase" because it can be split into two
				446	distinct parts, which used to be serialized, but they can now overlap in time:
				447
				448	1. Loading packages, that is, turning BUILD files into the `Package` objects
				449	that represent them
				450	2. Analyzing configured targets, that is, running the implementation of the
				451	rules to produce the action graph
				452
				453	Each configured target in the transitive closure of the configured targets
				454	requested on the command line must be analyzed bottom-up, i.e. leaf nodes first,
				455	then up to the ones on the command line. The inputs to the analysis of a single
				456	configured target are:
				457
				458	1. The configuration. ("how" to build that rule; for example, the target
				459	platform but also things like command line options the user wants to be
				460	passed to the C++ compiler)
				461	2. The direct dependencies. Their transitive info providers are available
				462	to the rule being analyzed. They are called like that because they provide a
				463	"roll-up" of the information in the transitive closure of the configured
				464	target, e.g. all the .jar files on the classpath or all the .o files that
				465	need to be linked into a C++ binary)
				466	3. The target itself. This is the result of loading the package the target
				467	is in. For rules, this includes its attributes, which is usually what
				468	matters.
				469	4. The implementation of the configured target. For rules, this can either
				470	be in Starlark or in Java. All non-rule configured targets are implemented
				471	in Java.
				472
				473	The output of analyzing a configured target is:
				474
				475	1. The transitive info providers that configured targets that depend on it can
				476	access
				477	2. The artifacts it can create and the actions that produce them.
				478
				479	The API offered to Java rules is `RuleContext`, which is the equivalent of the
				480	`ctx` argument of Starlark rules. Its API is more powerful, but at the same
				481	time, it's easier to do Bad Things™, for example to write code whose time or
				482	space complexity is quadratic (or worse), to make the Bazel server crash with a
				483	Java exception or to violate invariants (e.g. by inadvertently modifying an
				484	`Options` instance or by making a configured target mutable)
				485
				486	The algorithm that determines the direct dependencies of a configured target
				487	lives in `DependencyResolver.dependentNodeMap()`.
				488
				489	### Configurations
				490
				491	Configurations are the "how" of building a target: for what platform, with what
				492	command line options, etc.
				493
				494	The same target can be built for multiple configurations in the same build. This
				495	is useful, for example, when the same code is used for a tool that's run during
				496	the build and for the target code and we are cross-compiling or when we are
				497	building a fat Android app (one that contains native code for multiple CPU
				498	architectures)
				499
				500	Conceptually, the configuration is a `BuildOptions` instance. However, in
				501	practice, `BuildOptions` is wrapped by `BuildConfiguration` that provides
				502	additional sundry pieces of functionality. It propagates from the top of the
				503	dependency graph to the bottom. If it changes, the build needs to be
				504	re-analyzed.
				505
				506	This results in anomalies like having to re-analyze the whole build if e.g. the
				507	number of requested test runs changes, even though that only affects test
				508	targets (we have plans to "trim" configurations so that this is not the case,
				509	but it's not ready yet)
				510
				511	When a rule implementation needs part of the configuration, it needs to declare
				512	it in its definition using `RuleClass.Builder.requiresConfigurationFragments()`
				513	. This is both to avoid mistakes (e.g. Python rules using the Java fragment) and
				514	to facilitate configuration trimming so that e.g. if Python options change, C++
				515	targets don't need to be re-analyzed.
				516
				517	The configuration of a rule is not necessarily the same as that of its "parent"
				518	rule. The process of changing the configuration in a dependency edge is called a
				519	"configuration transition". It can happen in two places:
				520
				521	1. On a dependency edge. These transitions are specified in
				522	`Attribute.Builder.cfg()` and are functions from a `Rule` (where the
				523	transition happens) and a `BuildOptions` (the original configuration) to one
				524	or more `BuildOptions` (the output configuration).
				525	2. On any incoming edge to a configured target. These are specified in
				526	`RuleClass.Builder.cfg()`.
				527
				528	The relevant classes are `TransitionFactory` and `ConfigurationTransition`.
				529
				530	Configuration transitions are used, for example:
				531
				532	1. To declare that a particular dependency is used during the build and it
				533	should thus be built in the execution architecture
				534	2. To declare that a particular dependency must be built for multiple
				535	architectures (e.g. for native code in fat Android APKs)
				536
				537	If a configuration transition results in multiple configurations, it's called a
				538	_split transition._
				539
				540	Configuration transitions can also be implemented in Starlark (documentation
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	541	[here](https://bazel.build/rules/config))
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	542
				543	### Transitive info providers
				544
				545	Transitive info providers are a way (and the _only _way) for configured targets
				546	to tell things about other configured targets that depend on it. The reason why
				547	"transitive" is in their name is that this is usually some sort of roll-up of
				548	the transitive closure of a configured target.
				549
				550	There is generally a 1:1 correspondence between Java transitive info providers
				551	and Starlark ones (the exception is `DefaultInfo` which is an amalgamation of
				552	`FileProvider`, `FilesToRunProvider` and `RunfilesProvider` because that API was
				553	deemed to be more Starlark-ish than a direct transliteration of the Java one).
				554	Their key is one of the following things:
				555
				556	1. A Java Class object. This is only available for providers that are not
				557	accessible from Starlark. These providers are a subclass of
				558	`TransitiveInfoProvider`.
				559	2. A string. This is legacy and heavily discouraged since it's susceptible to
				560	name clashes. Such transitive info providers are direct subclasses of
				561	`build.lib.packages.Info` .
				562	3. A provider symbol. This can be created from Starlark using the `provider()`
				563	function and is the recommended way to create new providers. The symbol is
				564	represented by a `Provider.Key` instance in Java.
				565
				566	New providers implemented in Java should be implemented using `BuiltinProvider`.
				567	`NativeProvider` is deprecated (we haven't had time to remove it yet) and
				568	`TransitiveInfoProvider` subclasses cannot be accessed from Starlark.
				569
				570	### Configured targets
				571
				572	Configured targets are implemented as `RuleConfiguredTargetFactory`. There is a
				573	subclass for each rule class implemented in Java. Starlark configured targets
Xavier Bonaventura	fbb19fb	2021-06-02 09:53:05 -0700	[diff] [blame]	574	are created through `StarlarkRuleConfiguredTargetUtil.buildRule()` .
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	575
				576	Configured target factories should use `RuleConfiguredTargetBuilder` to
				577	construct their return value. It consists of the following things:
				578
				579	1. Their `filesToBuild`, i.e. the hazy concept of "the set of files this rule
				580	represents". These are the files that get built when the configured target
				581	is on the command line or in the srcs of a genrule.
				582	2. Their runfiles, regular and data.
				583	3. Their output groups. These are various "other sets of files" the rule can
				584	build. They can be accessed using the output\_group attribute of the
				585	filegroup rule in BUILD and using the `OutputGroupInfo` provider in Java.
				586
				587	### Runfiles
				588
				589	Some binaries need data files to run. A prominent example is tests that need
				590	input files. This is represented in Bazel by the concept of "runfiles". A
				591	"runfiles tree" is a directory tree of the data files for a particular binary.
				592	It is created in the file system as a symlink tree with individual symlinks
				593	pointing to the files in the source of output trees.
				594
				595	A set of runfiles is represented as a `Runfiles` instance. It is conceptually a
				596	map from the path of a file in the runfiles tree to the `Artifact` instance that
				597	represents it. It's a little more complicated than a single `Map` for two
				598	reasons:
				599
				600	* Most of the time, the runfiles path of a file is the same as its execpath.
				601	We use this to save some RAM.
				602	* There are various legacy kinds of entries in runfiles trees, which also need
				603	to be represented.
				604
				605	Runfiles are collected using `RunfilesProvider`: an instance of this class
				606	represents the runfiles a configured target (e.g. a library) and its transitive
				607	closure needs and they are gathered like a nested set (in fact, they are
				608	implemented using nested sets under the cover): each target unions the runfiles
				609	of its dependencies, adds some of its own, then sends the resulting set upwards
				610	in the dependency graph. A `RunfilesProvider` instance contains two `Runfiles`
				611	instances, one for when the rule is depended on through the "data" attribute and
				612	one for every other kind of incoming dependency. This is because a target
				613	sometimes presents different runfiles when depended on through a data attribute
				614	than otherwise. This is undesired legacy behavior that we haven't gotten around
				615	removing yet.
				616
				617	Runfiles of binaries are represented as an instance of `RunfilesSupport`. This
				618	is different from `Runfiles` because `RunfilesSupport` has the capability of
				619	actually being built (unlike `Runfiles`, which is just a mapping). This
				620	necessitates the following additional components:
				621
				622	* The input runfiles manifest. This is a serialized description of the
				623	runfiles tree. It is used as a proxy for the contents of the runfiles tree
				624	and Bazel assumes that the runfiles tree changes if and only if the contents
				625	of the manifest change.
				626	* The output runfiles manifest. This is used by runtime libraries that
				627	handle runfiles trees, notably on Windows, which sometimes doesn't support
				628	symbolic links.
				629	* The runfiles middleman. In order for a runfiles tree to exist, one needs
				630	to build the symlink tree and the artifact the symlinks point to. In order
				631	to decrease the number of dependency edges, the runfiles middleman can be
				632	used to represent all these.
				633	* Command line arguments for running the binary whose runfiles the
				634	`RunfilesSupport` object represents.
				635
				636	### Aspects
				637
				638	Aspects are a way to "propagate computation down the dependency graph". They are
				639	described for users of Bazel
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	640	[here](https://bazel.build/rules/aspects). A good
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	641	motivating example is protocol buffers: a `proto_library` rule should not know
				642	about any particular language, but building the implementation of a protocol
				643	buffer message (the “basic unit” of protocol buffers) in any programming
				644	language should be coupled to the `proto_library` rule so that if two targets in
				645	the same language depend on the same protocol buffer, it gets built only once.
				646
				647	Just like configured targets, they are represented in Skyframe as a `SkyValue`
				648	and the way they are constructed is very similar to how configured targets are
				649	built: they have a factory class called `ConfiguredAspectFactory` that has
				650	access to a `RuleContext`, but unlike configured target factories, it also knows
				651	about the configured target it is attached to and its providers.
				652
				653	The set of aspects propagated down the dependency graph is specified for each
				654	attribute using the `Attribute.Builder.aspects()` function. There are a few
				655	confusingly-named classes that participate in the process:
				656
				657	1. `AspectClass` is the implementation of the aspect. It can be either in Java
				658	(in which case it's a subclass) or in Starlark (in which case it's an
Xavier Bonaventura	fbb19fb	2021-06-02 09:53:05 -0700	[diff] [blame]	659	instance of `StarlarkAspectClass`). It's analogous to
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	660	`RuleConfiguredTargetFactory`.
				661	2. `AspectDefinition` is the definition of the aspect; it includes the
				662	providers it requires, the providers it provides and contains a reference to
				663	its implementation, i.e. the appropriate `AspectClass` instance. It's
				664	analogous to `RuleClass`.
				665	3. `AspectParameters` is a way to parametrize an aspect that is propagated down
				666	the dependency graph. It's currently a string to string map. A good example
				667	of why it's useful is protocol buffers: if a language has multiple APIs, the
				668	information as to which API the protocol buffers should be built for should
				669	be propagated down the dependency graph.
				670	4. `Aspect` represents all the data that's needed to compute an aspect that
				671	propagates down the dependency graph. It consists of the aspect class, its
				672	definition and its parameters.
				673	5. `RuleAspect` is the function that determines which aspects a particular rule
				674	should propagate. It's a `Rule` -> `Aspect` function.
				675
				676	A somewhat unexpected complication is that aspects can attach to other aspects;
				677	for example, an aspect collecting the classpath for a Java IDE will probably
				678	want to know about all the .jar files on the classpath, but some of them are
				679	protocol buffers. In that case, the IDE aspect will want to attach to the
				680	(`proto_library` rule + Java proto aspect) pair.
				681
				682	The complexity of aspects on aspects is captured in the class
				683	`AspectCollection`.
				684
				685	### Platforms and toolchains
				686
				687	Bazel supports multi-platform builds, that is, builds where there may be
				688	multiple architectures where build actions run and multiple architectures for
				689	which code is built. These architectures are referred to as _platforms_ in Bazel
				690	parlance (full documentation
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	691	[here](https://bazel.build/docs/platforms))
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	692
				693	A platform is described by a key-value mapping from _constraint settings_ (e.g.
				694	the concept of "CPU architecture") to _constraint values_ (e.g. a particular CPU
				695	like x86\_64). We have a "dictionary" of the most commonly used constraint
				696	settings and values in the `@platforms` repository.
				697
				698	The concept of _toolchain_ comes from the fact that depending on what platforms
				699	the build is running on and what platforms are targeted, one may need to use
				700	different compilers; for example, a particular C++ toolchain may run on a
				701	specific OS and be able to target some other OSes. Bazel must determine the C++
				702	compiler that is used based on the set execution and target platform
				703	(documentation for toolchains
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	704	[here](https://bazel.build/docs/toolchains)).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	705
				706	In order to do this, toolchains are annotated with the set of execution and
				707	target platform constraints they support. In order to do this, the definition of
				708	a toolchain are split into two parts:
				709
				710	1. A `toolchain()` rule that describes the set of execution and target
				711	constraints a toolchain supports and tells what kind (e.g. C++ or Java) of
				712	toolchain it is (the latter is represented by the `toolchain_type()` rule)
				713	2. A language-specific rule that describes the actual toolchain (e.g.
				714	`cc_toolchain()`)
				715
				716	This is done in this way because we need to know the constraints for every
				717	toolchain in order to do toolchain resolution and language-specific
				718	`*_toolchain()` rules contain much more information than that, so they take more
				719	time to load.
				720
				721	Execution platforms are specified in one of the following ways:
				722
				723	1. In the WORKSPACE file using the `register_execution_platforms()` function
				724	2. On the command line using the --extra\_execution\_platforms command line
				725	option
				726
				727	The set of available execution platforms is computed in
				728	`RegisteredExecutionPlatformsFunction` .
				729
				730	The target platform for a configured target is determined by
				731	`PlatformOptions.computeTargetPlatform()` . It's a list of platforms because we
				732	eventually want to support multiple target platforms, but it's not implemented
				733	yet.
				734
				735	The set of toolchains to be used for a configured target is determined by
				736	`ToolchainResolutionFunction`. It is a function of:
				737
				738	* The set of registered toolchains (in the WORKSPACE file and the
				739	configuration)
				740	* The desired execution and target platforms (in the configuration)
				741	* The set of toolchain types that are required by the configured target (in
				742	`UnloadedToolchainContextKey)`
				743	* The set of execution platform constraints of the configured target (the
				744	`exec_compatible_with` attribute) and the configuration
				745	(`--experimental_add_exec_constraints_to_targets`), in
				746	`UnloadedToolchainContextKey`
				747
				748	Its result is an `UnloadedToolchainContext`, which is essentially a map from
				749	toolchain type (represented as a `ToolchainTypeInfo` instance) to the label of
				750	the selected toolchain. It's called "unloaded" because it does not contain the
				751	toolchains themselves, only their labels.
				752
				753	Then the toolchains are actually loaded using `ResolvedToolchainContext.load()`
				754	and used by the implementation of the configured target that requested them.
				755
				756	We also have a legacy system that relies on there being one single "host"
				757	configuration and target configurations being represented by various
				758	configuration flags, e.g. `--cpu` . We are gradually transitioning to the above
				759	system. In order to handle cases where people rely on the legacy configuration
				760	values, we have implemented
				761	"[platform mappings](https://docs.google.com/document/d/1Vg_tPgiZbSrvXcJ403vZVAGlsWhH9BUDrAxMOYnO0Ls)"
				762	to translate between the legacy flags and the new-style platform constraints.
				763	Their code is in `PlatformMappingFunction` and uses a non-Starlark "little
				764	language".
				765
				766	### Constraints
				767
				768	Sometimes one wants to designate a target as being compatible with only a few
				769	platforms. Bazel has (unfortunately) multiple mechanisms to achieve this end:
				770
				771	* Rule-specific constraints
				772	* `environment_group()` / `environment()`
				773	* Platform constraints
				774
				775	Rule-specific constraints are mostly used within Google for Java rules; they are
				776	on their way out and they are not available in Bazel, but the source code may
				777	contain references to it. The attribute that governs this is called
				778	`constraints=` .
				779
				780	#### environment_group() and environment()
				781
				782	These rules are a legacy mechanism and are not widely used.
				783
				784	All build rules can declare which "environments" they can be built for, where a
				785	"environment" is an instance of the `environment()` rule.
				786
				787	There are various ways supported environments can be specified for a rule:
				788
				789	1. Through the `restricted_to=` attribute. This is the most direct form of
				790	specification; it declares the exact set of environments the rule supports
				791	for this group.
				792	2. Through the `compatible_with=` attribute. This declares environments a rule
				793	supports in addition to "standard" environments that are supported by
				794	default.
				795	3. Through the package-level attributes `default_restricted_to=` and
				796	`default_compatible_with=`.
				797	4. Through default specifications in `environment_group()` rules. Every
				798	environment belongs to a group of thematically related peers (e.g. "CPU
				799	architectures", "JDK versions" or "mobile operating systems"). The
				800	definition of an environment group includes which of these environments
				801	should be supported by "default" if not otherwise specified by the
				802	`restricted_to=` / `environment()` attributes. A rule with no such
				803	attributes inherits all defaults.
				804	5. Through a rule class default. This overrides global defaults for all
				805	instances of the given rule class. This can be used, for example, to make
				806	all `*_test` rules testable without each instance having to explicitly
				807	declare this capability.
				808
				809	`environment()` is implemented as a regular rule whereas `environment_group()`
				810	is both a subclass of `Target` but not `Rule` (`EnvironmentGroup`) and a
				811	function that is available by default from Starlark
				812	(`StarlarkLibrary.environmentGroup()`) which eventually creates an eponymous
				813	target. This is to avoid a cyclic dependency that would arise because each
				814	environment needs to declare the environment group it belongs to and each
				815	environment group needs to declare its default environments.
				816
				817	A build can be restricted to a certain environment with the
				818	`--target_environment` command line option.
				819
				820	The implementation of the constraint check is in
				821	`RuleContextConstraintSemantics` and `TopLevelConstraintSemantics`.
				822
				823	#### Platform constraints
				824
				825	The current "official" way to describe what platforms a target is compatible
				826	with is by using the same constraints used to describe toolchains and platforms.
				827	It's under review in pull request
				828	[#10945](https://github.com/bazelbuild/bazel/pull/10945).
				829
				830	### Visibility
				831
				832	If you work on a large codebase with a lot of developers (like at Google), you
				833	don't necessarily want everyone else to be able to depend on your code so that
				834	you retain the liberty to change things that you deem to be implementation
				835	details (otherwise, as per [Hyrum's law](https://www.hyrumslaw.com/), people
				836	_will_ come to depend on all parts of your code).
				837
				838	Bazel supports this by the mechanism called _visibility: _you can declare that a
				839	particular rule can only be depended on using the visibility attribute
				840	(documentation
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	841	[here](https://bazel.build/reference/be/common-definitions#common-attributes)).
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	842	This attribute is a little special because unlike every other attribute, the set
				843	of dependencies it generates is not simply the set of labels listed (yes, this
				844	is a design flaw).
				845
				846	This is implemented in the following places:
				847
				848	* The `RuleVisibility` interface represents a visibility declaration. It can
				849	be either a constant (fully public or fully private) or a list of labels.
				850	* Labels can refer to either package groups (predefined list of packages), to
				851	packages directly (`//pkg:__pkg__`) or subtrees of packages
				852	(`//pkg:__subpackages__`). This is different from the command line syntax,
				853	which uses `//pkg:*` or `//pkg/...`.
				854	* Package groups are implemented as their own target and configured target
				855	types (`PackageGroup` and `PackageGroupConfiguredTarget`). We could probably
				856	replace these with simple rules if we wanted to.
				857	* The conversion from visibility label lists to dependencies is done in
				858	`DependencyResolver.visitTargetVisibility` and a few other miscellaneous
				859	places.
				860	* The actual check is done in
				861	`CommonPrerequisiteValidator.validateDirectPrerequisiteVisibility()`
				862
				863	### Nested sets
				864
				865	Oftentimes, a configured target aggregates a set of files from its dependencies,
				866	adds its own, and wraps the aggregate set into a transitive info provider so
				867	that configured targets that depend on it can do the same. Examples:
				868
				869	* The C++ header files used for a build
				870	* The object files that represent the transitive closure of a `cc_library`
				871	* The set of .jar files that need to be on the classpath for a Java rule to
				872	compile or run
				873	* The set of Python files in the transitive closure of a Python rule
				874
				875	If we did this the naive way by using e.g. `List` or `Set`, we'd end up with
				876	quadratic memory usage: if there is a chain of N rules and each rule adds a
				877	file, we'd have 1+2+...+N collection members.
				878
				879	In order to get around this problem, we came up with the concept of a
				880	`NestedSet`. It's a data structure that is composed of other `NestedSet`
				881	instances and some members of its own, thereby forming a directed acyclic graph
				882	of sets. They are immutable and their members can be iterated over. We define
				883	multiple iteration order (`NestedSet.Order`): preorder, postorder, topological
				884	(a node always comes after its ancestors) and "don't care, but it should be the
				885	same each time".
				886
				887	The same data structure is called `depset` in Starlark.
				888
				889	### Artifacts and Actions
				890
				891	The actual build consists of a set of commands that need to be run to produce
				892	the output the user wants. The commands are represented as instances of the
				893	class `Action` and the files are represented as instances of the class
				894	`Artifact`. They are arranged in a bipartite, directed, acyclic graph called the
				895	"action graph".
				896
				897	Artifacts come in two kinds: source artifacts (i.e. ones that are available
				898	before Bazel starts executing) and derived artifacts (ones that need to be
				899	built). Derived artifacts can themselves be multiple kinds:
				900
				901	1. Regular artifacts. These are checked for up-to-dateness by computing
				902	their checksum, with mtime as a shortcut; we don't checksum the file if its
				903	ctime hasn't changed.
				904	2. Unresolved symlink artifacts. These are checked for up-to-dateness by
				905	calling readlink(). Unlike regular artifacts, these can be dangling
				906	symlinks. Usually used in cases where one then packs up some files into an
				907	archive of some sort.
				908	3. Tree artifacts. These are not single files, but directory trees. They
				909	are checked for up-to-dateness by checking the set of files in it and their
				910	contents. They are represented as a `TreeArtifact`.
				911	4. Constant metadata artifacts. Changes to these artifacts don't trigger a
				912	rebuild. This is used exclusively for build stamp information: we don't want
				913	to do a rebuild just because the current time changed.
				914
				915	There is no fundamental reason why source artifacts cannot be tree artifacts or
				916	unresolved symlink artifacts, it's just that we haven't implemented it yet (we
				917	should, though -- referencing a source directory in a BUILD file is one of the
				918	few known long-standing incorrectness issues with Bazel; we have an
				919	implementation that kind of works which is enabled by the
				920	`BAZEL_TRACK_SOURCE_DIRECTORIES=1` JVM property)
				921
				922	A notable kind of `Artifact` are middlemen. They are indicated by `Artifact`
				923	instances that are the outputs of `MiddlemanAction`. They are used to
				924	special-case some things:
				925
				926	* Aggregating middlemen are used to group artifacts together. This is so that
				927	if a lot of actions use the same large set of inputs, we don't have N\*M
				928	dependency edges, only N+M (they are being replaced with nested sets)
				929	* Scheduling dependency middlemen ensure that an action runs before another.
				930	They are mostly used for linting but also for C++ compilation (see
				931	`CcCompilationContext.createMiddleman()` for an explanation)
				932	* Runfiles middlemen are used to ensure the presence of a runfiles tree so
				933	that one does not separately need to depend on the output manifest and every
				934	single artifact referenced by the runfiles tree.
				935
				936	Actions are best understood as a command that needs to be run, the environment
				937	it needs and the set of outputs it produces. The following things are the main
				938	components of the description of an action:
				939
				940	* The command line that needs to be run
				941	* The input artifacts it needs
				942	* The environment variables that need to be set
				943	* Annotations that describe the environment (e.g. platform) it needs to run in
				944	\
				945
				946	There are also a few other special cases, like writing a file whose content is
				947	known to Bazel. They are a subclass of `AbstractAction`. Most of the actions are
				948	a `SpawnAction` or a `StarlarkAction` (the same, they should arguably not be
				949	separate classes), although Java and C++ have their own action types
				950	(`JavaCompileAction`, `CppCompileAction` and `CppLinkAction`).
				951
				952	We eventually want to move everything to `SpawnAction`; `JavaCompileAction` is
				953	pretty close, but C++ is a bit of a special-case due to .d file parsing and
				954	include scanning.
				955
				956	The action graph is mostly "embedded" into the Skyframe graph: conceptually, the
				957	execution of an action is represented as an invocation of
				958	`ActionExecutionFunction`. The mapping from an action graph dependency edge to a
				959	Skyframe dependency edge is described in
				960	`ActionExecutionFunction.getInputDeps()` and `Artifact.key()` and has a few
				961	optimizations in order to keep the number of Skyframe edges low:
				962
				963	* Derived artifacts do not have their own `SkyValue`s. Instead,
				964	`Artifact.getGeneratingActionKey()` is used to find out the key for the
				965	action that generates it
				966	* Nested sets have their own Skyframe key.
				967
				968	### Shared actions
				969
				970	Some actions are generated by multiple configured targets; Starlark rules are
				971	more limited since they are only allowed to put their derived actions into a
				972	directory determined by their configuration and their package (but even so,
				973	rules in the same package can conflict), but rules implemented in Java can put
				974	derived artifacts anywhere.
				975
				976	This is considered to be a misfeature, but getting rid of it is really hard
				977	because it produces significant savings in execution time when e.g. a source
				978	file needs to be processed somehow and that file is referenced by multiple rules
				979	(handwave-handwave). This comes at the cost of some RAM: each instance of a
				980	shared action needs to be stored in memory separately.
				981
				982	If two actions generate the same output file, they must be exactly the same:
				983	have the same inputs, the same outputs and run the same command line. This
				984	equivalence relation is implemented in `Actions.canBeShared()` and it is
				985	verified between the analysis and execution phases by looking at every Action.
				986	This is implemented in `SkyframeActionExecutor.findAndStoreArtifactConflicts()`
				987	and is one of the few places in Bazel that requires a "global" view of the
				988	build.
				989
				990	## The execution phase
				991
				992	This is when Bazel actually starts running build actions, i.e. commands that
				993	produce outputs.
				994
				995	The first thing Bazel does after the analysis phase is to determine what
				996	Artifacts need to be built. The logic for this is encoded in
				997	`TopLevelArtifactHelper`; roughly speaking, it's the `filesToBuild` of the
				998	configured targets on the command line and the contents of a special output
				999	group for the explicit purpose of expressing "if this target is on the command
				1000	line, build these artifacts".
				1001
				1002	The next step is creating the execution root. Since Bazel has the option to read
				1003	source packages from different locations in the file system (`--package_path`),
				1004	it needs to provide locally executed actions with a full source tree. This is
				1005	handled by the class `SymlinkForest` and works by taking note of every target
				1006	used in the analysis phase and building up a single directory tree that symlinks
				1007	every package with a used target from its actual location. An alternative would
				1008	be to pass the correct paths to commands (taking `--package_path` into account).
				1009	This is undesirable because:
				1010
				1011	* It changes action command lines when a package is moved from a package path
				1012	entry to another (used to be a common occurrence)
				1013	* It results in different command lines if an action is run remotely than if
				1014	it's run locally
				1015	* It requires a command line transformation specific to the tool in use
				1016	(consider the difference between e.g. Java classpaths and C++ include paths)
				1017	* Changing the command line of an action invalidates its action cache entry
				1018	* `--package_path` is slowly and steadily being deprecated
				1019
				1020	Then, Bazel starts traversing the action graph (the bipartite, directed graph
				1021	composed of actions and their input and output artifacts) and running actions.
				1022	The execution of each action is represented by an instance of the `SkyValue`
				1023	class `ActionExecutionValue`.
				1024
				1025	Since running an action is expensive, we have a few layers of caching that can
				1026	be hit behind Skyframe:
				1027
				1028	* `ActionExecutionFunction.stateMap` contains data to make Skyframe restarts
				1029	of `ActionExecutionFunction` cheap
				1030	* The local action cache contains data about the state of the file system
				1031	* Remote execution systems usually also contain their own cache
				1032
				1033	### The local action cache
				1034
				1035	This cache is another layer that sits behind Skyframe; even if an action is
				1036	re-executed in Skyframe, it can still be a hit in the local action cache. It
				1037	represents the state of the local file system and it's serialized to disk which
				1038	means that when one starts up a new Bazel server, one can get local action cache
				1039	hits even though the Skyframe graph is empty.
				1040
				1041	This cache is checked for hits using the method
				1042	`ActionCacheChecker.getTokenIfNeedToExecute()` .
				1043
				1044	Contrary to its name, it's a map from the path of a derived artifact to the
				1045	action that emitted it. The action is described as:
				1046
				1047	1. The set of its input and output files and their checksum
				1048	2. Its "action key", which is usually the command line that was executed, but
				1049	in general, represents everything that's not captured by the checksum of the
				1050	input files (e.g. for `FileWriteAction`, it's the checksum of the data
				1051	that's written)
				1052
				1053	There is also a highly experimental “top-down action cache” that is still under
				1054	development, which uses transitive hashes to avoid going to the cache as many
				1055	times.
				1056
				1057	### Input discovery and input pruning
				1058
				1059	Some actions are more complicated than just having a set of inputs. Changes to
				1060	the set of inputs of an action come in two forms:
				1061
				1062	* An action may discover new inputs before its execution or decide that some
				1063	of its inputs are not actually necessary. The canonical example is C++,
				1064	where it's better to make an educated guess about what header files a C++
				1065	file uses from its transitive closure so that we don't heed to send every
				1066	file to remote executors; therefore, we have an option not to register every
				1067	header file as an "input", but scan the source file for transitively
				1068	included headers and only mark those header files as inputs that are
				1069	mentioned in `#include` statements (we overestimate so that we don't need to
lberki	1df4c71	2021-05-17 05:15:13 -0700	[diff] [blame]	1070	implement a full C preprocessor) This option is currently hard-wired to
				1071	"false" in Bazel and is only used at Google.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1072	* An action may realize that some files were not used during its execution. In
				1073	C++, this is called ".d files": the compiler tells which header files were
				1074	used after the fact, and in order to avoid the embarrassment of having worse
				1075	incrementality than Make, Bazel makes use of this fact. This offers a better
				1076	estimate than the include scanner because it relies on the compiler.
				1077
				1078	These are implemented using methods on Action:
				1079
				1080	1. `Action.discoverInputs()` is called. It should return a nested set of
				1081	Artifacts that are determined to be required. These must be source artifacts
				1082	so that there are no dependency edges in the action graph that don't have an
				1083	equivalent in the configured target graph.
				1084	2. The action is executed by calling `Action.execute()`.
				1085	3. At the end of `Action.execute()`, the action can call
				1086	`Action.updateInputs()` to tell Bazel that not all of its inputs were
				1087	needed. This can result in incorrect incremental builds if a used input is
				1088	reported as unused.
				1089
				1090	When an action cache returns a hit on a fresh Action instance (e.g. created
				1091	after a server restart), Bazel calls `updateInputs()` itself so that the set of
				1092	inputs reflects the result of input discovery and pruning done before.
				1093
				1094	Starlark actions can make use of the facility to declare some inputs as unused
				1095	using the `unused_inputs_list=` argument of
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	1096	<code>[ctx.actions.run()](https://bazel.build/rules/lib/actions#run)</code>.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1097
				1098	### Various ways to run actions: Strategies/ActionContexts
				1099
				1100	Some actions can be run in different ways. For example, a command line can be
				1101	executed locally, locally but in various kinds of sandboxes, or remotely. The
				1102	concept that embodies this is called an `ActionContext` (or `Strategy`, since we
				1103	successfully went only halfway with a rename...)
				1104
				1105	The life cycle of an action context is as follows:
				1106
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1107	1. When the execution phase is started, `BlazeModule` instances are asked what
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1108	action contexts they have. This happens in the constructor of
				1109	`ExecutionTool`. Action context types are identified by a Java `Class`
				1110	instance that refers to a sub-interface of `ActionContext` and which
				1111	interface the action context must implement.
				1112	2. The appropriate action context is selected from the available ones and is
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1113	forwarded to `ActionExecutionContext` and `BlazeExecutor` .
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1114	3. Actions request contexts using `ActionExecutionContext.getContext()` and
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1115	`BlazeExecutor.getStrategy()` (there should really be only one way to do
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1116	it…)
				1117
				1118	Strategies are free to call other strategies to do their jobs; this is used, for
				1119	example, in the dynamic strategy that starts actions both locally and remotely,
				1120	then uses whichever finishes first.
				1121
				1122	One notable strategy is the one that implements persistent worker processes
				1123	(`WorkerSpawnStrategy`). The idea is that some tools have a long startup time
				1124	and should therefore be reused between actions instead of starting one anew for
				1125	every action (This does represent a potential correctness issue, since Bazel
				1126	relies on the promise of the worker process that it doesn't carry observable
				1127	state between individual requests)
				1128
				1129	If the tool changes, the worker process needs to be restarted. Whether a worker
				1130	can be reused is determined by computing a checksum for the tool used using
				1131	`WorkerFilesHash`. It relies on knowing which inputs of the action represent
				1132	part of the tool and which represent inputs; this is determined by the creator
				1133	of the Action: `Spawn.getToolFiles()` and the runfiles of the `Spawn` are
				1134	counted as parts of the tool.
				1135
				1136	More information about strategies (or action contexts!):
				1137
				1138	* Information about various strategies for running actions is available
				1139	[here](https://jmmv.dev/2019/12/bazel-strategies.html).
				1140	* Information about the dynamic strategy, one where we run an action both
				1141	locally and remotely to see whichever finishes first is available
				1142	[here](https://jmmv.dev/series.html#Bazel%20dynamic%20execution).
				1143	* Information about the intricacies of executing actions locally is available
				1144	[here](https://jmmv.dev/2019/11/bazel-process-wrapper.html).
				1145
				1146	### The local resource manager
				1147
				1148	Bazel _can_ run many actions in parallel. The number of local actions that
				1149	_should_ be run in parallel differs from action to action: the more resources an
				1150	action requires, the less instances should be running at the same time to avoid
				1151	overloading the local machine.
				1152
				1153	This is implemented in the class `ResourceManager`: each action has to be
				1154	annotated with an estimate of the local resources it requires in the form of a
				1155	`ResourceSet` instance (CPU and RAM). Then when action contexts do something
				1156	that requires local resources, they call `ResourceManager.acquireResources()`
				1157	and are blocked until the required resources are available.
				1158
				1159	A more detailed description of local resource management is available
				1160	[here](https://jmmv.dev/2019/12/bazel-local-resources.html).
				1161
				1162	### The structure of the output directory
				1163
				1164	Each action requires a separate place in the output directory where it places
				1165	its outputs. The location of derived artifacts is usually as follows:
				1166
				1167	```
				1168	$EXECROOT/bazel-out/<configuration>/bin/<package>/<artifact name>
				1169	```
				1170
				1171	How is the name of the directory that is associated with a particular
				1172	configuration determined? There are two conflicting desirable properties:
				1173
				1174	1. If two configurations can occur in the same build, they should have
				1175	different directories so that both can have their own version of the same
				1176	action; otherwise, if the two configurations disagree about e.g. the command
				1177	line of an action producing the same output file, Bazel doesn't know which
				1178	action to choose (an "action conflict")
				1179	2. If two configurations represent "roughly" the same thing, they should have
				1180	the same name so that actions executed in one can be reused for the other if
				1181	the command lines match: for example, changes to the command line options to
				1182	the Java compiler should not result in C++ compile actions being re-run.
				1183
				1184	So far, we have not come up with a principled way of solving this problem, which
				1185	has similarities to the problem of configuration trimming. A longer discussion
				1186	of options is available
				1187	[here](https://docs.google.com/document/d/1fZI7wHoaS-vJvZy9SBxaHPitIzXE_nL9v4sS4mErrG4/edit).
				1188	The main problematic areas are Starlark rules (whose authors usually aren't
				1189	intimately familiar with Bazel) and aspects, which add another dimension to the
				1190	space of things that can produce the "same" output file.
				1191
				1192	The current approach is that the path segment for the configuration is
				1193	`<CPU>-<compilation mode>` with various suffixes added so that configuration
				1194	transitions implemented in Java don't result in action conflicts. In addition, a
				1195	checksum of the set of Starlark configuration transitions is added so that users
				1196	can't cause action conflicts. It is far from perfect. This is implemented in
				1197	`OutputDirectories.buildMnemonic()` and relies on each configuration fragment
				1198	adding its own part to the name of the output directory.
				1199
				1200	## Tests
				1201
				1202	Bazel has rich support for running tests. It supports:
				1203
				1204	* Running tests remotely (if a remote execution backend is available)
				1205	* Running tests multiple times in parallel (for deflaking or gathering timing
				1206	data)
				1207	* Sharding tests (splitting test cases in same test over multiple processes
				1208	for speed)
				1209	* Re-running flaky tests
				1210	* Grouping tests into test suites
				1211
				1212	Tests are regular configured targets that have a TestProvider, which describes
				1213	how the test should be run:
				1214
				1215	* The artifacts whose building result in the test being run. This is a "cache
				1216	status" file that contains a serialized `TestResultData` message
				1217	* The number of times the test should be run
				1218	* The number of shards the test should be split into
				1219	* Some parameters about how the test should be run (e.g. the test timeout)
				1220
				1221	### Determining which tests to run
				1222
				1223	Determining which tests are run is an elaborate process.
				1224
				1225	First, during target pattern parsing, test suites are recursively expanded. The
				1226	expansion is implemented in `TestsForTargetPatternFunction`. A somewhat
				1227	surprising wrinkle is that if a test suite declares no tests, it refers to
				1228	_every_ test in its package. This is implemented in `Package.beforeBuild()` by
				1229	adding an implicit attribute called `$implicit_tests` to test suite rules.
				1230
				1231	Then, tests are filtered for size, tags, timeout and language according to the
				1232	command line options. This is implemented in `TestFilter` and is called from
				1233	`TargetPatternPhaseFunction.determineTests()` during target parsing and the
				1234	result is put into `TargetPatternPhaseValue.getTestsToRunLabels()`. The reason
				1235	why rule attributes which can be filtered for are not configurable is that this
				1236	happens before the analysis phase, therefore, the configuration is not
				1237	available.
				1238
				1239	This is then processed further in `BuildView.createResult()`: targets whose
				1240	analysis failed are filtered out and tests are split into exclusive and
				1241	non-exclusive tests. It's then put into `AnalysisResult`, which is how
				1242	`ExecutionTool` knows which tests to run.
				1243
				1244	In order to lend some transparency to this elaborate process, the `tests()`
				1245	query operator (implemented in `TestsFunction`) is available to tell which tests
				1246	are run when a particular target is specified on the command line. It's
				1247	unfortunately a reimplementation, so it probably deviates from the above in
				1248	multiple subtle ways.
				1249
				1250	### Running tests
				1251
				1252	The way the tests are run is by requesting cache status artifacts. This then
				1253	results in the execution of a `TestRunnerAction`, which eventually calls the
				1254	`TestActionContext` chosen by the `--test_strategy` command line option that
				1255	runs the test in the requested way.
				1256
				1257	Tests are run according to an elaborate protocol that uses environment variables
				1258	to tell tests what's expected from them. A detailed description of what Bazel
				1259	expects from tests and what tests can expect from Bazel is available
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	1260	[here](https://bazel.build/reference/test-encyclopedia). At the
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1261	simplest, an exit code of 0 means success, anything else means failure.
				1262
				1263	In addition to the cache status file, each test process emits a number of other
				1264	files. They are put in the "test log directory" which is the subdirectory called
				1265	`testlogs` of the output directory of the target configuration:
				1266
				1267	* `test.xml`, a JUnit-style XML file detailing the individual test cases in
				1268	the test shard
				1269	* `test.log`, the console output of the test. stdout and stderr are not
				1270	separated.
				1271	* `test.outputs`, the "undeclared outputs directory"; this is used by tests
				1272	that want to output files in addition to what they print to the terminal.
				1273
				1274	There are two things that can happen during test execution that cannot during
				1275	building regular targets: exclusive test execution and output streaming.
				1276
				1277	Some tests need to be executed in exclusive mode, i.e. not in parallel with
				1278	other tests. This can be elicited either by adding `tags=["exclusive"]` to the
				1279	test rule or running the test with `--test_strategy=exclusive` . Each exclusive
				1280	test is run by a separate Skyframe invocation requesting the execution of the
				1281	test after the "main" build. This is implemented in
				1282	`SkyframeExecutor.runExclusiveTest()`.
				1283
				1284	Unlike regular actions, whose terminal output is dumped when the action
				1285	finishes, the user can request the output of tests to be streamed so that they
				1286	get informed about the progress of a long-running test. This is specified by the
				1287	`--test_output=streamed` command line option and implies exclusive test
				1288	execution so that outputs of different tests are not interspersed.
				1289
				1290	This is implemented in the aptly-named `StreamedTestOutput` class and works by
				1291	polling changes to the `test.log` file of the test in question and dumping new
				1292	bytes to the terminal where Bazel rules.
				1293
				1294	Results of the executed tests are available on the event bus by observing
				1295	various events (e.g. `TestAttempt`, `TestResult` or `TestingCompleteEvent`).
				1296	They are dumped to the Build Event Protocol and they are emitted to the console
				1297	by `AggregatingTestListener`.
				1298
				1299	### Coverage collection
				1300
				1301	Coverage is reported by the tests in LCOV format in the files
				1302	`bazel-testlogs/$PACKAGE/$TARGET/coverage.dat` .
				1303
				1304	To collect coverage, each test execution is wrapped in a script called
				1305	`collect_coverage.sh` .
				1306
				1307	This script sets up the environment of the test to enable coverage collection
				1308	and determine where the coverage files are written by the coverage runtime(s).
				1309	It then runs the test. A test may itself run multiple subprocesses and consist
				1310	of parts written in multiple different programming languages (with separate
				1311	coverage collection runtimes). The wrapper script is responsible for converting
				1312	the resulting files to LCOV format if necessary, and merges them into a single
				1313	file.
				1314
				1315	The interposition of `collect_coverage.sh` is done by the test strategies and
				1316	requires `collect_coverage.sh` to be on the inputs of the test. This is
				1317	accomplished by the implicit attribute `:coverage_support` which is resolved to
				1318	the value of the configuration flag `--coverage_support` (see
				1319	`TestConfiguration.TestOptions.coverageSupport`)
				1320
				1321	Some languages do offline instrumentation, meaning that the coverage
				1322	instrumentation is added at compile time (e.g. C++) and others do online
				1323	instrumentation, meaning that coverage instrumentation is added at execution
				1324	time.
				1325
				1326	Another core concept is _baseline coverage_. This is the coverage of a library,
				1327	binary, or test if no code in it was run. The problem it solves is that if you
				1328	want to compute the test coverage for a binary, it is not enough to merge the
				1329	coverage of all of the tests because there may be code in the binary that is not
				1330	linked into any test. Therefore, what we do is to emit a coverage file for every
				1331	binary which contains only the files we collect coverage for with no covered
				1332	lines. The baseline coverage file for a target is at
				1333	`bazel-testlogs/$PACKAGE/$TARGET/baseline_coverage.dat` . It is also generated
				1334	for binaries and libraries in addition to tests if you pass the
				1335	`--nobuild_tests_only` flag to Bazel.
				1336
				1337	Baseline coverage is currently broken.
				1338
				1339	We track two groups of files for coverage collection for each rule: the set of
				1340	instrumented files and the set of instrumentation metadata files.
				1341
				1342	The set of instrumented files is just that, a set of files to instrument. For
				1343	online coverage runtimes, this can be used at runtime to decide which files to
				1344	instrument. It is also used to implement baseline coverage.
				1345
				1346	The set of instrumentation metadata files is the set of extra files a test needs
				1347	to generate the LCOV files Bazel requires from it. In practice, this consists of
				1348	runtime-specific files; for example, gcc emits .gcno files during compilation.
				1349	These are added to the set of inputs of test actions if coverage mode is
				1350	enabled.
				1351
				1352	Whether or not coverage is being collected is stored in the
				1353	`BuildConfiguration`. This is handy because it is an easy way to change the test
				1354	action and the action graph depending on this bit, but it also means that if
				1355	this bit is flipped, all targets need to be re-analyzed (some languages, e.g.
				1356	C++ require different compiler options to emit code that can collect coverage,
				1357	which mitigates this issue somewhat, since then a re-analysis is needed anyway).
				1358
				1359	The coverage support files are depended on through labels in an implicit
				1360	dependency so that they can be overridden by the invocation policy, which allows
				1361	them to differ between the different versions of Bazel. Ideally, these
				1362	differences would be removed, and we standardized on one of them.
				1363
				1364	We also generate a "coverage report" which merges the coverage collected for
				1365	every test in a Bazel invocation. This is handled by
				1366	`CoverageReportActionFactory` and is called from `BuildView.createResult()` . It
				1367	gets access to the tools it needs by looking at the `:coverage_report_generator`
				1368	attribute of the first test that is executed.
				1369
				1370	## The query engine
				1371
				1372	Bazel has a
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	1373	[little language](https://bazel.build/docs/query-how-to)
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1374	used to ask it various things about various graphs. The following query kinds
				1375	are provided:
				1376
				1377	* `bazel query` is used to investigate the target graph
				1378	* `bazel cquery` is used to investigate the configured target graph
				1379	* `bazel aquery` is used to investigate the action graph
				1380
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1381	Each of these is implemented by subclassing `AbstractBlazeQueryEnvironment`.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1382	Additional additional query functions can be done by subclassing `QueryFunction`
				1383	. In order to allow streaming query results, instead of collecting them to some
				1384	data structure, a `query2.engine.Callback` is passed to `QueryFunction`, which
				1385	calls it for results it wants to return.
				1386
				1387	The result of a query can be emitted in various ways: labels, labels and rule
				1388	classes, XML, protobuf and so on. These are implemented as subclasses of
				1389	`OutputFormatter`.
				1390
				1391	A subtle requirement of some query output formats (proto, definitely) is that
				1392	Bazel needs to emit _all _the information that package loading provides so that
				1393	one can diff the output and determine whether a particular target has changed.
				1394	As a consequence, attribute values need to be serializable, which is why there
				1395	are only so few attribute types without any attributes having complex Starlark
				1396	values. The usual workaround is to use a label, and attach the complex
				1397	information to the rule with that label. It's not a very satisfying workaround
				1398	and it would be very nice to lift this requirement.
				1399
				1400	## The module system
				1401
				1402	Bazel can be extended by adding modules to it. Each module must subclass
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1403	`BlazeModule` (the name is a relic of the history of Bazel when it used to be
				1404	called Blaze) and gets information about various events during the execution of
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1405	a command.
				1406
				1407	They are mostly used to implement various pieces of "non-core" functionality
				1408	that only some versions of Bazel (e.g. the one we use at Google) need:
				1409
				1410	* Interfaces to remote execution systems
				1411	* New commands
				1412
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1413	The set of extension points `BlazeModule` offers is somewhat haphazard. Don't
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1414	use it as an example of good design principles.
				1415
				1416	## The event bus
				1417
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1418	The main way BlazeModules communicate with the rest of Bazel is by an event bus
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1419	(`EventBus`): a new instance is created for every build, various parts of Bazel
				1420	can post events to it and modules can register listeners for the events they are
				1421	interested in. For example, the following things are represented as events:
				1422
				1423	* The list of build targets to be built has been determined
				1424	(`TargetParsingCompleteEvent`)
				1425	* The top-level configurations have been determined
				1426	(`BuildConfigurationEvent`)
				1427	* A target was built, successfully or not (`TargetCompleteEvent`)
				1428	* A test was run (`TestAttempt`, `TestSummary`)
				1429
				1430	Some of these events are represented outside of Bazel in the
fwe	ad37a37	2022-03-08 03:27:15 -0800	[diff] [blame]	1431	[Build Event Protocol](https://bazel.build/docs/build-event-protocol)
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1432	(they are `BuildEvent`s). This allows not only `BlazeModule`s, but also things
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1433	outside the Bazel process to observe the build. They are accessible either as a
				1434	file that contains protocol messages or Bazel can connect to a server (called
				1435	the Build Event Service) to stream events.
				1436
				1437	This is implemented in the `build.lib.buildeventservice` and
				1438	`build.lib.buildeventstream` Java packages.
				1439
				1440	## External repositories
				1441
				1442	Whereas Bazel was originally designed to be used in a monorepo (a single source
				1443	tree containing everything one needs to build), Bazel lives in a world where
				1444	this is not necessarily true. "External repositories" are an abstraction used to
				1445	bridge these two worlds: they represent code that is necessary for the build but
				1446	is not in the main source tree.
				1447
				1448	### The WORKSPACE file
				1449
				1450	The set of external repositories is determined by parsing the WORKSPACE file.
				1451	For example, a declaration like this:
				1452
				1453	```
				1454	local_repository(name="foo", path="/foo/bar")
				1455	```
				1456
				1457	Results in the repository called `@foo` being available. Where this gets
				1458	complicated is that one can define new repository rules in Starlark files, which
				1459	can then be used to load new Starlark code, which can be used to define new
				1460	repository rules and so on…
				1461
				1462	To handle this case, the parsing of the WORKSPACE file (in
				1463	`WorkspaceFileFunction`) is split up into chunks delineated by `load()`
				1464	statements. The chunk index is indicated by `WorkspaceFileKey.getIndex()` and
				1465	computing `WorkspaceFileFunction` until index X means evaluating it until the
				1466	Xth `load()` statement.
				1467
				1468	### Fetching repositories
				1469
				1470	Before the code of the repository is available to Bazel, it needs to be
				1471	_fetched_. This results in Bazel creating a directory under
				1472	`$OUTPUT_BASE/external/<repository name>`.
				1473
				1474	Fetching the repository happens in the following steps:
				1475
				1476	1. `PackageLookupFunction` realizes that it needs a repository and creates a
				1477	`RepositoryName` as a `SkyKey`, which invokes `RepositoryLoaderFunction`
				1478	2. `RepositoryLoaderFunction` forwards the request to
				1479	`RepositoryDelegatorFunction` for unclear reasons (the code says it's to
				1480	avoid re-downloading things in case of Skyframe restarts, but it's not a
				1481	very solid reasoning)
				1482	3. `RepositoryDelegatorFunction` finds out the repository rule it's asked to
				1483	fetch by iterating over the chunks of the WORKSPACE file until the requested
				1484	repository is found
				1485	4. The appropriate `RepositoryFunction` is found that implements the repository
				1486	fetching; it's either the Starlark implementation of the repository or a
				1487	hard-coded map for repositories that are implemented in Java.
				1488
				1489	There are various layers of caching since fetching a repository can be very
				1490	expensive:
				1491
				1492	1. There is a cache for downloaded files that is keyed by their checksum
				1493	(`RepositoryCache`). This requires the checksum to be available in the
				1494	WORKSPACE file, but that's good for hermeticity anyway. This is shared by
				1495	every Bazel server instance on the same workstation, regardless of which
				1496	workspace or output base they are running in.
				1497	2. A "marker file" is written for each repository under `$OUTPUT_BASE/external`
				1498	that contains a checksum of the rule that was used to fetch it. If the Bazel
				1499	server restarts but the checksum does not change, it's not re-fetched. This
				1500	is implemented in `RepositoryDelegatorFunction.DigestWriter` .
				1501	3. The `--distdir` command line option designates another cache that is used to
				1502	look up artifacts to be downloaded. This is useful in enterprise settings
				1503	where Bazel should not fetch random things from the Internet. This is
				1504	implemented by `DownloadManager` .
				1505
				1506	Once a repository is downloaded, the artifacts in it are treated as source
				1507	artifacts. This poses a problem because Bazel usually checks for up-to-dateness
				1508	of source artifacts by calling stat() on them, and these artifacts are also
				1509	invalidated when the definition of the repository they are in changes. Thus,
				1510	`FileStateValue`s for an artifact in an external repository need to depend on
				1511	their external repository. This is handled by `ExternalFilesHelper`.
				1512
				1513	### Managed directories
				1514
				1515	Sometimes, external repositories need to modify files under the workspace root
				1516	(e.g. a package manager that houses the downloaded packages in a subdirectory of
				1517	the source tree). This is at odds with the assumption Bazel makes that source
				1518	files are only modified by the user and not by itself and allows packages to
				1519	refer to every directory under the workspace root. In order to make this kind of
				1520	external repository work, Bazel does two things:
				1521
				1522	1. Allows the user to specify subdirectories of the workspace Bazel is not
				1523	allowed to reach into. They are listed in a file called `.bazelignore` and
				1524	the functionality is implemented in `BlacklistedPackagePrefixesFunction`.
				1525	2. We encode the mapping from the subdirectory of the workspace to the external
				1526	repository it is handled by into `ManagedDirectoriesKnowledge` and handle
				1527	`FileStateValue`s referring to them in the same way as those for regular
				1528	external repositories.
				1529
				1530	### Repository mappings
				1531
				1532	It can happen that multiple repositories want to depend on the same repository,
				1533	but in different versions (this is an instance of the "diamond dependency
				1534	problem"). For example, if two binaries in separate repositories in the build
				1535	want to depend on Guava, they will presumably both refer to Guava with labels
				1536	starting `@guava//` and expect that to mean different versions of it.
				1537
				1538	Therefore, Bazel allows one to re-map external repository labels so that the
				1539	string `@guava//` can refer to one Guava repository (e.g. `@guava1//`) in the
				1540	repository of one binary and another Guava repository (e.g. `@guava2//`) the the
				1541	repository of the other.
				1542
				1543	Alternatively, this can also be used to join diamonds. If a repository
				1544	depends on `@guava1//`, and another depends on `@guava2//`, repository mapping
				1545	allows one to re-map both repositories to use a canonical `@guava//` repository.
				1546
				1547	The mapping is specified in the WORKSPACE file as the `repo_mapping` attribute
				1548	of individual repository definitions. It then appears in Skyframe as a member of
				1549	`WorkspaceFileValue`, where it is plumbed to:
				1550
				1551	* `Package.Builder.repositoryMapping` which is used to transform label-valued
				1552	attributes of rules in the package by
				1553	`RuleClass.populateRuleAttributeValues()`
				1554	* `Package.repositoryMapping` which is used in the analysis phase (for
				1555	resolving things like `$(location)` which are not parsed in the loading
				1556	phase)
Xavier Bonaventura	fbb19fb	2021-06-02 09:53:05 -0700	[diff] [blame]	1557	* `BzlLoadFunction` for resolving labels in load() statements
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1558
				1559	## JNI bits
				1560
				1561	The server of Bazel is_ mostly _written in Java. The exception is the parts that
				1562	Java cannot do by itself or couldn't do by itself when we implemented it. This
				1563	is mostly limited to interaction with the file system, process control and
				1564	various other low-level things.
				1565
				1566	The C++ code lives under src/main/native and the Java classes with native
				1567	methods are:
				1568
				1569	* `NativePosixFiles` and `NativePosixFileSystem`
				1570	* `ProcessUtils`
				1571	* `WindowsFileOperations` and `WindowsFileProcesses`
				1572	* `com.google.devtools.build.lib.platform`
				1573
				1574	## Console output
				1575
				1576	Emitting console output seems like a simple thing, but the confluence of running
				1577	multiple processes (sometimes remotely), fine-grained caching, the desire to
				1578	have a nice and colorful terminal output and having a long-running server makes
				1579	it non-trivial.
				1580
				1581	Right after the RPC call comes in from the client, two `RpcOutputStream`
				1582	instances are created (for stdout and stderr) that forward the data printed into
				1583	them to the client. These are then wrapped in an `OutErr` (an (stdout, stderr)
				1584	pair). Anything that needs to be printed on the console goes through these
				1585	streams. Then these streams are handed over to
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1586	`BlazeCommandDispatcher.execExclusively()`.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1587
				1588	Output is by default printed with ANSI escape sequences. When these are not
				1589	desired (`--color=no`), they are stripped by an `AnsiStrippingOutputStream`. In
				1590	addition, `System.out` and `System.err` are redirected to these output streams.
				1591	This is so that debugging information can be printed using
				1592	`System.err.println()` and still end up in the terminal output of the client
				1593	(which is different from that of the server). Care is taken that if a process
				1594	produces binary output (e.g. `bazel query --output=proto`), no munging of stdout
				1595	takes place.
				1596
				1597	Short messages (errors, warnings and the like) are expressed through the
				1598	`EventHandler` interface. Notably, these are different from what one posts to
				1599	the `EventBus` (this is confusing). Each `Event` has an `EventKind` (error,
				1600	warning, info, and a few others) and they may have a `Location` (the place in
				1601	the source code that caused the event to happen).
				1602
				1603	Some `EventHandler` implementations store the events they received. This is used
				1604	to replay information to the UI caused by various kinds of cached processing,
				1605	for example, the warnings emitted by a cached configured target.
				1606
				1607	Some `EventHandler`s also allow posting events that eventually find their way to
				1608	the event bus (regular `Event`s do _not _appear there). These are
				1609	implementations of `ExtendedEventHandler` and their main use is to replay cached
				1610	`EventBus` events. These `EventBus` events all implement `Postable`, but not
				1611	everything that is posted to `EventBus` necessarily implements this interface;
				1612	only those that are cached by an `ExtendedEventHandler` (it would be nice and
				1613	most of the things do; it's not enforced, though)
				1614
				1615	Terminal output is _mostly_ emitted through `UiEventHandler`, which is
				1616	responsible for all the fancy output formatting and progress reporting Bazel
				1617	does. It has two inputs:
				1618
				1619	* The event bus
				1620	* The event stream piped into it through Reporter
				1621
				1622	The only direct connection the command execution machinery (i.e. the rest of
				1623	Bazel) has to the RPC stream to the client is through `Reporter.getOutErr()`,
				1624	which allows direct access to these streams. It's only used when a command needs
				1625	to dump large amounts of possible binary data (e.g. `bazel query`).
				1626
				1627	## Profiling Bazel
				1628
				1629	Bazel is fast. Bazel is also slow, because builds tend to grow until just the
				1630	edge of what's bearable. For this reason, Bazel includes a profiler which can be
				1631	used to profile builds and Bazel itself. It's implemented in a class that's
				1632	aptly named `Profiler`. It's turned on by default, although it records only
				1633	abridged data so that its overhead is tolerable; The command line
				1634	`--record_full_profiler_data` makes it record everything it can.
				1635
				1636	It emits a profile in the Chrome profiler format; it's best viewed in Chrome.
				1637	It's data model is that of task stacks: one can start tasks and end tasks and
				1638	they are supposed to be neatly nested within each other. Each Java thread gets
				1639	its own task stack. TODO: How does this work with actions and
				1640	continuation-passing style?
				1641
jingwen	f8b2d3b	2020-10-02 06:35:24 -0700	[diff] [blame]	1642	The profiler is started and stopped in `BlazeRuntime.initProfiler()` and
				1643	`BlazeRuntime.afterCommand()` respectively and attempts to be live for as long
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1644	as possible so that we can profile everything. To add something to the profile,
				1645	call `Profiler.instance().profile()`. It returns a `Closeable`, whose closure
				1646	represents the end of the task. It's best used with try-with-resources
				1647	statements.
				1648
				1649	We also do rudimentary memory profiling in `MemoryProfiler`. It's also always on
				1650	and it mostly records maximum heap sizes and GC behavior.
				1651
				1652	## Testing Bazel
				1653
				1654	Bazel has two main kinds of tests: ones that observe Bazel as a "black box" and
				1655	ones that only run the analysis phase. We call the former "integration tests"
				1656	and the latter "unit tests", although they are more like integration tests that
				1657	are, well, less integrated. We also have some actual unit tests, where they are
				1658	necessary.
				1659
				1660	Of integration tests, we have two kinds:
				1661
				1662	1. Ones implemented using a very elaborate bash test framework under
				1663	`src/test/shell`
				1664	2. Ones implemented in Java. These are implemented as subclasses of
dacek	f474a3b	2022-01-11 08:22:04 -0800	[diff] [blame]	1665	'BuildIntegrationTestCase'
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1666
dacek	d72ae00	2022-01-10 09:13:33 -0800	[diff] [blame]	1667	`BuildIntegrationTestCase` is the preferred integration testing framework as it
				1668	is well-equipped for most testing scenarios. As it is a Java framework, it
				1669	provides debuggability and seamless integration with many common development
				1670	tools. There are many examples of `BuildIntegrationTestCase` classes in the
				1671	Bazel repository.
laurentlb	4f2991c5	2020-08-12 11:37:32 -0700	[diff] [blame]	1672
				1673	Analysis tests are implemented as subclasses of `BuildViewTestCase`. There is a
				1674	scratch file system you can use to write BUILD files, then various helper
				1675	methods can request configured targets, change the configuration and assert
				1676	various things about the result of the analysis.