Blame - docs/skyframe.md - bazel

blob: f09839a2aef8e0061f6978ed152e80d4d89887de [file] [log] [blame] [view]

Ulf Adams	81e0858	2015-03-13 12:07:01 +0000	[diff] [blame]	1	# Skyframe - The parallel evaluation and incrementality model of Bazel
				2
				3	## Data model
				4
				5	The data model consists of the following items:
				6
				7	- `SkyValue`. Also called nodes. `SkyValues` are immutable objects that contain all the data built
				8	over the course of the build and the inputs of the build. Examples are: input files, output
				9	files, targets and configured targets.
				10	- `SkyKey`. A short immutable name to reference a `SkyValue`, for example, `FILECONTENTS:/tmp/foo`
				11	or `PACKAGE://foo`.
				12	- `SkyFunction`. Builds nodes based on their keys and dependent nodes.
				13	- Node graph. A data structure containing the dependency relationship between nodes.
				14	- `Skyframe`. Code name for the incremental evaluation framework Bazel is based on.
				15
				16
				17	## Evaluation
				18
				19	A build consists of evaluating the node that represents the build request (this is the state we are
				20	striving for, but there is a lot of legacy code in the way). First its `SkyFunction` is found and
				21	called with the key of the top-level `SkyKey`. The function then requests the evaluation of the
				22	nodes it needs to evaluate the top-level node, which in turn result in other function invocations,
				23	and so on, until the leaf nodes are reached (which are usually nodes representing input files in
				24	the file system). Finally, we end up with the value of the top-level `SkyValue`, some side effects
				25	(e.g. output files in the file system) and a directed acyclic graph of the dependencies between the
				26	nodes that were involved in the build.
				27
				28	A `SkyFunction` can request `SkyKeys` in multiple passes if it cannot tell in advance all of the
				29	nodes it needs to do its job. A simple example is evaluating an input file node that turns out to
				30	be a symlink: the function tries to read the file, realizes that it's a symlink, and thus fetches
				31	the file system node representing the target of the symlink. But that itself can be a symlink, in
				32	which case the original function will need to fetch its target, too.
				33
				34	The functions are represented in the code by the interface `SkyFunction` and the services
				35	provided to it by an interface called `SkyFunction.Environment`. These are the things functions can
				36	do:
				37
				38	- Request the evaluation of another node by way of calling `env.getValue`. If the node is
				39	available, its value is returned, otherwise, `null` is returned and the function itself is
				40	expected to return `null`. In the latter case, the dependent node is evaluated, and then the
				41	original node builder is invoked again, but this time the same `env.getValue` call will return a
				42	non-`null` value.
				43	- Request the evaluation of multiple other nodes by calling `env.getValues()`. This does
				44	essentially the same, except that the dependent nodes are evaluated in parallel.
				45	- Do computation during their invocation
				46	- Have side effects, for example, writing files to the file system. Care needs to be taken that two
				47	different functions do not step on each other's toes. In general, write side effects (where
				48	data flows outwards from Bazel) are okay, read side effects (where data flows inwards into Bazel
				49	without a registered dependency) are not, because they are an unregistered dependency and as
				50	such, can cause incorrect incremental builds.
				51
				52	`SkyFunction` implementations should not access data in any other way than requesting dependencies
				53	(e.g. by directly reading the file system), because that results in Bazel not registering the data
				54	dependency on the file that was read, thus resulting in incorrect incremental builds.
				55
				56	Once a function has enough data to do its job, it should return a non-`null` value indicating
				57	completion.
				58
				59	This evaluation strategy has a number of benefits:
				60
				61	- Hermeticity. If functions only request input data by way of depending on other nodes, Bazel
				62	can guarantee that if the input state is the same, the same data is returned. If all sky
				63	functions are deterministic, this means that the whole build will also be deterministic.
				64	- Correct and perfect incrementality. If all the input data of all functions is recorded, Bazel
				65	can invalidate only the exact set of nodes that need to be invalidated when the input data
				66	changes.
				67	- Parallelism. Since functions can only interact with each other by way of requesting
				68	dependencies, functions that do not depend on each other can be run in parallel and Bazel can
				69	guarantee that the result is the same as if they were run sequentially.
				70
				71	## Incrementality
				72
				73	Since functions can only access input data by depending on other nodes, Bazel can build up a
				74	complete data flow graph from the input files to the output files, and use this information to only
				75	rebuild those nodes that actually need to be rebuilt: the reverse transitive closure of the set of
				76	changed input files.
				77
				78	In particular, two possible incrementality strategies exist: the bottom-up one and the top-down one.
				79	Which one is optimal depends on how the dependency graph looks like.
				80
				81	- During bottom-up invalidation, after a graph is built and the set of changed inputs is known,
				82	all the nodes are invalidated that transitively depend on changed files. This is optimal
				83	if we know that the same top-level node will be built again.
				84	Note that bottom-up invalidation requires running `stat()` on all input files of the previous
				85	build to determine if they were changed. This can be improved by using `inotify` or a similar
				86	mechanism to learn about changed files.
				87
				88	- During top-down invalidation, the transitive closure of the top-level node is checked and only
				89	those nodes are kept whose transitive closure is clean. This is better if we know that the
				90	current node graph is large, but we only need a small subset of it in the next build: bottom-up
				91	invalidation would invalidate the larger graph of the first build, unlike top-down invalidation,
				92	which just walks the small graph of second build.
				93
				94	We currently only do bottom-up invalidation.
				95
				96	To get further incrementality, we use _change pruning_: if a node is invalidated, but upon rebuild,
				97	it is discovered that its new value is the same as its old value, the nodes that were invalidated
				98	due to a change in this node are "resurrected".
				99
				100	This is useful, for example, if one changes a comment in a C++ file: then the `.o` file generated
				101	from it will be the same, thus, we don't need to call the linker again.
				102
				103	## Incremental Linking / Compilation
				104
				105	The main limitation of this model is that the invalidation of a node is an all-or-nothing affair:
				106	when a dependency changes, the dependent node is always rebuilt from scratch, even if a better
				107	algorithm would exist that would mutate the old value of the node based on the changes. A few
				108	examples where this would be useful:
				109
				110	- Incremental linking
				111	- When a single `.class` file changes in a `.jar`, we could theoretically modify the `.jar` file
				112	instead of building it from scratch again.
				113
				114	The reason why Bazel currently does not support these things in a principled way (we have some
				115	measure of support for incremental linking, but it's not implemented within Skyframe) is twofold:
				116	we only had limited performance gains and it was hard to guarantee that the result of the mutation
				117	is the same as that of a clean rebuild would be, and Google values builds that are bit-for-bit
				118	repeatable.
				119
				120	Until now, we could always achieve good enough performance by simply decomposing an expensive build
				121	step and achieving partial re-evaluation that way: it splits all the classes in an app into
				122	multiple groups and does dexing on them separately. This way, if classes in a group do not change,
				123	the dexing does not have to be redone.
				124
				125	## Restarting SkyFunctions
				126
				127	Another inefficiency is that, currently, if a `SkyFunction` implementation cannot complete its job
				128	because one of its dependencies is missing, it needs to be completely restarted instead of resuming
				129	where it left off. This is currently not a big problem because we usually learn all the
				130	dependencies after a small amount of work. The only exceptions are package loading and execution of
				131	actions; these are both external processes that are expensive to restart. We allow package loading
				132	to proceed fully, store the loaded package away, record the dependencies in the graph, and on
				133	re-execution of the function return the already loaded package. I.e., we allow the function to keep
				134	state between executions.
				135
				136	If this turns out to be a significant performance or code health problem, there are alternative ways
				137	to add a more principled mechanism to keep state between executions:
				138
				139	- Splitting each node into multiple ones so that each smaller node only has to do one round of
				140	dependency discovery (effectively continuation passing); this requires explicit code.
				141	- By reimplementing Skyframe on some sort of lightweight thread infrastructure (e.g.
				142	[Quasar](http://docs.paralleluniverse.co/quasar/)) so that function execution can be suspended
				143	and resumed without a large performance hit and without requiring this to be explicit in the
				144	code.
				145	- By maintaining state for each `SkyFunction` instance between restarts (this is the workaround we
				146	are using for package loading, but is not implemented as a first-class feature of the evaluation
				147	framework).
				148
				149	## Mapping to Bazel concepts
				150
				151	This is a rough overview of some of the `SkyFunction` implementations Bazel uses to perform a build:
				152
				153	- FileStateValue. The result of an `lstat()`. For existent files, we also compute additional
				154	information in order to detect changes to the file. This is the lowest level node in the Skyframe
				155	graph and has no dependencies.
				156	- FileValue. Used by anything that cares about the actual contents and/or resolved path of a
				157	file. Depends on the corresponding `FileStateValue` and any symlinks that need to be resolved
				158	(e.g. the `FileValue` for `a/b` needs the resolved path of `a` and the resolved path of `a/b`).
				159	The distinction between `FileStateValue` is important because in some cases (for example,
				160	evaluating file system globs (e.g. `srcs=glob(["/.java"])`) the contents of the file are not
				161	actually needed.
				162	- DirectoryListingValue. Essentially the result of `readdir()`. Depends on the associated
				163	`FileValue` associated with the directory.
				164	- PackageValue. Represents the parsed version of a BUILD file. Depends on the `FileValue` of
				165	the associated `BUILD` file, and also transitively on any `DirectoryListingValue` that is used
				166	to resolve the globs in the package (the data structure representing the contents of a `BUILD`
				167	file internally)
				168	- ConfiguredTargetValue. Represents a configured target, which is a tuple of the set of actions
				169	generated during the analysis of a target and information provided to configured targets that
				170	depend on this one. Depends on the `PackageValue` the corresponding target is in, the
				171	`ConfiguredTargetValues` of direct dependencies, and a special node representing the build
				172	configuration.
				173	- ArtifactValue. Represents a file in the build, be it a source or an output artifacts
				174	(artifacts are almost equivalent to files, and are used to refer to files during the actual
				175	execution of build steps). For source files, it depends on the `FileValue` of the associated
				176	node, for output artifacts, it depends on the `ActionExecutionValue` of whatever action generates
				177	the artifact.
				178	- ActionExecutionValue. Represents the execution of an action. Depends on the `ArtifactValues`
				179	of its input files. The action it executes is currently contained within its sky key, which is
				180	contrary to the concept that sky keys should be small. We are working on solving this
				181	discrepancy (note that `ActionExecutionValue` and `ArtifactValue` are unused if we do not run the
				182	execution phase on Skyframe).
				183