Status: Implementing
Author: Damien Martin-Guillerez
Remote repositories are fetched the first time a build that depends on a repository is launched. The next time the same build happens, the already fetched repositories are not refetched, saving on download times or other expensive operations.
This behavior is also enforced even when the Bazel server is restarted by serializing the repository rule in the workspace file. A file named @<repositoryName>.marker
is created for each repository with a fingerprint of the serialized rule. On next fetch, if that fingerprint has not changed, the rule is not refetched. This is not applied if the repository rule is marked as local
because fetching a local repository is assumed to be fast.
These consideration were well-suited when the implementation of repository rules were not depending on Skylark file. With the introduction of Skylark repositories, several issues appeared:
bazel clean --expunge
.Right now rules are not invalidated on the environment:
repository_ctx.os.environ
would generate invalidation on environment variable that might be volatile (e.g. CC
when you want to use one C++ compiler and you reset your environment) and might miss other environment variables due to computed variable names.repository_ctx.execute
.This document proposes to add a way to declare a dependency on an environment variable value that would trigger a refetch of a repository. An optional attribute environ
would be added to the repository_rule
method, taking a list of strings and would trigger invalidation of the repository on any of change to those environment variables. E.g.:
my_repo = repository_rule(impl = _impl, environ = ["FOO", "BAR"])
my_repo
would be refetched on any change to the environment variables FOO
or BAR
but not if the environment variable BAZ
would changes.
To be consistent with the new environment specification mechanism, the environment available through repository_ctx.os.environ
or transmitted to repository_ctx.execute
will take values from the --action_env
flag, when specified. I.e. if --action_env FOO=BAR --action_env BAR
are specified, and the environment set FOO=BAZ
, BAR=FOO
, BAZ=BAR
, then the actual repository_ctx.os.environ
map would contain {"FOO": "BAR", "BAR": "FOO", "BAZ": "BAR" }
. This would ensure that the environment seen by repository rules is consistent with the one seen by actions (a repository rule see more than an action, leaving the rule writer the ability to filter the environment more finely).
Both these changes should allow Bazel to do auto-configuration based on environment variables:
--action_env
flag, and fix this environment using bazel info client-env
.A local
rule will be invalidated when any of its skyframe dependencies change. For non-local
rule, a marker file will be stored on the external directory with a summary of the dependencies of the rule. At each fetch operation, we check the existence of the marker file and verify each dependency. If one of them have changed, we would refetch that repository.
To avoid unnecessary re-download of artifacts, a content-addressable cache has been developed for downloads (and thus not discuted here).
The marker file will be a manifest containing the following items:
maven_jar
).environ
attribute of the repository rule.FileValue
-s requested by getPathFromLabel
and the corresponding file content digest.Extension
definining the repository rule. This transitive hash is computed from the hash of the current extension and the extension loaded from it. This means that a repository function will get invalidated as soon as the extension file content changes, which is an over invalidation. However, getting an optimal result would require correct serialization of Skylark extensions.SkylarkRepositoryFunction#getClientEnvironment
method to get the values from the --action_env
flag.markerData
map argument to RepositoryFunction#fetch
so SkylarkRepositoryFunction
can include those change. This attribute should be mutable so a repository can add more data to be stored in the marker file. Adds a corresponding function for verification, verifyMarkerManifest
, that would take a marker data map and return a tri-state: true if the repository is up to date, false if it needs refetch and null if additional Skyframe dependency need to be resolved for answering.environ
attribute to the repository_rule
function and the dependency on the Skyframe values for the environment. Also create a SkyFunction
for processed environment after the --action_env
flag.environ
values to the marker file through the getMarkerManifest
function.FileValue
-s to the marker file, adding all the files requested through the getPath
method to a specific builder that will be passed to the SkylarkRepositoryContext
.transitiveHashCode
of the Skylark Environment
to the marker manifest.