blob: b161531fe3dd44c70a7e610d641f9d3f133c294a [file] [log] [blame] [view]
---
layout: contribute
title: Recursive WORKSPACE file parsing
---
# Recursive WORKSPACE file parsing
**Status**: Unimplemented
**Author**: [kchodorow@](mailto:kchodorow@google.com)
## Objective
Users are annoyed by having to specify all deps in their WORKSPACE file. To
avoid this inconvenience, Bazel could load subprojects' WORKSPACE files
automatically.
## Non-Goals
* Solve the problem of people specifying two different repositories with the
same name (e.g., @util).
* Solve the problem of people specifying two different names for the same
repository (`@guava` and `@com_google_guava`).
## Resolution
When a repository is defined in multiple files, which definition "wins"? What
causes a conflict/error?
### Defined in the main repository's WORKSPACE file
This definition wins, regardless of other definitions.
### In a line
Note: as an intermediate step, this can be disabled, but the end goal is to
allow this so that intermediate dependencies that the top level doesn't care
about don't need to be resolved.
Suppose we have a main repository that depends on repo x, and x depends on repo
y:
<img src="/assets/ws-line.jpg" class="img-responsive">
In this case, version 1 of "foo" wins. This way, if a library has already
figured out which version works for them, its reverse dependencies do not have
to think about it.
This will also work if a parent overrides its children's versions, even if it
has multiple children.
### Different lines
If there is no obvious hierarchy and multiple versions are specified, error out.
Report what each chain of dependencies was that wanted the dep and at which
versions:
<img src="/assets/ws-multiline.jpg" class="img-responsive">
In this case, Bazel would error out with:
```
ERROR: Conflicting definitions of 'foo': bazel-external/y/WORKSPACE:2 repository(name = 'foo' version = '1')
requested by bazel-external/x/WORKSPACE:2 repository(name = 'y')
requested by WORKSPACE:3 repository(name = 'x')
vs. bazel-external/a/WORKSPACE:2 repository(name = 'foo' version = '2')
requested by WORKSPACE:2 repository(name = 'a')
```
This is also the case with diamond dependencies:
<img src="/assets/ws-diamond.jpg" class="img-responsive">
This would print:
```
ERROR: Conflicting definitions of 'foo': bazel-external/x/WORKSPACE:2 repository(name = 'foo' version = '2')
requested by WORKSPACE:2 repository(name = 'x')
vs. bazel-external/z/WORKSPACE:2 repository(name = 'foo' version = '1')
requested by bazel-external/y/WORKSPACE:2 repository(name = 'z')
requested by WORKSPACE:3 repository(name = 'y')
```
## Upgrade path
I think that this should be fairly straightforward, as any repository used by
the main repository or any subrepository had to be declared in the WORKSPACE
already, so it will take precedence.
To be extra safe, we can start with adding a `recursive = False` attribute to
the `workspace` rule, which we can then flip the default of.
## Implementation
There are two options for implementing this:
* We always download/link every external dependency before a build can happen.
E.g., if @x isn't defined in the WORKSPACE file, we have to recursively
traverse all of the repositories to know which repository does define it and
if there are any conflicting definitions. This is correct, but will be
frustrating to users and may not even work in some cases (e.g., if an
OS-X-only skylark repository rule is fetched on Linux).
* Every time a new WORKSPACE file is fetched, we check its repository rules
against the ones already defined and look for version conflicts. This would
entirely miss certain version conflicts until certain dependencies are built,
but will have better performance.
I think users will rebel unless we go with Option 2. However, this can have
some weird effects: suppose we have the diamond dependency above, and the user's
BUILD file contains:
```
cc_library(
name = "bar",
deps = ["@x//:dep"], # using @foo version 2
)
cc_library(
name = "baz",
deps = ["@y//:dep"], # using @foo version 1
)
```
If they build :bar and their coworker builds :baz, the two builds will work and
get different versions of @foo. However, as soon as one of them tries to build
both, they'll get the version mismatch error.
This is suboptimal, but I can't think of a way that all three of these can be
satisfied:
* The user doesn't have to declare everything at the top level.
* Bazel doesn't have to load everything.
* Bazel can immediately detect any conflicts.
This could be enforced by a CI on presubmit, which I think is good enough.
Whenever Bazel creates a new repository, it will attempt to parse the WORKSPACE
file and do Skyframe lookups against each repository name. If the repository
name is not defined, it will be initialized to the current WORKSPACE's
definition. If it already exists, the existing value will be compared.
For now, we'll be very picky about equality: `maven_jar` and `new_http_archive`
of the same Maven artifact will count as different repositories. For both
native and skylark repository rules, they will have to be equal to not conflict.
One issue is that is a little tricky but I think will work out: the WORKSPACE
file is parsed incrementally. Suppose the main WORKSPACE loads x.bzl, which
declares @y and @z. If @y depends on @foo version 1 and @z depends on @foo
version 2, this will throw a Skyframe error, even if @foo is later declared in
the WORKSPACE file. However, this should be okay, because if these dependencies
actually need @foo, it would need to be declared before them in the WS file
already.
## Supplementary changes
Not strictly required, but as part of this I'm planning to implement:
* A `bazel-external` convenience symlink (to the `[output_base]/external`
directory) so users can easily inspect their external repositories.
* Add an option to generate all WORKSPACE definitions (so generate a flat
WORKSPACE file from the hierarchy).
## Concerns
Questions users might have.
*Where did @x come from?*
Bazel will create a `bazel-external/@x.version` should contain the WORKSPACE (or
.bzl file) where we got @x's def and other WORKSPACE files that contain it.
*Which version of @x is going to be chosen?*
See resolution section above. Perhaps people could query for //external:x?
*I want to use a different version of @x.*
Declare @x in your WORKSPACE file, it'll override "lower" rules.
*When I update @x, what else will change?*
Because @x might declare repo @y and @y's version might change as well, we'd
need a different way to query for this. We could implement deps() for repo
rules or have some other mechanism for this.
## Thoughts on future development
Moving towards the user-as-conflict-resolver model (vs. the
user-as-transcriber-of-deps model) means that repositories that the user may not
even be aware of might be available in their workspace. I think this kind of
paves the way towards a nice auto-fetch system where a user could just depend on
`@com_google_guava//whatever` in their BUILD file, and Bazel could figure out
how to make `@com_google_guava` available.
## References
[So you want to write a package manager](https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527#.d90oxolzk)
does a good job outlining many of these challenges, but their suggested approach
(use semantic versioning for dependency resolution) cannot be used by Bazel for
the general case.