commit | 68e4319600d8c68c4142c99e730a5732d85fd2d3 | [log] [tgz] |
---|---|---|
author | gregce <gregce@google.com> | Thu Oct 31 15:00:10 2019 -0700 |
committer | Copybara-Service <copybara-worker@google.com> | Thu Oct 31 15:01:10 2019 -0700 |
tree | 6eac9f295332c48fa613256dcdc30efa1e8f9dce | |
parent | 9c5f8508d4bf378cb17069262918ca6775095223 [diff] |
Fix non-deterministic Blaze crash with "--aspects -k". The crash is: java.lang.ClassCastException: class com.google.devtools.build.lib.skyframe.AspectValue$AspectKey cannot be cast to class com.google.devtools.build.lib.skyframe.ConfiguredTargetKey (com.google.devtools.build.lib.skyframe.AspectValue$AspectKey and com.google.devtools.build.lib.skyframe.ConfiguredTargetKey are in unnamed module of loader 'app') at com.google.devtools.build.lib.skyframe.ConfiguredTargetCycleReporter.asTransitiveTargetKey(ConfiguredTargetCycleReporter.java:94) ... This only works with the following magical combination of features: 1) Building with a top-level aspect (--aspects). 2) Building -keep_going (-k). 3) The rule you're building or one of its transitive deps is a genquery. 4) The genquery hits a loading phase cycle. 5) Actually building whatever the genquery visits doesn't hit a cycle because a select() makes the cycle go away in the actual build. While this sounds incomprehensible, the basic story is this crash only happens in a pure loading phase cycle based from TransitiveTargetValue. genquery is the only rule that uses TransitiveTargetValue. Its "scope = " attribute normally guarantees whatever rule has the cycle will first be analyzed and hit the cycle during configured target evaluation, which skips this code path (you'll see the nice non-crashy message "This cycle occurred because of a configuration option" in those cases). But select()able deps short-circuit this: The select() means the cycle *doesn't* appear in analysis because the build's flags happen not to trigger whatever dependency path would trigger the cycle. So "scope = " finishes just fine. Then genquery gets into its rule-specific analysis and calls TransitiveTargetValue, which doesn't know which build flags are set so chooses *all* deps, including the bad ones. And we get a "loading-phase only" error, which propagates up to the ConfiguredTargetCycleReporter. Even *this* is normally fine: ConfiguredTargetCycleReporter already has logic to handle loading phase errors. But that logic assumes the top-level analyzing unit was a configured target, not an aspect. Bringing in --aspects breaks that assumption. Code details: This causes both a ConfiguredTarget and Aspect to be requested as top-level nodes at: https://github.com/bazelbuild/bazel/blob/b2e1fe367269103e127718131163cd702cefcdff/src/main/java/com/google/devtools/build/lib/analysis/BuildView.java#L403-L404 -keep_going guarantees both the aspect and configured target are evaluated in Skyframe. Both will return null on a cycle error. Both have their error recorded at https://github.com/bazelbuild/bazel/blob/f998f4e8b5c172286074f4de58eb3fb966ed8243/src/main/java/com/google/devtools/build/skyframe/EvaluationResult.java#L171 which stores them in a HashMap<SkyKey, ErrorInfo>: https://github.com/bazelbuild/bazel/blob/f998f4e8b5c172286074f4de58eb3fb966ed8243/src/main/java/com/google/devtools/build/skyframe/EvaluationResult.java#L157 which is then non-deterministically iterated over for cycle detection in: https://github.com/bazelbuild/bazel/blob/3740768a2f625e7b3310f69bcec47b5dd0898cf1/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeBuildView.java#L527. If the top-level build request includes a configured target and aspect, that means we're randomly going to get one of [ConfiguredTargetKey, AspectKey] first. If AspectKey comes first, then https://github.com/bazelbuild/bazel/blob/209847415ac251d0b73e98e705bcb1a411490a2e/src/main/java/com/google/devtools/build/lib/skyframe/ConfiguredTargetCycleReporter.java#L94 crashes. If it comes second, that code path is skipped because of https://github.com/bazelbuild/bazel/blob/6505f7c5db239a823e8503dcc6798702b2d54519/src/main/java/com/google/devtools/build/lib/skyframe/AbstractLabelCycleReporter.java#L85. I have no idea how to test this, given how deeply this depends on Skyframe's evaluation model. ======================================================================== Minimal repro: $ mkdir testapp $ cat >testapp/defs.bzl <<EOF def _rule_impl(ctx): pass custom_rule = rule( implementation = _rule_impl, attrs = { "deps": attr.label_list(), }, ) AspectProvider = provider() def _aspect_impl(target, ctx): pass my_aspect = aspect( implementation = _aspect_impl, attr_aspects = ["deps"], ) EOF $ cat >testapp/BUILD <<EOF load("//testapp:defs.bzl", "custom_rule") config_setting( name = "not_invoked", define_values = {"dont_set_me": "1"}, ) custom_rule( name = "my_rule", deps = [":maybe_cycle_dep"], ) custom_rule( name = "maybe_cycle_dep", deps = select({ "not_invoked": [":my_rule"], "//conditions:default": [], }), ) genquery( name = "my_query", expression = "(//testapp:my_rule)", scope = ["//testapp:my_rule"], ) EOF $ blaze build //testapp:my_query --aspects=//testapp:defs.bzl%my_aspect -k Because of the non-determinism, this may or may not crash Blaze. PiperOrigin-RevId: 277807980
{Fast, Correct} - Choose two
Build and test software of any size, quickly and reliably.
Speed up your builds and tests: Bazel rebuilds only what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.
Follow our tutorials:
See CONTRIBUTING.md