Use Artifact#getGeneratingActionKey to avoid creating Artifact nodes in the graph for all "normal" generated artifacts.

Any generated artifact that does not represent multiple other artifacts (aggregating middleman and tree artifacts are the exceptions) can be looked up directly from the generating action's ActionExecutionValue. The artifact's type gives a conservative decision procedure for this: non-middleman, non-tree generated artifacts are definitely ok. Non-aggregating middlemen and tree artifacts that are not produced by template expansion are also ok, but knowing that requires checking more data, so we continue to call into to ArtifactFunction for those artifacts.

Saves 60/4100 post-execution heap on a medium-sized build, or 1.5% of memory:

$ diffheap.pl ~/blaze/histo{_before,}.txt
 objsize  chg   instances       space KB    class name
------------------------------------------------------
      77   +0        -711        -735 KB    [B
      32   +0      -43594       -1362 KB    com.google.devtools.build.lib.actions.FileArtifactValue$RegularFileArtifactValue
      24   +0      -90829       -2128 KB    java.util.ArrayList
     746 -173          -5       -9230 KB    [Ljava.util.concurrent.ConcurrentHashMap$Node;
      32   +0     -309720       -9678 KB    java.util.concurrent.ConcurrentHashMap$Node
      40   +0     -309641      -12095 KB    com.google.devtools.build.skyframe.InMemoryNodeEntry
      90   +0     -400463      -26764 KB    [Ljava.lang.Object;
------------------------------------------------------
 total change:                 -62092 KB

There are three main external changes in this CL:

(1) Every consumer of Artifact metadata must pass all artifacts through ArtifactSkyKey#key and friends, since generated artifacts' keys are no longer necessarily themselves. This is something of a partial rollback of https://github.com/bazelbuild/bazel/commit/bf4123df23b5f93e572cd920f15afba340f92391 (originally unknown commit): see, for example, https://github.com/bazelbuild/bazel/commit/bf4123df23b5f93e572cd920f15afba340f92391#diff-619984696e738a6f3ccb9b3802ab7d90.

(2) Similarly every such consumer must be prepared for the returned SkyValue for an Artifact to be an ActionExecutionValue, not a FileArtifactValue or TreeArtifactValue. This means that the consumer must iterate over the original list of Artifacts, constructing each key on the fly, not the returned map, since the returned map may not have any indication of the original Artifacts. The construction of some FileArtifactValues on the fly here may slightly increase garbage (although not in the Google-internal or common Bazel case, since we store FileArtifactValues directly in ActionExecutionValue there), but it should be dominated by the memory savings. I'm hoping to clean up ActionExecutionValues in a separate change, so that we don't have this overlapping data, although the garbage issue may still remain.

The main complication in (2) is in ActionExecutionFunction, where we need to construct keys for non-mandatory source artifacts.

(3) Action rewinding no longer needs to invalidate ordinary generated Artifact nodes in the graph. This simplifies the resulting graphs, but can complicate the rewinding logic, since some Artifacts will still have nodes while others won't. Instead of tracking on the basis of artifacts, we now track on the basis of actions and artifacts.

While modifying ActionRewindStrategy, I took the liberty of doing some clean-ups: for instance, after https://github.com/bazelbuild/bazel/commit/efb3f1595ee897484c477168b8da42b67602e10e, the HashMultimap<DerivedArtifact, ActionInput> lostInputsByDepOwners was only using its values for a check that was guaranteed to succeed, so I just made it a set.

PiperOrigin-RevId: 252701678
16 files changed
tree: fe0a7e9623d6bcc264eb017ed5f769e94d71f33c
  1. .bazelci/
  2. examples/
  3. scripts/
  4. site/
  5. src/
  6. third_party/
  7. tools/
  8. .bazelrc
  9. .gitattributes
  10. .gitignore
  11. AUTHORS
  12. BUILD
  13. CHANGELOG.md
  14. CODEOWNERS
  15. combine_distfiles.py
  16. combine_distfiles_to_tar.sh
  17. compile.sh
  18. CONTRIBUTING.md
  19. CONTRIBUTORS
  20. distdir.bzl
  21. ISSUE_TEMPLATE.md
  22. LICENSE
  23. README.md
  24. WORKSPACE
README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

  • Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.

  • One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.

  • Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.

  • Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Contributing to Bazel

See CONTRIBUTING.md

Build status

Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.