commit	ed279ab4fa2d4356be00b54266f56edcc5aeae78	[log] [tgz]
author	nharmata <nharmata@google.com>	Tue Dec 07 08:54:00 2021 -0800
committer	Copybara-Service <copybara-worker@google.com>	Tue Dec 07 08:55:36 2021 -0800
tree	df23947568f2cef37c493f5ad6aa8e5cbe351cb2
parent	882321bd2cd126e586cb555e7f4ae535d965ec3d [diff]

Introduce a way for `mySkyFunction.compute(k, env)` to avoid having to redo work for `SkyKey` `k` on each such call to `#compute` (after dependencies are ready).

The comment block I added to `SkyFunction.java` has a great example.

Background and motivation
-------------------------

This has been a theoretical weakness since the initial introduction of Skyframe into the codebase (~mid 2013). The TODO comment in `PackageLookupFunction#findPackageByBuildFile` gives a practical example that never actually occurs for any important invocations (i.e. we don't know of any important invocations that pass `--package_path=<some list with 100+ elements>`).

I recently discovered that `ConfiguredTargetFunction#computeDependencies` (used by both `ConfiguredTargetFunction` and `AspectFunction`) has a very impactful example of this weakness. I've measured it accounting for 3-5% of the analysis phase cpu time and wall time for most invocations, including several important ones internally at Google. Therefore, I've decided to finally address this weakness in a holistic manner. This CL here is the first step towards that.

Historically, Blaze-on-Skyframe `SkyFunction`s have addressed this weakness in an ad-hoc manner. The typical pattern is to have the SkyFunction instance have a `Cache<SkyKey, DataProducedByInternalComputation>`, and populate the entry for `k` on calls to `compute(k, env)`, finally removing the entry when `k` has been fully computed (including being in error). `PackageFunction`, `BzlLoadFunction`, and `ActionExecutionFunction` all have examples of this pattern. This pattern has a few downsides:
* It's very easy for the Blaze-on-Skyframe `SkyFunction` to forget to remove the cache entry for `k`.
* In a nokeep_going evaluation, the `SkyFunction` doesn't have an opportunity to remove the cache entry. Therefore, the pattern is for `SkyframeExecutor#preparePackageLoading` itself to clear all the caches inter-invocation just in case.
* Because the `Cache` is associated with the `SkyFunction` instance, and the Blaze server typically retains all the Blaze-on-Skyframe `SkyFunction` instances forever, bugs in either of the above category can have the consequence of creating logically memory leaks. Internal Google CL 105253634 (2015) fixed an example of the first category.

Additionally, these ad-hoc caches don't have insight into what the Skyframe engine internals are currently doing, and so they are forced to be fairly naive. For example, the three current instances of this "ad-hoc cache" pattern use an unbounded cache data structure. This means the [peak temporary] memory usage is linear in the [peak] number of inflight Skyframe nodes of the appropriate type, which is neither under the control of or even knowable by the SkyFunction implementation. There's an example of this in the "TODO(b/136156191)" in `ActionExecutionFunction.java`. Therefore, an additional benefit of the principled mechanism introduced in this CL is that we have the ability to make this mechanism fancier in the future. I added a TODO for this in `InterComputeStateManager.java`.

Next steps
-----------

* Get rid of the "ad-hoc cache" pattern. (Mentioned above)
* Reimplement `ConfiguredTargetFunction` and `AspectFunction` to use the new mechanism. (Mentioned above)
* Reimplement `PackageLookupFunction` to use the new mechanism. (Mentioned above)
* Reimplement other Blaze-on-Skyframe `SkyFunction`s that can benefit from the new mechanism. For example, the profiling procedure I did [that led me to the `ConfiguredTargetFunction#computeDependencies` problem] identified `FileFunction` and `GlobFunction` as having a relatively large amount of waste here (but with very small absolute impact in benchmarks).

Implementation notes
-------------------------------

* I originally designed and implemented continuation passing style (CPS) support in Skyframe as the solution to the performance weakness in question. This worked great, and even led to the code in `PackageLookupFunction#findPackageByBuildFile` looking quite pretty imo... but it led to the code in `ConfiguredTargetFunction` turning into an unreadable mess that I'd feel bad leaving behind for future Blaze developers. Therefore, I concluded the CPS idea was a net negative wrt code readability.
* Kudos for lberki@ for the alternate solution suggestion of letting `SkyFunction`s mutate state. This doesn't have the readability concerns the CPS approach does.
* Rather than introduce `SkyFunction#compute(SkyKey, InterComputeState, Environment)`, it seems more tasteful to keep `SkyFunction#compute(SkyKey, Environment)` as the only entry point and instead introduce `Environment#getState` as the way for a `SkyFunction` implementation to get the current `InterComputeState` object. This may be, but unfortunately lots of existing unit tests would need to be rewritten. For example, `lib/skyframe/SkyFunctionEnvironmentForTesting.java` becomes impossible to implement. We can revisit this decision in the future; the important point is that this CL here allows us to make iterative improvements to the codebase.

PiperOrigin-RevId: 414738960

6 files changed

tree: df23947568f2cef37c493f5ad6aa8e5cbe351cb2

README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

Speed up your builds and tests: Bazel rebuilds only what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Reporting a Vulnerability

To report a security issue, please email security@bazel.build with a description of the issue, the steps you took to create the issue, affected versions, and, if known, mitigations for the issue. Our vulnerability management team will respond within 3 working days of your email. If the issue is confirmed as a vulnerability, we will open a Security Advisory. This project follows a 90 day disclosure timeline.

Contributing to Bazel

See CONTRIBUTING.md