commit | 4f189866eade44a85ec114ca91355d363ea3e03e | [log] [tgz] |
---|---|---|
author | ulfjack <ulfjack@google.com> | Fri Oct 26 09:59:33 2018 -0700 |
committer | Copybara-Service <copybara-piper@google.com> | Fri Oct 26 10:01:28 2018 -0700 |
tree | 6343caac13e6b2b54af4e318e00f0c4c865ce363 | |
parent | 25df709f22ac1acd382eb469b79cb9fd3bd7b173 [diff] |
Remove all uses of I/O in Resource sets While these purported to model I/O usage, they only created an artificial bottleneck that prevented more than 1/x (rounded down) actions from running in parallel, regardless of how powerful the underlying machine is since the available I/O was always modeled as 1.0. At the same time, most actions were modeled as zero I/O, so there was no constraint on the total number of actions running. The constraints mostly applied to actions of the same type. In theory, overcommitting a machine on memory is the most problematic, as this can cause arbitrary process kills and extreme slowdowns (due to excessive swapping). Overcommitting a machine on CPU is the next biggest problem; while Linux handles CPU overcommit very gracefully, we know that MacOS does a much worse job. However, we recently increased the minimum allocation to one core per process, which makes it less likely to happen, and we allow users to override specific instances with the 'cpu:n' tag. Overcommitting on I/O primarily slows down execution of concurrent processes. On a spinning platter, this can be particularly bad due to seek thrashing, where multiple processes try to read files sequentially, but due to interleaving, the disk ends up spending most of its time in seeks, and this can dominate performance. With the increasing usage of SSDs, which don't have any seek time, this becomes much less of a concern. However, Bazel isn't in a good position to control this, and the current approach is basically just broken. A good disk scheduler can still handle multiple processes without too much of a slowdown, whereas Bazel would have to make conservative restrictions about which processes it runs in parallel. In addition, Bazel cannot even detect whether it's running on an SSD, HDD, network file system, or ramdisk, so there's no way to automatically detect this. Overall, it seems better to rely on the existing mechanisms to prevent memory and CPU overcommit to also prevent I/O overcommit. Individual users can tweak the existing settings (e.g., by reducing jobs, by setting --local_resources, or by increasing cpu settings on a per-rule or per-action basis), if they are even affected. There is reason to expect this change to be a net win in performance and predictability for a majority of Bazel users, even if it's worse for a small fraction. Tests: These settings were preventing more than 10 medium or large tests from running at the same time. For tests that are actually in-memory unit tests, this unnecessarily reduces parallelism on machines with 10+ cores. On machines with fewer cores, we're already constraining the number of tests with jobs (by default jobs = cores), or, as of the recent changes to CPU setting, with the cores counter. The additional specification of 0.1 I/O seems unnecessary and more likely to do harm than good. For enormous tests, it seems advisable to use the 'cpu:n' tag to give Bazel a hint about the nature of the test. Unfortunately, test size is not a well-defined concept, so it's unclear whether we should increase the default core resource count for larger tests; there are arguments both in favor and against such a move. Ideally, we'd come up with a crisper definition, which would allow us to make a better call here. CppLinkAction: These settings were preventing more than 3 link actions from running at the same time, regardless of how heavyweight those actions are. This seems rather unfortunate - while we know that some link actions are so large that they monopolize the machine, this should be modeled with memory instead. While link actions may be the most likely candidate for inducing seek thrashing, they are also the most predictable for a disk scheduler to handle. PiperOrigin-RevId: 218869304
{Fast, Correct} - Choose two
Build and test software of any size, quickly and reliably.
Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.
Follow our tutorials:
See CONTRIBUTING.md
Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.