commit	4f189866eade44a85ec114ca91355d363ea3e03e	[log] [tgz]
author	ulfjack <ulfjack@google.com>	Fri Oct 26 09:59:33 2018 -0700
committer	Copybara-Service <copybara-piper@google.com>	Fri Oct 26 10:01:28 2018 -0700
tree	6343caac13e6b2b54af4e318e00f0c4c865ce363
parent	25df709f22ac1acd382eb469b79cb9fd3bd7b173 [diff]

Remove all uses of I/O in Resource sets

While these purported to model I/O usage, they only created an
artificial bottleneck that prevented more than 1/x (rounded down)
actions from running in parallel, regardless of how powerful the
underlying machine is since the available I/O was always modeled as 1.0.

At the same time, most actions were modeled as zero I/O, so there was no
constraint on the total number of actions running. The constraints
mostly applied to actions of the same type.

In theory, overcommitting a machine on memory is the most problematic,
as this can cause arbitrary process kills and extreme slowdowns (due to
excessive swapping).

Overcommitting a machine on CPU is the next biggest problem; while Linux
handles CPU overcommit very gracefully, we know that MacOS does a much
worse job. However, we recently increased the minimum allocation to one
core per process, which makes it less likely to happen, and we allow
users to override specific instances with the 'cpu:n' tag.

Overcommitting on I/O primarily slows down execution of concurrent
processes. On a spinning platter, this can be particularly bad due to
seek thrashing, where multiple processes try to read files sequentially,
but due to interleaving, the disk ends up spending most of its time in
seeks, and this can dominate performance. With the increasing usage of
SSDs, which don't have any seek time, this becomes much less of a
concern. However, Bazel isn't in a good position to control this, and
the current approach is basically just broken. A good disk scheduler can
still handle multiple processes without too much of a slowdown, whereas
Bazel would have to make conservative restrictions about which processes
it runs in parallel. In addition, Bazel cannot even detect whether it's
running on an SSD, HDD, network file system, or ramdisk, so there's no
way to automatically detect this.

Overall, it seems better to rely on the existing mechanisms to prevent
memory and CPU overcommit to also prevent I/O overcommit. Individual
users can tweak the existing settings (e.g., by reducing jobs, by
setting --local_resources, or by increasing cpu settings on a per-rule
or per-action basis), if they are even affected.

There is reason to expect this change to be a net win in performance and
predictability for a majority of Bazel users, even if it's worse for a
small fraction.

Tests:
These settings were preventing more than 10 medium or large tests from
running at the same time. For tests that are actually in-memory unit
tests, this unnecessarily reduces parallelism on machines with 10+
cores. On machines with fewer cores, we're already constraining the
number of tests with jobs (by default jobs = cores), or, as of the
recent changes to CPU setting, with the cores counter. The additional
specification of 0.1 I/O seems unnecessary and more likely to do harm
than good. For enormous tests, it seems advisable to use the 'cpu:n' tag
to give Bazel a hint about the nature of the test.

Unfortunately, test size is not a well-defined concept, so it's unclear
whether we should increase the default core resource count for larger
tests; there are arguments both in favor and against such a move.
Ideally, we'd come up with a crisper definition, which would allow us to
make a better call here.

CppLinkAction:
These settings were preventing more than 3 link actions from running at
the same time, regardless of how heavyweight those actions are. This
seems rather unfortunate - while we know that some link actions are so
large that they monopolize the machine, this should be modeled with
memory instead. While link actions may be the most likely candidate for
inducing seek thrashing, they are also the most predictable for a disk
scheduler to handle.
PiperOrigin-RevId: 218869304

16 files changed

tree: 6343caac13e6b2b54af4e318e00f0c4c865ce363

README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Contributing to Bazel

See CONTRIBUTING.md

Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.