blob: 6b5a4bdfd2821ccfd637c285c7a10023653719cb [file] [log] [blame] [view]
Project: /_project.yaml
Book: /_book.yaml
# Dynamic Execution
{% include "_buttons.html" %}
__Dynamic execution__ is a feature in Bazel
[since version 0.21](https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html){: .external},
where local and remote execution of the same action are started in parallel,
using the output from the first branch that finishes, cancelling the other
branch. It combines the execution power and/or large shared cache of a remote
build system with the low latency of local execution, providing the best of both
worlds for clean and incremental builds alike.
This page describes how to enable, tune, and debug dynamic execution. If you
have both local and remote execution set up and are trying to adjust Bazel
settings for better performance, this page is for you. If you don't already have
remote execution set up, go to the Bazel
[Remote Execution Overview](/remote/rbe) first.
## Enabling dynamic execution? {:#enabling-dynamic-execution}
The dynamic execution module is part of Bazel, but to make use of dynamic
execution, you must already be able to compile both locally and remotely from
the same Bazel setup.
To enable the dynamic execution module, pass the `--internal_spawn_scheduler`
flag to Bazel. This adds a new execution strategy called `dynamic`. You can now
use this as your strategy for the mnemonics you want to run dynamically, such as
`--strategy=Javac=dynamic`. See the next section for how to pick which mnemonics
to enable dynamic execution for.
For any mnemonic using the dynamic strategy, the remote execution strategies are
taken from the `--dynamic_remote_strategy` flag, and local strategies from the
`--dynamic_local_strategy` flag. Passing
`--dynamic_local_strategy=worker,sandboxed` sets the default for the local
branch of dynamic execution to try with workers or sandboxed execution in that
order. Passing `--dynamic_local_strategy=Javac=worker` overrides the default for
the Javac mnemonic only. The remote version works the same way. Both flags can
be specified multiple times. If an action cannot be executed locally, it is
executed remotely as normal, and vice-versa.
If your remote system has a cache, the `--dynamic_local_execution_delay`
flag adds a delay in milliseconds to the local execution after the remote system
has indicated a cache hit. This avoids running local execution when more cache
hits are likely. The default value is 1000ms, but should be tuned to being just
a bit longer than cache hits usually take. The actual time depends both on the
remote system and on how long a round-trip takes. Usually, the value will be the
same for all users of a given remote system, unless some of them are far enough
away to add roundtrip latency. You can use the
[Bazel profiling features](/rules/performance#performance-profiling)
to look at how long typical cache hits take.
Dynamic execution can be used with local sandboxed strategy as well as with
[persistent workers](/remote/persistent). Persistent workers will
automatically run with sandboxing when used with dynamic execution, and cannot
use [multiplex workers](/remote/multiplex). On Darwin and Windows systems,
the sandboxed strategy can be slow; you can pass
`--reuse_sandbox_directories` to reduce overhead of creating sandboxes on these systems.
Dynamic execution can also run with the `standalone` strategy, though since the
`standalone` strategy must take the output lock when it starts executing, it
effectively blocks the remote strategy from finishing first. The
`--experimental_local_lockfree_output` flag enables a way around this problem by
allowing the local execution to write directly to the output, but be aborted by
the remote execution, should that finish first.
If one of the branches of dynamic execution finishes first but is a failure, the
entire action fails. This is an intentional choice to prevent differences
between local and remote execution from going unnoticed.
For more background on how dynamic execution and its locking works, see Julio
Merino's excellent
[blog posts](https://jmmv.dev/series/bazel-dynamic-execution/){: .external}
## When should I use dynamic execution? {:#when-to-use}
Dynamic execution requires some form of
[remote execution system](/remote/rbe). It is not currently
possible to use a cache-only remote system, as a cache miss would be considered
a failed action.
Not all types of actions are well suited for remote execution. The best
candidates are those that are inherently faster locally, for instance through
the use of [persistent workers](/remote/persistent), or those that run
fast enough that the overhead of remote execution dominates execution time.
Since each locally executed action locks some amount of CPU and memory
resources, running actions that don't fall into those categories merely delays
execution for those that do.
As of release
[5.0.0-pre.20210708.4](https://github.com/bazelbuild/bazel/releases/tag/5.0.0-pre.20210708.4){: .external},
[performance profiling](/rules/performance#performance-profiling)
contains data about worker execution, including time spent finishing a work
request after losing a dynamic execution race. If you see dynamic execution
worker threads spending significant time acquiring resources, or a lot of time
in the `async-worker-finish`, you may have some slow local actions delaying the
worker threads.
<p align="center">
<img width="596px" alt="Profiling data with poor dynamic execution performance"
src="/docs/images/dyn-trace-alldynamic.png">
</p>
In the profile above, which uses 8 Javac workers, we see many Javac workers
having lost the races and finishing their work on the `async-worker-finish`
threads. This was caused by a non-worker mnemonic taking enough resources to
delay the workers.
<p align="center">
<img width="596px" alt="Profiling data with better dynamic execution performance"
src="/docs/images/dyn-trace-javaconly.png">
</p>
When only Javac is run with dynamic execution, only about half of the started
workers end up losing the race after starting their work.
The previously recommended `--experimental_spawn_scheduler` flag is deprecated.
It turns on dynamic execution and sets `dynamic` as the default strategy for all
mnemonics, which would often lead to these kinds of problems.
## Troubleshooting {:#troubleshooting}
Problems with dynamic execution can be subtle and hard to debug, as they can
manifest only under some specific combinations of local and remote execution.
The `--debug_spawn_scheduler` adds extra output from the dynamic
execution system that can help debug these problems. You can also adjust the
`--dynamic_local_execution_delay` flag and number of remote vs. local jobs
to make it easier to reproduce the problems.
If you are experiencing problems with dynamic execution using the `standalone`
strategy, try running without `--experimental_local_lockfree_output`, or run
your local actions sandboxed. This may slow down your build a bit (see above if
you're on Mac or Windows), but removes some possible causes for failures.