| Project: /_project.yaml |
| Book: /_book.yaml |
| |
| # Dynamic Execution |
| |
| {% include "_buttons.html" %} |
| |
| __Dynamic execution__ is a feature in Bazel where local and remote execution of |
| the same action are started in parallel, using the output from the first branch |
| that finishes, cancelling the other branch. It combines the execution power |
| and/or large shared cache of a remote build system with the low latency of local |
| execution, providing the best of both worlds for clean and incremental builds |
| alike. |
| |
| This page describes how to enable, tune, and debug dynamic execution. If you |
| have both local and remote execution set up and are trying to adjust Bazel |
| settings for better performance, this page is for you. If you don't already have |
| remote execution set up, go to the Bazel [Remote Execution |
| Overview](/remote/rbe) first. |
| |
| ## Enabling dynamic execution? {:#enabling-dynamic-execution} |
| |
| The dynamic execution module is part of Bazel, but to make use of dynamic |
| execution, you must already be able to compile both locally and remotely from |
| the same Bazel setup. |
| |
| To enable the dynamic execution module, pass the `--internal_spawn_scheduler` |
| flag to Bazel. This adds a new execution strategy called `dynamic`. You can now |
| use this as your strategy for the mnemonics you want to run dynamically, such as |
| `--strategy=Javac=dynamic`. See the next section for how to pick which mnemonics |
| to enable dynamic execution for. |
| |
| For any mnemonic using the dynamic strategy, the remote execution strategies are |
| taken from the `--dynamic_remote_strategy` flag, and local strategies from the |
| `--dynamic_local_strategy` flag. Passing |
| `--dynamic_local_strategy=worker,sandboxed` sets the default for the local |
| branch of dynamic execution to try with workers or sandboxed execution in that |
| order. Passing `--dynamic_local_strategy=Javac=worker` overrides the default for |
| the Javac mnemonic only. The remote version works the same way. Both flags can |
| be specified multiple times. If an action cannot be executed locally, it is |
| executed remotely as normal, and vice-versa. |
| |
| If your remote system has a cache, the `--dynamic_local_execution_delay` flag |
| adds a delay in milliseconds to the local execution after the remote system has |
| indicated a cache hit. This avoids running local execution when more cache hits |
| are likely. The default value is 1000ms, but should be tuned to being just a bit |
| longer than cache hits usually take. The actual time depends both on the remote |
| system and on how long a round-trip takes. Usually, the value will be the same |
| for all users of a given remote system, unless some of them are far enough away |
| to add roundtrip latency. You can use the [Bazel profiling |
| features](/rules/performance#performance-profiling) to look at how long typical |
| cache hits take. |
| |
| Dynamic execution can be used with local sandboxed strategy as well as with |
| [persistent workers](/remote/persistent). Persistent workers will automatically |
| run with sandboxing when used with dynamic execution, and cannot use [multiplex |
| workers](/remote/multiplex). On Darwin and Windows systems, the sandboxed |
| strategy can be slow; you can pass `--reuse_sandbox_directories` to reduce |
| overhead of creating sandboxes on these systems. |
| |
| Dynamic execution can also run with the `standalone` strategy, though since the |
| `standalone` strategy must take the output lock when it starts executing, it |
| effectively blocks the remote strategy from finishing first. The |
| `--experimental_local_lockfree_output` flag enables a way around this problem by |
| allowing the local execution to write directly to the output, but be aborted by |
| the remote execution, should that finish first. |
| |
| If one of the branches of dynamic execution finishes first but is a failure, the |
| entire action fails. This is an intentional choice to prevent differences |
| between local and remote execution from going unnoticed. |
| |
| For more background on how dynamic execution and its locking works, see Julio |
| Merino's excellent [blog |
| posts](https://jmmv.dev/series/bazel-dynamic-execution/){: .external} |
| |
| ## When should I use dynamic execution? {:#when-to-use} |
| |
| Dynamic execution requires some form of [remote execution system](/remote/rbe). |
| It is not currently possible to use a cache-only remote system, as a cache miss |
| would be considered a failed action. |
| |
| Not all types of actions are well suited for remote execution. The best |
| candidates are those that are inherently faster locally, for instance through |
| the use of [persistent workers](/remote/persistent), or those that run fast |
| enough that the overhead of remote execution dominates execution time. Since |
| each locally executed action locks some amount of CPU and memory resources, |
| running actions that don't fall into those categories merely delays execution |
| for those that do. |
| |
| As of release |
| [5.0.0-pre.20210708.4](https://github.com/bazelbuild/bazel/releases/tag/5.0.0-pre.20210708.4){: .external}, |
| [performance profiling](/rules/performance#performance-profiling) contains data |
| about worker execution, including time spent finishing a work request after |
| losing a dynamic execution race. If you see dynamic execution worker threads |
| spending significant time acquiring resources, or a lot of time in the |
| `async-worker-finish`, you may have some slow local actions delaying the worker |
| threads. |
| |
| <p align="center"> |
| <img width="596px" alt="Profiling data with poor dynamic execution performance" |
| src="/docs/images/dyn-trace-alldynamic.png"> |
| </p> |
| |
| In the profile above, which uses 8 Javac workers, we see many Javac workers |
| having lost the races and finishing their work on the `async-worker-finish` |
| threads. This was caused by a non-worker mnemonic taking enough resources to |
| delay the workers. |
| |
| <p align="center"> |
| <img width="596px" alt="Profiling data with better dynamic execution performance" |
| src="/docs/images/dyn-trace-javaconly.png"> |
| </p> |
| |
| When only Javac is run with dynamic execution, only about half of the started |
| workers end up losing the race after starting their work. |
| |
| The previously recommended `--experimental_spawn_scheduler` flag is deprecated. |
| It turns on dynamic execution and sets `dynamic` as the default strategy for all |
| mnemonics, which would often lead to these kinds of problems. |
| |
| ## Performance {:#performance} |
| |
| The dynamic execution approach assumes there are enough resources available |
| locally and remotely that it's worth spending some extra resources to improve |
| overall performance. But excessive resource usage may slow down Bazel itself or |
| the machine it runs on, or put unexpected pressure on a remote system. There are |
| several options for changing the behaviour of dynamic execution: |
| |
| `--dynamic_local_execution_delay` delays the start of a local branch by a number |
| of milliseconds after the remote branch has started, but only if there has been |
| a remote cache hit during the current build. This makes builds that benefit |
| from remote caching not waste local resources when it is likely that most |
| outputs can be found in the cache. Depending on the quality of the cache, |
| reducing this might improve build speeds, at the cost of using more local |
| resources. |
| |
| `--experimental_dynamic_local_load_factor` is an experimental advanced resource |
| management option. It takes a value from 0 to 1, 0 turning off this feature. |
| When set to a value above 0, Bazel adjusts the number of |
| locally scheduled actions when many actions waiting to |
| be scheduled. Setting it to 1 allows as many actions to be scheduled as there |
| are CPUs available (as per `--local_cpu_resources`). Lower values set the number |
| of actions scheduled to correspondingly fewer as higher numbers of actions are |
| available to run. This may sound counter-intuitive, but with a good remote |
| system, local execution does not help much when many actions are being run, and |
| the local CPU is better spent managing remote actions. |
| |
| `--experimental_dynamic_slow_remote_time` prioritizes starting local branches |
| when the remote branch has been running for at least this long. Normally the |
| most recently scheduled action gets priority, as it has the greatest chance of |
| winning the race, but if the remote system sometimes hangs or takes extra long, |
| this can get a build to move along. This is not enabled by default, because it |
| could hide issues with the remote system that should rather be fixed. Make sure |
| to monitor your remote system performance if you enable this option. |
| |
| `--experimental_dynamic_ignore_local_signals` can be used to let the remote |
| branch take over when a local spawn exits due to a given signal. This is |
| is mainly useful together with worker resource limits (see |
| [`--experimental_worker_memory_limit_mb`](https://bazel.build/reference/command-line-reference#flag--experimental_worker_memory_limit_mb), |
| [`--experimental_worker_sandbox_hardening`](https://bazel.build/reference/command-line-reference#flag--experimental_worker_sandbox_hardening), |
| and |
| [`--experimental_sandbox_memory_limit_mb`)](https://bazel.build/reference/command-line-reference#flag--experimental_sandbox_memory_limit_mb)), |
| where worker processes may be killed when they use too many resources. |
| |
| The [JSON trace profile](/advanced/performance/json-trace-profile) contains a |
| number of performance-related graphs that can help identify ways to improve the |
| trade-off of performance and resource usage. |
| |
| ## Troubleshooting {:#troubleshooting} |
| |
| Problems with dynamic execution can be subtle and hard to debug, as they can |
| manifest only under some specific combinations of local and remote execution. |
| The `--debug_spawn_scheduler` adds extra output from the dynamic execution |
| system that can help debug these problems. You can also adjust the |
| `--dynamic_local_execution_delay` flag and number of remote vs. local jobs to |
| make it easier to reproduce the problems. |
| |
| If you are experiencing problems with dynamic execution using the `standalone` |
| strategy, try running without `--experimental_local_lockfree_output`, or run |
| your local actions sandboxed. This may slow down your build a bit (see above if |
| you're on Mac or Windows), but removes some possible causes for failures. |