larsrc | 754c120 | 2020-07-31 11:59:05 -0700 | [diff] [blame] | 1 | --- |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 2 | layout: documentation |
daroberts | 73011da5 | 2021-01-06 11:59:44 -0800 | [diff] [blame] | 3 | title: Persistent Workers |
daroberts | ce2bdd1 | 2021-02-26 11:36:38 -0800 | [diff] [blame] | 4 | category: extending |
larsrc | 754c120 | 2020-07-31 11:59:05 -0700 | [diff] [blame] | 5 | --- |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 6 | |
daroberts | 73011da5 | 2021-01-06 11:59:44 -0800 | [diff] [blame] | 7 | # Persistent Workers |
| 8 | |
Googler | 54a96a1 | 2020-12-23 16:43:39 -0800 | [diff] [blame] | 9 | This page covers how to use persistent workers, the benefits, requirements, |
| 10 | and how workers affect sandboxing. |
| 11 | |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 12 | A persistent worker is a long-running process started by the Bazel server, which |
| 13 | functions as a _wrapper_ around the actual _tool_ (typically a compiler), or is |
| 14 | the _tool_ itself. In order to benefit from persistent workers, the tool must |
| 15 | support doing a sequence of compilations, and the wrapper needs to translate |
| 16 | between the tool's API and the request/response format described below. The same |
| 17 | worker might be called with and without the `--persistent_worker` flag |
| 18 | in the same build, and is responsible for appropriately starting and talking to |
| 19 | the tool, as well as shutting down workers on exit. Each worker instance is |
| 20 | assigned (but not chrooted to) a separate working directory under |
| 21 | `<outputBase>/bazel-workers`. |
| 22 | |
| 23 | Using persistent workers is an |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 24 | [execution strategy](https://docs.bazel.build/versions/master/user-manual.html#strategy-options) |
| 25 | that decreases start-up overhead, allows more JIT compilation, and enables |
| 26 | caching of for example the abstract syntax trees in the action execution. This |
| 27 | strategy achieves these improvements by sending multiple requests to a |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 28 | long-running process. |
| 29 | |
| 30 | Persistent workers are implemented for multiple languages, including Java, |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 31 | [TypeScript](https://bazelbuild.github.io/rules_nodejs/TypeScript.html), |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 32 | [Scala](https://github.com/bazelbuild/rules_scala), |
| 33 | [Kotlin](https://github.com/bazelbuild/rules_kotlin), and more. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 34 | |
| 35 | ## Using persistent workers <a name="usage"></a> |
| 36 | |
| 37 | [Bazel 0.27 and higher](https://blog.bazel.build/2019/06/19/list-strategy.html) |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 38 | uses persistent workers by default when executing builds, though remote |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 39 | execution takes precedence. For actions that do not support persistent workers, |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 40 | Bazel falls back to starting a tool instance for each action. You can explicitly |
| 41 | set your build to use persistent workers by setting the `worker` |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 42 | [strategy](user-manual.html#strategy-options) for the applicable tool mnemonics. |
| 43 | As a best practice, this example includes specifying `local` as a fallback to |
| 44 | the `worker` strategy: |
| 45 | |
| 46 | ``` |
| 47 | bazel build //my:target --strategy=Javac=worker,local |
| 48 | ``` |
| 49 | |
| 50 | Using the workers strategy instead of the local strategy can boost compilation |
Googler | ea70a3a | 2021-02-26 17:40:15 -0800 | [diff] [blame] | 51 | speed significantly, depending on implementation. For Java, builds can be |
| 52 | 2–4 times faster, sometimes more for incremental compilation. Compiling |
| 53 | Bazel is about 2.5 times as fast with workers. For more details, see the |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 54 | "[Choosing number of workers](#number-of-workers)" section. |
| 55 | |
| 56 | If you also have a remote build environment that matches your local build |
| 57 | environment, you can use the experimental |
| 58 | [_dynamic_ strategy](https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html), |
| 59 | which races a remote execution and a worker execution. To enable the dynamic |
| 60 | strategy, pass the |
| 61 | [--experimental_spawn_scheduler](command-line-reference.html#flag--experimental_spawn_scheduler) |
| 62 | flag. This strategy automatically enables workers, so there is no need to |
| 63 | specify the `worker` strategy, but you can still use `local` or `sandboxed` as |
| 64 | fallbacks. |
| 65 | |
| 66 | ## Choosing number of workers <a name="number-of-workers"></a> |
| 67 | |
| 68 | The default number of worker instances per mnemonic is 4, but can be adjusted |
| 69 | with the |
| 70 | [`worker_max_instances`](command-line-reference.html#flag--worker_max_instances) |
| 71 | flag. There is a trade-off between making good use of the available CPUs and the |
| 72 | amount of JIT compilation and cache hits you get. With more workers, more |
| 73 | targets will pay start-up costs of running non-JITted code and hitting cold |
| 74 | caches. If you have a small number of targets to build, a single worker may give |
| 75 | the best trade-off between compilation speed and resource usage (for example, |
| 76 | see [issue #8586](https://github.com/bazelbuild/bazel/issues/8586). The |
| 77 | `worker_max_instances` flag sets the maximum number of worker instances per |
| 78 | mnemonic and flag set (see below), so in a mixed system you could end up using |
| 79 | quite a lot of memory if you keep the default value. For incremental builds the |
| 80 | benefit of multiple worker instances is even smaller. |
| 81 | |
| 82 | This graph shows the from-scratch compilation times for Bazel (target |
| 83 | `//src:bazel`) on a 6-core hyper-threaded Intel Xeon 3.5 GHz Linux workstation |
Googler | ea70a3a | 2021-02-26 17:40:15 -0800 | [diff] [blame] | 84 | with 64 GB of RAM. For each worker configuration, five clean builds are run and |
| 85 | the average of the last four are taken. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 86 | |
| 87 | <p align="center"> |
| 88 | <img width="596px" alt="Graph of performance improvements of clean builds" src="/assets/workers-clean-chart.png"> |
| 89 | </p> |
| 90 | |
| 91 | For this configuration, two workers give the fastest compile, though at only 14% |
| 92 | improvement compared to one worker. One worker is a good option if you want to |
| 93 | use less memory. |
| 94 | |
| 95 | Incremental compilation typically benefits even more. Clean builds are |
| 96 | relatively rare, but changing a single file between compiles is common, in |
| 97 | particular in test-driven development. The above example also has some non-Java |
| 98 | packaging actions to it that can overshadow the incremental compile time. |
| 99 | Recompiling the Java sources only |
| 100 | (`//src/main/java/com/google/devtools/build/lib/bazel:BazelServer_deploy.jar`) |
| 101 | after changing an internal string constant in |
| 102 | [AbstractContainerizingSandboxedSpawn.java](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/sandbox/AbstractContainerizingSandboxedSpawn.java) |
| 103 | gives a 3x speed-up (average of 20 incremental builds with one warmup build |
| 104 | discarded): |
| 105 | |
| 106 | <p align="center"> |
| 107 | <img width="592px" alt="Graph of performance improvements of incremental builds" src="/assets/workers-incremental-chart.png"> |
| 108 | </p> |
| 109 | |
Googler | ea70a3a | 2021-02-26 17:40:15 -0800 | [diff] [blame] | 110 | The speed-up depends on the change being made. A speed-up of a |
| 111 | factor 6 is measured in the above situation when a commonly used constant |
| 112 | is changed. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 113 | |
| 114 | ## Modifying persistent workers<a name="options"></a> |
| 115 | |
| 116 | You can pass the |
| 117 | [`--worker_extra_flag`](command-line-reference.html#flag--worker_extra_flag) |
| 118 | flag to specify start-up flags to workers, keyed by mnemonic. For instance, |
| 119 | passing `--worker_extra_flag=javac=--debug` turns on debugging for Javac only. |
| 120 | Only one worker flag can be set per use of this flag, and only for one mnemonic. |
| 121 | Workers are not just created separately for each mnemonic, but also for |
| 122 | variations in their start-up flags. Each combination of mnemonic and start-up |
| 123 | flags is combined into a `WorkerKey`, and for each `WorkerKey` up to |
| 124 | `worker_max_instances` workers may be created. See the next section for how the |
| 125 | action configuration can also specify set-up flags. |
| 126 | |
| 127 | You can use the |
| 128 | [`--high_priority_workers`](command-line-reference.html#flag--high_priority_workers) |
| 129 | flag to specify a mnemonic that should be run in preference to normal-priority |
| 130 | mnemonics. This can help prioritize actions that are always in the critical |
| 131 | path. If there are two or more high priority workers executing requests, all |
| 132 | other workers are prevented from running. This flag can be used multiple times. |
| 133 | |
| 134 | Passing the |
| 135 | [`--worker_sandboxing`](command-line-reference.html#flag--worker_sandboxing) |
| 136 | flag makes each worker request use a separate sandbox directory for all its |
ranjanih | d373c72 | 2021-01-20 10:01:40 -0800 | [diff] [blame] | 137 | inputs. Setting up the [sandbox](sandboxing.md) takes some extra time, |
| 138 | especially on macOS, but gives a better correctness guarantee. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 139 | |
steinman | e92b72b | 2020-10-06 11:00:17 -0700 | [diff] [blame] | 140 | You can use the `--experimental_worker_allow_json_protocol` flag to allow |
| 141 | workers to communicate with Bazel through JSON instead of protocol buffers |
| 142 | (protobuf). The worker and the rule that consumes it can then be modified to |
| 143 | support JSON. |
| 144 | |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 145 | The |
| 146 | [`--worker_quit_after_build`](command-line-reference.html#flag--worker_quit_after_build) |
| 147 | flag is mainly useful for debugging and profiling. This flag forces all workers |
| 148 | to quit once a build is done. You can also pass |
| 149 | [`--worker_verbose`](command-line-reference.html#flag--worker_verbose) to get |
larsrc | 441586b | 2020-08-04 10:00:05 -0700 | [diff] [blame] | 150 | more output about what the workers are doing. |
| 151 | |
| 152 | Workers store their logs in the `<outputBase>/bazel-workers` directory, for |
| 153 | example |
| 154 | `/tmp/_bazel_larsrc/191013354bebe14fdddae77f2679c3ef/bazel-workers/worker-1-Javac.log`. |
| 155 | The file name includes the worker id and the mnemonic. Since there can be more |
| 156 | than one `WorkerKey` per mnemonic, you may see more than `worker_max_instances` |
| 157 | log files for a given mnemonic. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 158 | |
| 159 | For Android builds, see details at the |
| 160 | [Android Build Performance page](android-build-performance.html). |
| 161 | |
| 162 | ## Implementing persistent workers<a name="implementation"></a> |
| 163 | |
steinman | 38835eb | 2020-11-11 14:19:56 -0800 | [diff] [blame] | 164 | See the [creating persistent workers](creating-workers.html) page for |
| 165 | information on how to make a worker. |
| 166 | |
steinman | e92b72b | 2020-10-06 11:00:17 -0700 | [diff] [blame] | 167 | This example shows a Starlark configuration for a worker that uses JSON: |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 168 | |
| 169 | ```python |
| 170 | args_file = ctx.actions.declare_file(ctx.label.name + "_args_file") |
| 171 | ctx.actions.write( |
| 172 | output = args_file, |
| 173 | content = "\n".join(["-g", "-source", "1.5"] + ctx.files.srcs), |
| 174 | ) |
| 175 | ctx.actions.run( |
| 176 | mnemonic = "SomeCompiler", |
| 177 | executable = "bin/some_compiler_wrapper", |
| 178 | inputs = inputs, |
| 179 | outputs = outputs, |
| 180 | arguments = [ "-max_mem=4G", "@%s" % args_file.path], |
steinman | e92b72b | 2020-10-06 11:00:17 -0700 | [diff] [blame] | 181 | execution_requirements = { |
| 182 | "supports-workers" : "1", "requires-worker-protocol" : "json" } |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 183 | ) |
| 184 | ``` |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 185 | With this definition, the first use of this action would start with executing |
| 186 | the command line `/bin/some_compiler -max_mem=4G --persistent_worker`. A request |
| 187 | to compile `Foo.java` would then look like: |
| 188 | |
| 189 | ```prototext |
| 190 | arguments: [ "-g", "-source", "1.5", "Foo.java" ] |
| 191 | inputs: [ |
| 192 | {path: "symlinkfarm/input1" digest: "d49a..." }, |
| 193 | {path: "symlinkfarm/input2", digest: "093d..."}, |
| 194 | ] |
| 195 | ``` |
| 196 | |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 197 | The worker receives this on `stdin` in JSON format (because |
| 198 | `requires-worker-protocol` is set to JSON, and |
steinman | e92b72b | 2020-10-06 11:00:17 -0700 | [diff] [blame] | 199 | `--experimental_worker_allow_json_protocol` is passed to the build to enable |
ranjanih | 4962847 | 2020-11-18 17:57:35 -0800 | [diff] [blame] | 200 | this option). The worker then performs the action, and sends a JSON-formatted |
| 201 | `WorkResponse` to Bazel on its stdout. Bazel then parses this response and |
| 202 | manually converts it to a `WorkResponse` proto. To communicate |
| 203 | with the associated worker using binary-encoded protobuf instead of JSON, |
| 204 | `requires-worker-protocol` would be set to `proto`, like this: |
| 205 | |
steinman | e92b72b | 2020-10-06 11:00:17 -0700 | [diff] [blame] | 206 | ``` |
| 207 | execution_requirements = { |
| 208 | "supports-workers" : "1" , |
| 209 | "requires-worker-protocol" : "proto" |
| 210 | } |
| 211 | ``` |
| 212 | If you do not include `requires-worker-protocol` in the execution requirements, |
| 213 | Bazel will default the worker communication to use protobuf. |
| 214 | |
| 215 | Bazel derives the `WorkerKey` from the mnemonic and the shared flags, so if this |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 216 | configuration allowed changing the `max_mem` parameter, a separate worker would |
| 217 | be spawned for each value used. This can lead to excessive memory consumption if |
| 218 | too many variations are used. |
| 219 | |
| 220 | Each worker can currently only process one request at a time. The experimental |
| 221 | [multiplex workers](multiplex-worker.html) feature allows using multiple |
| 222 | threads, if the underlying tool is multithreaded and the wrapper is set up to |
| 223 | understand this. |
| 224 | |
| 225 | In [this GitHub repo](https://github.com/Ubehebe/bazel-worker-examples), you can |
Joe Lencioni | 4824527 | 2020-11-16 08:43:41 -0800 | [diff] [blame] | 226 | see example worker wrappers written in Java as well as in Python. If you are |
| 227 | working in JavaScript or TypeScript, the [@bazel/worker |
| 228 | package](https://www.npmjs.com/package/@bazel/worker) and |
| 229 | [nodejs worker example](https://github.com/bazelbuild/rules_nodejs/tree/stable/examples/worker) |
| 230 | might be helpful. |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 231 | |
| 232 | ## How do workers affect sandboxing? <a name="sandboxing"></a> |
| 233 | |
ranjanih | d373c72 | 2021-01-20 10:01:40 -0800 | [diff] [blame] | 234 | Using the `worker` strategy by default does not run the action in a |
| 235 | [sandbox](sandboxing.md), similar to the `local` strategy. You can set |
| 236 | the `--worker_sandboxing` flag to run all workers inside sandboxes, making sure |
| 237 | each execution of the tool only sees the input files it's supposed to have. The |
| 238 | tool may still leak information between requests internally, for instance |
| 239 | through a cache. Using `dynamic` strategy [requires workers to be sandboxed](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/exec/SpawnStrategyRegistry.java). |
larsrc | 4513812 | 2020-07-30 09:06:12 -0700 | [diff] [blame] | 240 | |
| 241 | To allow correct use of compiler caches with workers, a digest is passed along |
| 242 | with each input file. Thus the compiler or the wrapper can check if the input is |
| 243 | still valid without having to read the file. |
| 244 | |
| 245 | Even when using the input digests to guard against unwanted caching, sandboxed |
| 246 | workers offer less strict sandboxing than a pure sandbox, because the tool may |
| 247 | keep other internal state that has been affected by previous requests. |
| 248 | |
| 249 | ## Further reading <a name="further-reading"></a> |
| 250 | |
| 251 | For more information on persistent workers, see: |
| 252 | |
| 253 | * [Original persistent workers blog post](https://blog.bazel.build/2015/12/10/java-workers.html) |
| 254 | * [Haskell implementation description](https://www.tweag.io/blog/2019-09-25-bazel-ghc-persistent-worker-internship/) |
| 255 | * [Blog post by Mike Morearty](https://medium.com/@mmorearty/how-to-create-a-persistent-worker-for-bazel-7738bba2cabb) |
| 256 | * [Front End Development with Bazel: Angular/TypeScript and Persistent Workers |
| 257 | w/ Asana](https://www.youtube.com/watch?v=0pgERydGyqo) |
| 258 | * [Bazel strategies explained](https://jmmv.dev/2019/12/bazel-strategies.html) |
| 259 | * [Informative worker strategy discussion on the bazel-discuss mailing list](https://groups.google.com/forum/#!msg/bazel-discuss/oAEnuhYOPm8/ol7hf4KWJgAJ) |