Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 1 | --- |
| 2 | layout: documentation |
Greg Estren | c081675 | 2020-02-20 13:04:29 -0800 | [diff] [blame] | 3 | title: Debugging remote cache hits for remote execution |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 4 | --- |
| 5 | |
Googler | a85685a | 2021-01-08 16:53:49 -0800 | [diff] [blame] | 6 | # Debugging Remote Cache Hits for Remote Execution |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 7 | |
| 8 | This page describes how to check your cache hit rate and how to investigate |
| 9 | cache misses in the context of remote execution. |
| 10 | |
| 11 | This page assumes that you have a build and/or test that successfully |
| 12 | utilizes remote execution, and you want to ensure that you are effectively |
| 13 | utilizing remote cache. |
| 14 | |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 15 | ## Checking your cache hit rate |
| 16 | |
| 17 | In the standard output of your Bazel run, look at the `INFO` line that lists |
| 18 | processes, which roughly correspond to Bazel actions. That line details |
| 19 | where the action was run. Look for the `remote` label, which indicates an action |
| 20 | executed remotely, `linux-sandbox` for actions executed in a local sandbox, |
| 21 | and other values for other execution strategies. An action whose result came |
| 22 | from a remote cache is displayed as `remote cache hit`. |
| 23 | |
| 24 | For example: |
| 25 | |
| 26 | INFO: 7 processes: 3 remote cache hit, 4 remote. |
| 27 | |
| 28 | In this example there were 3 remote cache hits, and 4 actions did not have |
| 29 | cache hits and were executed remotely. Local cache hits are not included in this |
| 30 | summary. If you are getting 0 processes (or a number lower than expected), |
| 31 | run `bazel clean` followed by your build/test command. |
| 32 | |
| 33 | ## Troubleshooting cache hits |
| 34 | |
| 35 | If you are not getting the cache hit rate you are expecting, do the following: |
| 36 | |
Greg Estren | c081675 | 2020-02-20 13:04:29 -0800 | [diff] [blame] | 37 | ### Ensure re-running the same build/test command produces cache hits |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 38 | |
| 39 | 1. Run the build(s) and/or test(s) that you expect to populate the cache. The |
| 40 | first time a new build is run on a particular stack, we expect no remote |
| 41 | cache hits. As part of remote execution, action results are stored in the |
| 42 | cache and a subsequent run should pick them up. |
| 43 | |
| 44 | 2. Run `bazel clean`. This command cleans your local cache, which allows |
| 45 | you to investigate remote cache hits without the results being masked by |
| 46 | local cache hits. |
| 47 | |
| 48 | 3. Run the build(s) and test(s) that you are investigating again (on the same |
| 49 | machine). |
| 50 | |
| 51 | 4. Check the `INFO` line for cache hit rate. If you see no processes except |
| 52 | `remote cache hit`, then your cache is being correctly populated and |
| 53 | accessed. In that case, skip to the next section. |
| 54 | |
| 55 | 5. A likely source of discrepancy is something non-hermetic in the build causing |
| 56 | the actions to receive different action keys across the two runs. To find |
| 57 | those actions, do the following: |
| 58 | |
| 59 | a. Re-run the build(s) or test(s) in question to obtain execution logs: |
| 60 | |
| 61 | bazel clean |
Benjamin Peterson | a6efd3b | 2020-07-29 03:44:19 -0700 | [diff] [blame] | 62 | bazel $YOUR_FLAGS build //your:target --execution_log_binary_file=/tmp/exec1.log |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 63 | bazel clean |
Benjamin Peterson | a6efd3b | 2020-07-29 03:44:19 -0700 | [diff] [blame] | 64 | bazel $YOUR_FLAGS build //your:target --execution_log_binary_file=/tmp/exec2.log |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 65 | |
| 66 | b. [Compare the execution logs](#comparing-the-execution-logs) between the |
| 67 | two runs. Ensure that the actions are identical across the two log files. |
| 68 | Discrepancies provide a clue about the changes that occurred between the |
| 69 | runs. Update your build to eliminate those discrepancies. |
| 70 | |
| 71 | If you are able to resolve the caching problems and now the repeated run |
| 72 | produces all cache hits, skip to the next section. |
| 73 | |
| 74 | If your action IDs are identical but there are no cache hits, then something |
| 75 | in your configuration is preventing caching. Continue with this section to |
| 76 | check for common problems. |
| 77 | |
| 78 | 5. Check that all actions in the execution log have `cacheable` set to true. If |
| 79 | `cacheable` does not appear in the execution log for a give action, that |
| 80 | means that the corresponding rule may have a `no-cache` tag in its |
| 81 | definition in the `BUILD` file. Look at the human-readable `progress_message` |
| 82 | field in the execution log to help determine where the action is coming from. |
| 83 | |
| 84 | 6. If the actions are identical and `cacheable` but there are no cache hits, it |
| 85 | is possible that your command line includes `--noremote_accept_cached` which |
| 86 | would disable cache lookups for a build. |
| 87 | |
| 88 | If figuring out the actual command line is difficult, use the canonical |
| 89 | command line from the |
| 90 | [Build Event Protocol](/build-event-protocol.html) |
| 91 | as follows: |
| 92 | |
| 93 | a. Add `--build_event_text_file=/tmp/bep.txt` to your Bazel command to get |
| 94 | the text version of the log. |
| 95 | |
| 96 | b. Open the text version of the log and search for the |
| 97 | `structured_command_line` message with `command_line_label: "canonical"`. |
| 98 | It will list all the options after expansion. |
| 99 | |
| 100 | c. Search for `remote_accept_cached` and check whether it's set to `false`. |
| 101 | |
| 102 | d. If `remote_accept_cached` is `false`, determine where it is being |
| 103 | set to `false`: either at the command line or in a |
| 104 | [bazelrc](/guide.html#where-are-the-bazelrc-files) file. |
| 105 | |
| 106 | ### Ensure caching across machines |
| 107 | |
| 108 | After cache hits are happening as expected on the same machine, run the |
| 109 | same build(s)/test(s) on a different machine. If you suspect that caching is |
| 110 | not happening across machines, do the following: |
| 111 | |
| 112 | 1. Make a small modification to your build to avoid hitting existing caches. |
| 113 | |
| 114 | 2. Run the build on the first machine: |
| 115 | |
| 116 | bazel clean |
Benjamin Peterson | a6efd3b | 2020-07-29 03:44:19 -0700 | [diff] [blame] | 117 | bazel ... build ... --execution_log_binary_file=/tmp/exec1.log |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 118 | |
| 119 | 3. Run the build on the second machine, ensuring the modification from step 1 |
| 120 | is included: |
| 121 | |
| 122 | bazel clean |
Benjamin Peterson | a6efd3b | 2020-07-29 03:44:19 -0700 | [diff] [blame] | 123 | bazel ... build ... --execution_log_binary_file=/tmp/exec2.log |
Googler | 2af641c | 2018-11-26 03:01:41 -0800 | [diff] [blame] | 124 | |
| 125 | 4. [Compare the execution logs](#comparing-the-execution-logs) for the two |
| 126 | runs. If the logs are not identical, investigate your build configurations |
| 127 | for discrepancies as well as properties from the host environment leaking |
| 128 | into either of the builds. |
| 129 | |
| 130 | ## Comparing the execution logs |
| 131 | |
| 132 | Execution logs contain records of all actions executed during the build. For |
| 133 | each action there is a |
| 134 | [SpawnExec](https://github.com/bazelbuild/bazel/blob/42389d9468a954f3793a19f8e026b022b39aefca/src/main/protobuf/spawn.proto#L67) |
| 135 | element containing all of the information from the action key, Thus, if the |
| 136 | logs are identical then so are the action cache keys. |
| 137 | |
| 138 | To compare logs for two builds that are not sharing cache hits as expected, |
| 139 | do the folowing: |
| 140 | |
| 141 | 1. Get the execution logs from each build and store them as `/tmp/exec1.log` and |
| 142 | `/tmp/exec2.log`. |
| 143 | |
| 144 | 2. Download the Bazel source code and navigate to the Bazel folder by using |
| 145 | the command below. You need the source code to parse the |
| 146 | execution logs with the |
| 147 | [execlog parser](https://source.bazel.build/bazel/+/master:src/tools/execlog/). |
| 148 | |
| 149 | git clone https://github.com/bazelbuild/bazel.git |
| 150 | cd bazel |
| 151 | |
| 152 | 3. Use the execution log parser to convert the logs to text. The following |
| 153 | invocation also sorts the actions in the second log to match the action order |
| 154 | in the first log for ease of comparison. |
| 155 | |
| 156 | bazel build src/tools/execlog:parser |
| 157 | bazel-bin/src/tools/execlog/parser \ |
| 158 | --log_path=/tmp/exec1.log \ |
| 159 | --log_path=/tmp/exec2.log \ |
| 160 | --output_path=/tmp/exec1.log.txt \ |
| 161 | --output_path=/tmp/exec2.log.txt |
| 162 | |
| 163 | 4. Use your favourite text differ to diff `/tmp/exec1.log.txt` and |
| 164 | `/tmp/exec2.log.txt`. |