blob: ba92b69578d0d4b40e57c8d29b7f7adf43ec3f39 [file] [log] [blame] [view]
Googler2af641c2018-11-26 03:01:41 -08001---
2layout: documentation
Greg Estrenc0816752020-02-20 13:04:29 -08003title: Debugging remote cache hits for remote execution
Googler2af641c2018-11-26 03:01:41 -08004---
5
Googlera85685a2021-01-08 16:53:49 -08006# Debugging Remote Cache Hits for Remote Execution
Googler2af641c2018-11-26 03:01:41 -08007
8This page describes how to check your cache hit rate and how to investigate
9cache misses in the context of remote execution.
10
11This page assumes that you have a build and/or test that successfully
12utilizes remote execution, and you want to ensure that you are effectively
13utilizing remote cache.
14
Googler2af641c2018-11-26 03:01:41 -080015## Checking your cache hit rate
16
17In the standard output of your Bazel run, look at the `INFO` line that lists
18processes, which roughly correspond to Bazel actions. That line details
19where the action was run. Look for the `remote` label, which indicates an action
20executed remotely, `linux-sandbox` for actions executed in a local sandbox,
21and other values for other execution strategies. An action whose result came
22from a remote cache is displayed as `remote cache hit`.
23
24For example:
25
ecngtngb35944e2021-10-18 02:33:09 -070026 INFO: 11 processes: 6 remote cache hit, 3 internal, 2 remote.
Googler2af641c2018-11-26 03:01:41 -080027
ecngtngb35944e2021-10-18 02:33:09 -070028In this example there were 6 remote cache hits, and 2 actions did not have
29cache hits and were executed remotely. The 3 internal part can be ignored.
30It is typically tiny internal actions, such as creating symbolic links. Local
31cache hits are not included in this summary. If you are getting 0 processes
32(or a number lower than expected), run `bazel clean` followed by your build/test
33command.
Googler2af641c2018-11-26 03:01:41 -080034
35## Troubleshooting cache hits
36
37If you are not getting the cache hit rate you are expecting, do the following:
38
Greg Estrenc0816752020-02-20 13:04:29 -080039### Ensure re-running the same build/test command produces cache hits
Googler2af641c2018-11-26 03:01:41 -080040
411. Run the build(s) and/or test(s) that you expect to populate the cache. The
Googlerf86ad692021-03-08 11:18:47 -080042 first time a new build is run on a particular stack, you can expect no remote
Googler2af641c2018-11-26 03:01:41 -080043 cache hits. As part of remote execution, action results are stored in the
44 cache and a subsequent run should pick them up.
45
462. Run `bazel clean`. This command cleans your local cache, which allows
47 you to investigate remote cache hits without the results being masked by
48 local cache hits.
49
503. Run the build(s) and test(s) that you are investigating again (on the same
51 machine).
52
Googler25a3c7d2021-01-22 18:38:49 +0100534. Check the `INFO` line for cache hit rate. If you see no processes except
ecngtngb35944e2021-10-18 02:33:09 -070054 `remote cache hit` and `internal`, then your cache is being correctly populated and
Googler2af641c2018-11-26 03:01:41 -080055 accessed. In that case, skip to the next section.
56
575. A likely source of discrepancy is something non-hermetic in the build causing
58 the actions to receive different action keys across the two runs. To find
59 those actions, do the following:
60
61 a. Re-run the build(s) or test(s) in question to obtain execution logs:
62
63 bazel clean
ecngtngb35944e2021-10-18 02:33:09 -070064 bazel $YOUR_FLAGS build //your:target --execution_log_json_file=/tmp/exec1.json
Googler2af641c2018-11-26 03:01:41 -080065 bazel clean
ecngtngb35944e2021-10-18 02:33:09 -070066 bazel $YOUR_FLAGS build //your:target --execution_log_json_file=/tmp/exec2.json
Googler2af641c2018-11-26 03:01:41 -080067
68 b. [Compare the execution logs](#comparing-the-execution-logs) between the
69 two runs. Ensure that the actions are identical across the two log files.
70 Discrepancies provide a clue about the changes that occurred between the
71 runs. Update your build to eliminate those discrepancies.
72
73 If you are able to resolve the caching problems and now the repeated run
74 produces all cache hits, skip to the next section.
75
76 If your action IDs are identical but there are no cache hits, then something
77 in your configuration is preventing caching. Continue with this section to
78 check for common problems.
79
805. Check that all actions in the execution log have `cacheable` set to true. If
81 `cacheable` does not appear in the execution log for a give action, that
82 means that the corresponding rule may have a `no-cache` tag in its
83 definition in the `BUILD` file. Look at the human-readable `progress_message`
84 field in the execution log to help determine where the action is coming from.
85
866. If the actions are identical and `cacheable` but there are no cache hits, it
87 is possible that your command line includes `--noremote_accept_cached` which
88 would disable cache lookups for a build.
89
90 If figuring out the actual command line is difficult, use the canonical
91 command line from the
92 [Build Event Protocol](/build-event-protocol.html)
93 as follows:
94
95 a. Add `--build_event_text_file=/tmp/bep.txt` to your Bazel command to get
96 the text version of the log.
97
98 b. Open the text version of the log and search for the
99 `structured_command_line` message with `command_line_label: "canonical"`.
100 It will list all the options after expansion.
101
102 c. Search for `remote_accept_cached` and check whether it's set to `false`.
103
104 d. If `remote_accept_cached` is `false`, determine where it is being
105 set to `false`: either at the command line or in a
106 [bazelrc](/guide.html#where-are-the-bazelrc-files) file.
107
108### Ensure caching across machines
109
110After cache hits are happening as expected on the same machine, run the
111same build(s)/test(s) on a different machine. If you suspect that caching is
112not happening across machines, do the following:
113
1141. Make a small modification to your build to avoid hitting existing caches.
115
1162. Run the build on the first machine:
117
118 bazel clean
ecngtngb35944e2021-10-18 02:33:09 -0700119 bazel ... build ... --execution_log_json_file=/tmp/exec1.json
Googler2af641c2018-11-26 03:01:41 -0800120
1213. Run the build on the second machine, ensuring the modification from step 1
122 is included:
123
124 bazel clean
ecngtngb35944e2021-10-18 02:33:09 -0700125 bazel ... build ... --execution_log_json_file=/tmp/exec2.json
Googler2af641c2018-11-26 03:01:41 -0800126
1274. [Compare the execution logs](#comparing-the-execution-logs) for the two
128 runs. If the logs are not identical, investigate your build configurations
129 for discrepancies as well as properties from the host environment leaking
130 into either of the builds.
131
132## Comparing the execution logs
133
134Execution logs contain records of all actions executed during the build. For
135each action there is a
136[SpawnExec](https://github.com/bazelbuild/bazel/blob/42389d9468a954f3793a19f8e026b022b39aefca/src/main/protobuf/spawn.proto#L67)
137element containing all of the information from the action key, Thus, if the
138logs are identical then so are the action cache keys.
139
140To compare logs for two builds that are not sharing cache hits as expected,
141do the folowing:
142
ecngtngb35944e2021-10-18 02:33:09 -07001431. Get the execution logs from each build and store them as `/tmp/exec1.json` and
144 `/tmp/exec2.json`.
Googler2af641c2018-11-26 03:01:41 -0800145
ecngtngb35944e2021-10-18 02:33:09 -07001462. Use your favourite text differ to diff `/tmp/exec1.json` and
147 `/tmp/exec2.json`.