[7.1.0] Attempt to fix cancellation crash in repo fetching w/ worker thread (#21599)

The stacktrace in https://github.com/bazelbuild/bazel/issues/21478
suggests that the second `workerFuture.get()` (line 183 before) is
snagging on a `CancellationException`. Closer inspection indicates that
the exception handling in this entire block of code is just faulty --
one, `workerFuture.get()` on line 169 is very unlikely to throw an
`InterruptedException` because this call happens after we've received a
`DONE` from the signal queue, which is at the very end of the worker
thread logic (in its own `finally` clause, actually); two, the second
call to `workerFuture.get()` on line 183 doesn't actually do anything
because `get()`-ing a cancelled future would just throw a
`CancellationException` immediately.

This CL attempts to fix these two glaring errors. It now tries to handle
interrupts where it's likely to happen, which is at the call to
`state.signalQueue.take()` -- this is where the Skyframe thread spends
the most time blocked, and where a Ctrl-C from the user is most likely
to land. We catch an `InterruptedException` here and interrupt the
worker thread. To wait for the worker thread to finish, we
uninterruptibly take from the signal queue instead of calling
`workerFuture.get()`.

Additionally, we now correctly handle the worker thread being
interrupted by someone other the host Skyframe thread (the memory
pressure handler, in all likelihood), by simply retrying the fetch
instead of crashing Bazel.

Fixes https://github.com/bazelbuild/bazel/issues/21478 (maybe...?)

Commit
https://github.com/bazelbuild/bazel/commit/fd769f0211b7039ee7ccafe852dd1f8da828c8d6

PiperOrigin-RevId: 613348046
Change-Id: I692fa750cb8873f1bd403f16764d1845410a29f1

Co-authored-by: Googler <wyv@google.com>
1 file changed
tree: fe956cf1513c459889be52df489bd5aca1243ea9
  1. .bazelci/
  2. .github/
  3. examples/
  4. scripts/
  5. site/
  6. src/
  7. third_party/
  8. tools/
  9. .bazelrc
  10. .bazelversion
  11. .gitattributes
  12. .gitignore
  13. AUTHORS
  14. bazel_downloader.cfg
  15. BUILD
  16. CHANGELOG.md
  17. CODE_OF_CONDUCT.md
  18. CODEOWNERS
  19. combine_distfiles.py
  20. combine_distfiles_to_tar.sh
  21. compile.sh
  22. CONTRIBUTING.md
  23. CONTRIBUTORS
  24. distdir.bzl
  25. distdir_deps.bzl
  26. extensions.bzl
  27. LICENSE
  28. maven_install.json
  29. MODULE.bazel
  30. MODULE.bazel.lock
  31. rbe_extension.bzl
  32. README.md
  33. repositories.bzl
  34. requirements.txt
  35. SECURITY.md
  36. WORKSPACE
  37. WORKSPACE.bzlmod
  38. workspace_deps.bzl
README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

  • Speed up your builds and tests: Bazel rebuilds only what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.

  • One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.

  • Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.

  • Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Reporting a Vulnerability

To report a security issue, please email security@bazel.build with a description of the issue, the steps you took to create the issue, affected versions, and, if known, mitigations for the issue. Our vulnerability management team will respond within 3 working days of your email. If the issue is confirmed as a vulnerability, we will open a Security Advisory. This project follows a 90 day disclosure timeline.

Contributing to Bazel

See CONTRIBUTING.md

Build status