Reimplement AsynchronousFileOutputStream to use a writer thread

Background: the original code is implementing the OutputStream interface. Given a sequence of write calls, the code puts each of these writes into a queue. On the other side of the queue is an unbounded thread pool, which takes the writes off the queue one by one, and then does individual blocking writes with a fixed file offset.

There are three problems with the original code:
1. Writes are sent to the Kernel one-by-one. Imagine if the incoming writes a single-byte writes, then we do one kernel call for every single byte. This is the worst case.

2. Due to multithreading, the writes can get reordered. In the worst case, the order is reversed, and the kernel flushes each write to disk individually. Since the writes are not aligned to disk block boundaries, each write has to first *read* the block from disk, overwrite a few bytes, and then flush the block back to disk. On a spinning platter, this is the worst possible sequence of operations: write a block, read the same block back, write the same block again, read the same block back, etc., with each operation having to wait for a full disk rotation.

Note that this gets worse if there's high thread and / or disk contention, e.g., when running a build system in the background.

3. Due to the unbounded thread pool, it may end up creating a new thread for every single write (possibly as many as one per byte written). This is also the worst case, although it's probably negligible compared to 1+2.

Compared to that, this change uses an in-memory buffer before sending writes to the kernel so non-block sized writes are batched, it writes sequentially, and it uses a single thread created at the start. A single thread should be more than able to fully saturate local disk I/O, so multi-threading should ~never be a improvement here.

This might look different if we had perfectly aligned writes of an integer multiple of the disk block size to a distributed network file system or a local SSD raid. If you look at the clients of this class, that's definitely not the case: this code is primarily used for local file BEP transports - we wouldn't use local file BEP transports to write to a network file system, we'd use the BES instead. It's also used to write a couple of log files, also all local - otherwise we'd add the data to BEP. These are all unaligned, ~random-sized writes.

If we created a lot of these files, then using a thread pool with fewer threads than files and using non-blocking I/O might be an improvement due to the reduction in thread count, but I think it's very unlikely that we'll ever need that complexity.

PiperOrigin-RevId: 200694423
2 files changed
tree: dff3ce3d36308110c2fcad3911f32d6aee07c389
  1. .bazelci/
  2. examples/
  3. scripts/
  4. site/
  5. src/
  6. third_party/
  7. tools/
  8. .gitattributes
  9. .gitignore
  10. AUTHORS
  11. BUILD
  12. CHANGELOG.md
  13. combine_distfiles.py
  14. combine_distfiles_to_tar.sh
  15. compile.sh
  16. CONTRIBUTING.md
  17. CONTRIBUTORS
  18. distdir.bzl
  19. ISSUE_TEMPLATE.md
  20. LICENSE
  21. README.md
  22. WORKSPACE
README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

  • Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.

  • One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.

  • Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.

  • Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Contributing to Bazel

See CONTRIBUTING.md

Build status

Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.