Address TarFileWriter.add_tar performance issue

Addresses TarFileWriter.add_tar performance issue by maintaining seek position during traverse.

Based on looking at the implementation of python's tarfile.py, calling `tarfile.extractfile(<string>)` appears to force an upfront load of all the "members" in the tarfile, and subsequent lookups by string do a sequential scan across the member list. If a `tarinfo` is passed, the initial scan isn't necessary, and `extractfile` is able to figure out where it should start looking based on offset info contained in the passed `tarinfo`.

I discovered this while trying to use `container_layer` from https://github.com/bazelbuild/rules_docker. When trying to create a layer from a large (~1.2GB, 100k files) tarball (Yes, I know, it's ridiculous :-P), I measured in  ~150ms in overhead _per file_ in the input tar, and the overhead seemed to scale with the size of the tarball (i.e., creating layers from much smaller tarballs showed very little overhead per-file). This change allows the layer to be built successfully in 120s _total_.

edit: typos/clarity

Closes #6300.

PiperOrigin-RevId: 216712964
1 file changed
tree: 69afb22310eb8567f89cc92c0441962d21bff423
  1. .bazelci/
  2. examples/
  3. scripts/
  4. site/
  5. src/
  6. third_party/
  7. tools/
  8. .gitattributes
  9. .gitignore
  10. AUTHORS
  11. BUILD
  12. CHANGELOG.md
  13. CODEOWNERS
  14. combine_distfiles.py
  15. combine_distfiles_to_tar.sh
  16. compile.sh
  17. CONTRIBUTING.md
  18. CONTRIBUTORS
  19. distdir.bzl
  20. ISSUE_TEMPLATE.md
  21. LICENSE
  22. README.md
  23. WORKSPACE
README.md

Bazel

{Fast, Correct} - Choose two

Build and test software of any size, quickly and reliably.

  • Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.

  • One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.

  • Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.

  • Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.

Getting Started

Documentation

Contributing to Bazel

See CONTRIBUTING.md

Build status

Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.