commit | 6b4a261df32d12b265fe0b47250cf4cb2d20a658 | [log] [tgz] |
---|---|---|
author | Robert Gay <robert.gay@redfin.com> | Thu Oct 11 09:55:29 2018 -0700 |
committer | Copybara-Service <copybara-piper@google.com> | Thu Oct 11 09:57:09 2018 -0700 |
tree | 69afb22310eb8567f89cc92c0441962d21bff423 | |
parent | 88a5891c9d5533192d23ec14437461d092a56349 [diff] |
Address TarFileWriter.add_tar performance issue Addresses TarFileWriter.add_tar performance issue by maintaining seek position during traverse. Based on looking at the implementation of python's tarfile.py, calling `tarfile.extractfile(<string>)` appears to force an upfront load of all the "members" in the tarfile, and subsequent lookups by string do a sequential scan across the member list. If a `tarinfo` is passed, the initial scan isn't necessary, and `extractfile` is able to figure out where it should start looking based on offset info contained in the passed `tarinfo`. I discovered this while trying to use `container_layer` from https://github.com/bazelbuild/rules_docker. When trying to create a layer from a large (~1.2GB, 100k files) tarball (Yes, I know, it's ridiculous :-P), I measured in ~150ms in overhead _per file_ in the input tar, and the overhead seemed to scale with the size of the tarball (i.e., creating layers from much smaller tarballs showed very little overhead per-file). This change allows the layer to be built successfully in 120s _total_. edit: typos/clarity Closes #6300. PiperOrigin-RevId: 216712964
{Fast, Correct} - Choose two
Build and test software of any size, quickly and reliably.
Speed up your builds and tests: Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.
Follow our tutorials:
See CONTRIBUTING.md
Bazel is released in ‘Beta’. See the product roadmap to learn about the path toward a stable 1.0 release.