UnixFileSystem: read cached hashes from extended attributes
There are certain workloads where Bazel's running time gets dominated by
checksum computation. Examples include:
- People adding local_repository()s to their project that point to
networked file shares.
- The use of repositories that contain very large input files.
When using remote execution, we need to compute digests to be able to
place such files in input roots. In many cases, a centralized CAS will
already contain these files. It would be nice if Bazel could efficiently
check for existence of such objects without needing to scan the file
locally.
This change extends UnixFileSystem to call getxattr() on attribute
~~"user.checksum.${algo}" prior to falling back to reading file contents.~~
~~There is no true standard on how these extended attributes should be~~
~~called, but "user.checksum.${algo}" already has some precedent. It is,~~
~~for example, used by BuildGrid internally:~~
~~https://gitlab.com/BuildGrid/buildbox/buildbox-fuse/-/merge_requests/9~~
**EDIT:** The name of the extended attribute is now configurable.
Using extended attributes to store this information also seems to be a
fairly common approach. Apparently it is also used within Google itself:
https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ
So far no code has been added to let Bazel write these attributes to
disk. The main goal so far is to speed up access to read-only corpora,
where the maintainers have spent the effort adding these attributes.
Closes #11662.
(@janakdr made some modifications from the original pull request, mainly to
deal with merge conflicts and address Google-internal style.)
PiperOrigin-RevId: 332256967
{Fast, Correct} - Choose two
Build and test software of any size, quickly and reliably.
Speed up your builds and tests: Bazel rebuilds only what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.
Follow our tutorials:
See CONTRIBUTING.md