Status: Implemented
Author: Klaus Aehlig
protoc
Bazel depends on a protobuffer compiler to generate code, especially java code, from an abstract description of the protocol buffer; in particular, files generated by protoc
are machine-independent. In fact, Bazel most of the time uses the latest version of protoc
. New versions of protoc
that contain incompatible changes to the programming interface are released frequently.
The current approach to the protoc
dependency is to have checked-in statically-linked executables for all the supported platforms (where some platforms, like FreeBSD, have to use Linux-compatibility features). The full source tree of the protobuf compiler is also part of the repository. However, for generating files, the committed binaries are always used.
The current approach as certain shortcomings.
Having up-to-date binaries for all the supported platforms does not scale well as the number of platforms Bazel should run on is increasing.
The requirement of having a suitable executable in the code base adds additional complexity to the process of bootstrapping a new architecture.
Binaries in the code base do not follow standard open-source principles; in fact, meaningful reviews for changes updating them are hard and in practise often boil down to a question of trust in the person making the change.
Committed binaries make the “source” repository unnecessary big. Currently, a checkout at head contains over 250MB in committed .exe
and .dll
files.
BUILD
to compile protoc
from sourceThis BUILD
file for the third_party/protobuf
is changed in such a way, that the protoc
is compiled from source instead of selecting from the committed pre-built binaries; the pre-built binaries are removed from the source tree. As the protoc
sources are already part of the repository, this is not a huge change; also, as protoc
is written in C++
, no additional dependencies are introduced that way.
Note that then, every user who already has a working (bootstrap) bazel
, can build bazel from source, without depending on committed binaries or having a protoc
already on the machine. The problem of building your first bazel
will be addressed in the next sections.
This change also removes an internal consistency requirement from the code base. It was always assumed that the binaries actually match the accompanying sources.
A new target //:bazel-distfile
will be added. This will be an archive containing
all source files in their respective places, including the files under third_party
, site
, scripts
, etc, as well as
under a subdirectory derived
all the files generated by protoc
that are needed to compile a bootstrap version of bazel
.
For convenience, the derived
subdirectory may also contain other generated architecture-independent files, like an HTML-version of the documentation for local browsing. A corrollary of the archive layout is that by removing the derived
directory a checkout of the upstream sources is obtained.
This new artifact will be built for every release and made available together along with the other release artifacts (like packages, installers, executables). The same means of certifying integrity (like hashes, SSL-certificates) will be used.
The compile.sh
will be modified to first check if a derived
directory exists and if this is the case assume that all the files generated by protoc
are already present there; only if not present, it will try to generate the needed output of protoc
for bootstrapping, assuming that the PROTOC
environment variable points to a good protoc
binary.
So, there will be three ways to build bazel
.
If one has an old bazel
binary already, a new one can be built from a checkout of the source repository. This approach is useful for developpers. It might also be used by users who want to upgrade their old bazel
binary to the next release.
By downloading the distribution artifact, the compile.sh
script can be used to build bazel. Again, no protoc
has to be installed ahead of time. This approach is useful for source distributions, as well as for bringing Bazel to a new platform.
If one already has the correct version of protoc
on the machine, the compile.sh
script can be used by setting the PROTOC
environment variable. This approach is useful for distributions that want to provide snapshots of bazel
inbetween official releases and maintain a protoc
package anyway.
protoc
binary installedThis would be the standard open-source approach of requiring the user to have the required dependencies installed ahead of time. Unfortunately, protoc
contains incompatible changes too frequently, so that this would be an unreasonable burden. Note that the bootstrapping from your own protoc
and a repository checkout is still possible with the suggested approach.
protoc
outputAnother approach would be to make the output of protoc
part of the versioned sources instead of generating them for the distribution file. As with all approaches based on committing generated files, this would introduce another consistency requirement to the repository. In this case, the requirement would be that the generated files be up-to-date with respect to the respective .proto
files. Of course, such a consistency could be verified by an appropriate test. Nevertheless, it seems the cleaner and probably more managable to only version true source files and generate derived files from the respective sources.