site/en/basics/build-systems.md - bazel - Git at Google

 Project: /_project.yaml
 Book: /_book.yaml

 # Why a Build System?

 {% include "_buttons.html" %}

 This page discusses what build systems are, what they do, why you should use a
 build system, and why compilers and build scripts aren't the best choice as your
 organization starts to scale. It's intended for developers who don't have much
 experience with a build system.

 ## What is a build system?

 Fundamentally, all build systems have a straightforward purpose: they transform
 the source code written by engineers into executable binaries that can be read
 by machines. Build systems aren't just for human-authored code; they also allow
 machines to create builds automatically, whether for testing or for releases to
 production. In an organization with thousands of engineers, it's common that
 most builds are triggered automatically rather than directly by engineers.

 ### Can't I just use a compiler?

 The need for a build system might not be immediately obvious. Most engineers
 don't use a build system while learning to code: most start by invoking tools
 like `gcc` or `javac` directly from the command line, or the equivalent in an
 integrated development environment (IDE). As long as all the source code is in
 the same directory, a command like this works fine:

 ```posix-terminal
 javac *.java
 ```

 This instructs the Java compiler to take every Java source file in the current
 directory and turn it into a binary class file. In the simplest case, this is
 all you need.

 However, as soon as code expands, the complications begin. `javac` is smart
 enough to look in subdirectories of the current directory to find code to
 import. But it has no way of finding code stored in _other parts_ of the
 filesystem (perhaps a library shared by several projects). It also only knows
 how to build Java code. Large systems often involve different pieces written in
 a variety of programming languages with webs of dependencies among those pieces,
 meaning no compiler for a single language can possibly build the entire system.

 Once you're dealing with code from multiple languages or multiple compilation
 units, building code is no longer a one-step process. Now you must evaluate what
 your code depends on and build those pieces in the proper order, possibly using
 a different set of tools for each piece. If any dependencies change, you must
 repeat this process to avoid depending on stale binaries. For a codebase of even
 moderate size, this process quickly becomes tedious and error-prone.

 The compiler also doesn’t know anything about how to handle external
 dependencies, such as third-party `JAR` files in Java. Without a build system,
 you could manage this by downloading the dependency from the internet, sticking
 it in a `lib` folder on the hard drive, and configuring the compiler to read
 libraries from that directory. Over time, it's difficult to maintain the
 updates, versions, and source of these external dependencies.

 ### What about shell scripts?

 Suppose that your hobby project starts out simple enough that you can build it
 using just a compiler, but you begin running into some of the problems described
 previously. Maybe you still don’t think you need a build system and can automate
 away the tedious parts using some simple shell scripts that take care of
 building things in the correct order. This helps out for a while, but pretty
 soon you start running into even more problems:

 *   It becomes tedious. As your system grows more complex, you begin spending
     almost as much time working on your build scripts as on real code. Debugging
     shell scripts is painful, with more and more hacks being layered on top of
     one another.

 *   It’s slow. To make sure you weren’t accidentally relying on stale libraries,
     you have your build script build every dependency in order every time you
     run it. You think about adding some logic to detect which parts need to be
     rebuilt, but that sounds awfully complex and error prone for a script. Or
     you think about specifying which parts need to be rebuilt each time, but
     then you’re back to square one.

 *   Good news: it’s time for a release! Better go figure out all the arguments
     you need to pass to the jar command to make your final build. And remember
     how to upload it and push it out to the central repository. And build and
     push the documentation updates, and send out a notification to users. Hmm,
     maybe this calls for another script...

 *   Disaster! Your hard drive crashes, and now you need to recreate your entire
     system. You were smart enough to keep all of your source files in version
     control, but what about those libraries you downloaded? Can you find them
     all again and make sure they were the same version as when you first
     downloaded them? Your scripts probably depended on particular tools being
     installed in particular places—can you restore that same environment so that
     the scripts work again? What about all those environment variables you set a
     long time ago to get the compiler working just right and then forgot about?

 *   Despite the problems, your project is successful enough that you’re able to
     begin hiring more engineers. Now you realize that it doesn’t take a disaster
     for the previous problems to arise—you need to go through the same painful
     bootstrapping process every time a new developer joins your team. And
     despite your best efforts, there are still small differences in each
     person’s system. Frequently, what works on one person’s machine doesn’t work
     on another’s, and each time it takes a few hours of debugging tool paths or
     library versions to figure out where the difference is.

 *   You decide that you need to automate your build system. In theory, this is
     as simple as getting a new computer and setting it up to run your build
     script every night using cron. You still need to go through the painful
     setup process, but now you don’t have the benefit of a human brain being
     able to detect and resolve minor problems. Now, every morning when you get
     in, you see that last night’s build failed because yesterday a developer
     made a change that worked on their system but didn’t work on the automated
     build system. Each time it’s a simple fix, but it happens so often that you
     end up spending a lot of time each day discovering and applying these simple
     fixes.

 *   Builds become slower and slower as the project grows. One day, while waiting
     for a build to complete, you gaze mournfully at the idle desktop of your
     coworker, who is on vacation, and wish there were a way to take advantage of
     all that wasted computational power.

 You’ve run into a classic problem of scale. For a single developer working on at
 most a couple hundred lines of code for at most a week or two (which might have
 been the entire experience thus far of a junior developer who just graduated
 university), a compiler is all you need. Scripts can maybe take you a little bit
 farther. But as soon as you need to coordinate across multiple developers and
 their machines, even a perfect build script isn’t enough because it becomes very
 difficult to account for the minor differences in those machines. At this point,
 this simple approach breaks down and it’s time to invest in a real build system.
	Project: /_project.yaml
	Book: /_book.yaml

	# Why a Build System?

	{% include "_buttons.html" %}

	This page discusses what build systems are, what they do, why you should use a
	build system, and why compilers and build scripts aren't the best choice as your
	organization starts to scale. It's intended for developers who don't have much
	experience with a build system.

	## What is a build system?

	Fundamentally, all build systems have a straightforward purpose: they transform
	the source code written by engineers into executable binaries that can be read
	by machines. Build systems aren't just for human-authored code; they also allow
	machines to create builds automatically, whether for testing or for releases to
	production. In an organization with thousands of engineers, it's common that
	most builds are triggered automatically rather than directly by engineers.

	### Can't I just use a compiler?

	The need for a build system might not be immediately obvious. Most engineers
	don't use a build system while learning to code: most start by invoking tools
	like `gcc` or `javac` directly from the command line, or the equivalent in an
	integrated development environment (IDE). As long as all the source code is in
	the same directory, a command like this works fine:

	```posix-terminal
	javac *.java
	```

	This instructs the Java compiler to take every Java source file in the current
	directory and turn it into a binary class file. In the simplest case, this is
	all you need.

	However, as soon as code expands, the complications begin. `javac` is smart
	enough to look in subdirectories of the current directory to find code to
	import. But it has no way of finding code stored in _other parts_ of the
	filesystem (perhaps a library shared by several projects). It also only knows
	how to build Java code. Large systems often involve different pieces written in
	a variety of programming languages with webs of dependencies among those pieces,
	meaning no compiler for a single language can possibly build the entire system.

	Once you're dealing with code from multiple languages or multiple compilation
	units, building code is no longer a one-step process. Now you must evaluate what
	your code depends on and build those pieces in the proper order, possibly using
	a different set of tools for each piece. If any dependencies change, you must
	repeat this process to avoid depending on stale binaries. For a codebase of even
	moderate size, this process quickly becomes tedious and error-prone.

	The compiler also doesn’t know anything about how to handle external
	dependencies, such as third-party `JAR` files in Java. Without a build system,
	you could manage this by downloading the dependency from the internet, sticking
	it in a `lib` folder on the hard drive, and configuring the compiler to read
	libraries from that directory. Over time, it's difficult to maintain the
	updates, versions, and source of these external dependencies.

	### What about shell scripts?

	Suppose that your hobby project starts out simple enough that you can build it
	using just a compiler, but you begin running into some of the problems described
	previously. Maybe you still don’t think you need a build system and can automate
	away the tedious parts using some simple shell scripts that take care of
	building things in the correct order. This helps out for a while, but pretty
	soon you start running into even more problems:

	* It becomes tedious. As your system grows more complex, you begin spending
	almost as much time working on your build scripts as on real code. Debugging
	shell scripts is painful, with more and more hacks being layered on top of
	one another.

	* It’s slow. To make sure you weren’t accidentally relying on stale libraries,
	you have your build script build every dependency in order every time you
	run it. You think about adding some logic to detect which parts need to be
	rebuilt, but that sounds awfully complex and error prone for a script. Or
	you think about specifying which parts need to be rebuilt each time, but
	then you’re back to square one.

	* Good news: it’s time for a release! Better go figure out all the arguments
	you need to pass to the jar command to make your final build. And remember
	how to upload it and push it out to the central repository. And build and
	push the documentation updates, and send out a notification to users. Hmm,
	maybe this calls for another script...

	* Disaster! Your hard drive crashes, and now you need to recreate your entire
	system. You were smart enough to keep all of your source files in version
	control, but what about those libraries you downloaded? Can you find them
	all again and make sure they were the same version as when you first
	downloaded them? Your scripts probably depended on particular tools being
	installed in particular places—can you restore that same environment so that
	the scripts work again? What about all those environment variables you set a
	long time ago to get the compiler working just right and then forgot about?

	* Despite the problems, your project is successful enough that you’re able to
	begin hiring more engineers. Now you realize that it doesn’t take a disaster
	for the previous problems to arise—you need to go through the same painful
	bootstrapping process every time a new developer joins your team. And
	despite your best efforts, there are still small differences in each
	person’s system. Frequently, what works on one person’s machine doesn’t work
	on another’s, and each time it takes a few hours of debugging tool paths or
	library versions to figure out where the difference is.

	* You decide that you need to automate your build system. In theory, this is
	as simple as getting a new computer and setting it up to run your build
	script every night using cron. You still need to go through the painful
	setup process, but now you don’t have the benefit of a human brain being
	able to detect and resolve minor problems. Now, every morning when you get
	in, you see that last night’s build failed because yesterday a developer
	made a change that worked on their system but didn’t work on the automated
	build system. Each time it’s a simple fix, but it happens so often that you
	end up spending a lot of time each day discovering and applying these simple
	fixes.

	* Builds become slower and slower as the project grows. One day, while waiting
	for a build to complete, you gaze mournfully at the idle desktop of your
	coworker, who is on vacation, and wish there were a way to take advantage of
	all that wasted computational power.

	You’ve run into a classic problem of scale. For a single developer working on at
	most a couple hundred lines of code for at most a week or two (which might have
	been the entire experience thus far of a junior developer who just graduated
	university), a compiler is all you need. Scripts can maybe take you a little bit
	farther. But as soon as you need to coordinate across multiple developers and
	their machines, even a perfect build script isn’t enough because it becomes very
	difficult to account for the minor differences in those machines. At this point,
	this simple approach breaks down and it’s time to invest in a real build system.