Bazel CI Playbook

Status: Work in progress

This guide describes several maintenance workflows that have to be executed frequently.

Deploying new CI worker images

Our Linux and Windows CI workers run on GCE instances. The basic update process consists of the following two steps:

Run create_images to create new VM images. This step starts a temporary VM, configures it as a CI worker, and then saves its image in GCE before destroying the temporary VM. This step does not affect running builds.
Run create_instances to deploy instances with the new VM images. This step deletes the existing instances, then reads the configuration file to determine how many instances are needed, and finally creates new instances with the new images. As a result, any running builds will be interrupted.

Note: Many changes to the Linux workers don't require these two steps since we run Docker containers on Linux. See below for a description on how to create and deploy new Docker images.

Prerequesites

All steps require git and the Google Cloud SDK to be installed on your machine.

Windows

You need a machine with a recent version of MacOS and Microsoft Remote Desktop (10) installed.

First, create new images.
1. Clone the continuous-integration repository.
2. cd into the continuous-integration/buildkite directory.
3. Create new images by running create_images.py <platform1> <platform2> <...>. For Windows, this usually means to include bk-windows and bk-trusted-windows, whereas the windows-playground platform is optional. Hint: You can see a list of available platforms by running the script without any arguments.
4. The script opens Microsoft Remote Desktop to establish a connection to the VM that is used for building the image. Accept any popups and log into the machine by pasting the password into the password field (the script already copied into the clipboard).
5. Run the setup script by executing \setup.ps1.
6. Wait until the script has finished. At one point the VM will be rebooted, so the script has to open the remote connection again. The whole process can take up to 30 minutes.
7. Login into the Google Cloud Console and check that the created images are no longer busy. Make sure to select the project that matches the image (e.g. bazel-public for trusted images, bazel-untrusted for “normal” images).
8. If something fails, you can always run create_images again.
Deploy CI workers with the newly created image by running create_instances.py --local_config <instance_group1> <instance_group2> <...>. The available instances group names can be found in the configuration file. Moreover, you can run the script without any arguments to get a list of available instance groups or check the configuration file. For Windows you would usually pass bk-windows bk-trusted-windows to the script.

Linux

Most changes can be rolled out by creating and deploying new Docker images. This step requires that Docker is installed and set up, and you need permissions to access the container registry in our GCP project.

Clear your local Docker cache via docker builder prune -a -f.
Clone the continuous-integration repository.
cd into the continuous-integration/buildkite/docker directory.
Run build.sh.

If you need to create and deploy new VM images, you can follow these steps:

Clone the continuous-integration repository.
cd into the continuous-integration/buildkite directory.
Create new images by running python3.6 create_images.py <platform1> <platform2> <...>. For Linux, this usually means to include bk-docker and bk-trusted-docker. Hint: You can see a list of available platforms by running the script without any arguments.
Deploy CI workers with the newly created image by running python3.6 create_instances.py --local_config <instance_group1> <instance_group2> <...>. The available instances group names can be found in the configuration file. For Linux you would usually pass bk-docker bk-trusted-docker to the script.

MacOS

We are operating a number of physical Mac machines in our office. Please see go/bazel-ci-playbook if you're in the Google network.

Deploying a new Bazelisk version

Create a new Bazelisk release. This step has to be done on a Mac machine (due to cross-compilation problems), and requires permissions to create a release.
To deploy this release on MacOS:
1. Update the Bazelisk Homebrew formula.
2. SSH into the machines and update them via Homebrew (see internal instructions for more details).
To deploy this release on Linux:
1. Update the Dockerfile.
2. Follow the instructions here to deploy new Docker images.
To deploy this release on Windows:
1. Create and deploy new VM images by following the instructions. There is no need to update any files manually since the setup script always fetches the latest version of Bazelisk