How to monitor for Bazel regressions

This is a guide for the Bazel build sheriff about monitoring the Bazel CI (Continuous Integration) projects and jobs.

The CI dashboard

URL: https://ci.bazel.build/view/Dashboard/

The dashboard gives a quick overview of the Bazel CI's health.

We monitor:

If Bazel's own jobs are not green, the Bazel team must:

  1. investigate
  2. fix as soon as possible

If the other projects are not green:

  1. report it to the project owners
  2. deactivate the project if it stays broken for more than a week

Triaging failures

The build sheriff should monitor the outputs of these types of jobs:

Global tests

URLs:

When do these jobs run:

  • nightly: runs every night and can be re-run on demand using the Run button in Jenkins (you need to log in on the Jenkins UI)
  • release: runs at every push and is always green for non-release pushes

How to investigate: see the user guide.

When global tests fail badly:

  1. file a bug to bazelbuild/bazel
  2. add the “breakage” label to the bug
  3. add the “release blocker” label if the breakage is on the release job

Postsubmits

These are all the other monitored jobs.

To investigate:

How to investigate: see the user guide.

  1. report to the project owner (e.g. Bazel team for “bazel-tests”)
  2. deactivate partially or totally, if a failure stays for too long