How to monitor for Bazel regressions

This is a guide for the Bazel build sheriff about monitoring the Bazel CI (Continuous Integration) projects and jobs.

The CI dashboard

URL: http://ci.bazel.io/view/Dashboard/

The dashboard gives a quick overview of the Bazel CI's health.

We monitor:

If Bazel's own jobs are not green, the Bazel team must:

  1. investigate
  2. fix as soon as possible

If the other projects are not green:

  1. report it to the project owners
  2. deactivate the project if it stays broken for more than a week

Triaging failures

The build sheriff should monitor the outputs of these types of jobs:

Global tests

URLs:

When do these jobs run:

  • nightly: runs every night and can be re-run on demand using the Run button in Jenkins (you need to log in on the Jenkins UI)
  • release: runs at every push and is always green for non-release pushes

How to investigate: see the user guide.

When global tests fail badly:

  1. file a bug to bazelbuild/bazel
  2. add the “breakage” label to the bug
  3. add the “release blocker” label if the breakage is on the release job

Benchmark

URL: http://ci.bazel.io/job/benchmark

How to investigate: look at the output logs:

Postsubmits

These are all the other monitored jobs.

To investigate:

How to investigate: see the user guide.

  1. report to the project owner (e.g. Bazel team for “bazel-tests”)
  2. deactivate partially or totally, if a failure stays for too long