blob: e292821148e34d022a7b9858e1432bb2b35ba532 [file] [log] [blame] [view]
# Bazel CI Metrics Service
## SQL Setup
Run the following commands to initialize an empty database `metrics`:
NOTE: Double check that the following commands match the output of `git grep "CREATE TABLE"`.
```sql
USE metrics;
CREATE TABLE aggregated_pipeline_performance (org VARCHAR(255), pipeline VARCHAR(255), build INT, scheduled DATETIME, total_time_seconds FLOAT, skipped_tasks VARCHAR(255), result VARCHAR(16), PRIMARY KEY(org, pipeline, build));
CREATE TABLE build_success (org VARCHAR(255), pipeline VARCHAR(255), build INT, linux VARCHAR(255), macos VARCHAR(255), windows VARCHAR(255), rbe VARCHAR(255), PRIMARY KEY(org, pipeline, build));
CREATE TABLE builds_per_change (org VARCHAR(255), pipeline VARCHAR(255), changelist INT, builds INT, PRIMARY KEY(org, pipeline, changelist));
CREATE TABLE cloud_build_status (timestamp DATETIME, build VARCHAR(255), source VARCHAR(255), success BOOL, PRIMARY KEY(timestamp, build));
CREATE TABLE critical_path (org VARCHAR(255), pipeline VARCHAR(255), build INT, wait_time_seconds FLOAT, run_time_seconds FLOAT, longest_task_name VARCHAR(255), longest_task_time_seconds FLOAT, result VARCHAR(255), PRIMARY KEY(org, pipeline, build));
CREATE TABLE flakiness (org VARCHAR(255), pipeline VARCHAR(255), build INT, target VARCHAR(255), passed_count INT, failed_count INT, PRIMARY KEY(org, pipeline, build, target));
CREATE TABLE mac_performance (org VARCHAR(255), pipeline VARCHAR(255), build INT, shard INT, wait_time_seconds FLOAT, run_time_seconds FLOAT, skipped BOOL, PRIMARY KEY(org, pipeline, build, shard));
CREATE TABLE pipeline_performance (org VARCHAR(255), pipeline VARCHAR(255), build INT, job VARCHAR(255), creation_time DATETIME, wait_time_seconds FLOAT, run_time_seconds FLOAT, skipped_tasks VARCHAR(255), PRIMARY KEY(org, pipeline, build, job));
CREATE TABLE platform_load (timestamp DATETIME, org VARCHAR(255), platform VARCHAR(255), waiting_jobs INT, running_jobs INT, PRIMARY KEY(org, timestamp, platform));
CREATE TABLE platform_significance (org VARCHAR(255), pipeline VARCHAR(255), total_builds INT, passing_builds INT, canceled_builds INT, setup_failed INT, linux_failures INT, macos_failures INT, windows_failures INT, rbe_failures INT, multi_platform_failures INT, PRIMARY KEY(org, pipeline));
CREATE TABLE platform_usage (org VARCHAR(255), pipeline VARCHAR(255), build INT, platform VARCHAR(255), usage_seconds FLOAT, PRIMARY KEY(org, pipeline, build, platform));
CREATE TABLE release_downloads (release_name VARCHAR(255), artifact VARCHAR(255), downloads INT, PRIMARY KEY(release_name, artifact));
CREATE TABLE worker_availability (timestamp DATETIME, org VARCHAR(255), platform VARCHAR(255), idle_count INT, busy_count INT, PRIMARY KEY(timestamp, org, platform));
CREATE TABLE zombie_instances (cloud_project VARCHAR(255), zone VARCHAR(255), instance VARCHAR(255), status VARCHAR(255), seconds_online FLOAT, timestamp DATETIME, PRIMARY KEY(cloud_project, zone, instance));
```
## PubSub Setup for Cloud Build Status
The `cloud_build_status` metric requires a PubSub subscription to the `cloud-builds` topic in the `bazel-public` project.
Moreover, the service account needs to have `Pub/Sub Subscriber` permissions in the `bazel-public` project.
Run the following commands to see if there is already a subscription:
- `gcloud config set project bazel-public`
- `gcloud pubsub subscriptions list | grep build-status`
The output should contain `projects/bazel-public/subscriptions/build-status`. If that's not the case, please run
- `gcloud pubsub subscriptions create build-status --topic cloud-builds`
## Service Deployment
Make sure you have access to the `staging.bazel-untrusted.appspot.com` GCS bucket, then run:
- `gcloud app deploy metrics/app.yaml --stop-previous-version`
- `gcloud app logs tail -s default`
## Running the service locally
The following steps allow you to run the service locally:
1. Ask an EngProd team member for access to a GCP service account.
2. Download the credentials for the service account (json file).
3. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point at the credentials: `export GOOGLE_APPLICATION_CREDENTIALS="path/to/file.json"`
4. Download the [Cloud SQL-Proxy](https://cloud.google.com/sql/docs/mysql/sql-proxy).
5. Start the proxy via `./cloud_sql_proxy -instances="bazel-untrusted:europe-west1:metrics"=tcp:3306`
6. Run the app via `go run metrics/main.go metrics/settings.go --test=true`. The `test`parameter means that all metrics are collected immediately, and all results are published to stdout instead of being written to Cloud SQL.
## Access via Cloud Shell
Open Cloud Shell for the `bazel-untrusted` project, then run these commands:
- `gcloud beta auth login`
- `gcloud beta sql connect metrics --user=root --quiet`
## Test Coverage
TODO(fweikert): Actually implement unit tests.
```bash
go test metrics/clients/buildkite_test.go metrics/clients/buildkite.go metrics/clients/buildkite_api.go
```
# TODOs
- Implement unit tests.
- All metrics should export typed `DataSet` implementations instead of `LegacyDataSet` objects, similar to cloud_build_status.
- There should be a graph of metrics (not just a list) in order to show dependencies between metrics.