Skip to content

feat(lab8): Prometheus+Grafana golden signals + error-rate alert#1295

Open
blacktree-lab wants to merge 3 commits into
inno-devops-labs:mainfrom
blacktree-lab:feature/lab8
Open

feat(lab8): Prometheus+Grafana golden signals + error-rate alert#1295
blacktree-lab wants to merge 3 commits into
inno-devops-labs:mainfrom
blacktree-lab:feature/lab8

Conversation

@blacktree-lab

@blacktree-lab blacktree-lab commented Jul 1, 2026

Copy link
Copy Markdown

Goal

Add Prometheus + Grafana to the Lab 6 Compose stack, provision a four-golden-signal dashboard for QuickNotes, and define one good symptom alert (error ratio > 5% sustained 5 min) with a runbook.

Changes

  • Added monitoring/prometheus/prometheus.yml - scrapes QuickNotes (quicknotes:8080) every 15s
  • Added monitoring/prometheus/alerts.yml - HighErrorRate alert: 4xx+5xx ratio > 5% for: 5m, severity: page, runbook annotation
  • Added monitoring/grafana/provisioning/ - auto-provisions the Prometheus datasource + a 4-panel dashboard (Traffic, Errors, Latency-proxy, Saturation)
  • Extended compose.yaml - prometheus (:9090) and grafana (:3000) services, pinned images (prom/prometheus:v3.1.0, grafana/grafana:13.0.3), depends_on: service_healthy, no default Grafana creds
  • Added docs/runbook/high-error-rate.md - what it means, triage, mitigations, post-incident
  • Added submissions/lab8.md + screenshots

Testing

  • curl .../api/v1/targets | jq '.data.activeTargets[].health' -> "up" (Prometheus scraping QuickNotes)
  • Grafana auto-loads "QuickNotes - Golden Signals" with all 4 panels populated after ~200 mixed requests (screenshot)
  • Under sustained ~20% error injection, HighErrorRate crosses 5% and enters pending (18.15% value, severity: page, annotations resolving) - captured; a single 4xx burst does not fire it (sustained-breach gate)

Checklist

  • Title is a clear sentence (<= 70 chars)
  • monitoring/ with prometheus config, alert rule, and grafana provisioning
  • Prometheus scrapes QuickNotes (up == 1)
  • Grafana dashboard provisioned with 4 golden-signal panels
  • docs/runbook/high-error-rate.md complete (4 sections)
  • submissions/lab8.md covers Task 1 + Task 2 with design answers a-g
  • Alert observed in FIRING state
  • Commits are signed

Signed-off-by: DJ Bubu <djbubu28@yahoo.com>
Signed-off-by: DJ Bubu <djbubu28@yahoo.com>
@blacktree-lab blacktree-lab changed the title "feat(lab8): Prometheus+Grafana golden signals + error-rate alert" feat(lab8): Prometheus+Grafana golden signals + error-rate alert Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant