SRE methodology skills for AI agents. Each skill packages one reliability workflow (investigating a live incident, handing over oncall, writing a postmortem) as a self-contained module your agent loads and runs.
Built and maintained by Anyshift.
Your agent already writes code and runs commands. It does not know how a seasoned SRE actually works an incident: which signals to correlate first, when a deploy is the prime suspect, when to stop digging and page a human. These skills encode that methodology so the agent follows a real playbook instead of improvising.
Every skill runs end-to-end with no Anyshift account and no external credentials. The methodology, the worked examples, and the replay tests all work offline against fixtures.
Each skill targets one real product and one job over it: audit an IAM policy, triage a Terraform plan, resolve an S3 bucket's effective access. It does that job end-to-end, offline, against fixtures. Not a wrapper that dumps the API response back: each one carries the judgment a senior engineer applies to that one source, the thresholds and known-bad combinations that separate signal from a clean-looking config.
Then it stops. A single source only knows itself. The moment a question needs a join (this role to everything it can actually reach, this queue to its producers and consumers, this plan to the running infrastructure it will move) the data runs out. Each skill names exactly where that happens and what's missing, so the boundary is explicit instead of a silent wrong answer. That boundary is the same one every time: the join across resources, across sources, or across time.
| Skill | Domain | What it does |
|---|---|---|
sqs-queue-auditor |
AWS | Audits redrive/DLQ wiring, maxReceiveCount, retention ordering against the DLQ, and a visibility timeout left at the risky default: the queue-side config that silently drops or re-delivers messages while every attribute reads as fine. |
iam-deceptive-escalation-auditor |
AWS | Resolves the effective permission set across every policy on a principal (Allow minus blanket Deny), flags the cross-statement escalation combos (PassRole+compute-launch, policy-rewrite-in-place, trust-policy rewrite) that no single statement reveals, and stays symmetric: it does not flag an escalation already killed by a Deny, a scoped wildcard, or a sealed Condition. |
sg-deceptive-reachability-auditor |
AWS | Builds a directed reachability graph from SG-to-SG references plus internet edges, composes the transitive closure from a named entry point, and reports the shortest path to the crown-jewel tier and the bridging hub SGs that a per-rule read misses, reporting clean when a segmented fleet has no reachable path. |
s3-estate-calibration-auditor |
AWS | Resolves each bucket's effective verdict by composing all four layers (Block Public Access, bucket policy, ACL, access points), then calibrates across an estate: it names the one bucket that is genuinely public or cross-account exposed without over-flagging the many siblings that read as exposed but are neutralised, and reports clean when nothing is live. |
terraform-plan-risk-reporter |
IaC | Ranks plan changes by blast risk, isolating destroys and force-replacements of stateful or irreplaceable resources from the harmless in-place updates they hide among. |
github-actions-flake-reporter |
CI/CD | Detects flaky jobs (pass-on-rerun on an unchanged SHA), clusters failures by cause, and flags duration regressions across run history, not just the last red run. |
sqs-queue-auditor, iam-deceptive-escalation-auditor, sg-deceptive-reachability-auditor, and s3-estate-calibration-auditor are built out, each with fixture-based replay tests and a committed control-vs-treatment lift eval; the rest are planned. kubectl-investigator stays as the methodology-shaped reference template: it shows the directory shape, the worked-example format, and the fixture-based replay tests every skill above follows.
These skills ship as a plugin in Anyshift's Claude Code marketplace. In a Claude Code session:
/plugin marketplace add anyshift-io/claude-plugins
/plugin install sre-skills@anyshift
The skills are now loaded. The agent reaches for the right one whenever you ask something that maps to an incident, a change review, an oncall handover, a postmortem, or a reliability audit. Pull new skills and versions later with /plugin marketplace update anyshift.
Clone the repo and point your agent at the skill you want. Each skill directory is self-contained: the methodology, the worked examples, and the fixture-based replay tests live together, so you can run the skill against the fixtures before pointing it at your own infrastructure.
git clone https://github.com/anyshift-io/sre-skills.gitEvery skill also documents its failure modes: where it is likely to be wrong, and where the agent should escalate to a human instead of acting.
The skills work standalone. Two optional layers add infrastructure context:
- Anyshift MCP as a context primer. Skills can opt into richer context from the Anyshift MCP server. When the integration is wired up for a skill, that skill publishes a measured "with vs without" delta, so the added value is explicit rather than assumed.
- Annie, pre-loaded. Running Anyshift's Annie agent gives you these skills already loaded, with your Terraform state, cloud inventory (AWS / GCP / Azure), and recent deploys wired in.
- Two worked examples drawn from real incidents or canonical scenarios.
- Fixture-based replay tests that run without external credentials.
- An explicit failure-modes section: where the skill is wrong, where the agent should escalate to a human.
For a curated index of SRE skills (ours and others), MCP servers, and reading, see anyshift-io/awesome-sre-skills.
Contributions to the vendor-neutral skills are welcome. See CONTRIBUTING.md.