The test-gaming detector for the agentic era.
Pragma reads the tests your AI assistant just wrote and refuses the ones that
pass without verifying anything: assert True, mocking the function under test,
a test name that promises an error path with a body that asserts nothing.
A Claude Code plugin and a small CLI, with three tiers of defense.
Ask an AI assistant to make the tests pass and it will. Sometimes by writing
real code. Sometimes by writing assert True, mocking the function under test,
or redefining the production class right there in the test file. Coverage is
green, CI is green, nothing is verified.
Static rules alone become whack-a-mole — a new evasion for every new rule. Pragma layers three tiers, each catching what the previous one misses:
- Tier 1 — AST classifier. Fast, deterministic, local. Parses the test
(never runs it) and flags known gaming shapes: tautological asserts, mocking
the symbol under test,
pytest.skipsmuggled into a body, name/body mismatches, and more. Always on. - Tier 2 — coverage-of-target gate. Runs the test under coverage
instrumentation and asks: did the production code's lines actually execute?
If not, the test isn't testing the target. Because this executes the test
file under audit, it is opt-in:
--with-coverageon the CLI, orPRAGMA_COVERAGE=1in the plugin hook. The subprocess runs with a secret-scrubbed allowlist environment (see Security). - Tier 3 — LLM judge. A model reads the test alongside the production source
and decides whether the test verifies behavior or just asserts on its own
mocks. DeepSeek by default; any OpenAI-compatible endpoint works. Opt-in:
--with-llmon the CLI, orPRAGMA_HOOK_WITH_LLM=1in the hook. Configured viaPRAGMA_LLM_API_KEY,PRAGMA_LLM_BASE_URL,PRAGMA_LLM_MODEL, andPRAGMA_LLM_TIMEOUT. This is the only tier that touches the network.
Pragma classifies Python (pytest), Vitest, and Jest test files. The expected
outcome (success / reject) is inferred from the test name and the production
target from the imports — zero config to start. See
docs/rules.md for every verdict.
Requires Python 3.11+.
From source (editable install with dev tooling):
pip install -e ".[dev]"Optional extras:
pip install -e ".[coverage]" # tier 2 helper (pytest-timeout)
pip install -e ".[llm]" # tier 3 LLM judge (openai SDK)This installs the pragma command. Verify it:
pragma --helpIn Claude Code, install the plugin from this repository:
/plugin install pragma@joncik91/pragma
The plugin's hooks scan edits to files matching test_*.py, */tests/*.py, and
*/tests/*/*.py. Tier 1 always runs; tiers 2 and 3 are opt-in via the
environment variables above. See docs/hooks.md.
Once installed, the plugin gates test-file edits automatically:
- PreToolUse inspects a full-file
Writebefore it lands and runs tier 1. - PostToolUse re-checks on disk after
Edit/MultiEdit, and is wherePRAGMA_COVERAGE=1(tier 2) andPRAGMA_HOOK_WITH_LLM=1(tier 3) take effect.
The hook blocks only the gaming an edit introduces — pre-existing gaming in a
file is not your problem to fix right now. Full semantics, exit codes, and
environment variables are in docs/hooks.md.
The CLI works on its own:
# tier 1 only — fast, deterministic
pragma verify tests path/to/test_login.py
# tier 1 + tier 2 — runs the test under coverage, requires target lines to execute
pragma verify tests path/to/test_login.py --with-coverage
# all three tiers
pragma verify tests path/to/test_login.py --with-coverage --with-llmDefault output is JSON; --human prints one path::test [kind] evidence line
per verdict. The command exits 1 when any blocking verdict fires, 0
otherwise.
$ pragma verify tests tests/fixtures/blocking/gamed_tautology.py --human
tests/fixtures/blocking/gamed_tautology.py::test_login_happy_path [python.tautological] `assert True` is a constant truthyWire the tier-1 classifier into pre-commit:
pragma init-precommitThis writes a .pre-commit-config.yaml that runs pragma verify tests on
staged test files. For the manual snippet, see docs/PRECOMMIT.md.
Full command and option reference, plus the plugin, rules, and security pages,
live under docs/:
Maintained by the Pragma contributors. See the repository for issues and pull requests.
PRs welcome. Useful directions:
- More languages. The verdict-table shape generalises — Go's
_test.go, Rust's#[test], Ruby's RSpec all have analogous gaming patterns. - New tier 1 verdicts for shapes the classifier doesn't yet catch (open an issue with a real-world test file showing the pattern).
- Replay corpora — anonymised test files where a tier got it wrong, so the classifier can be tuned.
Set up the dev environment with pip install -e ".[dev]" and run the suite with
pytest. See docs/PRECOMMIT.md for the existing
pre-commit integration.
MIT © Pragma contributors.