Most AI features fail not because the model is bad, but because the requirements were written for the wrong kind of system.
- The Problem
- What's in this repo
- How to use
- PRD Scorer Web App
- Built On
- Architecture
- Screenshots
- Run locally
- Run tests
- Deploy your own
- Why this is different from a standard PRD
A standard PRD tells you what a feature should do. That's fine when the system is deterministic. When the system is a language model, "what it should do" is the easy part. The hard parts are:
- What does a good output look like, in measurable terms, before you build?
- What happens when the model is slow, wrong, or uncertain?
- What data must exist before you start, and who's responsible for it?
- What are the failure modes, and who owns the mitigations?
Most teams answer these questions in production, not in requirements. This repo is a framework for answering them before engineering starts.
| Artifact | Description |
|---|---|
template/AI-FEATURE-PRD-TEMPLATE.md |
The 10-section PRD template with inline guidance for each field |
scorecard/PRD-READINESS-SCORECARD.md |
A 10-point checklist to score any AI PRD before engineering kickoff |
example/real-time-agent-assist-prd.md |
A fully worked example: real-time AI assist for a contact center |
assets/prd-vs-ai-prd-comparison.md |
Side-by-side: traditional PRD sections vs. their AI-native equivalents |
scorer/ |
Next.js web app β upload a PRD, get an AI-generated score and downloadable report |
- Start with the template. Copy
AI-FEATURE-PRD-TEMPLATE.mdand fill it in for your feature. Don't skip sections β the gaps are the signal. - Score your draft. Upload it to the PRD Scorer or run it through the
PRD-READINESS-SCORECARD.mdmanually. If you score below 7, you have known gaps before engineering starts. - See the worked example.
real-time-agent-assist-prd.mdshows every section filled in for a real-world AI feature. Use it to calibrate what "specific enough" looks like.
Live: https://ai-feature-prd-toolkit.vercel.app/
Upload any AI feature PRD β .md, .txt, .pdf, or .docx β and get an instant 10-point readiness score powered by Claude. Each dimension returns a finding, a fix recommendation, and a downloadable Markdown report. The app also offers a direct download of the PRD template so you can start from scratch without leaving the page.
Input: Your AI feature PRD (any format)
ββββββββββββββββββββββββββββββββββββββββββββββ
PRD Readiness Score 7 / 10
Address gaps before sprint planning
ββββββββββββββββββββββββββββββββββββββββββββββ
Critical gaps:
Β· No golden dataset owner or due date
Β· Cost estimate missing
Β· Confidence thresholds not specified
β Eval criteria defined and measurable 1/1
β Golden dataset ownership 0/1
Finding: PRD mentions eval but no owner or dataset exists.
Fix: Name the person creating the dataset and set a due date
before engineering kickoff.
β Confidence thresholds specified 1/1
β Latency SLA defined 1/1
β Fallback behavior documented 1/1
β Data readiness go/no-go criteria 1/1
β Human-in-the-loop checkpoint 1/1
β Failure mode pre-mortem 1/1
β Rollout cost estimate at scale 0/1
Finding: No token/volume/cost calculation present.
Fix: Estimate tokens Γ daily volume Γ price for each model call.
β Open questions log with owners 1/1
ββββββββββββββββββββββββββββββββββββββββββββββ
[ Download report ] [ Score again ]
ββββββββββββββββββββββββββββββββββββββββββββββ
The Anthropic API call happens server-side. Your document is never stored.
| Tool | Why |
|---|---|
| Next.js 14 | Single repo for frontend and serverless API routes β no separate backend to deploy |
| React | UI only, no component library β the scorer page is a single file with inline styles |
| Claude (claude-sonnet-4-20250514) | Structured JSON output via the Messages API; strong instruction-following for strict per-dimension scoring |
| pdf.js | Client-side PDF parsing without bundling β injected dynamically only when a .pdf is uploaded |
| mammoth.js | Client-side .docx β plain text; same on-demand injection pattern as pdf.js |
| Vercel | Zero-config deployment; root directory set to scorer/ so only the app is deployed, not the markdown artifacts |
βββββββββββββββββββββββββββββββββββββββββββββββ
β Browser β
β pages/index.js β
β β
β File upload (.md / .txt / .pdf / .docx) β
β Client-side parsing (FileReader, pdf.js, β
β mammoth.js β loaded on demand) β
β Results UI + report download β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β POST /api/score
β { prdText: string }
ββββββββββββββββββββΌβββββββββββββββββββββββββββ
β Next.js API Route β
β pages/api/score.js β
β β
β Β· Validates input (100β50,000 chars) β
β Β· Calls Anthropic API with scoring prompt β
β Β· Strips code fences, parses JSON response β
β Β· Returns structured score object β
β β
β ANTHROPIC_API_KEY β server-side only, β
β never exposed to the browser β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββββββ
β Anthropic API β
β claude-sonnet-4-20250514 β
β β
β Evaluates PRD against 10 dimensions β
β Returns structured JSON: scores, findings, β
β fixes, top gaps, summary, verdict β
βββββββββββββββββββββββββββββββββββββββββββββββ
npm run install:scorer # install dependencies (first time)
cp scorer/.env.local.example scorer/.env.local # add your ANTHROPIC_API_KEY
npm run dev # copies template β public/, starts http://localhost:3000npm run install:scorer # if not already installed
cd scorer && npm test # 38 tests β API route + pure functionsThe test suite covers input validation, boundary conditions, and security properties: non-string inputs rejected, API key never present in responses, raw Anthropic envelope never forwarded, and malformed upstream JSON handled without leaking content.
- Fork this repo
- Go to vercel.com β Import project β select this repo β set the root directory to
scorer - Add
ANTHROPIC_API_KEYas an environment variable in the Vercel project settings - Deploy β Vercel handles the rest
- AI systems don't pass or fail β they degrade. Standard acceptance criteria assume binary correctness. This template forces you to define confidence tiers, fallback behavior, and what happens when the model is slow, wrong, or uncertain.
- The data is a dependency, not a detail. Most AI features stall in production because the data was assumed, not specified. The data readiness gates section treats data quality as a go/no-go criterion β because it is.
- Evaluation criteria have to be designed before the model is built. If you wait until the model is done to decide how to measure it, you'll end up measuring what's easy, not what matters. This template starts with eval criteria.
If you can't define what "good" looks like before you build, you'll know it when it's too late.


