Skip to content

amitgambhir/ai-feature-prd-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‹ AI Feature PRD Toolkit

Most AI features fail not because the model is bad, but because the requirements were written for the wrong kind of system.

MIT License Next.js Powered by Claude


Table of Contents

  1. The Problem
  2. What's in this repo
  3. How to use
  4. PRD Scorer Web App
  5. Built On
  6. Architecture
  7. Screenshots
  8. Run locally
  9. Run tests
  10. Deploy your own
  11. Why this is different from a standard PRD

The Problem

A standard PRD tells you what a feature should do. That's fine when the system is deterministic. When the system is a language model, "what it should do" is the easy part. The hard parts are:

  • What does a good output look like, in measurable terms, before you build?
  • What happens when the model is slow, wrong, or uncertain?
  • What data must exist before you start, and who's responsible for it?
  • What are the failure modes, and who owns the mitigations?

Most teams answer these questions in production, not in requirements. This repo is a framework for answering them before engineering starts.


What's in this repo

Artifact Description
template/AI-FEATURE-PRD-TEMPLATE.md The 10-section PRD template with inline guidance for each field
scorecard/PRD-READINESS-SCORECARD.md A 10-point checklist to score any AI PRD before engineering kickoff
example/real-time-agent-assist-prd.md A fully worked example: real-time AI assist for a contact center
assets/prd-vs-ai-prd-comparison.md Side-by-side: traditional PRD sections vs. their AI-native equivalents
scorer/ Next.js web app β€” upload a PRD, get an AI-generated score and downloadable report

How to use

  1. Start with the template. Copy AI-FEATURE-PRD-TEMPLATE.md and fill it in for your feature. Don't skip sections β€” the gaps are the signal.
  2. Score your draft. Upload it to the PRD Scorer or run it through the PRD-READINESS-SCORECARD.md manually. If you score below 7, you have known gaps before engineering starts.
  3. See the worked example. real-time-agent-assist-prd.md shows every section filled in for a real-world AI feature. Use it to calibrate what "specific enough" looks like.

PRD Scorer Web App

Live: https://ai-feature-prd-toolkit.vercel.app/

Upload any AI feature PRD β€” .md, .txt, .pdf, or .docx β€” and get an instant 10-point readiness score powered by Claude. Each dimension returns a finding, a fix recommendation, and a downloadable Markdown report. The app also offers a direct download of the PRD template so you can start from scratch without leaving the page.

Input: Your AI feature PRD (any format)

──────────────────────────────────────────────
  PRD Readiness Score    7 / 10
  Address gaps before sprint planning
──────────────────────────────────────────────
  Critical gaps:
  Β· No golden dataset owner or due date
  Β· Cost estimate missing
  Β· Confidence thresholds not specified

  βœ“  Eval criteria defined and measurable       1/1
  βœ—  Golden dataset ownership                   0/1
       Finding: PRD mentions eval but no owner or dataset exists.
       Fix: Name the person creating the dataset and set a due date
            before engineering kickoff.
  βœ“  Confidence thresholds specified            1/1
  βœ“  Latency SLA defined                        1/1
  βœ“  Fallback behavior documented               1/1
  βœ“  Data readiness go/no-go criteria           1/1
  βœ“  Human-in-the-loop checkpoint               1/1
  βœ“  Failure mode pre-mortem                    1/1
  βœ—  Rollout cost estimate at scale             0/1
       Finding: No token/volume/cost calculation present.
       Fix: Estimate tokens Γ— daily volume Γ— price for each model call.
  βœ“  Open questions log with owners             1/1
──────────────────────────────────────────────
  [ Download report ]   [ Score again ]
──────────────────────────────────────────────

The Anthropic API call happens server-side. Your document is never stored.


Built On

Tool Why
Next.js 14 Single repo for frontend and serverless API routes β€” no separate backend to deploy
React UI only, no component library β€” the scorer page is a single file with inline styles
Claude (claude-sonnet-4-20250514) Structured JSON output via the Messages API; strong instruction-following for strict per-dimension scoring
pdf.js Client-side PDF parsing without bundling β€” injected dynamically only when a .pdf is uploaded
mammoth.js Client-side .docx β†’ plain text; same on-demand injection pattern as pdf.js
Vercel Zero-config deployment; root directory set to scorer/ so only the app is deployed, not the markdown artifacts

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser                                     β”‚
β”‚  pages/index.js                              β”‚
β”‚                                              β”‚
β”‚  File upload (.md / .txt / .pdf / .docx)     β”‚
β”‚  Client-side parsing (FileReader, pdf.js,    β”‚
β”‚  mammoth.js β€” loaded on demand)              β”‚
β”‚  Results UI + report download                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚  POST /api/score
                   β”‚  { prdText: string }
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Next.js API Route                           β”‚
β”‚  pages/api/score.js                          β”‚
β”‚                                              β”‚
β”‚  Β· Validates input (100–50,000 chars)        β”‚
β”‚  Β· Calls Anthropic API with scoring prompt   β”‚
β”‚  Β· Strips code fences, parses JSON response  β”‚
β”‚  Β· Returns structured score object           β”‚
β”‚                                              β”‚
β”‚  ANTHROPIC_API_KEY β€” server-side only,       β”‚
β”‚  never exposed to the browser                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Anthropic API                               β”‚
β”‚  claude-sonnet-4-20250514                    β”‚
β”‚                                              β”‚
β”‚  Evaluates PRD against 10 dimensions         β”‚
β”‚  Returns structured JSON: scores, findings,  β”‚
β”‚  fixes, top gaps, summary, verdict           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Screenshots

Upload screen

Score results

Downloadable report


Run locally

npm run install:scorer                           # install dependencies (first time)
cp scorer/.env.local.example scorer/.env.local   # add your ANTHROPIC_API_KEY
npm run dev                                      # copies template β†’ public/, starts http://localhost:3000

Run tests

npm run install:scorer   # if not already installed
cd scorer && npm test    # 38 tests β€” API route + pure functions

The test suite covers input validation, boundary conditions, and security properties: non-string inputs rejected, API key never present in responses, raw Anthropic envelope never forwarded, and malformed upstream JSON handled without leaking content.


Deploy your own

  1. Fork this repo
  2. Go to vercel.com β†’ Import project β†’ select this repo β†’ set the root directory to scorer
  3. Add ANTHROPIC_API_KEY as an environment variable in the Vercel project settings
  4. Deploy β€” Vercel handles the rest

Why this is different from a standard PRD

  • AI systems don't pass or fail β€” they degrade. Standard acceptance criteria assume binary correctness. This template forces you to define confidence tiers, fallback behavior, and what happens when the model is slow, wrong, or uncertain.
  • The data is a dependency, not a detail. Most AI features stall in production because the data was assumed, not specified. The data readiness gates section treats data quality as a go/no-go criterion β€” because it is.
  • Evaluation criteria have to be designed before the model is built. If you wait until the model is done to decide how to measure it, you'll end up measuring what's easy, not what matters. This template starts with eval criteria.

If you can't define what "good" looks like before you build, you'll know it when it's too late.

About

A framework of templates, scorecards, and a web app for writing AI-native feature requirements that answer the hard questions before engineering starts.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors