A one-of-a-kind interactive roleplay bootcamp that teaches product people of all levels of proficiency how to build and evaluate production-ready AI systems — ✨ by actually doing it ✨
No slides. No videos. You clone this repo, open Claude Code, and it becomes your personal AI evals tutor: teaching one concept at a time, guiding you through hands-on exercises, and evaluating your product decisions.
⭐ Star this repo to save it to your GitHub profile for easy reference later.
This course is for product folks who want to ship AI features that actually work — reliably, at scale, beyond gut-feel.
Primary audience: Product Managers shipping AI features who want a systematic, repeatable way to know their product is actually working. Also great for:
- Associate and Group PMs transitioning into AI-focused roles
- Founders and solo builders who own both product and quality
- Product Leads overseeing AI teams and setting eval strategy
- Technical PMs who want to bridge engineering metrics and product decisions
If you've ever asked "how do I know if this AI is actually working?" — this course is for you.
- Tell if your AI is actually working — not just in demos, but in production, consistently, for all your users
- Find what's breaking before your users do — when AI behaves unexpectedly, you'll know exactly where to look and what to ask
- Write quality standards your team can build to — replace vague requests with clear, testable criteria before development starts
- Catch when AI fails some users more than others — spot whether certain customer groups are getting a worse experience before it becomes a problem
- Run AI experiments that actually tell you something — avoid the traps that make AI test results misleading
- Make launch calls with a framework, not a gut feel — a repeatable ship/hold process for every AI feature you own
- Hold your team and vendors accountable — ask the right questions in any AI review, regardless of how technical it gets
- Build a culture where quality is everyone's job — turn evals from a last-minute checkbox into a team-wide habit
- Multiple learning tracks — choose the use case that matches your level and context; new tracks are added regularly
- Hands-on exercises — every lesson includes an exercise where you do the analysis; no toy examples
- You do the thinking — Claude computes on request; you direct the analysis and draw the conclusions
- PM Decision Points — each lesson ends with you writing a recommendation or artifact; Claude evaluates it against a scoring rubric
- Adaptive tutoring — Claude matches your pace; experienced practitioners move fast, newcomers get more examples
- ~30–40 min per day — designed for working professionals; one focused lesson per day
- Progress saved locally — tracked in
progress/progress.json, gitignored and never leaves your machine
Already set up? Skip ahead:
- Not sure if you have Node.js or Claude Code installed? → Step 1
- Have Node.js but not Claude Code? → Step 2
- Have Claude Code installed? → Step 3
- Have the files cloned? → Step 4
Open a terminal. This is where the course runs.
- Mac: Search "Terminal" in Spotlight, or press
Cmd+Spaceand type Terminal - Cursor: Go to View → Terminal, or press
Ctrl+`(Windows) /Cmd+`(Mac) - Windows: Search "PowerShell" in the Start menu
⚠️ Using Cursor? Claude Code is a separate tool — Cursor is your editor, Claude Code is what runs the course. Type commands in the terminal (View → Terminal), not Cursor's chat box.
Check if you have Node.js:
node --versionIf you see a version number, you have Node.js. If not, download it from nodejs.org (use the LTS version) before continuing.
Check if you have Claude Code:
claude --versionIf you see a version number, skip to Step 3. If not, continue to Step 2.
Create an Anthropic account (free) at claude.ai if you don't have one — you'll need it to authenticate Claude Code.
npm install -g @anthropic-ai/claude-codeVerify it worked:
claude --versionIf you see a version number, you're good. ✅
Permissions error? If you're on a managed or corporate laptop, download Node.js directly from nodejs.org instead of using npm — this bypasses most IT restrictions. Still stuck? You may need to ask IT to whitelist the install.
git clone https://github.com/productfoundry101/ai-builders-bootcamp.git
cd ai-builders-bootcampDon't have git? Download it from git-scm.com, then run the commands above.
If you're using Cursor: Go to File → Open Folder and select the ai-builders-bootcamp folder. Your course files — lessons, datasets, everything — will appear in the left sidebar. These are real files sitting on your computer; you can open the CSVs in Excel, Numbers, or Google Sheets anytime.
Make sure you're inside the course folder, then run:
claudeYou'll see a > prompt — that means it worked. Type go and your tutor will introduce itself and start Day 1.
Each time you come back to continue the course — the next day, or after any break — run these two commands from your terminal:
cd ai-builders-bootcamp
claudeYour progress is saved automatically after each lesson. The tutor will pick up exactly where you left off.
| Problem | Fix |
|---|---|
claude: command not found |
Run npm install -g @anthropic-ai/claude-code again, then restart your terminal |
| Permissions error during install | Download Node.js directly from nodejs.org instead |
Blank screen after running claude |
You're in — just type go to start |
| Claude doesn't introduce itself as tutor | Make sure you ran claude from inside the ai-builders-bootcamp folder, not a parent directory |
| Claude asks to approve file writes | Type yes — it needs this to save your progress |
| Stuck mid-lesson | Type resume — the tutor will re-read your progress and pick up where you left off |
When you start the course, you'll choose a learning track. Each track has its own lessons, exercises, and PM decision points built around a specific real-world AI use case.
21 days. 3 weeks. One lesson per day.
| Day | Lesson | Key Skills |
|---|---|---|
| D1 | Pipeline Mapping | Pipeline stages, non-determinism, reading traces |
| D2 | Failure Surface Mapping | Evaluation surface map, failure layers, coverage gaps |
| D3 | Error Analysis | Open coding, axial coding, saturation, triage |
| D4 | Thinking in Distributions | Shape before depth, pass@k, reliable@k, the consistency gap |
| D5 | Grader Types | Code-based, model-based, human graders; layering strategy |
| D6 | LLM-as-Judge | Calibration trap, Critique Shadowing, failure modes, meta-evaluation |
| D7 | Golden Datasets | Three sources, contamination, dataset lifecycle |
| Day | Lesson | Key Skills |
|---|---|---|
| D8 | RAG Evaluation | Precision@k, faithfulness, answer relevance, context recall |
| D9 | Hallucination Detection | Detection strategies, grounding, citation evaluation |
| D10 | Release Criteria | Guardrail vs optimization metrics, ship/hold thresholds |
| D11 | Metric Design | Metric tradeoffs, evaluation cost, coverage strategy |
| D12 | Fairness & Subgroups | Subgroup slicing, disparity detection, fairness in practice |
| D13 | Eval-Driven Development | Evals as product specs, regression testing, eval cadence |
| D14 | Observability | Logging, tracing, what to instrument and why |
| Day | Lesson | Key Skills |
|---|---|---|
| D15 | Agent Evaluation | Multi-step pipelines, tool use, trajectory evaluation |
| D16 | AI Experiments | LLM A/B testing, variance, confounds |
| D17 | Launch Readiness | Pre-launch checklist, drift detection, incident response |
| D18 | Red Teaming | Threat modeling, adversarial prompts, stress testing |
| D19 | Ship Decisions | Synthesizing eval signals into a go/no-go recommendation |
| D20 | Regulatory Context | AI Act, liability, what product people need to know |
| D21 | Eval Culture | Institutionalizing evals, team buy-in, eval as product practice |
New to AI evaluation? This track teaches the same eval fundamentals through the lens of a consumer AI product — a language learning assistant that holds conversations, gives feedback, and adapts to the learner's level.
Open-ended outputs, tricky quality definitions, and a use case most people intuitively understand — making it the ideal entry point if you're new to AI evals or working in consumer AI.
📝 Content in progress — lessons and exercises for this track are being added regularly. Watch the repo to get notified when new content drops.
use-cases/
menu-verification/ Intermediate track — menu verification at a food delivery company
lessons/ Lesson content (D1–D21)
exercises/ CSV datasets you'll analyse during exercises
scoring-rubrics.md PM Decision Point rubrics (used by Claude, not shown to you)
meta.md Track title, level, and description
language-tutor/ Beginner track — building a conversational language tutor
lessons/ (coming soon)
exercises/ (coming soon)
tutor/ Session protocol — Claude's tutoring instructions
progress/ Your local progress — gitignored, never leaves your machine
CLAUDE.md Course configuration — Claude reads this on startup
Found this course useful? Star the repo ⭐ — it saves it to your GitHub profile for easy reference, it helps others discover it, and it massively helps me.
This course is actively updated based on feedback from real learners — new lessons, new use cases, fixes, and improvements ship regularly. To get notified the moment an update drops, click Watch → Custom → Releases at the top of this page.
This course stands on the shoulders of practitioners who've shared their teachings publicly. If you want to go deeper, these are the sources that most shaped what you just learned:
- Hamel Husain — evals methodology, error analysis, LLM-as-judge
- Shreya Shankar — LLM judge calibration research
- Lenny's Newsletter — PM-specific evals framing ("Beyond vibe checks" and related pieces)
- Aman Khan — AI PM evals perspective
- Tal Raviv — practical PM evals examples
- AI Analyst Lab — inspiration for framing evals as a product-centric arc (rather than analyst-centric) and for treating error analysis as the foundation every other technique builds on
- RAGAS — RAG evaluation framework
- OWASP LLM Top 10 — adversarial attack taxonomy for LLM systems
- "Building AI Product Sense with a Custom Tutor" by Aman Khan — inspiration for implementing Claude Code as your AI tutor
CC BY-NC-SA 4.0 — Free to use and adapt for non-commercial purposes with attribution.