🧪 AI Builders Bootcamp

A one-of-a-kind interactive roleplay bootcamp that teaches product people of all levels of proficiency how to build and evaluate production-ready AI systems — ✨ by actually doing it ✨

No slides. No videos. You clone this repo, open Claude Code, and it becomes your personal AI evals tutor: teaching one concept at a time, guiding you through hands-on exercises, and evaluating your product decisions.

⭐ Star this repo to save it to your GitHub profile for easy reference later.

🙋 Who Should Take This Course

This course is for product folks who want to ship AI features that actually work — reliably, at scale, beyond gut-feel.

Primary audience: Product Managers shipping AI features who want a systematic, repeatable way to know their product is actually working. Also great for:

Associate and Group PMs transitioning into AI-focused roles
Founders and solo builders who own both product and quality
Product Leads overseeing AI teams and setting eval strategy
Technical PMs who want to bridge engineering metrics and product decisions

If you've ever asked "how do I know if this AI is actually working?" — this course is for you.

🎯 What You'll Learn

Tell if your AI is actually working — not just in demos, but in production, consistently, for all your users
Find what's breaking before your users do — when AI behaves unexpectedly, you'll know exactly where to look and what to ask
Write quality standards your team can build to — replace vague requests with clear, testable criteria before development starts
Catch when AI fails some users more than others — spot whether certain customer groups are getting a worse experience before it becomes a problem
Run AI experiments that actually tell you something — avoid the traps that make AI test results misleading
Make launch calls with a framework, not a gut feel — a repeatable ship/hold process for every AI feature you own
Hold your team and vendors accountable — ask the right questions in any AI review, regardless of how technical it gets
Build a culture where quality is everyone's job — turn evals from a last-minute checkbox into a team-wide habit

✨ Course Features

Multiple learning tracks — choose the use case that matches your level and context; new tracks are added regularly
Hands-on exercises — every lesson includes an exercise where you do the analysis; no toy examples
You do the thinking — Claude computes on request; you direct the analysis and draw the conclusions
PM Decision Points — each lesson ends with you writing a recommendation or artifact; Claude evaluates it against a scoring rubric
Adaptive tutoring — Claude matches your pace; experienced practitioners move fast, newcomers get more examples
~30–40 min per day — designed for working professionals; one focused lesson per day
Progress saved locally — tracked in progress/progress.json, gitignored and never leaves your machine

🚀 Quick Start

Already set up? Skip ahead:

Not sure if you have Node.js or Claude Code installed? → Step 1
Have Node.js but not Claude Code? → Step 2
Have Claude Code installed? → Step 3
Have the files cloned? → Step 4

Step 1 — Check your setup

Open a terminal. This is where the course runs.

Mac: Search "Terminal" in Spotlight, or press Cmd+Space and type Terminal
Cursor: Go to View → Terminal, or press Ctrl+` (Windows) / Cmd+` (Mac)
Windows: Search "PowerShell" in the Start menu

⚠️ Using Cursor? Claude Code is a separate tool — Cursor is your editor, Claude Code is what runs the course. Type commands in the terminal (View → Terminal), not Cursor's chat box.

Check if you have Node.js:

node --version

If you see a version number, you have Node.js. If not, download it from nodejs.org (use the LTS version) before continuing.

Check if you have Claude Code:

claude --version

If you see a version number, skip to Step 3. If not, continue to Step 2.

Create an Anthropic account (free) at claude.ai if you don't have one — you'll need it to authenticate Claude Code.

Step 2 — Install Claude Code

npm install -g @anthropic-ai/claude-code

Verify it worked:

claude --version

If you see a version number, you're good. ✅

Permissions error? If you're on a managed or corporate laptop, download Node.js directly from nodejs.org instead of using npm — this bypasses most IT restrictions. Still stuck? You may need to ask IT to whitelist the install.

Step 3 — Get the course files

git clone https://github.com/productfoundry101/ai-builders-bootcamp.git
cd ai-builders-bootcamp

Don't have git? Download it from git-scm.com, then run the commands above.

If you're using Cursor: Go to File → Open Folder and select the ai-builders-bootcamp folder. Your course files — lessons, datasets, everything — will appear in the left sidebar. These are real files sitting on your computer; you can open the CSVs in Excel, Numbers, or Google Sheets anytime.

Step 4 — Start the course

Make sure you're inside the course folder, then run:

claude

You'll see a > prompt — that means it worked. Type go and your tutor will introduce itself and start Day 1.

🔄 Returning after your first session

Each time you come back to continue the course — the next day, or after any break — run these two commands from your terminal:

cd ai-builders-bootcamp
claude

Your progress is saved automatically after each lesson. The tutor will pick up exactly where you left off.

🔧 Troubleshooting

Problem	Fix
`claude: command not found`	Run `npm install -g @anthropic-ai/claude-code` again, then restart your terminal
Permissions error during install	Download Node.js directly from nodejs.org instead
Blank screen after running `claude`	You're in — just type `go` to start
Claude doesn't introduce itself as tutor	Make sure you ran `claude` from inside the `ai-builders-bootcamp` folder, not a parent directory
Claude asks to approve file writes	Type `yes` — it needs this to save your progress
Stuck mid-lesson	Type `resume` — the tutor will re-read your progress and pick up where you left off

📅 Course Structure

When you start the course, you'll choose a learning track. Each track has its own lessons, exercises, and PM decision points built around a specific real-world AI use case.

🟡 Track 1 — Menu Verification at a Food Delivery Company (Intermediate)

21 days. 3 weeks. One lesson per day.

Week 1 — Your Eval Foundation (Days 1–7)

Day	Lesson	Key Skills
D1	Pipeline Mapping	Pipeline stages, non-determinism, reading traces
D2	Failure Surface Mapping	Evaluation surface map, failure layers, coverage gaps
D3	Error Analysis	Open coding, axial coding, saturation, triage
D4	Thinking in Distributions	Shape before depth, pass@k, reliable@k, the consistency gap
D5	Grader Types	Code-based, model-based, human graders; layering strategy
D6	LLM-as-Judge	Calibration trap, Critique Shadowing, failure modes, meta-evaluation
D7	Golden Datasets	Three sources, contamination, dataset lifecycle

Week 2 — Metrics and Measurement at Scale (Days 8–14)

Day	Lesson	Key Skills
D8	RAG Evaluation	Precision@k, faithfulness, answer relevance, context recall
D9	Hallucination Detection	Detection strategies, grounding, citation evaluation
D10	Release Criteria	Guardrail vs optimization metrics, ship/hold thresholds
D11	Metric Design	Metric tradeoffs, evaluation cost, coverage strategy
D12	Fairness & Subgroups	Subgroup slicing, disparity detection, fairness in practice
D13	Eval-Driven Development	Evals as product specs, regression testing, eval cadence
D14	Observability	Logging, tracing, what to instrument and why

Week 3 — Ship, Monitor, and Scale (Days 15–21)

Day	Lesson	Key Skills
D15	Agent Evaluation	Multi-step pipelines, tool use, trajectory evaluation
D16	AI Experiments	LLM A/B testing, variance, confounds
D17	Launch Readiness	Pre-launch checklist, drift detection, incident response
D18	Red Teaming	Threat modeling, adversarial prompts, stress testing
D19	Ship Decisions	Synthesizing eval signals into a go/no-go recommendation
D20	Regulatory Context	AI Act, liability, what product people need to know
D21	Eval Culture	Institutionalizing evals, team buy-in, eval as product practice

🟢 Track 2 — Building a Conversational Language Tutor (Beginner)

New to AI evaluation? This track teaches the same eval fundamentals through the lens of a consumer AI product — a language learning assistant that holds conversations, gives feedback, and adapts to the learner's level.

Open-ended outputs, tricky quality definitions, and a use case most people intuitively understand — making it the ideal entry point if you're new to AI evals or working in consumer AI.

📝 Content in progress — lessons and exercises for this track are being added regularly. Watch the repo to get notified when new content drops.

📁 What's in the Repo

use-cases/
  menu-verification/    Intermediate track — menu verification at a food delivery company
    lessons/            Lesson content (D1–D21)
    exercises/          CSV datasets you'll analyse during exercises
    scoring-rubrics.md  PM Decision Point rubrics (used by Claude, not shown to you)
    meta.md             Track title, level, and description
  language-tutor/       Beginner track — building a conversational language tutor
    lessons/            (coming soon)
    exercises/          (coming soon)
tutor/                  Session protocol — Claude's tutoring instructions
progress/               Your local progress — gitignored, never leaves your machine
CLAUDE.md               Course configuration — Claude reads this on startup

⭐ Stay Updated

Found this course useful? Star the repo ⭐ — it saves it to your GitHub profile for easy reference, it helps others discover it, and it massively helps me.

This course is actively updated based on feedback from real learners — new lessons, new use cases, fixes, and improvements ship regularly. To get notified the moment an update drops, click Watch → Custom → Releases at the top of this page.

📚 Further Reading & Acknowledgements

This course stands on the shoulders of practitioners who've shared their teachings publicly. If you want to go deeper, these are the sources that most shaped what you just learned:

Hamel Husain — evals methodology, error analysis, LLM-as-judge
Shreya Shankar — LLM judge calibration research
Lenny's Newsletter — PM-specific evals framing ("Beyond vibe checks" and related pieces)
Aman Khan — AI PM evals perspective
Tal Raviv — practical PM evals examples
AI Analyst Lab — inspiration for framing evals as a product-centric arc (rather than analyst-centric) and for treating error analysis as the foundation every other technique builds on
RAGAS — RAG evaluation framework
OWASP LLM Top 10 — adversarial attack taxonomy for LLM systems
"Building AI Product Sense with a Custom Tutor" by Aman Khan — inspiration for implementing Claude Code as your AI tutor

📄 License

CC BY-NC-SA 4.0 — Free to use and adapt for non-commercial purposes with attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.claude		.claude
progress		progress
tutor		tutor
use-cases		use-cases
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 AI Builders Bootcamp

🙋 Who Should Take This Course

🎯 What You'll Learn

✨ Course Features

🚀 Quick Start

Step 1 — Check your setup

Step 2 — Install Claude Code

Step 3 — Get the course files

Step 4 — Start the course

🔄 Returning after your first session

🔧 Troubleshooting

📅 Course Structure

🟡 Track 1 — Menu Verification at a Food Delivery Company (Intermediate)

Week 1 — Your Eval Foundation (Days 1–7)

Week 2 — Metrics and Measurement at Scale (Days 8–14)

Week 3 — Ship, Monitor, and Scale (Days 15–21)

🟢 Track 2 — Building a Conversational Language Tutor (Beginner)

📁 What's in the Repo

⭐ Stay Updated

📚 Further Reading & Acknowledgements

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🧪 AI Builders Bootcamp

🙋 Who Should Take This Course

🎯 What You'll Learn

✨ Course Features

🚀 Quick Start

Step 1 — Check your setup

Step 2 — Install Claude Code

Step 3 — Get the course files

Step 4 — Start the course

🔄 Returning after your first session

🔧 Troubleshooting

📅 Course Structure

🟡 Track 1 — Menu Verification at a Food Delivery Company (Intermediate)

Week 1 — Your Eval Foundation (Days 1–7)

Week 2 — Metrics and Measurement at Scale (Days 8–14)

Week 3 — Ship, Monitor, and Scale (Days 15–21)

🟢 Track 2 — Building a Conversational Language Tutor (Beginner)

📁 What's in the Repo

⭐ Stay Updated

📚 Further Reading & Acknowledgements

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages