Skip to content

productfoundry101/ai-evals-bootcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧪 AI Builders Bootcamp

A one-of-a-kind interactive roleplay bootcamp that teaches product people of all levels of proficiency how to build and evaluate production-ready AI systems — ✨ by actually doing it ✨

No slides. No videos. You clone this repo, open Claude Code, and it becomes your personal AI evals tutor: teaching one concept at a time, guiding you through hands-on exercises, and evaluating your product decisions.

⭐ Star this repo to save it to your GitHub profile for easy reference later.


🙋 Who Should Take This Course

This course is for product folks who want to ship AI features that actually work — reliably, at scale, beyond gut-feel.

Primary audience: Product Managers shipping AI features who want a systematic, repeatable way to know their product is actually working. Also great for:

  • Associate and Group PMs transitioning into AI-focused roles
  • Founders and solo builders who own both product and quality
  • Product Leads overseeing AI teams and setting eval strategy
  • Technical PMs who want to bridge engineering metrics and product decisions

If you've ever asked "how do I know if this AI is actually working?" — this course is for you.


🎯 What You'll Learn

  • Tell if your AI is actually working — not just in demos, but in production, consistently, for all your users
  • Find what's breaking before your users do — when AI behaves unexpectedly, you'll know exactly where to look and what to ask
  • Write quality standards your team can build to — replace vague requests with clear, testable criteria before development starts
  • Catch when AI fails some users more than others — spot whether certain customer groups are getting a worse experience before it becomes a problem
  • Run AI experiments that actually tell you something — avoid the traps that make AI test results misleading
  • Make launch calls with a framework, not a gut feel — a repeatable ship/hold process for every AI feature you own
  • Hold your team and vendors accountable — ask the right questions in any AI review, regardless of how technical it gets
  • Build a culture where quality is everyone's job — turn evals from a last-minute checkbox into a team-wide habit

✨ Course Features

  • Multiple learning tracks — choose the use case that matches your level and context; new tracks are added regularly
  • Hands-on exercises — every lesson includes an exercise where you do the analysis; no toy examples
  • You do the thinking — Claude computes on request; you direct the analysis and draw the conclusions
  • PM Decision Points — each lesson ends with you writing a recommendation or artifact; Claude evaluates it against a scoring rubric
  • Adaptive tutoring — Claude matches your pace; experienced practitioners move fast, newcomers get more examples
  • ~30–40 min per day — designed for working professionals; one focused lesson per day
  • Progress saved locally — tracked in progress/progress.json, gitignored and never leaves your machine

🚀 Quick Start

Already set up? Skip ahead:

  • Not sure if you have Node.js or Claude Code installed? → Step 1
  • Have Node.js but not Claude Code? → Step 2
  • Have Claude Code installed? → Step 3
  • Have the files cloned? → Step 4

Step 1 — Check your setup

Open a terminal. This is where the course runs.

  • Mac: Search "Terminal" in Spotlight, or press Cmd+Space and type Terminal
  • Cursor: Go to View → Terminal, or press Ctrl+` (Windows) / Cmd+` (Mac)
  • Windows: Search "PowerShell" in the Start menu

⚠️ Using Cursor? Claude Code is a separate tool — Cursor is your editor, Claude Code is what runs the course. Type commands in the terminal (View → Terminal), not Cursor's chat box.

Check if you have Node.js:

node --version

If you see a version number, you have Node.js. If not, download it from nodejs.org (use the LTS version) before continuing.

Check if you have Claude Code:

claude --version

If you see a version number, skip to Step 3. If not, continue to Step 2.

Create an Anthropic account (free) at claude.ai if you don't have one — you'll need it to authenticate Claude Code.


Step 2 — Install Claude Code

npm install -g @anthropic-ai/claude-code

Verify it worked:

claude --version

If you see a version number, you're good. ✅

Permissions error? If you're on a managed or corporate laptop, download Node.js directly from nodejs.org instead of using npm — this bypasses most IT restrictions. Still stuck? You may need to ask IT to whitelist the install.


Step 3 — Get the course files

git clone https://github.com/productfoundry101/ai-builders-bootcamp.git
cd ai-builders-bootcamp

Don't have git? Download it from git-scm.com, then run the commands above.

If you're using Cursor: Go to File → Open Folder and select the ai-builders-bootcamp folder. Your course files — lessons, datasets, everything — will appear in the left sidebar. These are real files sitting on your computer; you can open the CSVs in Excel, Numbers, or Google Sheets anytime.


Step 4 — Start the course

Make sure you're inside the course folder, then run:

claude

You'll see a > prompt — that means it worked. Type go and your tutor will introduce itself and start Day 1.


🔄 Returning after your first session

Each time you come back to continue the course — the next day, or after any break — run these two commands from your terminal:

cd ai-builders-bootcamp
claude

Your progress is saved automatically after each lesson. The tutor will pick up exactly where you left off.


🔧 Troubleshooting

Problem Fix
claude: command not found Run npm install -g @anthropic-ai/claude-code again, then restart your terminal
Permissions error during install Download Node.js directly from nodejs.org instead
Blank screen after running claude You're in — just type go to start
Claude doesn't introduce itself as tutor Make sure you ran claude from inside the ai-builders-bootcamp folder, not a parent directory
Claude asks to approve file writes Type yes — it needs this to save your progress
Stuck mid-lesson Type resume — the tutor will re-read your progress and pick up where you left off

📅 Course Structure

When you start the course, you'll choose a learning track. Each track has its own lessons, exercises, and PM decision points built around a specific real-world AI use case.


🟡 Track 1 — Menu Verification at a Food Delivery Company (Intermediate)

21 days. 3 weeks. One lesson per day.

Week 1 — Your Eval Foundation (Days 1–7)

Day Lesson Key Skills
D1 Pipeline Mapping Pipeline stages, non-determinism, reading traces
D2 Failure Surface Mapping Evaluation surface map, failure layers, coverage gaps
D3 Error Analysis Open coding, axial coding, saturation, triage
D4 Thinking in Distributions Shape before depth, pass@k, reliable@k, the consistency gap
D5 Grader Types Code-based, model-based, human graders; layering strategy
D6 LLM-as-Judge Calibration trap, Critique Shadowing, failure modes, meta-evaluation
D7 Golden Datasets Three sources, contamination, dataset lifecycle

Week 2 — Metrics and Measurement at Scale (Days 8–14)

Day Lesson Key Skills
D8 RAG Evaluation Precision@k, faithfulness, answer relevance, context recall
D9 Hallucination Detection Detection strategies, grounding, citation evaluation
D10 Release Criteria Guardrail vs optimization metrics, ship/hold thresholds
D11 Metric Design Metric tradeoffs, evaluation cost, coverage strategy
D12 Fairness & Subgroups Subgroup slicing, disparity detection, fairness in practice
D13 Eval-Driven Development Evals as product specs, regression testing, eval cadence
D14 Observability Logging, tracing, what to instrument and why

Week 3 — Ship, Monitor, and Scale (Days 15–21)

Day Lesson Key Skills
D15 Agent Evaluation Multi-step pipelines, tool use, trajectory evaluation
D16 AI Experiments LLM A/B testing, variance, confounds
D17 Launch Readiness Pre-launch checklist, drift detection, incident response
D18 Red Teaming Threat modeling, adversarial prompts, stress testing
D19 Ship Decisions Synthesizing eval signals into a go/no-go recommendation
D20 Regulatory Context AI Act, liability, what product people need to know
D21 Eval Culture Institutionalizing evals, team buy-in, eval as product practice

🟢 Track 2 — Building a Conversational Language Tutor (Beginner)

New to AI evaluation? This track teaches the same eval fundamentals through the lens of a consumer AI product — a language learning assistant that holds conversations, gives feedback, and adapts to the learner's level.

Open-ended outputs, tricky quality definitions, and a use case most people intuitively understand — making it the ideal entry point if you're new to AI evals or working in consumer AI.

📝 Content in progress — lessons and exercises for this track are being added regularly. Watch the repo to get notified when new content drops.


📁 What's in the Repo

use-cases/
  menu-verification/    Intermediate track — menu verification at a food delivery company
    lessons/            Lesson content (D1–D21)
    exercises/          CSV datasets you'll analyse during exercises
    scoring-rubrics.md  PM Decision Point rubrics (used by Claude, not shown to you)
    meta.md             Track title, level, and description
  language-tutor/       Beginner track — building a conversational language tutor
    lessons/            (coming soon)
    exercises/          (coming soon)
tutor/                  Session protocol — Claude's tutoring instructions
progress/               Your local progress — gitignored, never leaves your machine
CLAUDE.md               Course configuration — Claude reads this on startup

⭐ Stay Updated

Found this course useful? Star the repo ⭐ — it saves it to your GitHub profile for easy reference, it helps others discover it, and it massively helps me.

This course is actively updated based on feedback from real learners — new lessons, new use cases, fixes, and improvements ship regularly. To get notified the moment an update drops, click Watch → Custom → Releases at the top of this page.


📚 Further Reading & Acknowledgements

This course stands on the shoulders of practitioners who've shared their teachings publicly. If you want to go deeper, these are the sources that most shaped what you just learned:


📄 License

CC BY-NC-SA 4.0 — Free to use and adapt for non-commercial purposes with attribution.

About

Learn to evaluate AI products for production — 21 hands-on lessons on evals, metrics, fairness, agents, red teaming, and release decisions for working PMs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors