4.7 (879)
Featured in
Lenny’s List
ML Engineer with 25 years of experience
ML Systems & Applied AI Evals Researcher

12 people enrolled last week.
🏆 The most field-tested evals course available. We've refined this exact material over a year of cohorts with 4,500+ engineers and PMs from teams like OpenAI, Google, Meta, Amazon, and Microsoft, folding their feedback in every time. Read what students say →
More than a course: an ongoing system, tools, and community for shipping better AI.
🎮 A private Discord community for ongoing support, even after the course.
🤖 6 months of access to our AI Evals assistant.
♾️ Lifetime access to all recordings, materials, and future cohorts.
💬 10+ hours of live office hours to get your questions answered.
Do you catch yourself asking any of these while building AI applications?
How do I test outputs that need subjective judgment?
If I change the prompt, how do I know I am not breaking something else?
Where should I focus my efforts? Do I need to test everything?
What if I have no data or customers yet? Where do I start?
What should I measure, and what tools should I use?
Can I automate evaluation, and how do I trust it?
If so, this is for you. All sessions are live and recorded.
Build a real AI agent, find where it breaks, and improve it with evals you can trust, working the full loop hands-on.
Instrument a real agent so every run leaves a trace you can inspect.
Turn vague failures into specific, reproducible cases with a root cause.
Set up logging and observability that show what the agent actually did.
Replace random spot-checking with a repeatable way to read traces and spot failures.
Group and prioritize failure modes so you fix what matters first.
Learn how to analyze agentic systems, including tool calls and retrieval.
Design and validate LLM-as-judge and code-based evals that match expert judgment.
Learn when a metric is real and when it is noise no one should act on.
Align evaluators with the people who own the product, so the results stick.
Wire an agent into a test suite so prompt, model, and tool changes get checked before they ship.
Compare experiments consistently and keep datasets from overfitting.
Monitor agents in production and catch drift before users do.
Probe for prompt injection, jailbreaks, and unsafe tool calls.
Add guardrails and human checks that hold up under attack.
Map an agent's attack surface so you know where it can be pushed.
Run experiments that raise accuracy and lower latency and cost.
Show which change moved the metric, with numbers.
Optimize prompts, models, and retries without breaking what already works.
ML Engineer with 20 years of experience.


ML Systems Researcher Making AI Evaluation Work in Practice
Engineers and PMs who ship prompt changes and hope nothing breaks. (You'll learn to measure impact before and after every change.)
Teams still spot-checking AI outputs by hand instead of measuring systematically. (You'll learn how build automated evals you can trust.)
Leaders who don't know where their AI is failing or where to invest resources. You'll learn how to systematically find & prioritize issues.
Live sessions
Learn directly from Hamel Husain & Shreya Shankar in a real-time, interactive format.
Lifetime Access to All Recordings & Materials
Revisit the materials and lectures anytime. Recordings and slides are made available to all students.
150+ Page Course Reader
We provide a course reader with detailed notes to supplement your learning and act as a future reference as you work on evals.
Lifetime Access To Discord Community
Private discord for questions, job leads, and ongoing support from the community (over 1000+ students and growing).
10+ Office Hour Q&As
Open office hours for questions and personalized feedback.
4 Homework Assignments With Solutions & Walkthroughs
Optional coding assignments & walkthrough videos so you can practice every concept.
Certificate of Completion
Share your new skills with your employer or on LinkedIn.
Detailed Vendor & Tools Workshops
Curated talks from industry experts working on evals, as well as workshops with vendors building eval tools.
Maven Guarantee
Your purchase is backed by the Maven Guarantee.
17 live sessions • 9 lessons
Sep
5
Lecture 1: Building Agents, Foundations
Sep
9
Lecture 2: Building Agents, Designing for Evaluability
Sep
12
Lecture 3: Error Analysis, Finding Failures
Sep
10
Office Hours
Sep
12
Office Hours
Live sessions
3-5 hrs / week
Lectures are delivered live but also recorded so you can watch the materials at your own pace. We also provide over 10 hours of office hours and a community where you can ask questions (even after the course ends!).
Sat, Sep 5
6:00 PM—7:00 PM (UTC)
Wed, Sep 9
3:00 PM—4:00 PM (UTC)
Sat, Sep 12
6:00 PM—7:00 PM (UTC)
Optional Homework Assignments
1-2 hrs / week
Optional coding homework assignments where you implement evals from scratch. We provide all students with solutions and associated walk-throughs.

Simon Willison

Harrison Chase

Eugene Yan

Charles Frye

Bryan Bischof

George Siemens
A one-hour conversation on error analysis.

See more testimonials at https://bit.ly/eval-reviews

See more reviews at bit.ly/eval-reviews
.png&w=1536&q=75)
https://x.com/ttorres/status/1933296711658815722

This is a special tool for students and not available for sale. Experimental and for learning only.
Maven for Teams
Reimbursement
Get your company to pay
Everything L&D needs: email template, receipts, and certificate of completion.
Get reimbursedTeam discount
Learn with your teammates
Save 20%+ when 2 or more teammates enroll in the same cohort.
Save 20%+ with a teamPrivate cohort
Run a cohort for your org
A dedicated cohort with a custom schedule and curriculum, tailored to your team.
Book a private cohort$4,200
USD