Featured in
Lenny’s List
ML Engineer with 20 years of experience
ML Systems & Applied AI Evals Researcher


4 people enrolled last week.
🚨 The next cohort is January 2026. Enroll now and get immediate access to our course reader and community. 🚨
All students in this course get:
- 🗄️ Lifetime access to all materials!
- 🤖 6 months of unlimited access to our new AI Eval Assistant (more info below).
- 🧑🏫 8+ hours of office hours to maximize the value of live interaction.
- 🏫 Lifetime Access to a Discord community with 2k+ students and instructors.
---
Do you catch yourself asking any of the following questions while building AI applications?
1. How do I test applications when the outputs are stochastic and require subjective judgements?
2. If I change the prompt, how do I know I'm not breaking something else?
3. Where should I focus my engineering efforts? Do I need to test everything?
4. What if I have no data or customers, where do I start?
5. What metrics should I track? What tools should I use? Which models are best?
6. Can I automate testing and evaluation? If so, how do I trust it?
If you aren't sure about the answers to these questions, this course is for you.
This is a flipped classroom setting. All lectures are professionally edited and recorded with an emphasis on live office hours and student interaction.
Learn proven approaches for quickly improving AI applications. Build AI that works better than the competition, regardless of the use-case.
Understand instrumentation and observability for tracking system behavior.
Learn approaches for generating synthetic data to maximize error discovery and bootstrap product development.
Understand how to choose the right tools and vendors for you, with deep dives into the most popular solutions in the evals space.
Apply data analysis techniques to rapidly find systematic issues in your product regardless of the use case.
Master the processes and tools to annotate and analyze data quickly and efficiently.
Learn how to analyze agentic systems (tool calls, RAG, etc.) to quickly identify systematic patterns and errors.
Create evals that are customized to your product and provide immediate value, NOT generic off the shelf evals (which do not work).
Align evals with stakeholders & domain experts that allow you to scientifically trust the evals.
Create high-quality LLM-as-a-judge and code based evals with a systematic, iterative process.
Learn how to measure & debug RAG systems for retrieval relevance and factual accuracy.
Understand how to tame multi-step pipelines to identify error propagation and root-causes of errors quickly.
Master techniques that apply to multi-modal settings, including text, image, and audio interactions.
Learn how to set up automated evaluation gates in CI/CD pipelines.
Understand methods for consistent comparison across experiments, including how to prepare and maintain datasets to prevent overfitting.
Implement safety and quality control guardrails.
Develop a strong intuition of when to write an eval, and when NOT write an eval.
Learn how to design interfaces to remove friction from reviewing data and collect higher quality data with less effort.
Learn how to avoid common pitfalls surrounding team organization, collaboration, responsibilities, tools, automation, and metrics.
Engineers & PMs building AI products who are interested in moving beyond proof-of-concepts.
Those interested in moving beyond vibe-checks to data driven measurements you can trust, even when outputs are stochastic or subjective.
Founders and leaders who are unsure of the failure modes of their AI applications and where to allocate resources.
Live sessions
Learn directly from Hamel Husain & Shreya Shankar in a real-time, interactive format.
Lifetime Access to All Recordings & Materials
Revisit the materials and lectures anytime. Recordings and slides are made available to all students.
150+ Page Course Reader
We provide a course reader with detailed notes to supplement your learning and act as a future reference as you work on evals.
Lifetime Access To Discord Community
Private discord for questions, job leads, and ongoing support from the community (over 1000+ students and growing).
8+ Office Hour Q&As
Open office hours for questions and personalized feedback.
4 Homework Assignments With Solutions & Walkthroughs
Optional coding assignments & walkthrough videos so you can practice every concept.
Certificate of Completion
Share your new skills with your employer or on LinkedIn.
Detailed Vendor & Tools Workshops
Curated talks from industry experts working on evals, as well as workshops with vendors building eval tools.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
10 live sessions • 78 lessons
Jan
28
Optional: Live Office Hours 1
Jan
30
Optional: Live Office Hours 2
.png&w=768&q=75)
We'll show you why conventional approaches to product development break down when building AI and what to do instead.
Explore methods for creating evals that pinpoint where your AI is struggling, and how to prioritize improvements.
Learn to build AI products through iterative experiments rather than rigid roadmaps, with clear, measurable objectives.
Learn to break down complex AI capabilities into measurable stages that help you identify where to focus.
Live sessions
2-3 hrs / week
Lectures are professionally recorded & edited to save you time and cut out the fluff. We maximize live interaction through office hours and workshops.
Wed, Jan 28
4:30 PM—5:30 PM (UTC)
Fri, Jan 30
4:30 PM—5:30 PM (UTC)
Tue, Feb 3
4:30 PM—5:30 PM (UTC)
Optional Homework Assignments
1-2 hrs / week
Optional coding homework assignments where you implement evals from scratch. We provide all students with solutions and associated walk-throughs.

Simon Willison

Harrison Chase

Eugene Yan

Charles Frye

Bryan Bischof

George Siemens
https://www.youtube.com/watch?v=BsWxPI9UM4c

See more testimonials at https://bit.ly/eval-reviews

See more reviews at bit.ly/eval-reviews
.png&w=1536&q=75)
https://x.com/ttorres/status/1933296711658815722

This is a special tool for students and not available for sale. Experimental and for learning only.