2 Weeks
·Cohort-based Course
Create your first AI evals and analytics playbook. Learn how to plan, test, launch, and scale trustworthy AI products confidently.
2 Weeks
·Cohort-based Course
Create your first AI evals and analytics playbook. Learn how to plan, test, launch, and scale trustworthy AI products confidently.
Previously at
Course overview
Go beyond building AI products. You’ll learn how to integrate AI evals directly into your current AI application and workflow, ensuring every feature you ship is measurable, reliable, and aligned with your product goals.
Our goal
You come in with a product use case and walk away with an actionable, clear plan to evaluate and continuously analyze your AI product.
Is this you?
This accelerated course is specifically crafted for senior PMs, product leaders, data scientists, and AI engineers who are already building with AI and looking to implement AI Evals and Analytics with a practical framework and playbook.
Couse Plan
------------------------------------------------------
Lesson 1: The AI Evaluation Framework
- Understanding the AI evaluation framework
- Differentiating Eval, Analytics, and xAI
- Integrating AI evaluation into your product lifecycle
- Exercise: Section 1 of your playbook
Lesson 2: How to Evaluate AI Products
- Human evaluation methodologies
- Using LLM-as-a-judge approaches
- Mapping AI evaluation work into the development cycle
- Designing rubrics and metrics for evaluation
- Exercise: Section 2 of your playbook
Lesson 3: Experiment Design and AI Evaluation Tools
- Evaluator model selection and usage
- Experiment design for AI evaluation
- Overview of popular eval tools: DeepEval, RAGAs, Langsmith
- Exercise: Sections 2 & 3 of your playbook
Lesson 4: Product Analytics & Build Your First Evals Team
- Key concepts in product analytics
- Product monitoring practices for AI products
- Understand leading & lagging indicators
- Conduct post-launch review and analytics
- How to build your first AI Evals team
- Exercise: Sections 3 & 4 of your playbook
This is an accelerated course. We've condensed four weeks of material into two weeks so you can learn everything at an intensive pace.
Course Highlights & Benefits
------------------------------------------------------
✅ Real-World Case Studies and Methods Explore proven evaluation approaches from teams shipping production AI. Learn when to use LLM-as-a-judge vs. human reviewers, how to balance quality with velocity, and what actually matters for your specific use case.
✅ Battle-Tested Industry Expertise Learn from instructors with 20+ years combined experience in data science and ML products, plus 2+ years pioneering AI Evals since the launch of ChatGPT.
✅ Complete AI Evaluation Playbook Get lifetime access to templates, frameworks, rubrics, and an AI Evals glossary. Everything you need to implement immediately. Actionable. Application-focused. Practical.
✅ Practical, Project-Driven Workshops Build your own evaluation playbook during 4 sessions (2 hours each) over 2 weeks. Work on your actual product while getting direct feedback on your specific challenges and methodology.
🎁 Founding Cohort Benefits Join our first cohort and help shape the course. Receive personalized guidance, in-depth discussions on your evaluation challenges, and access to our professional community of product leaders, data practitioners, and AI builders.
Are you shipping AI products without knowing if they actually work—or if they're safe?
Time to work on AI Evals and Analytics.
-- Stella and Amy
01
Product leaders building AI products who need a clear playbook for evaluation frameworks, success metrics, and shipping with confidence.
02
Data scientists looking to level up with AI product skills. Learn how to do AI evals, LLM-as-a-judge validation, and AI-specific metrics.
03
Engineers building AI products ready to move beyond manual testing—learn production-grade evaluation systems that let you ship fast and safe
Build Your First AI Evals Team
Create clear ownership: who writes rubrics, validates metrics, and holds veto power. Keep Legal, Trust & Safety, and SMEs aligned without evaluation becoming a bottleneck.
Execute Comprehensive Pre-launch Testing
Run sniff tests, build quantitative evaluation pipelines, and design experiments that prove your AI beats the baseline. Know when to use human labels vs. LLM-as-a-judge.
Design Experiments for AI products
Handle stochastic outputs and subjective quality with proper experiment design. Choose the right methodology, set sample sizes, and define guardrails that protect business metrics.
Monitor Product and Catch Issues Early
Set up leading indicators (retry rates, confidence scores) and lagging metrics (CSAT, cost). Build escalation procedures and run structured post-launch reviews at 15, 30, and 60 days.
Walk Away with Execution-ready Playbook
Build your first AI Evals and Analytics playbook for your current product use case. Start shipping confidently and safely.
Live sessions
Learn directly from Stella Liu & Amy Chen in a real-time, interactive format.
Your First AI Evals and Analytics Playbook
Create your first AI Evals playbook and apply it on your current projects.
Glossary Sheet
Master the terminology with clear definitions and practical examples for every key concept in AI Evals.
Lifetime access
Go back to course content and recordings whenever you need to.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
4 live sessions • 4 lessons • 5 projects
Oct
27
Oct
31
Nov
3
Nov
7
Learn practical skills from industry AI Eval pioneer
Stella Liu is co-founder of AI Evals & Analytics and an AI Evaluation scientist and researcher, specializing in frameworks for large language models and AI-powered products.
Since 2023, she has led real-world AI evaluation projects in EdTech, where she established the first AI product evaluation framework for Higher Education and continues to advance research on the safe and responsible use of AI. Her work combines academic rigor with hands-on product experience, bringing proven evaluation methods into both enterprise and educational contexts.
Earlier in her career, Stella worked at Shopify and Carvana, where she built large-scale data-driven automation systems that powered product innovation and operational efficiency at scale.
She is Top 1% Mentor in Data Science on ADPList.
Learn analytics skills from industry AI and data practitioner
Amy Chen is co-founder of AI Evals & Analytics and an AI partner helping companies with AI engineering, product development, and go-to-market strategy. With over 10 years of experience spanning data science, product management, ML engineering, and GTM, she brings versatile expertise to startups at every stage.
She is Top 1% Mentor in AI/ML Engineering and mentored over 300 data scientists and analysts on ADPList. She posts regularly about AI and data science on LinkedIn and has over 9.5k followers.
Join an upcoming cohort
Cohort 1
$2,250
Dates
Payment Deadline
7-8 hours per week
Live Interactive Lectures + Workshop
Intensive sessions with frameworks, tactics, and exercises.
Join an upcoming cohort
Cohort 1
$2,250
Dates
Payment Deadline