AI Evals and Analytics Playbook

New
·

2 Weeks

·

Cohort-based Course

Create your first AI evals and analytics playbook. Learn how to plan, test, launch, and scale trustworthy AI products confidently.

Previously at

Shopify.com
Carvana
Harvard University
University of Washington
UCLA

Course overview

Build Your First AI Evals & Analytics Playbook

Go beyond building AI products. You’ll learn how to integrate AI evals directly into your current AI application and workflow, ensuring every feature you ship is measurable, reliable, and aligned with your product goals.


Our goal

You come in with a product use case and walk away with an actionable, clear plan to evaluate and continuously analyze your AI product.


Is this you?

This accelerated course is specifically crafted for senior PMs, product leaders, data scientists, and AI engineers who are already building with AI and looking to implement AI Evals and Analytics with a practical framework and playbook.


Couse Plan

------------------------------------------------------

Lesson 1: The AI Evaluation Framework

- Understanding the AI evaluation framework

- Differentiating Eval, Analytics, and xAI

- Integrating AI evaluation into your product lifecycle

- Exercise: Section 1 of your playbook


Lesson 2: How to Evaluate AI Products

- Human evaluation methodologies

- Using LLM-as-a-judge approaches

- Mapping AI evaluation work into the development cycle

- Designing rubrics and metrics for evaluation

- Exercise: Section 2 of your playbook


Lesson 3: Experiment Design and AI Evaluation Tools

- Evaluator model selection and usage

- Experiment design for AI evaluation

- Overview of popular eval tools: DeepEval, RAGAs, Langsmith

- Exercise: Sections 2 & 3 of your playbook


Lesson 4: Product Analytics & Build Your First Evals Team

- Key concepts in product analytics

- Product monitoring practices for AI products

- Understand leading & lagging indicators

- Conduct post-launch review and analytics

- How to build your first AI Evals team

- Exercise: Sections 3 & 4 of your playbook


This is an accelerated course. We've condensed four weeks of material into two weeks so you can learn everything at an intensive pace.


Course Highlights & Benefits

------------------------------------------------------

Real-World Case Studies and Methods Explore proven evaluation approaches from teams shipping production AI. Learn when to use LLM-as-a-judge vs. human reviewers, how to balance quality with velocity, and what actually matters for your specific use case.


Battle-Tested Industry Expertise Learn from instructors with 20+ years combined experience in data science and ML products, plus 2+ years pioneering AI Evals since the launch of ChatGPT.


Complete AI Evaluation Playbook Get lifetime access to templates, frameworks, rubrics, and an AI Evals glossary. Everything you need to implement immediately. Actionable. Application-focused. Practical.


Practical, Project-Driven Workshops Build your own evaluation playbook during 4 sessions (2 hours each) over 2 weeks. Work on your actual product while getting direct feedback on your specific challenges and methodology.


🎁 Founding Cohort Benefits Join our first cohort and help shape the course. Receive personalized guidance, in-depth discussions on your evaluation challenges, and access to our professional community of product leaders, data practitioners, and AI builders.



Are you shipping AI products without knowing if they actually work—or if they're safe?

Time to work on AI Evals and Analytics.

-- Stella and Amy

Who is this course for

01

Product leaders building AI products who need a clear playbook for evaluation frameworks, success metrics, and shipping with confidence.

02

Data scientists looking to level up with AI product skills. Learn how to do AI evals, LLM-as-a-judge validation, and AI-specific metrics.

03

Engineers building AI products ready to move beyond manual testing—learn production-grade evaluation systems that let you ship fast and safe

What you’ll get out of this course

Build Your First AI Evals Team

Create clear ownership: who writes rubrics, validates metrics, and holds veto power. Keep Legal, Trust & Safety, and SMEs aligned without evaluation becoming a bottleneck.

Execute Comprehensive Pre-launch Testing

Run sniff tests, build quantitative evaluation pipelines, and design experiments that prove your AI beats the baseline. Know when to use human labels vs. LLM-as-a-judge.

Design Experiments for AI products

Handle stochastic outputs and subjective quality with proper experiment design. Choose the right methodology, set sample sizes, and define guardrails that protect business metrics.

Monitor Product and Catch Issues Early

Set up leading indicators (retry rates, confidence scores) and lagging metrics (CSAT, cost). Build escalation procedures and run structured post-launch reviews at 15, 30, and 60 days.

Walk Away with Execution-ready Playbook

Build your first AI Evals and Analytics playbook for your current product use case. Start shipping confidently and safely.

What’s included

Live sessions

Learn directly from Stella Liu & Amy Chen in a real-time, interactive format.

Your First AI Evals and Analytics Playbook

Create your first AI Evals playbook and apply it on your current projects.

Glossary Sheet

Master the terminology with clear definitions and practical examples for every key concept in AI Evals.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

4 live sessions • 4 lessons • 5 projects

Week 1

Oct 27—Nov 2

    Getting Ready for Session 1

    2 items

    Oct

    27

    Session 1 - The AI Evaluation Framework

    Mon 10/2710:00 PM—12:00 AM (UTC)

    Getting Ready for Session 2

    2 items

    Oct

    31

    Session 2 - How to Evaluate AI Products

    Fri 10/3110:00 PM—12:00 AM (UTC)

Week 2

Nov 3—Nov 9

    Getting Ready for Session 3

    2 items

    Nov

    3

    Session 3 - Experiment Design and AI Evaluation Tools

    Mon 11/311:00 PM—1:00 AM (UTC)

    Getting Ready for Session 4

    2 items

    Nov

    7

    Session 4 - Product Monitoring & Analytics

    Fri 11/711:00 PM—1:00 AM (UTC)

Post-course

    Submit your Playbook

    1 item

Bonus

Meet your instructors

Stella Liu

Stella Liu

Learn practical skills from industry AI Eval pioneer

Stella Liu is co-founder of AI Evals & Analytics and an AI Evaluation scientist and researcher, specializing in frameworks for large language models and AI-powered products.


Since 2023, she has led real-world AI evaluation projects in EdTech, where she established the first AI product evaluation framework for Higher Education and continues to advance research on the safe and responsible use of AI. Her work combines academic rigor with hands-on product experience, bringing proven evaluation methods into both enterprise and educational contexts.


Earlier in her career, Stella worked at Shopify and Carvana, where she built large-scale data-driven automation systems that powered product innovation and operational efficiency at scale.


She is Top 1% Mentor in Data Science on ADPList.


Follow Stella on LinkedIn

Shopify
Carvana
Harvard University
Arizona State University
Amy Chen

Amy Chen

Learn analytics skills from industry AI and data practitioner

Amy Chen is co-founder of AI Evals & Analytics and an AI partner helping companies with AI engineering, product development, and go-to-market strategy. With over 10 years of experience spanning data science, product management, ML engineering, and GTM, she brings versatile expertise to startups at every stage.


She is Top 1% Mentor in AI/ML Engineering and mentored over 300 data scientists and analysts on ADPList. She posts regularly about AI and data science on LinkedIn and has over 9.5k followers.


Follow Amy on LinkedIn

System1
Uptake
Seasalt.ai
UCLA
University of Washington
A pattern of wavy dots

Join an upcoming cohort

AI Evals and Analytics Playbook

Cohort 1

$2,250

Dates

Oct 27—Nov 9, 2025

Payment Deadline

Oct 27, 2025
Get reimbursed

Course schedule

7-8 hours per week

  • Live Interactive Lectures + Workshop

    Intensive sessions with frameworks, tactics, and exercises.


A pattern of wavy dots

Join an upcoming cohort

AI Evals and Analytics Playbook

Cohort 1

$2,250

Dates

Oct 27—Nov 9, 2025

Payment Deadline

Oct 27, 2025
Get reimbursed

$2,250

USD

2 Weeks