AI Evals for Product Development

Shane Butler

Principal Data Scientist, AI Evaluations

Master the data-driven workflow behind reliable, high-quality AI products.

This course equips you with the capability to measure, diagnose, and improve AI features in production. Specifically, it is the evolution of product data science and analytics for a world where AI product behavior is probabilistic, multi-step, and harder to observe.

The focus is on measuring whether AI features actually deliver user value and business impact, and knowing what to do when they do not.

If you already have product analytics foundations, the course shows how to adapt those foundations for AI systems. You do not need to be an AI specialist. Anyone can learn this. This however is not a foundations course about AI. You will get the most out of it if you are actively working on, or about to ship, AI features in a real product.

-> -> -> For a limited time, Iโ€™m opening 4 pilot seats for the April AI Evals for Product Development cohort. The cohort is usually $1,500; pilot seats pay $300 (80% off) in exchange for structured pre-cohort feedback from Jan through March.

Click to Apply to Participate in Pilot Program

What youโ€™ll learn

Learn practical, hands-on methods to evaluate, measure, and improve AI products using data science to make better product decisions.

  • Specify what to log so AI behavior can be measured and debugged in practice

  • Design traces that connect prompts, context, retrieval, outputs, and UX events

  • Assess whether an AI feature is evaluable before investing in optimization

  • Run structured failure analysis on real usage data and traces

  • Identify which failures materially prevent users from completing their task

  • Prioritize fixes based on user harm and product impact rather than anecdotes

  • Translate critical failures into measurable quality and success signals

  • Validate whether AI metrics predict user behavior and business results

  • Use segmentation and driver analysis to explain where AI delivers value

  • Build evaluation workflows spanning offline tests, review, and online validation

  • Define release gates and regression checks that work for stochastic systems

  • Monitor drift and quality degradation before users are impacted

  • Convert evaluation signals into clear ship, iterate, pause, or rollback decisions

  • Balance trade-offs across quality, UX risk, cost, and business outcomes

  • Communicate evaluation-driven decisions clearly to product and engineering leaders

Learn directly from Shane

Shane Butler

Shane Butler

Principal Data Scientist, AI Evaluations at Ontra

Stripe, Nextdoor, PwC
Stripe
Nextdoor
PwC India
Ontra
AppFolio

Who this course is for

  • Product Managers shipping AI-powered features

  • Data Scientists and Analysts responsible for measurement and evaluation

  • Engineers and ML Engineers building and maintaining AI systems

What's included

Shane Butler

Live sessions

Learn directly from Shane Butler in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund through the second week of the course.

Course syllabus

Week 1

Apr 6โ€”Apr 12

    Week 1: Foundations of AI Product Evaluation

    5 items

Week 2

Apr 13โ€”Apr 19

    Week 2: Instrumentation and Observability for AI Systems

    5 items

Free resources

Schedule

Live sessions

2 hrs / week

Async content

3-5 hrs / week

$1,500

USD

Apr 6โ€”May 16
Enroll