AI Evals for Product Development

Shane Butler

Principal Data Scientist, AI Evaluations

Master the data-driven workflow behind reliable, high-quality AI products.

This course equips you with the capability to measure, diagnose, and improve AI features in production. Specifically, it is the evolution of product data science and analytics for a world where AI product behavior is probabilistic, multi-step, and harder to observe.

The focus is on measuring whether AI features actually deliver user value and business impact, and knowing what to do when they do not.

If you already have product analytics foundations, the course shows how to adapt those foundations for AI systems. You do not need to be an AI specialist. Anyone can learn this. This however is not a foundations course about AI. You will get the most out of it if you are actively working on, or about to ship, AI features in a real product.

-> -> -> Join our AI Builders Slack community
Work through AI eval design, learn Agentic Analytics workflows, and get feedback on what you’re building from other AI builders. Click Here to Join Slack Community!

What you’ll learn

Learn practical, hands-on methods to evaluate, measure, and improve AI products using data science to make better product decisions.

Specify what to log so AI behavior can be measured and debugged in practice
Design traces that connect prompts, context, retrieval, outputs, and UX events
Assess whether an AI feature is evaluable before investing in optimization

Run structured failure analysis on real usage data and traces
Identify which failures materially prevent users from completing their task
Prioritize fixes based on user harm and product impact rather than anecdotes

Translate critical failures into measurable quality and success signals
Validate whether AI metrics predict user behavior and business results
Use segmentation and driver analysis to explain where AI delivers value

Build evaluation workflows spanning offline tests, review, and online validation
Define release gates and regression checks that work for stochastic systems
Monitor drift and quality degradation before users are impacted

Convert evaluation signals into clear ship, iterate, pause, or rollback decisions
Balance trade-offs across quality, UX risk, cost, and business outcomes
Communicate evaluation-driven decisions clearly to product and engineering leaders

Learn directly from Shane

Shane Butler

Contact

Principal Data Scientist, AI Evaluations at Ontra

Stripe, Nextdoor, PwC

Who this course is for

Product Managers: Shipping AI but can't prove impact or define good? Get a repeatable eval plan with metrics and decision rules.
Data Scientists and Analysts: Evals feel ad hoc and hard to trust? Build a system that links offline quality to user and business outcomes.
Engineers and ML Engineers: Evals live in notebooks and regressions slip? Operationalize with checks, monitoring, and tight feedback loops.

What's included

Live sessions

Learn directly from Shane Butler in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund through the second week of the course.

Course syllabus

Week 1

Apr 6—Apr 12

Week 1: Foundations of AI Product Evaluation

5 items

Week 2

Apr 13—Apr 19

Week 2: Instrumentation and Observability for AI Systems

5 items

Free resources

Schedule

Live sessions

2 hrs / week

Async content

3-5 hrs / week

Frequently asked questions

$1,500

USD

Apr 6—May 16

2 cohorts