Principal Data Scientist, AI Evaluations

This course equips you with the capability to measure, diagnose, and improve AI features in production. Specifically, it is the evolution of product data science and analytics for a world where AI product behavior is probabilistic, multi-step, and harder to observe.
The focus is on measuring whether AI features actually deliver user value and business impact, and knowing what to do when they do not.
If you already have product analytics foundations, the course shows how to adapt those foundations for AI systems. You do not need to be an AI specialist. Anyone can learn this. This however is not a foundations course about AI. You will get the most out of it if you are actively working on, or about to ship, AI features in a real product.
-> -> -> For a limited time, Iโm opening 4 pilot seats for the April AI Evals for Product Development cohort. The cohort is usually $1,500; pilot seats pay $300 (80% off) in exchange for structured pre-cohort feedback from Jan through March.
Learn practical, hands-on methods to evaluate, measure, and improve AI products using data science to make better product decisions.
Specify what to log so AI behavior can be measured and debugged in practice
Design traces that connect prompts, context, retrieval, outputs, and UX events
Assess whether an AI feature is evaluable before investing in optimization
Run structured failure analysis on real usage data and traces
Identify which failures materially prevent users from completing their task
Prioritize fixes based on user harm and product impact rather than anecdotes
Translate critical failures into measurable quality and success signals
Validate whether AI metrics predict user behavior and business results
Use segmentation and driver analysis to explain where AI delivers value
Build evaluation workflows spanning offline tests, review, and online validation
Define release gates and regression checks that work for stochastic systems
Monitor drift and quality degradation before users are impacted
Convert evaluation signals into clear ship, iterate, pause, or rollback decisions
Balance trade-offs across quality, UX risk, cost, and business outcomes
Communicate evaluation-driven decisions clearly to product and engineering leaders

Principal Data Scientist, AI Evaluations at Ontra

Product Managers shipping AI-powered features
Data Scientists and Analysts responsible for measurement and evaluation
Engineers and ML Engineers building and maintaining AI systems

Live sessions
Learn directly from Shane Butler in a real-time, interactive format.
Lifetime access
Go back to course content and recordings whenever you need to.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund through the second week of the course.
Live sessions
2 hrs / week
Async content
3-5 hrs / week
$1,500
USD