Redesign Your Product Metrics for AI Evals

Hosted by Shane Butler

Wed, Dec 17, 2025

9:00 PM UTC (30 minutes)

Virtual (Zoom)

Free to join

Invite your network

What you'll learn

Understand How Product Data Science Traditionally Evaluates

Learn how teams use metrics, funnels, and experiments to measure feature impact in deterministic products.

Identify Why Traditional Methods Break for AI Features

See how probabilistic outputs, ambiguous correctness, and multi-step pipelines undermine traditional analytics.

Apply a New Simple Mental Model for Evaluating AI Features

Learn a practical model for evaluating AI across inputs, context, model behavior, output quality, and user value.

Why this topic matters

AI features break the assumptions behind traditional product metrics. Funnels, experiments, and success metrics often give misleading signals because AI outputs are probabilistic and multi-step. Understanding these emerging gaps in the evaluation of AI quality helps PMs, engineers, and data teams avoid misreads, design better evaluation workflows, and make more confident product decisions.

You'll learn from

Shane Butler

Principal Data Scientist, AI Evaluations at Ontra

Shane Butler is a Principal Data Scientist at Ontra, where he leads evaluation strategy for AI product development in the legal tech domain. He has more than ten years of experience in product data science and causal inference, with prior roles at Stripe, Nextdoor, and PwC. His current work focuses on practical, end-to-end methods for evaluating AI features in production. Shane is also the co-host of the AI podcast Data Neighbor, where he interviews product, data, and engineering leaders who are pioneering the next generation of data science and analytics in an AI-driven landscape.

Previously at Stripe, Nextdoor, PwC

Stripe
Nextdoor
Ontra
PwC UK
AppFolio

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.