Design Experiments for AI Features

Hosted by Shane Butler

Thu, Jan 29, 2026

8:00 PM UTC (1 hour)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

AI Analytics for Builders
Shane Butler, Sravya Madipalli, and Hai Guan
View syllabus

What you'll learn

Where experiments fit in the AI impact chain

A/B tests are best for linking quality changes to user behavior

Define exposure and randomization correctly

Learn eligibility, exposure, and unit of randomization so “treatment” means users actually experienced the AI.

Know when results are decision-grade

Interpret lift, variance, and segments well enough to decide ship, ramp, hold, or rollback with confidence.

Why this topic matters

A/B tests are often treated as the default proof of AI impact. For AI features, they only work under specific conditions: correct exposure, correct randomization, and the right question. This lesson shows where experiments are strong in the impact chain and what you need for results you can trust.

You'll learn from

Shane Butler

Principal Data Scientist, AI Evaluations at Ontra

Shane Butler is a Principal Data Scientist at Ontra, where he leads evaluation strategy for AI product development in the legal tech domain. He has more than ten years of experience in product data science and causal inference, with prior roles at Stripe, Nextdoor, and PwC. His current work focuses on practical, end-to-end methods for evaluating AI features in production. Shane is also the co-host of the AI podcast Data Neighbor, where he interviews product, data, and engineering leaders who are pioneering the next generation of data science and analytics in an AI-driven landscape.

Previously at Stripe, Nextdoor, PwC

Stripe
Nextdoor
PwC India
Ontra
AppFolio

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.