Design an AI Evaluation Plan

Hosted by Shane Butler

Thu, Mar 5, 2026

6:00 PM UTC (1 hour)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

AI Evals for Product Development
Shane Butler
View syllabus

What you'll learn

Break an AI feature into evaluable steps

Decompose a real AI feature into inputs, context, outputs, UX, and outcomes, and list what to evaluate at each step.

Define evidence that supports a ship decision

For each step, specify what evidence would justify ship, iterate, or stop, and what “evidence” is misleading.

Identify gaps that block confident evaluation

Spot what’s missing in logging, annotation, or metrics, then capture it as clear next steps in the plan.

Join AI Builders Slack Community. Link: bit.ly/ai-connect

Learn Agentic Analytics workflows and AI Eval design. Get feedback on what you’re building from other AI builders.

Why this topic matters

Most teams jump into metrics or cherry-picked examples before they agree on what they’re evaluating and what evidence would support a decision. This session teaches a practical way to design an evaluation plan for one AI feature: what to evaluate end-to-end, what evidence matters, and what gaps you must close before results are trustworthy.

You'll learn from

Shane Butler

Principal Data Scientist at Ontra

Shane Butler is a Principal Data Scientist at Ontra, where he leads evaluation strategy for AI product development in the legal tech domain. He has more than ten years of experience in product data science and causal inference, with prior roles at Stripe, Nextdoor, and PwC. Shane is also the co-host of the AI podcast Data Neighbor, where he interviews product, data, and engineering leaders who are pioneering the next generation of data science and analytics in an AI-driven landscape.

Previously at Stripe, Nextdoor, PwC

Stripe
Nextdoor
Ontra
PwC India
AppFolio

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.