Does Your AI Actually Work? Build the Evals That Prove It.

Free Lesson

Does Your AI Actually Work? Build the Evals That Prove It.

Hosted by Will Lowrey

99 students

In this video

What you'll learn

Learn how to build evals, not a vibe check

The three moves: assert what's deterministic, judge the rest against a rubric, and calibrate it to a human.

Write a rubric that catches what matters

Good vs bad rubrics on a real example (scoring a "tell me about yourself"), plus the red-team cases that expose bias.

Leverage other models to pressure test your prompts

It's not enough to let the model that built the prompt, test the prompt. Pit Claude vs Codex vs Gemini vs Grok

Why this topic matters

You shipped the AI feature and everyone's using it. But is it actually good? Most people are checking complaints and trusting their gut. If you want to really know, you need to build evals: a rubric for what "good" means, test cases built to break it, and a judge calibrated to their own judgment. This session shows you how, live, on an example you're familiar with. Build better AI products!

You'll learn from

Will Lowrey

20 years in product leadership. Indeed, Bazaarvoice, startups. 200+ PMs coached.

Will spent a decade in product leadership at Bazaarvoice (0-to-1 products to $20M+) and Indeed (Head of Product, Seen by Indeed, ML matching foundation tied to $500M+ in revenue). He has coached 200+ PMs through career transitions and the shift to working with AI, runs enterprise AI workshops for teams at companies like Afiniti, and runs his own business on AI systems he built. He is not the furthest ahead in AI. He is the person who is really good at getting you there.

Previously at