Evals in Action With Arize

Hosted by Laurie Voss

184 students

In this video

What you'll learn

Build your first LLM-as-a-Judge evaluator

Write an eval that detects hallucinations in <10 minutes using Arize Phoenix's templates and your own custom criteria.

Trace your AI system end-to-end

Add observability to any LLM application so you can see exactly what's happening at every step, from input to output.

Choose the right evaluator for each failure mode

Learn when to use code-based checks, LLM judges, or human annotations based on what you're trying to catch.

Why this topic matters

You've learned why evals matter and what to measure. Now you need to actually build them. Most teams get stuck here because the gap between "understanding evals" and "shipping evals" feels enormous. This hands-on session bridges that gap with live code, real tools, and templates you can steal. You'll leave with working evaluators, not just concepts.

You'll learn from

Laurie Voss

Head of DevRel at Arize, co-founder, npm Inc

Laurie Voss is Head of Developer Relations at Arize AI, where he helps teams build better AI applications through observability and evaluation. Previously, he was VP of Developer Relations at LlamaIndex, Senior Data Analyst at Netlify, and co-founded npm, Inc (acquired by GitHub), where he served as COO and CTO. With 20+ years in developer tools and data analysis, Laurie brings a practical, code-first approach to AI evaluation.

Share this lesson

184 students

Share this lesson

184 students

Go deeper with a course

Featured in Lenny’s List

Building Agentic AI Applications with a Problem-First Approach