Evals for Voice AI: Learnings from Google Evals Team

Hosted by Ravin Kumar

213 students

In this video

What you'll learn

Multimodal Evals Challenges

Gain practical insights for building, benchmarking, or researching evals in voice and other multimodal contexts

Design Voice AI Evals

Learn from practical experience of evals for real-world voice-based conversational AI product like NotebookLM

Evaluation Driven Development

Use evals for real-world conversational quality, setting evals metrics and guardrails in voice systems.

Why this topic matters

How to do AI evals for voice-based products? How do evaluation frameworks shape the development, reliability, and user experience of emerging voice AI systems. Using Google’s NotebookLM as a case study, we’ll dive into practical insights on designing evals that capture real-world conversational quality, reasoning depth, and responsiveness in multimodal and voice-based contexts.

You'll learn from

Ravin Kumar

Sr. Researcher at Google Deepmind

Ravin is Sr. Researcher at Google Deepmind. He worked on LLMs, Trust and Safety, and LLM Trust and Safety.

Ravin has over a decade of experience in applying math, statistics, and computation to solve challenging problems in various domains, such as aerospace, food, and open source.

Ravin is also a core contributor to PyMC, the leading platform for statistical data science. He is passionate about making math accessible and useful for everyone. I have written a book and published several papers on this topic.

ravinkumar.com

Google DeepMind
Google
sweetgreen
SpaceX

Go deeper with a course

AI Evals and Analytics Playbook
Stella Liu and Amy Chen
View syllabus