Part 3: Building Robust Evaluations for AI Agents

Free Lesson

Part 3: Building Robust Evaluations for AI Agents

Part of From Automation to Multi-Agent Architectures

•

Hosted by Hamza Farooq and Gabriela de Queiroz

154 students

In this video

What you'll learn

Define What “Good” Looks Like for AI Agents

Identify success metrics for agent reasoning, actions, and outcomes—going far beyond simple accuracy scores.

Design Practical, Real-World Agent Evaluations

Build task-based, behavioral, and regression-style evals that reflect how agents actually operate in production.

Use Evaluations to Ship with Confidence

Apply eval results to debug failures, compare agent versions, and iterate safely without breaking existing behavior.

Why this topic matters

AI agents often seem to work, until real users, edge cases, and scale expose their failures. This matters because without proper evaluation, teams ship systems they can’t trust or improve. Robust evals turn agent systems from impressive demos into reliable, measurable, and production-ready products.

You'll learn from

Hamza Farooq

Founder & Adjunct Professor | 15+ years | Google | Stanford | UCLA

I am the founder of Traversaal.ai, an LLM-based startup dedicated to building scalable, customizable, and cost-efficient AI solutions for enterprises. My work focuses on practical, production-ready AI, far from the flashy, overhyped demos that fail in real-world environments.

With over 15 years of experience in machine learning, my career has spanned three continents and seven countries, working across industries like tech, telecommunications, finance, and retail. As a former Senior Research Manager at Google and Walmart Labs, I have led teams specializing in optimization, NLP, recommender systems, and time series forecasting.

Currently, I serve as an adjunct professor at Stanford Continuing Studies and UCLA, where I bridge the gap between academic research and real-world AI applications. My passion lies in educating the next generation of AI practitioners and enabling organizations to build enterprise-grade AI solutions that are scalable and efficient.

Beyond academia, I have been recognized as a Top AI Voice on LinkedIn multiple times, with my work even being featured in Times Square. Last year, Meta awarded me a $100K AI Grant for my contributions to the field—further validating my commitment to advancing AI research and deployment.

I regularly speak at conferences, lead training sessions, and consult on AI strategy, covering topics like large language models, deep learning, cloud computing, and AI deployment at scale. My mission is simple: help organizations cut through the noise and build AI solutions that actually work in production.

Gabriela de Queiroz

Ex-Microsoft & IBM AI leader | AI Advisor for Startups

Gabriela de Queiroz is the Founder of f02 labs, where she delivers AI Strategy and Developer Advocacy as a Service to help startups accelerate visibility, product adoption, and market awareness. Previously Director of AI at Microsoft, she advised hundreds of startups on building with AI and driving product adoption, and earlier led AI strategy and open-source initiatives at IBM.

In addition to her industry leadership, Gabriela has taught for Coursera, EdX, and DataCamp, where her courses have reached over 300k learners worldwide. She also founded R-Ladies and AI Inclusive, global communities empowering over 200,000 members.

Previous attendees from: