How to Setup Evals For Agents

Lightning Lessons

Practical skills & tools to accelerate your career

How to Setup Evals For Agents

Hosted by Harrison Chase and Hamel Husain

424 students

What you'll learn

Strategies for measuring dynamic systems

Agents don't have a fixed behavior. We'll show you analytical tools you can use to tame them.

Evaluation Metrics For Agents

Principles and guidance for defining metrics for evaluating agents.

How to use evals to iterate & improve upon agents

Leverage evals to drive real improvements in your system.

Why this topic matters

This matters because it reflects a fundamental shift in how we build and maintain AI systems—especially autonomous or semi-autonomous agents like AI copilots, task bots, or decision-making systems. 1. Agents Don’t Have a Fixed Behavior 2. Agents Interact With the Real World 3. Evaluation Metrics Are Complex 4. Iterative Improvements Drive Performance 5. 5. Helps Build Trust and Reliability

You'll learn from

Harrison Chase

Co-Founder and CEO at LangChain

Harrison is the co-founder and CEO of LangChain, an open-source framework designed to help developers build applications powered by large language models (LLMs). With a background in machine learning and deep learning systems, Harrison previously worked at Robust Intelligence and Google. He launched LangChain in 2022 to simplify the creation of intelligent agents and context-aware applications, rapidly becoming a leading figure in the emerging AI tooling ecosystem. His work focuses on enabling more powerful, composable, and production-ready LLM applications.

Hamel Husain

ML Engineer with 20 years of experience

Hamel is a machine learning engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

ML Systems Researcher Making AI Evaluation Work in Practice

Shreya Shankar is an experienced ML Engineer who is currently a PhD candidate in computer science at UC Berkeley, where she builds systems that help people use AI to work with data effectively. Her research focuses on developing practical tools and frameworks for building reliable ML systems, with recent groundbreaking work on LLM evaluation and data quality. She has published influential papers on evaluating and aligning LLM systems, including "Who Validates the Validators?" which explores how to systematically align LLM evaluations with human preferences.

Prior to her PhD, Shreya worked as an ML engineer in industry and completed her BS and MS in computer science at Stanford. Her work appears in top data management and HCI venues including SIGMOD, VLDB, and UIST. She is currently supported by the NDSEG Fellowship and has collaborated extensively with major tech companies and startups to deploy her research in production environments. Her recent projects like DocETL and SPADE demonstrate her ability to bridge theoretical frameworks with practical implementations that help developers build more reliable AI systems.

Share this lesson

424 students

Share this lesson

424 students

Go deeper with a course

Featured in Lenny’s List

AI Evals For Engineers & PMs