What Makes a Good Search Agent?
Mon, Jul 13, 2026
6:00 PM UTC (1 hour)
Virtual (Zoom)
Free to join
Go deeper with a course
Featured in Lenny’s List
AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar
ML Engineer with 20+ years of experience.. ML Systems Researcher Making AI Evaluation Work in Practice
Mon, Jul 13, 2026
6:00 PM UTC (1 hour)
Virtual (Zoom)
Free to join
Go deeper with a course
Featured in Lenny’s List
AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar
ML Engineer with 20+ years of experience.. ML Systems Researcher Making AI Evaluation Work in Practice
What you'll learn
Build benchmarks that test search agents
How to construct benchmarks that hold up, using BrowseComp-Plus and MAST to evaluate agents on hard, agentic browsing tasks.
Generate training data on a budget
How ORBIT synthesizes multi-constraint queries cheaply, so you can train search agents that generalize beyond one narrow domain.
Read agent trajectories to find what's wrong
Why a search agent's trajectory explains its performance, with a demo of the Hawkeye tool for analyzing them.
Why this topic matters
Search agents query, retrieve, and reason over external sources to answer knowledge-heavy questions. The hard part is knowing whether one is any good. Nandan will show you how to build benchmarks that measure search-agent quality, how to synthesize training data on a budget, and how to use trajectory analysis to debug these agents.
You'll learn from
Nandan Thakur
Creator of the BEIR and MIRACL benchmarks; PhD, University of Waterloo
Nandan Thakur is an information-retrieval and RAG researcher who completed his PhD at the University of Waterloo in 2026, advised by Jimmy Lin. He created the BEIR and MIRACL benchmarks, now standard for evaluating retrieval across domains and languages, and his recent work focuses on building and evaluating search agents. He has interned at Google, Vectara, and Databricks, and collaborated with Snowflake, Microsoft, and Huawei. Website
Hamel Husain
ML Engineer with 20+ years of experience
Hamel Husain is a ML Engineer with 20+ years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.