Evaluating AI Agents

Free Lesson

Evaluating AI Agents

Part of Building Production AI Systems

•

Hosted by Amir Feizpour and Samuel Dion-Girardeau

1,447 students

In this video

What you'll learn

Measuring AI in Uncertain Domains

Evaluating AI in markets is tough since no single "right" answer exists. Success requires creative metrics.

Going From Accuracy to Market Relevance

Beyond F1 scores, using market-centric metrics like correlation helps align AI behavior with financial realities.

How to Use Meta-Evaluation and Practical Tools

Defining "good enough" using tools like MLflow and Langchain/ Langgraph matters; metrics must stand up to scrutiny.

Why this topic matters

An AI agent was created to surface the most relevant companies for any given theme, but evaluating its performance turned out to be tricky. In this talk, the challenge of designing a solid evaluation framework is explored. Early benchmarks using F1 scores and MSE often punished good picks, so the approach was refined, leading to stronger evaluations and more confidence in the results.

You'll learn from

Amir Feizpour

Founder @ Aggregate Intellect

Amir Feizpour is the founder, CEO, and Chief Scientist at Aggregate Intellect building a generative business brain for service and science based companies. Amir has built and grown a global community of 5000+ AI practitioners and researchers gathered around topics in AI research, engineering, product development, and responsibility. Prior to this, Amir was an NLP Product Lead at Royal Bank of Canada. Amir held a research position at University of Oxford conducting experiments on quantum computing resulting in high profile publications and patents. Amir holds a PhD in Physics from University of Toronto. Amir also serves the AI ecosystem as an advisor at MaRS Discovery District, works with several startups as fractional chief AI officer, and engages with a wide range of community audiences (business executives to hands-on developers) through training and educational programs. Amir leads Aggregate Intellect’s R&D via several academic collaborations.

Samuel Dion-Girardeau

CTO at Tilt

After graduating in Linguistics and Computer Science at ULaval, Samuel worked at Nuance (acq. Microsoft), building a platform for researchers and devs to train and evaluate custom models for natural language understanding, speech recognition and voice biometrics. At Delphia, he worked on turning user data into real-life assets, with various data products, an AI-based robo-advisor, and even a crypto information game. Now CTO at Tilt, he's building an agent that automates end-to-end thematic investing, from theme discovery to actual trades.

See all products from aggregate

Share this lesson

1,447 students

Share this lesson

1,447 students

Go deeper with a course

Agentic Buildcamp - A Cognitive Gym for Building AI Agents — using AI Agents