Generative Benchmarking

Hosted by Jason Liu and Kelly Hong

Wed, May 21, 2025

5:00 PM UTC (1 hour)

Virtual (Zoom)

Free to join

388 students

Invite your network

Go deeper with a course

Systematically Improving RAG Applications
Jason Liu
View syllabus

What you'll learn

AI Evaluation Challenges

Discover why AI systems need specialized benchmarking beyond traditional testing methods.

Benchmark Limitations

Identify the shortcomings of public benchmarks, including clean datasets and potential training data contamination.

Representativeness in Testing

Apply techniques to generate benchmark tests that accurately reflect real-world user queries and production conditions.

Why this topic matters

Effective AI evaluation is critical as systems move from labs to production. Understanding generative benchmarking helps you build AI that performs well on real-world tasks, not just academic tests. This knowledge bridges the gap between theoretical capabilities and practical performance, giving you a competitive edge in developing AI solutions that deliver genuine value to users.

You'll learn from

Jason Liu

Consultant at the intersection of Information Retrieval and AI

Jason has built search and recommendation systems for the past 6 years. He has consulted and advised a dozens startups in the last year to improve their RAG systems. He is the creator of the Instructor Python library.

Kelly Hong

Researcher at Chroma


Previously at

Stitch Fix
Meta
University of Waterloo
New York University

Learn directly from Jason Liu and Kelly Hong

By continuing, you agree to Maven's Terms and Privacy Policy.

© 2025 Maven Learning, Inc.