Building AI-Native Products

Understanding Embedding Performance through Generative Evals

Hosted by Jason Liu and Kelly Hong

1,042 students

What you'll learn

AI Evaluation Challenges

Discover why AI systems need specialized benchmarking beyond traditional testing methods.

Benchmark Limitations

Identify the shortcomings of public benchmarks, including clean datasets and potential training data contamination.

Representativeness in Testing

Apply techniques to generate benchmark tests that accurately reflect real-world user queries and production conditions.

Why this topic matters

Effective AI evaluation is critical as systems move from labs to production. Understanding generative benchmarking helps you build AI that performs well on real-world tasks, not just academic tests. This knowledge bridges the gap between theoretical capabilities and practical performance, giving you a competitive edge in developing AI solutions that deliver genuine value to users.

You'll learn from

Jason Liu

Consultant at the intersection of Information Retrieval and AI

Jason has built search and recommendation systems for the past 6 years. He has consulted and advised a dozens startups in the last year to improve their RAG systems. He is the creator of the Instructor Python library.

Kelly Hong

Researcher at Chroma


Previously at

Stitch Fix
Meta
University of Waterloo
New York University
© 2025 Maven Learning, Inc.