Building AI-Native Products
Understanding Embedding Performance through Generative Evals
Hosted by Jason Liu and Kelly Hong
What you'll learn
AI Evaluation Challenges
Discover why AI systems need specialized benchmarking beyond traditional testing methods.
Benchmark Limitations
Identify the shortcomings of public benchmarks, including clean datasets and potential training data contamination.
Representativeness in Testing
Apply techniques to generate benchmark tests that accurately reflect real-world user queries and production conditions.
Why this topic matters
Effective AI evaluation is critical as systems move from labs to production. Understanding generative benchmarking helps you build AI that performs well on real-world tasks, not just academic tests. This knowledge bridges the gap between theoretical capabilities and practical performance, giving you a competitive edge in developing AI solutions that deliver genuine value to users.
You'll learn from
Jason Liu
Consultant at the intersection of Information Retrieval and AI
Jason has built search and recommendation systems for the past 6 years. He has consulted and advised a dozens startups in the last year to improve their RAG systems. He is the creator of the Instructor Python library.
Kelly Hong
Researcher at Chroma
Previously at
Go deeper with a course