How Evals Made GitHub Copilot Work

Lightning Lessons

Practical skills & tools to accelerate your career

How Evals Made GitHub Copilot Work

Hosted by John Berryman and Hamel Husain

Mon, May 12, 2025

10:00 PM UTC (30 minutes)

Virtual (Zoom)

Free to join

Learn directly from undefined

By continuing, you agree to Maven's Terms and Privacy Policy.

223 students

By continuing, you agree to Maven's Terms and Privacy Policy.

Invite your network

Go deeper with a course

AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar

View syllabus

Mon, May 12, 2025

10:00 PM UTC (30 minutes)

Virtual (Zoom)

Free to join

223 students

Invite your network

Go deeper with a course

AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar

View syllabus

What you'll learn

Build more reliable LLM-as-judge systems

See how the Copilot team improved their automated evaluation by validating the judges

Learn the three-part eval taxonomy that drove success

Understand the differences between algorithmic, subjective, and verifiable evaluation approaches

Avoid the "ratchet effect" trap in A/B testing

Learn how GitHub's team hit local maximums with their metrics and the techniques they developed to overcome them.

Why this topic matters

GitHub Copilot stands as one of the first commercially successful generative AI applications. To make the product work, the copilot team had to invent evaluation methodologies with no existing blueprint. John shares candid insights from these pioneering efforts, helping you avoid repeating their mistakes and improve your own evaluation practices.

You'll learn from

John Berryman

ML Researcher, Software Engineer, and Author; Worked on GitHub Copilot

John Berryman is the founder of Arcturus Labs. His journey through AI and search technologies includes contributing to GitHub Copilot's early development, where he worked on the team that brought AI-assisted coding from concept to reality. Throughout his career, John has helped build search and recommendation systems that millions use daily – from GitHub's code search infrastructure to Eventbrite's discovery platform and the US Patent Office's next-generation search system. This blend of experience in both foundational search technologies and cutting-edge AI applications gives him unique insight into building practical, powerful LLM applications.

John shares his expertise through two books: Relevant Search, which reveals the art and science of building search applications, and Prompt Engineering for LLMs, which guides developers through the emerging practice of language model application development.

Hamel Husain

ML Engineer with 20 years of experience

Hamel is a machine learning engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

Previously at