Evaluating LLMs for Your Applications

Free Lesson

Evaluating LLMs for Your Applications

Part of The AI Evaluation Handbook

•

Hosted by Mahesh Yadav

1,026 students

In this video

What you'll learn

Framework for choosing the right LLMs

Identify the best GenAI model for your needs based on budget, latency and team expertise etc

Setting Evaluation Criteria

Learn how to set clear, actionable metrics despite challenges with benchmarks and real-world examples.

Case Study: Contract Processing Application

We'll use a contract processing application to illustrate how to apply the principles you just learn.

Why this topic matters

Evaluating GenAI applications is crucial for product managers to ensure that the chosen AI model meets not only technical specification but also business goals and user expectations. Understanding how to select the right model will help you get started on the path to define model specific evaluation metrics, this will allow teams to iterate quickly and create impactful solutions.

You'll learn from

Mahesh Yadav

GenAI Product Lead at Google, previously at Meta, Amazon, and Microsoft

Mahesh Yadav is a Product Leader at Google GenAI team. Mahesh is one of the world's top AI executives and an award-winning AI Product Educator. His work on AI has been featured in the Nvidia GTC conference, Microsoft Build, and Meta blogs.

Mahesh has 20 years of experience in building products at Meta, Microsoft and AWS AI teams. Mahesh has worked in all layers of the AI stack from AI chips to LLM and has a deep understanding of how GenAI companies ship value to customers.

Currently, he leads AI agent at Google Cloud where it is used extensively for Gemini and other key Google products.

See all products from Mahesh

Share this lesson

1,026 students

Share this lesson

1,026 students

Go deeper with a course

Agentic AI Product Management Certification using Claude Code