AI Evals for PMs Certification

4.6 (8)

Marily Nika, Ph.D AI/ML

GenAI Product Builder @ Google, ex-Meta

Diego Granados

Product Manager AI&ML @ Google

+ George Zoto

Eliminate uncertainty in shipping AI features

“Does it work?”... “Is it good enough?”... “Can we ship it?”...

How do you answer these questions for AI products? You’re responsible for “running evals” but what does that mean?

How do you choose the right metrics, interpret fuzzy results, and make a confident decision?

This course gives you a framework to do just that.

Map user value to evaluation (eval) objectives so your metrics aren’t abstract. Define success then translate it into measurable criteria.
Choose metrics you can actually maintain: capability, safety, UX friction, latency, cost and “does this reduce support tickets or increase activation.”
Set ship/no-ship thresholds you can defend to leadership.
Build lightweight workflows that work in real teams: human review where it matters, automation where it lasts, documentation that drives decisions.
Consider domain constraints (e.g., healthcare safety) and know what to avoid: silent failures, misleading proxy metrics and tests that don’t reflect production.
Tie everything to ROI: impact vs unit cost, eval coverage vs reliability, and the minimum viable monitoring you need post-launch.

Experience AI evals through a case-based approach with a real AI product that we evaluate together.

What you’ll learn

Acquire and develop a critical skill for product managers who are leading and contributing to AI products.

Learn a repeatable framework for deciding when an AI feature is ready to launch.
Tie decisions to user value, business goals, and measurable evaluation criteria.

Turn fuzzy product goals into concrete eval objectives and measurable success criteria.
Define “good enough” in plain language before choosing metrics or tools.

Use a PM-friendly menu of metrics to avoid misleading proxies and anchor on business value.
Balance capability, latency, UX friction, and cost without being an ML engineer.

Create ship/no-ship thresholds tied to KPIs, risk, and user impact.
Know when to stop tweaking prompts and when launch should be paused.

Learn what to automate, what to review manually, and how to design sustainable processes.
Produce datasets, golden examples, and error taxonomies your team can reuse.

Understand risks in sensitive domains like healthcare and finance.
Avoid silent failures, weak proxies, and tests that don’t reflect production.

Learn directly from expert instructors

Marily Nika, Ph.D AI/ML

GenAI Product Builder @ Google, ex-Meta

Diego Granados

Product Manager AI&ML @ Google

George Zoto

Senior NLP/AI Scientist

Who this course is for

PMs leading AI features, growth, or platform initiatives
PMs who partner with ML teams and want to set evaluation standards
PMs who need to make clear “ship or hold” calls without doing the engineering

What's included

Live sessions

Learn directly from your instructors in a real-time, interactive format.

1:1 time with instructors

Book time with one of the instructors to ask questions, gain clarity, and level-up your learning.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

5 live sessions • 37 lessons

Week 1

Mar 2—Mar 8

Mar

AI Evals for PMs - AI Evals and You

Wed 3/45:00 PM—6:30 PM (UTC)

Intro Workshop: AI Evals and You

1 item

Survey

1 item

The role of Evaluations in AI Product Development

4 items

The AI Model Evals Playbook

1 item

Instructions: Get Opik API Key

1 item

Instructions: Get OpenAI API Key

1 item

Week 2

Mar 9—Mar 15

Mar

AI Evals for PMs - Foundations, Scoping and Datasets

Mon 3/94:00 PM—5:30 PM (UTC)

Session 1: Foundations, Scoping and Datasets

7 items

Exercise 1: Foundations, Scoping and Datasets

1 item

Mar

AI Evals for PMs - Metrics, Thresholds and Ship-readiness

Wed 3/114:00 PM—5:30 PM (UTC)

Session 2: Metrics, Thresholds and Ship-Readiness

6 items

Exercise 2: Metrics, Thresholds and Ship-readiness

1 item

Metrics Mastery

2 items

Schedule

Live sessions

3 hrs / week

Wed, Mar 4
5:00 PM—6:30 PM (UTC)
Mon, Mar 9
4:00 PM—5:30 PM (UTC)
Wed, Mar 11
4:00 PM—5:30 PM (UTC)

Projects

2 hrs / week

Async content

1 hr / week

Testimonials

Marily does an excellent job of striking a balance between providing easy-to-understand explanations and diving into technical depth. The hands-on projects allow students to apply what they learn and apply their understanding.
Dave
Group Product Manager
Diego is a rare combination in a PM of deep technical understanding and passion for ensuring customers have an incredible experience with projects he was involved with.
Zach Cook
Principal Product Manager

Frequently asked questions

Get course updates