Debug Your AI Product: Private Team Workshop

Hamel Husain

ML Engineer with 20+ years experience

Shreya Shankar

ML Systems Researcher

Find and prioritize your AI's biggest failures in 2 days — using your own data.

This isn't a course — we roll up our sleeves and work on your product with your data.

You bring your real traces and interaction logs. Over 2 days, we systematically find where your AI is breaking down, why users are losing trust, and what to fix first.

Most teams building AI products are stuck in the same loop: ship a change, hope it helps, manually spot-check a few outputs, repeat. Meanwhile, users hit failures nobody on the team has even seen.

In this private workshop, we break that cycle. You bring real traces and interaction logs from your product. Together, we systematically uncover where your AI is breaking down, build a catalog of failure modes specific to your use case, and leave with a prioritized plan to fix what matters most.

This methodology has been refined across 4,000+ practitioners from 500+ companies, and is part of what we teach in our full-length course. It's the highest-ROI activity in AI product development.

Each workshop can accommodate a team of up to 6 participants. We can help you identify which team members to bring (trust us, you don't want more than 6).

Below is a sample agenda, this can be customized to your needs and schedule.

Workshop agenda

  • 12:00PM EDT

    Discovery: Import and explore your data (Day 1)

    Import your real traces and interaction logs. We'll slice the data together, review representative samples, and start spotting patterns in how your AI responds to different inputs.


  • 12:00PM EDT

    Map failure patterns in your product (Day 2)

    Systematically review your traces to identify and catalog every way your AI fails. Build a taxonomy of failure modes specific to your product and quantify their frequency and impact.


  • 1:00PM EDT

    Prioritize fixes and build your roadmap (Day 2)

    Rank failure modes by business impact and effort to fix. Build a concrete "fix first / fix next / don't bother" roadmap tied to your product metrics with clear owners and timelines.

Learn directly from Hamel & Shreya

Hamel Husain

Hamel Husain

Trained 4,000+ AI practitioners from 500+ companies on AI evaluation.

Airbnb
GitHub
DataRobot
Shreya Shankar

Shreya Shankar

ML Systems Researcher Making AI Evaluation Work in Practice

Google
Stanford University
UC Berkeley

Who this workshop is for

  • Engineering teams with a live AI product that has real users.

  • Product managers responsible for AI features who need a data-driven improvement plan instead of guessing which changes will move the needle.

  • Technical leaders who want to stop relying on manual QA and spot-checks, and instead build a systematic approach to improving AI quality.

Prerequisites

  • A live AI product or feature with real users

    This workshop is hands-on with your data. You need a deployed product generating real interactions we can analyze together.

  • Access to recent traces or interaction logs

    We work directly with your production data. You'll receive a prep guide with instructions on what to collect and how to format it.

  • A dedicated team of 2-5 participants

    Engineers, PMs, and domain experts who know the product. The best results come from cross-functional teams who can act on findings immediately.

What's included

Live sessions

Learn directly from Hamel Husain & Shreya Shankar in a real-time, interactive format.

2-day deep dive on your production stack

I investigate your real system end to end: data flows, prompts, evals, and guardrails. We reproduce real failures so every recommendation is grounded in what your users actually see.

Ranked failure list with real user traces

We surface your highest-impact failure modes, each tied to concrete examples from your own logs. You leave with a ranked backlog of issues actually hurting trust, revenue, or support load right now.

Executive debrief and written findings

We end with a focused exec readout plus a written report summarizing risks, wins, and next steps — so leadership can make resourcing decisions without the full technical deep dive.

Maven Guarantee

Your purchase is backed by the Maven Guarantee.

See what our students have to say (reviews from our full course).

See more reviews at bit.ly/eval-reviews

See more reviews at bit.ly/eval-reviews

Reviews from our full evals course

Frequently asked questions

$23,500

USD

May 11
·

2 cohorts