ML Engineer with 20+ years experience
ML Systems Researcher


This isn't a course — we roll up our sleeves and work on your product with your data.
You bring your real traces and interaction logs. Over 2 days, we systematically find where your AI is breaking down, why users are losing trust, and what to fix first.
Most teams building AI products are stuck in the same loop: ship a change, hope it helps, manually spot-check a few outputs, repeat. Meanwhile, users hit failures nobody on the team has even seen.
In this private workshop, we break that cycle. You bring real traces and interaction logs from your product. Together, we systematically uncover where your AI is breaking down, build a catalog of failure modes specific to your use case, and leave with a prioritized plan to fix what matters most.
This methodology has been refined across 4,000+ practitioners from 500+ companies, and is part of what we teach in our full-length course. It's the highest-ROI activity in AI product development.
Each workshop can accommodate a team of up to 6 participants. We can help you identify which team members to bring (trust us, you don't want more than 6).
Below is a sample agenda, this can be customized to your needs and schedule.
Import your real traces and interaction logs. We'll slice the data together, review representative samples, and start spotting patterns in how your AI responds to different inputs.
Systematically review your traces to identify and catalog every way your AI fails. Build a taxonomy of failure modes specific to your product and quantify their frequency and impact.
Rank failure modes by business impact and effort to fix. Build a concrete "fix first / fix next / don't bother" roadmap tied to your product metrics with clear owners and timelines.
Engineering teams with a live AI product that has real users.
Product managers responsible for AI features who need a data-driven improvement plan instead of guessing which changes will move the needle.
Technical leaders who want to stop relying on manual QA and spot-checks, and instead build a systematic approach to improving AI quality.
This workshop is hands-on with your data. You need a deployed product generating real interactions we can analyze together.
We work directly with your production data. You'll receive a prep guide with instructions on what to collect and how to format it.
Engineers, PMs, and domain experts who know the product. The best results come from cross-functional teams who can act on findings immediately.
Live sessions
Learn directly from Hamel Husain & Shreya Shankar in a real-time, interactive format.
2-day deep dive on your production stack
I investigate your real system end to end: data flows, prompts, evals, and guardrails. We reproduce real failures so every recommendation is grounded in what your users actually see.
Ranked failure list with real user traces
We surface your highest-impact failure modes, each tied to concrete examples from your own logs. You leave with a ranked backlog of issues actually hurting trust, revenue, or support load right now.
Executive debrief and written findings
We end with a focused exec readout plus a written report summarizing risks, wins, and next steps — so leadership can make resourcing decisions without the full technical deep dive.
Maven Guarantee
Your purchase is backed by the Maven Guarantee.

See more reviews at bit.ly/eval-reviews




$23,500
USD
2 cohorts