
Hamel Husain
ML Engineer with 20+ years of experience

Shreya Shankar
ML Systems Researcher Making AI Evaluation Work in Practice
This isn't a lecture — we roll up our sleeves and work on your product with your data.
You bring your real traces and interaction logs. Over 1-2 focused days, we systematically find where your AI is breaking down, why users are losing trust, and what to fix first.
Is this right for your team?
This workshop is built for teams who:
Have a live AI product or feature with real users
Have access to recent traces or interaction logs
Have a clear owner (PM or engineering lead) and dedicated engineering time
Feel stuck — lots of ideas for improvement but no clear evidence of what will actually move the needle
What happens in the workshop
We import and slice your logs to expose hidden failure patterns
We build a catalog of failure modes specific to your product
We quantify where users are losing trust and where the model is wasting effort
We design metrics and automated checks so you can measure the impact of every fix
We build a prioritized "fix first / fix next / don't bother" roadmap
What you walk away with
A ranked list of your top 20–40 failure modes with real examples from your own logs
A 30–60 day implementation plan tied to your product metrics
A measurement plan — what to track and how to know changes actually worked
Clarity on whether a deeper engagement makes sense for your situation
Before the workshop, your team will need to:
Choose one specific AI product or feature to focus on
Collect recent traces or interaction logs (we provide guidance on format)
Identify 2–5 team members to participate (engineers, PMs, domain experts)
This methodology has been refined across 4,000+ practitioners from 500+ companies. It's the same systematic approach used in our advisory engagements — compressed into an intensive, hands-on format focused entirely on your product.
Note: A discount is available for those who are ok with making the session and data publicly available as teaching materials.
$23,500
USD
We use your real data to find exactly where your AI is failing, prioritize fixes, and build a measurement plan.