Why Your AI Product Feels Slow Even When the Model Is Fast
Fri, Jul 3, 2026
6:00 PM UTC (45 minutes)
Virtual (Zoom)
Free to join
Go deeper with a course
Featured in Lenny’s List
AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar
ML Engineer with 20+ years of experience.. ML Systems Researcher Making AI Evaluation Work in Practice
Fri, Jul 3, 2026
6:00 PM UTC (45 minutes)
Virtual (Zoom)
Free to join
Go deeper with a course
Featured in Lenny’s List
AI Evals For Engineers & PMs

Hamel Husain and Shreya Shankar
ML Engineer with 20+ years of experience.. ML Systems Researcher Making AI Evaluation Work in Practice
What you'll learn
Trace where the latency comes from
Follow a request from arrival to first token to last token, and see where the time gets lost: queueing, batching, and prefill versus decode.
Pick the right lever for the bottleneck
Match the fix to the problem: KV cache management, chunked prefill, quantization, or speculative decoding, and the compute-bound versus memory-bound tradeoff each one addresses.
Make it feel faster without changing the model
Use streaming and time-to-first-token to improve perceived speed, and batching and concurrency to improve throughput and cost.
Why this topic matters
A capable model can still feel sluggish in production. The lag usually lives in the inference path: how requests queue and batch, prefill versus decode, the KV cache, and whether you stream tokens. Abi does this for a living. She'll show you where the latency comes from, and patterns that make agentic systems feel faster while costing less. There's a hands-on exercise so you can see it yourself.
You'll learn from
Abi Aryan
Founder of Abide AI; author of O'Reilly's LLMOps
Abi Aryan is a machine learning research engineer with about a decade of experience and the founder of Abide AI. She wrote O'Reilly's LLMOps: Managing Large Language Models in Production and the forthcoming GPU Engineering for AI Systems, and she was a visiting research scholar at UCLA's Cognitive Systems Lab under Judea Pearl. She teaches the Maven course AI Inference Engineering & Systems Design and reviews for NeurIPS, ICML, and ICLR. LinkedIn
Hamel Husain
ML Engineer with 20+ years of experience
Hamel Husain is a ML Engineer with 20+ years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.