Why Your AI Product Feels Slow Even When the Model Is Fast

Free Lesson

Why Your AI Product Feels Slow Even When the Model Is Fast

Part of AI Product Engineering

•

Hosted by Abi Aryan and Hamel Husain

686 students

In this video

What you'll learn

Trace where the latency comes from

Follow a request from arrival to first token to last token, and see where the time gets lost: queueing, batching, and prefill versus decode.

Pick the right lever for the bottleneck

Match the fix to the problem: KV cache management, chunked prefill, quantization, or speculative decoding, and the compute-bound versus memory-bound tradeoff each one addresses.

Make it feel faster without changing the model

Use streaming and time-to-first-token to improve perceived speed, and batching and concurrency to improve throughput and cost.

Why this topic matters

A capable model can still feel sluggish in production. The lag usually lives in the inference path: how requests queue and batch, prefill versus decode, the KV cache, and whether you stream tokens. Abi does this for a living. She'll show you where the latency comes from, and patterns that make agentic systems feel faster while costing less. There's a hands-on exercise so you can see it yourself.

You'll learn from

Abi Aryan

Founder of Abide AI; author of O'Reilly's LLMOps

Abi Aryan is a machine learning research engineer with about a decade of experience and the founder of Abide AI. She wrote O'Reilly's LLMOps: Managing Large Language Models in Production and the forthcoming GPU Engineering for AI Systems, and she was a visiting research scholar at UCLA's Cognitive Systems Lab under Judea Pearl. She teaches the Maven course AI Inference Engineering & Systems Design and reviews for NeurIPS, ICML, and ICLR. LinkedIn

Hamel Husain

ML Engineer with 20+ years of experience

Hamel Husain is a ML Engineer with 20+ years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

See all products from Hamel Husain & Shreya Shankar

Share this lesson

686 students

Share this lesson

686 students

Go deeper with a course

Featured in Lenny’s List

AI Evals For Engineers & PMs