Latency First: How to Actually Make RAG & Agents Fast

Hosted by Jason Liu and Aarush Sah

690 students

In this video

What you'll learn

Measure RAG & Agent Performance Effectively

Students will learn to distinguish between TTFT, TPS, and step latency to benchmark AI systems.

Identify Latency Bottlenecks in AI Pipelines

Students will learn to diagnose slowdown points in RAG workflows and multi-step agents.

Apply Practical Optimization Techniques

Students will master stack-agnostic strategies to reduce response times while maintaining high-quality AI outputs.

Why this topic matters

Latency is the silent killer of AI adoption. Users abandon systems that make them wait, regardless of accuracy. By mastering performance optimization, you'll deliver solutions people actually use, overcome the primary barrier to production AI success, and develop a professional edge that distinguishes you in a market fixated on capability rather than usability.

You'll learn from

Jason Liu

Consultant at the intersection of Information Retrieval and AI

Jason has built search and recommendation systems for the past 6 years. He has consulted and advised a dozens startups in the last year to improve their RAG systems. He is the creator of the Instructor Python library.