What happens when you make an LLM call?

Hosted by Abi Aryan

Mon, Feb 2, 2026

6:00 PM UTC (45 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Inference Engineering for AI Systems
Abi Aryan
View syllabus

What you'll learn

Run through the Full LLM Call Stack

Understand the end‑to‑end journey from a user’s HTTP/API request down through inference engines, runtimes, and hardware

Let's dissect the Inference Process Step‑by‑Step

We'll break down how prompts are tokenized, transformed, and processed, including core mechanics of prefill v/s decode

Systems Level Thinking in Action

We'll do a case study diagnosing bottlenecks, choosing inference engines and reasoning for latency, throughput & scaling

Why this topic matters

We will break down behind the scenes of an LLM call to understand exactly what happens from prompt to output. This will be done using an architecture diagram as well as the code. If you have been curious about how HF or PyTorch or vLLM or Ray and CUDA everything orchestrates-you'd love it. We will look into the stack, the bottlenecks, and the system-level trade-offs everything ML engineers need.

You'll learn from

Abi Aryan

Lead Research Engineer @ Abide

Abi Aryan is the founder and lead research engineer at Abide AI, a deep tech company developing neurosymbolic models for reasoning in agents. With a decade of experience as an ML engineer building production-scale AI systems, she is also the author of two books:

  • LLMOps (O'Reilly Publications)
  • GPU Engineering for AI Systems (upcoming title from Packt Publishing, releasing Autumn 2026)

Abide AI
O'Reilly Media
@Packtpub

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.