What happens when you make an LLM call?
Hosted by Abi Aryan
What you'll learn
Run through the Full LLM Call Stack
Understand the end‑to‑end journey from a user’s HTTP/API request down through inference engines, runtimes, and hardware
Let's dissect the Inference Process Step‑by‑Step
We'll break down how prompts are tokenized, transformed, and processed, including core mechanics of prefill v/s decode
Systems Level Thinking in Action
We'll do a case study diagnosing bottlenecks, choosing inference engines and reasoning for latency, throughput & scaling
Why this topic matters
We will break down behind the scenes of an LLM call to understand exactly what happens from prompt to output. This will be done using an architecture diagram as well as the code. If you have been curious about how HF or PyTorch or vLLM or Ray and CUDA everything orchestrates-you'd love it.
We will look into the stack, the bottlenecks, and the system-level trade-offs everything ML engineers need.
You'll learn from
Abi Aryan
Lead Research Engineer @ Abide
Abi Aryan is the founder and lead research engineer at Abide AI, a deep tech company developing neurosymbolic models for reasoning in agents. With a decade of experience as an ML engineer building production-scale AI systems, she is also the author of two books:
- LLMOps (O'Reilly Publications)
- GPU Engineering for AI Systems (upcoming title from Packt Publishing, releasing Autumn 2026)
Go deeper with a course
AI Systems Design & Inference Engineering

Abi Aryan
Founder and Research Engineering Lead @ Abide AI
Keep exploring




