6 Weeks
·Cohort-based Course
The class where you go from Hugging Face Transformers to building and deploying a full inference stack that real companies pay millions for
6 Weeks
·Cohort-based Course
The class where you go from Hugging Face Transformers to building and deploying a full inference stack that real companies pay millions for
Author of LLMOps, GPU Engineering Books
Course overview
If you’re an AI engineer who can already prompt or fine-tune models but you’ve never been able to answer questions like:
- Why is my 70B model using 120 GB of VRAM and still slow?
- How do I serve 500 concurrent users on 4×H100s without going broke?
- What actually happens inside FlashAttention / PagedAttention / tensor parallelism?
- How do I make money (or save my company millions) running open models in production?
… then this is the course you’ve been waiting for.
In 6 intensive weeks you will build, profile, optimize, distribute, and ship a complete LLM inference system that can profitably serve LLaMA-3-70B, Mixtral, DeepSeek, or any 70B–405B-class model, including stateful, tool-calling workloads.
You’ll leave with:
- A live, public (or internal) inference API that beats most commercial providers on price/performance
- Hard numbers you can quote in interviews or to your boss
- A portfolio project that gets you hired at the top inference companies (Fireworks, Together, Groq, OctoAI, etc.)
- No prior CUDA. Just 4×2-hour live sessions.
01
This course is for AI engineers, ML infrastructure engineers, and backend developers who want systems-level mastery
02
AI/ML engineers who know PyTorch/Transformers but feel stuck at the research to production gap
03
Founders building LLM apps who are tired of burning money on OpenAI / Engineers who own inference cost and need to 10–50× it
A live, production-grade LLM inference service running 70B+ (or MoE) models
Projects and hard metrics that you can quote in interviews, share with your team:
Optimize LLM inference across the full stack
From kernel-level tuning to distributed execution, you’ll learn how to speed up every layer of the LLM serving pipeline. This includes batching and caching strategies, load balancing, inference engine configuration, and parallelism techniques (data, tensor, pipeline, and expert).
A permanent LLM Cost Calculator
Being able to predict real-world cost and throughput for any model/hardware/parallelism combo within ~10 % accuracy
Battle-tested profiling skills
You'll be able to take any slow inference setup, find the real bottleneck in <15 minutes, and fix it
Deep, intuitive understanding of key concepts
You would be able to explain prefill vs decode, KV cache, tensor/pipeline/sequence parallelism, PagedAttention, FlashAttention-2 and when each actually matters
Confidence to choose the right inference engine
Learning how to choose between vLLM, TensorRT-LLM, DeepSpeed-FastGen, TGI and hardware (H100 vs B200 vs 4090 vs Groq) before spending a single dollar

Live sessions
Learn directly from Abi Aryan in a real-time, interactive format.
Guest Speakers
Learn from industry professionals and their experiences.
Lifetime access
Go back to course content and recordings whenever you need to.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
6 live sessions • 18 lessons • 5 projects
Jan
17
Jan
24
Jan
31
Feb
7
Feb
14
Feb
21
Join an upcoming cohort
Valorants
$1,999
Dates
Payment Deadline
Abi Aryan is the founder and lead research engineer at Abide AI, a deep tech startup working to integrate causal reasoning in AI systems, where she is developing the decision making stack for agentic applications in enterprise.
She is also the author for an O'Reilly book titled LLMOps and is writing her next book titled GPU Engineering for AI Systems with Packt.
Active hands-on learning
This course builds on live workshops and hands-on projects
Interactive and project-based
You’ll be interacting with other learners through breakout rooms and project teams
Learn with a cohort of peers
Join a community of like-minded people who want to learn and grow alongside you
Maria Vechtomova
Join an upcoming cohort
Valorants
$1,999
Dates
Payment Deadline