AI Systems Engineering

New
·

6 Weeks

·

Cohort-based Course

The class where you go from Hugging Face Transformers to building and deploying a full inference stack that real companies pay millions for

Author of LLMOps, GPU Engineering Books

O'Reilly Media
@Packtpub

Course overview

From AI Engineer to AI Systems Engineer

If you’re an AI engineer who can already prompt or fine-tune models but you’ve never been able to answer questions like:


- Why is my 70B model using 120 GB of VRAM and still slow?

- How do I serve 500 concurrent users on 4×H100s without going broke?

- What actually happens inside FlashAttention / PagedAttention / tensor parallelism?

- How do I make money (or save my company millions) running open models in production?


… then this is the course you’ve been waiting for.


In 6 intensive weeks you will build, profile, optimize, distribute, and ship a complete LLM inference system that can profitably serve LLaMA-3-70B, Mixtral, DeepSeek, or any 70B–405B-class model, including stateful, tool-calling workloads.


You’ll leave with:

- A live, public (or internal) inference API that beats most commercial providers on price/performance

- Hard numbers you can quote in interviews or to your boss

- A portfolio project that gets you hired at the top inference companies (Fireworks, Together, Groq, OctoAI, etc.)

- No prior CUDA. Just 4×2-hour live sessions.

Who is this course for

01

This course is for AI engineers, ML infrastructure engineers, and backend developers who want systems-level mastery


02

AI/ML engineers who know PyTorch/Transformers but feel stuck at the research to production gap

03

Founders building LLM apps who are tired of burning money on OpenAI / Engineers who own inference cost and need to 10–50× it

Prerequisites

  • You have shipped at least one non-trivial Python project

  • You can already load and run an LLM using HF Transformers

  • Comfortable with basic terminal/SSH, git, yaml & json files

What you’ll get out of this course

A live, production-grade LLM inference service running 70B+ (or MoE) models

Projects and hard metrics that you can quote in interviews, share with your team:


  • I served LLaMA-3-70B at 108 tokens/s and $0.32 per million output tokens
  • My system handled 500+ concurrent stateful, tool-calling requests with <900 ms first-token latency

Optimize LLM inference across the full stack

From kernel-level tuning to distributed execution, you’ll learn how to speed up every layer of the LLM serving pipeline. This includes batching and caching strategies, load balancing, inference engine configuration, and parallelism techniques (data, tensor, pipeline, and expert).

A permanent LLM Cost Calculator

Being able to predict real-world cost and throughput for any model/hardware/parallelism combo within ~10 % accuracy

Battle-tested profiling skills

You'll be able to take any slow inference setup, find the real bottleneck in <15 minutes, and fix it

Deep, intuitive understanding of key concepts

You would be able to explain prefill vs decode, KV cache, tensor/pipeline/sequence parallelism, PagedAttention, FlashAttention-2 and when each actually matters

Confidence to choose the right inference engine

Learning how to choose between vLLM, TensorRT-LLM, DeepSpeed-FastGen, TGI and hardware (H100 vs B200 vs 4090 vs Groq) before spending a single dollar

What’s included

Abi Aryan

Live sessions

Learn directly from Abi Aryan in a real-time, interactive format.

Guest Speakers

Learn from industry professionals and their experiences.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

6 live sessions • 18 lessons • 5 projects

Week 1

Jan 14—Jan 18

    Foundations of LLM Inference

    5 items

    Jan

    17

    Session 1

    Sat 1/175:00 PM—7:00 PM (UTC)
    Optional

Week 2

Jan 19—Jan 25

    GPU-Level Optimization & Profiling

    5 items

    Jan

    24

    Session 2

    Sat 1/245:00 PM—7:00 PM (UTC)
    Optional

Week 3

Jan 26—Feb 1

    Parallelism & Distributed Inference

    5 items

    Jan

    31

    Session 3

    Sat 1/315:00 PM—7:00 PM (UTC)
    Optional

Week 4

Feb 2—Feb 8

    Building Production-Ready, Stateful Inference Systems

    5 items

    Feb

    7

    Session 4

    Sat 2/75:00 PM—7:00 PM (UTC)
    Optional

Week 5

Feb 9—Feb 15

    Guest Lectures from Industry Experts

    2 items

    Feb

    14

    Session 5

    Sat 2/145:00 PM—7:00 PM (UTC)
    Optional

Week 6

Feb 16—Feb 22

    Demo Day: Design & Optimize Your Own Scalable LLM Inference Stack

    1 item

    Feb

    21

    Session 6

    Sat 2/215:00 PM—7:00 PM (UTC)
    Optional
A pattern of wavy dots

Join an upcoming cohort

AI Systems Engineering

Valorants

$1,999

Dates

Jan 14—Feb 22, 2026

Payment Deadline

Jan 15, 2026
Get reimbursed

Meet your instructor

Abi Aryan

Abi Aryan

Abi Aryan is the founder and lead research engineer at Abide AI, a deep tech startup working to integrate causal reasoning in AI systems, where she is developing the decision making stack for agentic applications in enterprise.


She is also the author for an O'Reilly book titled LLMOps and is writing her next book titled GPU Engineering for AI Systems with Packt.

Learning is better with cohorts

Learning is better with cohorts

Active hands-on learning

This course builds on live workshops and hands-on projects

Interactive and project-based

You’ll be interacting with other learners through breakout rooms and project teams

Learn with a cohort of peers

Join a community of like-minded people who want to learn and grow alongside you

What people are saying

        Most people throw in hardware because it is easy instead of optimizing the processes. This is a very important topic!
Maria Vechtomova

Maria Vechtomova

Databricks MVP, MLOps Tech Lead and LLMOps Instuctor

Frequently Asked Questions

A pattern of wavy dots

Join an upcoming cohort

AI Systems Engineering

Valorants

$1,999

Dates

Jan 14—Feb 22, 2026

Payment Deadline

Jan 15, 2026
Get reimbursed

$1,999

USD

6 Weeks