Scale Billion Parameter Models Without Burning a Single GPU Hour

New
·

5 Weeks

·

Cohort-based Course

Learn to train 100B+ parameter models efficiently from the engineers who built the leading frameworks and techniques

This course is popular

23 people enrolled last week.

Hosted by

Zachary Mueller + 14 Guest Experts

🤗 accelerate Technical Lead with a decade of experience

With speakers from

Hugging Face
Anyscale
PyTorch
Snowflake
Unsloth AI

Course overview

Your system will scale. Make sure you're ready.

🚨 The Problem

Distributed training is full of invisible traps.

Get it wrong, and months of team velocity is gone and you wasted tens or hundreds of thousands of dollars in unoptimized compute.

Most teams guess their way through ZeRO configs, DDP setups, and pipeline logic. That leads to stalled releases and blown budgets.


The Solution

This course is the only live training that teaches modern distributed AI workflows directly from people who have scaled training across tens of thousands of GPUs, and built the foundational knowledge the AI ecosystem is built upon today.

You’ll walk away knowing what to use, when to use it, and why it works, across real production-scale scenarios.


🔥 Built for High-Impact Teams

Engineers: Train large models across 8 to 1,000+ GPUs with precision

Tech leads and CTOs: Make confident decisions on systems, tools, and costs

Founders and startups: Build smarter AI infrastructure that scales without waste


🧩 What Makes This Course Different

* Learn directly from a curated list of 14 world-class experts in the field from Meta, Ray, Snowflake, Hugging Face, and more

* Get five focused workshops with matching hands-on labs for real skills

* Gain access to a private alumni network for continued learning and hiring

* Use $500+ in real compute credits to apply your skills immediately


📝 Guest Experts and Case Studies

Over 14 guest speakers from the top AI teams. Each session ties directly to a problem that will need to be solved when crafting the ideal training scenario.


Applied Track:

Robert Nishihara (Ray)

How to orchestrate GPU training across thousands of nodes effectively

Sami Jaghouar (Prime Intellect)

Building decentralized training systems at a global scale

Tunji Ruwase (Snowflake)

Training long-context models efficiently without exploding memory

Prince Canuma

Running LLMs directly on Apple Silicon for local-first development


Pretraining Track:

Phuc Nguyen (Hugging Face)

Mastering FP8 precision training

Elie Bakouch (Hugging Face)

Advanced MoE and parallelism strategies

Daniel Han (UnslothAI)

How Triton kernels can be an easy optimization win, and other modern practices


Distributed Technique Track:

Sylvain Gugger (Jane Street)

Overview of the ZeRO algorithm

Wanchao Liang (Thinking Machines)

How DTensors helps bring new engineers into understanding distributed training faster

Ferdinand Mom (Hugging Face)

How you should stack parallelism strategies to maximize your training capacity

Less Wright (Meta)

How Async TensorParallelism is necessary to train across clusters of thousands of GPUs

Matej Sirovatka (Hugging Face)

Why Expert Parallelism is a necessity when training MoE models at scale

Marc Sun (Hugging Face)

Why we need new strategies for deploying large models at scale, and how to get there

Built for the People Scaling What’s Next

01

CTOs who need to make fast, informed decisions on how to scale LLMs.

02

Team leads who want confidence that engineers can execute at scale without burning time.

03

Senior engineers tired of piecing together half-solutions from publications, frameworks, and more.

Prerequisites

  • This course is for engineers already comfortable training models using PyTorch or Hugging Face Transformers.

  • Trusted by top builders at Hugging Face, Modal, Snowflake, and Meta.

  • "Zach is one of the key people making distributed training accessible" - Mark Saroufim (Software Engineer at Meta)

What You'll Achieve

Train 100B+ models across 8–1,000 GPUs efficiently

You’ll understand the core problems teams face during large-scale training and how to avoid them using proven methods.

Build real-world experience with modern training techniques

You won’t just watch; you’ll train models using DDP, ZeRO, pipeline parallelism, and more. Each one applied in code.

Understand which training methods to use and when

You’ll learn how to match technique to context. Whether it’s model size, hardware limits, or team constraints, you’ll know what fits.

Be ready before training becomes your bottleneck

Most teams wait too long to prepare for scale. This course makes sure you’re ready before your current training setup stops working.

Go from scattered tutorials to production-ready training skills

You’ll connect theory with practice and walk away with working knowledge you can apply in real systems.

Personalized Instruction

Generous office hours ensure that students can ask questions about their specific issues, interests, and needs.

What’s included

Zachary Mueller + 14 Guest Experts

Live sessions

Learn directly from Zachary Mueller + 14 Guest Experts in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to, and have access to all future cohorts

Generous office hours

Bring your blockers to office hours and leave with answers. Get feedback, debug help, and real support when you need it.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Course notebooks

Detailed course notebooks and material with maticulous notes to help walk you through the material and learn along the way

Compute Credits

$500 in Modal compute credits, 6 months of Hugging Face Pro

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

Week 1

Sep 1—Sep 7

    Sep

    2

    Course Introduction and `nbdistributed`: A Jupyter framework for interactive distributed PyTorch

    Tue 9/26:00 PM—7:00 PM (UTC)

    Sep

    4

    Distributed Data Parallelism From Scratch

    Thu 9/46:00 PM—7:00 PM (UTC)

Week 2

Sep 8—Sep 14

    Sep

    9

    ZeRO: Stage 1 & 2

    Tue 9/96:00 PM—7:00 PM (UTC)

Week 3

Sep 15—Sep 21

    Sep

    16

    ZeRO: Stage 3 and Efficient ZeRO Strategies

    Tue 9/166:00 PM—7:30 PM (UTC)

Week 4

Sep 22—Sep 28

    Sep

    23

    Pipeline Parallelism and Tensor Parallelism

    Tue 9/236:00 PM—7:30 PM (UTC)

    Sep

    25

    Efficient Strategies for Distributed Inference

    Thu 9/256:00 PM—7:00 PM (UTC)

Week 5

Sep 29—Oct 3

    Sep

    30

    2D Parallelism

    Tue 9/306:00 PM—7:00 PM (UTC)

    Oct

    2

    3D Parallelism (Guest Speaker)

    Thu 10/26:00 PM—7:00 PM (UTC)
Free resource

Free Access to Part of Lesson 1

Hi there! To help you get a good grasp on how the course will be oriented and an idea on what some of the content looks like, I can share with you an exclusive preview into what the course webpage will be and how some of the content is shaped. I've worked hard to make sure Quarto and Jupyter will help me create educational material that will wow you, so let me know if it does!


(Note: this material preview may change as the course develops, but only for additive purposes)

Get access to the webpage

Frequently Asked Questions

Instructor is a recognized expert, with hands-on experience

        Zach is my go to person on anything dealing with distributed training. He has maintained the most popular library in the world that helps developers with this problem, which means he’s familiar with all of the issues mere mortals have while tackling this problem. Zach is the best person to teach this subject. I am taking this course.
Hamel Husain

Hamel Husain

Founder, Parlance Labs | Evals, evals, evals
        Zach is one of the key people in the world making distributed machine learning more accessible. He has firsthand experience building some incredible popular tools like huggingface/accelerate. If you're GPU poor but considering moving to the GPU middle class then I can't think of a better instructor.
Mark Saroufim

Mark Saroufim

Software Engineer at Meta | Co-founder, GPU MODE
        As a long time maintainer of HF Accelerate, Zach has had to master not only a deep understanding of ML scaling methods, but also to integrate them into a cohesive API for the masses to use. I've seen Zach consistently deliver robust, well-integrated solutions with a deep system-level understanding. You will be in good hands with Zach at the helm.
Stas Bekman

Stas Bekman

Senior Machine Learning Engineer, Snowflake
        Zach's stewardship of Accelerate and managing the intricacies of multiple distributed technologies (while abstracting it into an easy to use API) make Zach the preeminent leader in distributed training. Zach has shown deep understanding of everything from fundamentals to implementation, and is the first person that would come to mind to teach this
Wing Lian

Wing Lian

Founder, Axolotl
        Zach is truly one in a million. I've never met anyone who puts so much time and thought into crafting deep learning code. With his background and experience, learning from him is an invaluable opportunity.
Radek Osmulski

Radek Osmulski

Senior Data Scientist, NVIDIA
        Zach has a strong grasp of the fundamentals of fastai, but what really sets him apart is his ability to teach. He mixes in practical topics throughout his lessons, making every video engaging and worthwhile. With a proven track record of creating high-quality content, I’m confident that any course Zach produces will be worth your time and attention
Kevin Bird

Kevin Bird

Co-Founder, Problem Solvers Guild
        Zach and I used to work together at HuggingFace, since then and through today he’s been building foundational tools for the open ML community to use and learn distributed training techniques. I’ve personally used his tools for years to train models such as OLMo and Tülu along with benefiting from his knowledge to better understand what is going on.
Dr. Nathan Lambert

Dr. Nathan Lambert

LLM Post Training Lead, Ai2
A pattern of wavy dots

Join an upcoming cohort

Scale Billion Parameter Models Without Burning a Single GPU Hour

Cohort 1

$2,400

Dates

Sep 1—Oct 3, 2025

Payment Deadline

Aug 31, 2025
Get reimbursed
A pattern of wavy dots

Join an upcoming cohort

Scale Billion Parameter Models Without Burning a Single GPU Hour

Cohort 1

$2,400

Dates

Sep 1—Oct 3, 2025

Payment Deadline

Aug 31, 2025
Get reimbursed

$2,400

5 Weeks