All courses Data & engineering

Class is in session

Scratch to Scale: Large-Scale Training in the Modern World

5 Weeks

Cohort-based Course

Learn the techniques used today from world-class researchers and engineers from Meta, Ray, Hugging Face, and more

Class is in session

Scratch to Scale: Large-Scale Training in the Modern World

5 Weeks

Cohort-based Course

Learn the techniques used today from world-class researchers and engineers from Meta, Ray, Hugging Face, and more

Hosted by

Zachary Mueller

🤗 accelerate Technical Lead with a decade of experience

Zachary Mueller

🤗 accelerate Technical Lead with a decade of experience

With speakers from

Course overview

Build the skills to answer the call when it's time to take your models to scale

Master distributed training and real-world scale techniques from top engineers.

Whether you're an ML engineer looking to move beyond single-GPU experiments, or a product leader seeking to understand the language your AI team speaks, this course will give you the hands-on skills and conceptual clarity to operate confidently at scale.

What Makes This Course Different

This started as a distributed training course. It organically grew into an all-encompassing learning event with world-class speakers who bring deep, real-world experience in modern large-scale training.

The distributed training curriculum remains intact: five hands-on workshops covering today’s core scale-up methods.

Now, each week features hand-tailored guest lectures from top engineers at Hugging Face, Meta, Snowflake, and more.

These expert sessions are aligned to each workshop’s topic, helping you bridge theory with modern production practices.

All course materials and recordings are available after enrollment, and you’ll get free lifetime access to future cohorts.

💡 What’s included:

5 core workshops

15+ guest talks across 3 curated tracks

Weekly live office hours

Class discord with lifetime access

100% money-back guarantee (within 14 days of starting the course)

🏕️ Fireside Chats

Hear from the experts on their real experiences trying to take models to scale, the challenges, and the discoveries

Yuxiang Wei (Meta FAIR)

📣 Conference Talks

Applied Track

Hear how industry leaders are solving real-world scale problems:

Robert Nishihara (Ray, Anyscale): Scaling across thousands of GPUs with Ray

Sami Jaghouar (Prime Intellect): Decentralized global-scale training

Tunji Ruwase (Snowflake): Efficient long-context training with Arctic

Prince Canuma: Local ML workloads using Apple Silicon + MLX

Pretraining Track

Deep dives into LLM pretraining at scale:

Phuc Nguyen (Hugging Face): A practitioner's guide to FP8

Elie Bakouch (Hugging Face): Hyper-optimizing LLMs with MoE, MLA & more

Daniel Han (UnslothAI): Speeding up training with Triton & custom kernels

🧠 Distributed Training Course

Learn the foundations and modern techniques used in real-world LLM scaleups across hundreds or thousands of GPUs.

5 instructor-led workshops:

DDP from scratch and avoiding data bottlenecks

ZeRO (Part 1): How model sharding enables scale

ZeRO (Part 2): Efficiency tradeoffs and stage comparison

Pipeline & Tensor Parallelism: Solving communication slowdowns

Multi-Dimensional Parallelism: Combining all methods for throughput

Guest Lectures include:

Sylvain Gugger (Jane Street): Overview of ZeRO

Wanchao Liang (TorchTitan): DTensor and large-scale pretraining

Wing Lian (Axolotl): 2D Parallelism with Axolotl

Ferdinand Mom (Hugging Face): Multi-dimensional parallelism

Less Wright (Meta): Async TensorParallelism

Matej Sirovatka (Hugging Face): Expert Parallelism for MoE

Marc Sun (Hugging Face): Deployment strategies at scale

✅ Guarantee

If you're not satisfied, we offer a 100% refund up to 14 days after the course begins. No risk, just learning.

Built for the People Scaling What’s Next

Beginner to intermediate MLE’s wanting to make sure they have skills that are relevant in today’s market

Senior engineers tired of piecing together half-solutions from publications, frameworks, and more.

Team leads who want confidence that engineers can execute at scale without burning time.

Beginner to intermediate MLE’s wanting to make sure they have skills that are relevant in today’s market

Senior engineers tired of piecing together half-solutions from publications, frameworks, and more.

Team leads who want confidence that engineers can execute at scale without burning time.

CTOs who need to make fast, informed decisions on how to scale LLMs.

Prerequisites

Train any model, at least once
I don't want you to be an expert. But you should have some mild experience training a model of some capacity, be it PyTorch, TF, and such
Understand basic high-school algebra
I'm not here to teach you matrix calculus, nor will we be going that advanced. However some amount of core math is still needed
Familiarity with Python coding
PyTorch is in Python, so having some experience with the language will do you well since the whole course is in it

What You'll Achieve

Train 100B+ models across 8–1,000 GPUs efficiently

You’ll understand the core problems teams face during large-scale training and how to avoid them using proven methods.

Build real-world experience with modern training techniques

You won’t just watch; you’ll train models using DDP, ZeRO, pipeline parallelism, and more. Each one applied in code.

Understand which training methods to use and when

You’ll learn how to match technique to context. Whether it’s model size, hardware limits, or team constraints, you’ll know what fits.

Be ready before training becomes your bottleneck

Most teams wait too long to prepare for scale. This course makes sure you’re ready before your current training setup stops working.

Go from scattered tutorials to production-ready training skills

You’ll connect theory with practice and walk away with working knowledge you can apply in real systems.

Personalized Instruction

Generous office hours ensure that students can ask questions about their specific issues, interests, and needs.

What’s included

Live sessions

Learn directly from Zachary Mueller in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to, and have access to all future cohorts

Generous office hours

Bring your blockers to office hours and leave with answers. Get feedback, debug help, and real support when you need it.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Course notebooks & code

Detailed course notebooks and material with meticulous notes to help walk you through the material and learn along the way

Compute Credits

$1000 in Lambda compute credits, $500 in Modal compute credits, $50 in Prime Intellect compute credits, 6 months of Hugging Face Pro

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

40 live sessions • 6 lessons

Week 1

Sep 1—Sep 7

Sep

Course Introduction and `nbdistributed`: A Jupyter framework for interactive distributed PyTorch

Tue 9/25:00 PM—6:00 PM (UTC)

Sep

How to Use Quarto to Make a Blog

Tue 9/26:30 PM—7:30 PM (UTC)

Sep

Fireside Chat with Yuxiang

Wed 9/36:00 PM—7:00 PM (UTC)

Distributed Data Parallelism From Scratch

Sep
4
Distributed Data Parallelism From Scratch
Thu 9/45:00 PM—6:00 PM (UTC)

2 more items

Sep

Guest Speaker: Robert Nishihara (Ray, Anyscale)

Fri 9/55:00 PM—6:00 PM (UTC)

Sep

Office Hours

Fri 9/56:30 PM—7:30 PM (UTC)

Optional

Capstone Project

1 item

Week 2

Sep 8—Sep 14

Sep

Office Hours

Mon 9/81:00 PM—2:00 PM (UTC)

Optional

Sep

Marimo and PyTorch: Building your own board for your own tensors

Mon 9/86:30 PM—7:30 PM (UTC)

Sep

Office Hours

Mon 9/89:00 PM—10:00 PM (UTC)

Optional

Sep

An Overview of ZeRO with Sylvain Gugger

Tue 9/95:00 PM—6:00 PM (UTC)

ZeRO: Stage 1 & 2

Sep
9
ZeRO: Stage 1 & 2
Tue 9/96:00 PM—7:00 PM (UTC)

1 more item

Sep

Practitioners Guide to FP8 Training (Phuc)

Wed 9/105:00 PM—6:00 PM (UTC)

Sep

Hands-on FP8 Workshop

Thu 9/116:30 PM—7:30 PM (UTC)

Sep

Speeding Up Training with Triton and Custom Kernels (Daniel Han)

Thu 9/115:00 PM—6:00 PM (UTC)

Week 3

Sep 15—Sep 21

Sep

Office Hours

Mon 9/151:00 PM—2:00 PM (UTC)

Optional

Sep

Office Hours

Mon 9/159:00 PM—10:00 PM (UTC)

Optional

DataLoader Workshop

2 items

ZeRO: Stage 3

Sep
16
ZeRO: Stage 3 and Efficient ZeRO Strategies
Tue 9/165:00 PM—6:30 PM (UTC)

1 more item

Sep

DTensor and Large-Scale Pretraining (Wanchao)

Wed 9/175:00 PM—6:00 PM (UTC)

Sep

Parallelizing parallel programming of parallel processors with Modal (Charles Frye)

Wed 9/176:30 PM—7:30 PM (UTC)

Sep

Efficient Long-Context Training with Arctic (Tunji Ruwase)

Thu 9/185:00 PM—6:00 PM (UTC)

Sep

Office Hours

Fri 9/199:00 PM—10:00 PM (UTC)

Optional

Week 4

Sep 22—Sep 28

Sep

Office Hours

Mon 9/221:00 PM—2:00 PM (UTC)

Optional

Sep

Office Hours

Mon 9/229:00 PM—10:00 PM (UTC)

Optional

Sep

Pipeline Parallelism

Tue 9/235:00 PM—6:30 PM (UTC)

Sep

Tensor Parallelism

Tue 9/236:30 PM—7:30 PM (UTC)

Sep

Async TensorParallelism (Less Wright)

Wed 9/245:00 PM—6:00 PM (UTC)

Sep

Efficient Strategies for Distributed Inference (Marc Sun)

Wed 9/246:30 PM—7:30 PM (UTC)

Sep

DiLoCo: Decentralized Training Part 1 (Zach Charles)

Thu 9/255:00 PM—6:00 PM (UTC)

Sep

Decentralized Training Part 2 (Sami Jaghouar)

Thu 9/256:30 PM—7:30 PM (UTC)

Sep

Expert Parallelism (Matej Sirovatka)

Fri 9/265:00 PM—6:00 PM (UTC)

Sep

Office Hours

Fri 9/269:00 PM—10:00 PM (UTC)

Optional

Week 5

Sep 29—Oct 3

Sep

Office Hours

Mon 9/291:00 PM—2:00 PM (UTC)

Optional

Sep

Office Hours

Mon 9/299:00 PM—10:00 PM (UTC)

Optional

Sep

2D Parallelism with Wing Lian

Tue 9/306:00 PM—7:00 PM (UTC)

Oct

Hyper-optimizing LLMs with MoE, MLA, and More (Elie Bakouch)

Wed 10/15:00 PM—6:00 PM (UTC)

Oct

Hugging Face Inference Providers with Merve

Thu 10/23:00 PM—4:00 PM (UTC)

Oct

Guest Speaker: Prince Canuma

Thu 10/25:00 PM—6:00 PM (UTC)

Oct

3D Parallelism with Ferdinand Mom

Thu 10/26:30 PM—7:30 PM (UTC)

Oct

Office Hours

Fri 10/39:00 PM—10:00 PM (UTC)

Optional

Oct

Class Finale: A Review and Farewell

Fri 10/35:00 PM—6:00 PM (UTC)

Free resource

Distributed Training Lexicon

The Distributed Training Lexicon is a free resource of 49 different distributed training terms with pairing definitions and some visualizations to go with it. The goal is to have a very quick cheatsheet to look at when needing a reminder of what certain methods are.

Download it for free

Free resource

Free Access to Part of Lesson 1

Hi there! To help you get a good grasp on how the course will be oriented and an idea on what some of the content looks like, I can share with you an exclusive preview into what the course webpage will be and how some of the content is shaped. I've worked hard to make sure Quarto and Jupyter will help me create educational material that will wow you, so let me know if it does!

(Note: this material preview may change as the course develops, but only for additive purposes)

Get access to the webpage

Frequently Asked Questions

Instructor is a recognized expert, with hands-on experience

Zach is my go to person on anything dealing with distributed training. He has maintained the most popular library in the world that helps developers with this problem, which means he’s familiar with all of the issues mere mortals have while tackling this problem. Zach is the best person to teach this subject. I am taking this course.

Hamel Husain

Founder, Parlance Labs | Evals, evals, evals

Zach is one of the key people in the world making distributed machine learning more accessible. He has firsthand experience building some incredible popular tools like huggingface/accelerate. If you're GPU poor but considering moving to the GPU middle class then I can't think of a better instructor.

Mark Saroufim

Software Engineer at Meta | Co-founder, GPU MODE

As a long time maintainer of HF Accelerate, Zach has had to master not only a deep understanding of ML scaling methods, but also to integrate them into a cohesive API for the masses to use. I've seen Zach consistently deliver robust, well-integrated solutions with a deep system-level understanding. You will be in good hands with Zach at the helm.

Stas Bekman

Senior Machine Learning Engineer, Snowflake

Zach's stewardship of Accelerate and managing the intricacies of multiple distributed technologies (while abstracting it into an easy to use API) make Zach the preeminent leader in distributed training. Zach has shown deep understanding of everything from fundamentals to implementation, and is the first person that would come to mind to teach this

Wing Lian

Founder, Axolotl

Zach is truly one in a million. I've never met anyone who puts so much time and thought into crafting deep learning code. With his background and experience, learning from him is an invaluable opportunity.

Radek Osmulski

Senior Data Scientist, NVIDIA

Zach has a strong grasp of the fundamentals of fastai, but what really sets him apart is his ability to teach. He mixes in practical topics throughout his lessons, making every video engaging and worthwhile. With a proven track record of creating high-quality content, I’m confident that any course Zach produces will be worth your time and attention

Kevin Bird

Co-Founder, Problem Solvers Guild

Zach and I used to work together at HuggingFace, since then and through today he’s been building foundational tools for the open ML community to use and learn distributed training techniques. I’ve personally used his tools for years to train models such as OLMo and Tülu along with benefiting from his knowledge to better understand what is going on.

Dr. Nathan Lambert

LLM Post Training Lead, Ai2

Be the first to know about upcoming cohorts