5 Weeks
·Cohort-based Course
Learn the techniques used today from world-class researchers and engineers from Meta, Ray, Hugging Face, and more
This course is popular
17 people enrolled last week.
5 Weeks
·Cohort-based Course
Learn the techniques used today from world-class researchers and engineers from Meta, Ray, Hugging Face, and more
This course is popular
17 people enrolled last week.
Hosted by
Zachary Mueller
🤗 accelerate Technical Lead with a decade of experience
With speakers from
Course overview
Master distributed training and real-world scale techniques from top engineers.
Whether you're an ML engineer looking to move beyond single-GPU experiments, or a product leader seeking to understand the language your AI team speaks, this course will give you the hands-on skills and conceptual clarity to operate confidently at scale.
What Makes This Course Different
This started as a distributed training course. It organically grew into an all-encompassing learning event with world-class speakers who bring deep, real-world experience in modern large-scale training.
The distributed training curriculum remains intact: five hands-on workshops covering today’s core scale-up methods.
Now, each week features hand-tailored guest lectures from top engineers at Hugging Face, Meta, Snowflake, and more.
These expert sessions are aligned to each workshop’s topic, helping you bridge theory with modern production practices.
All course materials and recordings are available after enrollment, and you’ll get free lifetime access to future cohorts.
💡 What’s included:
5 core workshops
15+ guest talks across 3 curated tracks
Weekly live office hours
Class discord with lifetime access
Over $500 in credits from Modal and Hugging Face
100% money-back guarantee (within 14 days of course completion)
🏕️ Fireside Chats
Hear from the experts on their real experiences trying to take models to scale, the challenges, and the discoveries
Yuxiang Wei (Meta FAIR)
📣 Conference Talks
Applied Track
Hear how industry leaders are solving real-world scale problems:
Robert Nishihara (Ray, Anyscale): Scaling across thousands of GPUs with Ray
Sami Jaghouar (Prime Intellect): Decentralized global-scale training
Tunji Ruwase (Snowflake): Efficient long-context training with Arctic
Prince Canuma: Local ML workloads using Apple Silicon + MLX
Pretraining Track
Deep dives into LLM pretraining at scale:
Phuc Nguyen (Hugging Face): A practitioner's guide to FP8
Elie Bakouch (Hugging Face): Hyper-optimizing LLMs with MoE, MLA & more
Daniel Han (UnslothAI): Speeding up training with Triton & custom kernels
🧠 Distributed Training Course
Learn the foundations and modern techniques used in real-world LLM scaleups across hundreds or thousands of GPUs.
5 instructor-led workshops:
DDP from scratch and avoiding data bottlenecks
ZeRO (Part 1): How model sharding enables scale
ZeRO (Part 2): Efficiency tradeoffs and stage comparison
Pipeline & Tensor Parallelism: Solving communication slowdowns
Multi-Dimensional Parallelism: Combining all methods for throughput
Guest Lectures include:
Sylvain Gugger (Jane Street): Overview of ZeRO
Wanchao Liang (TorchTitan): DTensor and large-scale pretraining
Wing Lian (Axolotl): 2D Parallelism with Axolotl
Ferdinand Mom (Hugging Face): Multi-dimensional parallelism with nanotron
Less Wright (Meta): Async TensorParallelism
Matej Sirovatka (Hugging Face): Expert Parallelism for MoE
Marc Sun (Hugging Face): Deployment strategies at scale
🚀 Free Compute & Tools
Get hands-on with real-scale training from Day 1.
We’re proud to be sponsored by:
🤗 Hugging Face — 6 months Pro access
⚙️ Modal — $500 in compute credits
More partnerships coming soon
✅ Guarantee
If you're not satisfied, we offer a 100% refund up to 14 days after the course ends. No risk, just learning.
01
Beginner to intermediate MLE’s wanting to make sure they have skills that are relevant in today’s market
02
Senior engineers tired of piecing together half-solutions from publications, frameworks, and more.
03
Team leads who want confidence that engineers can execute at scale without burning time.
04
CTOs who need to make fast, informed decisions on how to scale LLMs.
I don't want you to be an expert. But you should have some mild experience training a model of some capacity, be it PyTorch, TF, and such
I'm not here to teach you matrix calculus, nor will we be going that advanced. However some amount of core math is still needed
PyTorch is in Python, so having some experience with the language will do you well since the whole course is in it
Train 100B+ models across 8–1,000 GPUs efficiently
You’ll understand the core problems teams face during large-scale training and how to avoid them using proven methods.
Build real-world experience with modern training techniques
You won’t just watch; you’ll train models using DDP, ZeRO, pipeline parallelism, and more. Each one applied in code.
Understand which training methods to use and when
You’ll learn how to match technique to context. Whether it’s model size, hardware limits, or team constraints, you’ll know what fits.
Be ready before training becomes your bottleneck
Most teams wait too long to prepare for scale. This course makes sure you’re ready before your current training setup stops working.
Go from scattered tutorials to production-ready training skills
You’ll connect theory with practice and walk away with working knowledge you can apply in real systems.
Personalized Instruction
Generous office hours ensure that students can ask questions about their specific issues, interests, and needs.
Live sessions
Learn directly from Zachary Mueller in a real-time, interactive format.
Lifetime access
Go back to course content and recordings whenever you need to, and have access to all future cohorts
Generous office hours
Bring your blockers to office hours and leave with answers. Get feedback, debug help, and real support when you need it.
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Course notebooks
Detailed course notebooks and material with maticulous notes to help walk you through the material and learn along the way
Compute Credits
$500 in Modal compute credits, 6 months of Hugging Face Pro
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
Sep
2
Sep
3
Sep
4
Sep
5
Sep
9
Sep
9
Sep
10
Sep
11
Sep
11
Sep
16
Sep
17
Sep
17
Sep
18
Sep
23
Sep
25
Sep
24
Sep
24
Sep
30
Oct
2
Oct
2
Distributed Training Lexicon
The Distributed Training Lexicon is a free resource of 49 different distributed training terms with pairing definitions and some visualizations to go with it. The goal is to have a very quick cheatsheet to look at when needing a reminder of what certain methods are.
Download it for free
Free Access to Part of Lesson 1
Hi there! To help you get a good grasp on how the course will be oriented and an idea on what some of the content looks like, I can share with you an exclusive preview into what the course webpage will be and how some of the content is shaped. I've worked hard to make sure Quarto and Jupyter will help me create educational material that will wow you, so let me know if it does!
(Note: this material preview may change as the course develops, but only for additive purposes)
Get access to the webpage
Hamel Husain
Mark Saroufim
Stas Bekman
Wing Lian
Radek Osmulski
Kevin Bird
Dr. Nathan Lambert
Join an upcoming cohort
Cohort 1
$1,500
Dates
Payment Deadline
Join an upcoming cohort
Cohort 1
$1,500
Dates
Payment Deadline