5 Weeks
·Cohort-based Course
Master the journey from prototype to production with large-scale model training, scaling, and deployment.
This course is popular
21 people enrolled last week.
5 Weeks
·Cohort-based Course
Master the journey from prototype to production with large-scale model training, scaling, and deployment.
This course is popular
21 people enrolled last week.
Previously at
Course overview
Master distributed training and real-world scale techniques from top engineers.
Whether you're an ML engineer looking to move beyond single-GPU experiments, or a product leader seeking to understand the language your AI team speaks, this course will give you the hands-on skills and conceptual clarity to operate confidently at scale.
What Makes This Course Different
This started as a distributed training course. It organically grew into an all-encompassing learning event with world-class speakers who bring deep, real-world experience in modern large-scale training.
The distributed training curriculum remains intact: five hands-on workshops covering today’s core scale-up methods.
Now, each week features hand-tailored guest lectures from top engineers at Hugging Face, Meta, Snowflake, and more.
These expert sessions are aligned to each workshop’s topic, helping you bridge theory with modern production practices.
All course materials and recordings are available after enrollment, and you’ll get free lifetime access to future cohorts.
💡 What’s included:
5 core workshops
14+ guest talks across 3 curated tracks
Weekly live office hours
Community collaboration
Compute credits from Hugging Face & Modal
100% money-back guarantee (within 14 days of course completion)
📣 Conference Talks
Applied Track
Hear how industry leaders are solving real-world scale problems:
Robert Nishihara (Ray, Anyscale): Scaling across thousands of GPUs with Ray
Sami Jaghouar (Prime Intellect): Decentralized global-scale training
Tunji Ruwase (Snowflake): Efficient long-context training with Arctic
Prince Canuma: Local ML workloads using Apple Silicon + MLX
Pretraining Track
Deep dives into LLM pretraining at scale:
Phuc Nguyen (Hugging Face): A practitioner's guide to FP8
Elie Bakouch (Hugging Face): Hyper-optimizing LLMs with MoE, MLA & more
Daniel Han (UnslothAI): Speeding up training with Triton & custom kernels
🧠 Distributed Training Course
Learn the foundations and modern techniques used in real-world LLM scaleups across hundreds or thousands of GPUs.
5 instructor-led workshops:
DDP from scratch and avoiding data bottlenecks
ZeRO (Part 1): How model sharding enables scale
ZeRO (Part 2): Efficiency tradeoffs and stage comparison
Pipeline & Tensor Parallelism: Solving communication slowdowns
Multi-Dimensional Parallelism: Combining all methods for throughput
Guest Lectures include:
Sylvain Gugger (Jane Street): Overview of ZeRO
Wanchao Liang (TorchTitan): DTensor and large-scale pretraining
Ferdinand Mom (Hugging Face): Multi-dimensional parallelism
Less Wright (Meta): Async TensorParallelism
Matej Sirovatka (Hugging Face): Expert Parallelism for MoE
Marc Sun (Hugging Face): Deployment strategies at scale
🚀 Free Compute & Tools
Get hands-on with real-scale training from Day 1.
We’re proud to be sponsored by:
🤗 Hugging Face — 6 months Pro access
⚙️ Modal — $500 in compute credits
More partnerships coming soon
✅ Guarantee
If you're not satisfied, we offer a 100% refund up to 14 days after the course ends. No risk, just learning.
01
Recent graduates and beginner Machine Learning Engineers wanting to know the tools of the trade used today for modern model training
02
Senior ML Engineers dropped into the world of Large LLMs that need to know the parts to focus on when it comes to modernizing your stack
03
Project managers leading ML teams who need to speak the language of scale, efficiency, and delivery in today’s AI training world
Generally should be familiar with core tensor operations and how a model gets made with PyTorch
We’re going to be using lots of operations from torch.distributed. I’ll be teaching them to you, but know the core operations for tensors
Understand how model training works on single GPU and the full flow (data -> outputs -> gradients -> backprop) are necessary
Listen to top experts in the field
This conference is home to over a dozen world experts in the field of DL and ML when it comes to distributed training, all centered in one location just for you
Understand not just what distributed training is, but become an expert in it
I don't want you to take this course and go "okay, I think I get what's happening here." I want you to walk away feeling knowledgeable enough to where if someone went up to you and said "here's 1,000 GPUs for a day, do something" you can move into action immediately
Deep understanding of different parallelization strategies
This won't be a surface level course teaching you "how to use torch.distributed.FSDP". We're going to understand it from the ground-up.
Train a few models on multiple GPUs
Above all, I'm going to make sure everyone gets experience training in a distributed fashion by the end of this course on at least one model through the homework.
Hands-On Exercises, Examples, and Code
This is not a course where I bore you with a slide deck the entire time (though for some it might be needed). Instead we are down in the weeds of code, having you implement along with me.
Personalized Instruction
Generous office hours ensure that students can ask questions about their specific issues, interests, and needs.
Live sessions
Learn directly from Zachary Mueller in a real-time, interactive format.
Lifetime access
Go back to course content and recordings whenever you need to.
Compute Credits
$500 in Modal compute credits, 6 months of Hugging Face Pro
Course notebooks
Detailed course notebooks and material with maticulous notes to help walk you through the material and learn along the way
Community of peers
Stay accountable and share insights with like-minded professionals.
Certificate of completion
Share your new skills with your employer or on LinkedIn.
Generous office hours
I'll be making myself available to you for feedback, questions, and anything else I can help with
Maven Guarantee
This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.
Sep
2
Sep
4
Sep
9
Sep
16
Sep
23
Sep
25
Sep
30
Oct
2
Free Access to Part of Lesson 1
Hi there! To help you get a good grasp on how the course will be oriented and an idea on what some of the content looks like, I can share with you an exclusive preview into what the course webpage will be and how some of the content is shaped. I've worked hard to make sure Quarto and Jupyter will help me create educational material that will wow you, so let me know if it does!
(Note: this material preview may change as the course develops, but only for additive purposes)
Get access to the webpage
Hamel Husain
Mark Saroufim
Stas Bekman
Wing Lian
Radek Osmulski
Kevin Bird
Dr. Nathan Lambert
I've been in the field for almost a decade now. I first started in the fast.ai community, quickly learning how modern-day training pipelines are built and operated. Then I moved to Hugging Face, where I'm the Technical Lead on the accelerate project and manage the transformers Trainer.
I've written numerous blogs, courses, and given talks on distributed training and PyTorch throughout my career.
Through this experience, I've condensed down almost a decade of learning to this course, and I'm excited to bring you all with me for the learning journey
Join an upcoming cohort
Cohort 1
$1,500
Dates
Payment Deadline
Join an upcoming cohort
Cohort 1
$1,500
Dates
Payment Deadline