The Distributed Training Taxonomy: DP, PP, and more

Hosted by Zach Muelller

87 students

What you'll learn

Just what are these weird names?

Pipeline parallelism? Tensor parallelism? ZeRO? We'll learn at the high level what these all are

Focus on the idea, not the code

I'm not here to overwhelm you with code implementations and applications. This is a brief introduction to their ideas.

When should I use what?

Figuring out the best topology of strategies is complicated. I'll help guide that decision for you

Why this topic matters

In the modern training world, words like "PIpeline Parallelism", "ZeRO", "Context Parallelism" get thrown around like they're "known." But... what even are they? We're going to cover at a high level what each of them do, how they differ, and which situation calls for each of them

You'll learn from

Zach Muelller

Instructor, Technical Lead at Hugging Face

I've been in the field for almost a decade now. I first started in the fast.ai community, quickly learning how modern-day training pipelines are built and operated. Then I moved to Hugging Face, where I'm the Technical Lead on the accelerate project and manage the transformers Trainer.


I've written numerous blogscourses, and given talks on distributed training and PyTorch throughout my career.

Hugging Face
Accenture
© 2025 Maven Learning, Inc.