The Distributed Training Taxonomy: DP, PP, and more

Hosted by Zach Muelller

Wed, Jul 16, 2025

3:00 PM UTC (45 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

From Scratch to Scale: Large Scale Training in the Modern World
Zachary Mueller
View syllabus

What you'll learn

Just what are these weird names?

Pipeline parallelism? Tensor parallelism? ZeRO? We'll learn at the high level what these all are

Focus on the idea, not the code

I'm not here to overwhelm you with code implementations and applications. This is a brief introduction to their ideas.

When should I use what?

Figuring out the best topology of strategies is complicated. I'll help guide that decision for you

Why this topic matters

In the modern training world, words like "PIpeline Parallelism", "ZeRO", "Context Parallelism" get thrown around like they're "known." But... what even are they? We're going to cover at a high level what each of them do, how they differ, and which situation calls for each of them

You'll learn from

Zach Muelller

Instructor, Technical Lead at Hugging Face

I've been in the field for almost a decade now. I first started in the fast.ai community, quickly learning how modern-day training pipelines are built and operated. Then I moved to Hugging Face, where I'm the Technical Lead on the accelerate project and manage the transformers Trainer.


I've written numerous blogscourses, and given talks on distributed training and PyTorch throughout my career.

Hugging Face
Accenture

Learn directly from Zach Muelller

By continuing, you agree to Maven's Terms and Privacy Policy.

© 2025 Maven Learning, Inc.