nbdistributed: Native torch.distributed in Jupyter Notebooks

Lightning Lessons

Practical skills & tools to accelerate your career

nbdistributed: Native torch.distributed in Jupyter Notebooks

Hosted by Zach Mueller

113 students

What you'll learn

Learn why torch.distributed and Jupyter is tricky

Understand *why* using Jupyter has historically not been the best interface when training distributed models

Understand why Jupyter is useful

"But notebooks are bad..." Are they? Can we make them more useful? Especially in this context

Understand how to use `nbdistributed`

Most importantly: you will learn how `nbdistributed` helps you write distributed training loops iteratively

Why this topic matters

Using Jupyter Notebooks for quick POC's and debugging is the norm in Data Science. Being able to apply this to distributed PyTorch training is a critical unlock for making tricky errors solved faster and allows for smaller feedback loops to get off the ground sooner. Historically solutions like Hugging Face `accelerate`'s `notebook_launcher` exist, however what if we could find a better way?

You'll learn from

Zach Mueller

Technical Lead @ Hugging Face

I've been in the field for almost a decade now. I first started in the fast.ai community, quickly learning how modern-day training pipelines are built and operated. Then I moved to Hugging Face, where I'm the Technical Lead on the accelerate project and manage the transformers Trainer.

I've written numerous blogs, courses, and given talks on distributed training and PyTorch throughout my career.