nbdistributed: Native torch.distributed in Jupyter Notebooks
Hosted by Zach Mueller
What you'll learn
Learn why torch.distributed and Jupyter is tricky
Understand *why* using Jupyter has historically not been the best interface when training distributed models
Understand why Jupyter is useful
"But notebooks are bad..." Are they? Can we make them more useful? Especially in this context
Understand how to use `nbdistributed`
Most importantly: you will learn how `nbdistributed` helps you write distributed training loops iteratively
Why this topic matters
Using Jupyter Notebooks for quick POC's and debugging is the norm in Data Science. Being able to apply this to distributed PyTorch training is a critical unlock for making tricky errors solved faster and allows for smaller feedback loops to get off the ground sooner. Historically solutions like Hugging Face `accelerate`'s `notebook_launcher` exist, however what if we could find a better way?
You'll learn from
Zach Mueller
Technical Lead @ Hugging Face
I've been in the field for almost a decade now. I first started in the fast.ai community, quickly learning how modern-day training pipelines are built and operated. Then I moved to Hugging Face, where I'm the Technical Lead on the accelerate project and manage the transformers Trainer.
I've written numerous blogs, courses, and given talks on distributed training and PyTorch throughout my career.
Go deeper with a course