Build Mixture-of-Experts for LLMs with PyTorch from Scratch

Lightning Lessons

Practical skills & tools to accelerate your career

Building AI-Native Products

Build Mixture-of-Experts for LLMs with PyTorch from Scratch

Hosted by Damien Benveniste, PhD

Fri, Jun 6, 2025

4:30 PM UTC (1 hour)

Virtual (Zoom)

Free to join

Learn directly from undefined

By continuing, you agree to Maven's Terms and Privacy Policy.

Invite your network

Go deeper with a course

Build Production-Ready LLMs From Scratch

Damien Benveniste

View syllabus

Fri, Jun 6, 2025

4:30 PM UTC (1 hour)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Build Production-Ready LLMs From Scratch

Damien Benveniste

View syllabus

What you'll learn

Demystify Mixture‑of‑Experts (MoE) Routing

Map gating math to PyTorch and see how tokens pick top experts to scale models efficiently.

Build a Sparse MoE Layer From Scratch

Code Top‑K routing, scatter‑gather dispatch, and merge expert outputs in one CUDA‑ready block.

Add Load‑Balancing Loss for Stable Training

Integrate the aux loss that equalizes expert usage to prevent stragglers during large‑scale runs.

Why this topic matters

Implementing Mixture‑of‑Experts reveals how sparse routing expands model capacity while controlling compute. You’ll translate gating math, dispatch logic, and load‑balancing loss into PyTorch, gaining skills to adapt, debug, and scale Transformers on modest hardware. Leave with runnable code and insight into frontier architectures.

You'll learn from

Damien Benveniste, PhD

Former Meta ML Tech Lead, CEO @ AiEdge

Welcome, my name is Damien Benveniste! After a Ph.D. in theoretical Physics, I started my career in Machine Learning more than 10 years ago.

I have been a Data Scientist, Machine Learning Engineer, and Software Engineer. I have led various Machine Learning projects in diverse industry sectors such as AdTech, Market Research, Financial Advising, Cloud Management, online retail, marketing, credit score modeling, data storage, healthcare, and energy valuation. Previously, I was a Machine Learning Tech Lead at Meta on the automation at scale of model optimization for Ads ranking.

I am now training the next generation of Machine Learning engineers.

Previously at