Build Production-Ready LLMs From Scratch

New
·

6 Weeks

·

Cohort-based Course

A 6‑week, live bootcamp for ML engineers to architect, fine‑tune, and deploy scalable LLM applications through six real‑world projects

Previously at

Meta
Medallia
Rackspace Technology
Bluestem Brands
Dell

Course overview

From Prototype to Production: Ship Scalable LLM Systems in 6 Weeks

The Real-World LLM Engineering Roadblocks You Face Today


👋 Transitioning from General ML to LLM Specialization: You’ve built recommendation engines or classifier models, but moving into Transformer‑centric development feels like learning a whole new discipline—no clear roadmap exists.

👋 Lack of LLM‑Specific Career Path: You see “LLM Engineer” roles popping up on LinkedIn, but your current CV only shows “Data Scientist” or “ML Engineer.” You need hands‑on projects and artifacts to credibly make the jump.

👋 Career Stalled by “Academic” Skillset: You can recite Transformer papers, but when asked, “Have you shipped an LLM feature end‑to‑end?” you have no answer—and no portfolio to prove it!

👋 Prototype Meltdown Under Production Load: You’ve fine‑tuned a small model locally, but when you switch from 1 to 100 concurrent requests, your GPU memory spikes and inference grinds to a halt, because you’ve never applied continuous batching, KV caching, or paged‑attention in a live setting.

👋 RAG Integration Headaches: Turning a standalone model into a live, Retriever‑Augmented Generation service becomes a multi‑week integration nightmare.


How this course will help you


Because we’ve packaged every stage of the LLM lifecycle, from career transition to production rollout, into a six‑week bootcamp that:

Guides Your Career Pivot: You’ll emerge with six polished GitHub projects, a deployment playbook, and RAG demos that transform your resume from “ML generalist” to “LLM Specialist.”

Attacks Each Pain‑Point Head‑On: Attacks each pain point head‑on with six job‑mirroring projects (from scratch → RLHF → scaling → deployment → RAG), so you never waste time on dead‑end tutorials

Live Code‑Along Workshops & Office Hours: Tackle your own fine‑tuning bugs, scaling hiccups, and deployment errors alongside Damien in dedicated sessions, so you get hands‑on fixes for the exact issues you’ll face on the job.

Ready‑to‑Use Repos & Playbooks: Grab our curated starter code, development scripts, deployment templates, and debugging checklists, so you can plug them straight into your next project without reinventing the wheel.

A Portfolio of Six Production‑Grade Projects: Leave with six end‑to‑end deliverables, from a Transformer built from scratch to a live RAG API, ready to showcase on GitHub, in performance reviews, or to hiring managers.


No more scattered blog-hopping or generic bootcamps, this is the only cohort where you’ll master Transformer internals and ship production‑grade LLM systems while making the career leap you’ve been aiming for.


What You’ll Actually Build and Ship


Across six hands‑on projects, you’ll deliver deployable LLM components and applications, no fluff, just job‑ready code:

A Modern Transformer Architecture from scratch: Implement a sliding‑window multihead attention to slash O(N²) to O(N·w), RoPE for relative positional encoding, and the Mixture-of-Expert architecture for improved performance, all in PyTorch.

Instruction‑Tuned LLM: Fine‑tune a model with supervised learning, RLHF, DPO, and ORPO for instruction following on a real benchmark and compare performance gains.

Scalable Training Pipeline: Containerize a multi‑GPU job with DeepSpeed ZeRO on SageMaker to maximize throughput and minimize cost.

Extended‑Context Model: Modify RoPE scaling, apply 4/8‑bit quantization, and inject LoRA adapters to double your context window.

Multi‑Mode Deployment: Stand up a Hugging Face endpoint, a vLLM streaming API, and an OpenAI‑compatible server, all Dockerized and optimized for low latency.

End‑to‑End RAG Chat App: Build a FastAPI backend with conversational memory and a Streamlit UI for live Retrieval‑Augmented Generation.


By the end of Week 6, you won’t just know these techniques, you’ll have shipped six production‑grade artifacts, each reflecting the exact pipelines, optimizations, and deployment routines you’ll use on the job.


Live & Recorded Content: Reinforce, Deepen, Accelerate


12 Interactive Live Workshops (3 hrs each): Each session follows the Concept → Code flow. I’ll introduce the day’s core topic (e.g. self-attention, LoRA, vLLM optimizations, ...), and we’ll implement the features step‑by‑step in code so you see exactly how theory maps to code. Bring your questions!

10 + Hours of On‑Demand Deep‑Dive Lectures: Short videos (10–20 min) on Transformer internals, fine-tuning tricks, deployment optimizations. Watch before each project to hit the ground running. Step through every line of code at your own pace; perfect for review or catching up if you miss a live session. Downloadable slide decks, annotated notebooks, and cheat sheets you’ll reference long after graduation.


Why This Matters: Live workshops turn recorded concepts into actionable skills. You’ll see how theory maps directly onto code, get instant feedback, and internalize best practices. Then, recorded lectures become your asynchronous safety net, letting you revisit tricky topics, prepare for upcoming labs, and solidify your understanding on demand.

Who is this course for

01

Senior ML Engineers upgrading to LLMs, frustrated by prototypes that crumble under real‑world load and runaway inference costs

02

Data Scientists moving into LLM workflows, stalled by opaque fine-tuning pipelines and multi‑week integration headaches.

03

Recent CS Graduates specializing in LLMs, eager to move beyond theory but lacking hands‑on experience

What you’ll get out of this course

Deliver Six Production‑Grade LLM Artifacts

  • Implement modern Transformer architecture from Scratch
  • Fine-tune LLMs for instruction following
  • Build a scalable training pipeline
  • Fine-tune LLM to extend context window with RoPE and QLoRA
  • Deploy API endpoints for production-ready LLM
  • Build end‑to‑end RAG chat app

Tackle Real‑World LLM Challenges

Spend dedicated lab sessions wrestling with production‑scale puzzles, OOMs, tail‑latency spikes, and integration snags, until you persevere and ship battle‑tested solutions.

Master Core Concepts in Live Code‑Along Workshops

12 interactive sessions where you’ll implement each module live, ask questions, troubleshoot your project code, and internalize battle‑tested best practices.

Reinforce with On‑Demand Deep‑Dive Lectures

10+ hours of short videos covering Transformer math, ZeRO strategies, vLLM optimizations, and more. Complete with annotated notebooks and slides.

Peer & Instructor Code Reviews

GitHub‑style feedback rounds on your project submissions

Lifetime Access & Private Community

Revisit recordings anytime, join a Slack of LLM engineers, share new tactics, and continue growing long after the cohort wraps.

What’s included

Damien Benveniste

Live sessions

Learn directly from Damien Benveniste in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

12 live sessions • 64 lessons • 6 projects

Week 1

Jul 12—Jul 13

    The Transformer Architecture

    • Jul

      12

      Architecture Basics

      Sat 7/129:00 PM—12:00 AM (UTC)
    • Jul

      13

      Implement The Transformer Architecture From Scratch

      Sun 7/139:00 PM—12:00 AM (UTC)
    17 more items

    Project 1

    1 item

Week 2

Jul 14—Jul 20

    Training LLMs to Follow Instructions

    • Jul

      19

      Pre-training - Supervised Learning Fine Tuning - RLHF

      Sat 7/199:00 PM—12:00 AM (UTC)
    • Jul

      20

      Training Implementation

      Sun 7/209:00 PM—12:00 AM (UTC)
    8 more items

    Project 2

    1 item

Week 3

Jul 21—Jul 27

    How to Scale Model Training

    • Jul

      26

      CPU vs GPU vs TPU - The GPU Architecture - Distributed Training

      Sat 7/269:00 PM—12:00 AM (UTC)
    • Jul

      27

      Data Parallelism - Model Parallelism - Zero Redundancy Optimizer Strategy

      Sun 7/279:00 PM—12:00 AM (UTC)
    8 more items

    Project 3

    1 item

Week 4

Jul 28—Aug 3

    How to Fine-Tune LLMs

    • Aug

      2

      The different fine-tuning learning tasks - Catastrophic forgetting

      Sat 8/29:00 PM—12:00 AM (UTC)
    • Aug

      3

      LoRA and QLoRA Adapters

      Sun 8/39:00 PM—12:00 AM (UTC)
    11 more items

    Project 4

    1 item

Week 5

Aug 4—Aug 10

    How to Deploy LLMs

    • Aug

      9

      The Deployment Strategies - Multi-LoRA - The Text Generation Layer

      Sat 8/99:00 PM—12:00 AM (UTC)
    • Aug

      10

      Streaming Applications - Continuous Batching - KV-Caching - The Paged-Attention - vLLM

      Sun 8/109:00 PM—12:00 AM (UTC)
    9 more items

    Project 5

    1 item

Week 6

Aug 11—Aug 17

    Building the Application Layer

    • Aug

      16

      Implementing and Optimizing a Retriever Augmented Generation (RAG) pipeline

      Sat 8/169:00 PM—12:00 AM (UTC)
    • Aug

      17

      Productionizing a RAG Pipeline with FastAPI

      Sun 8/179:00 PM—12:00 AM (UTC)
    11 more items

    Project 6

    1 item

Meet your instructor

Damien Benveniste

Damien Benveniste

Welcome, my name is Damien Benveniste! After a Ph.D. in theoretical Physics, I started my career in Machine Learning more than 10 years ago.


I have been a Data Scientist, Machine Learning Engineer, and Software Engineer. I have led various Machine Learning projects in diverse industry sectors such as AdTech, Market Research, Financial Advising, Cloud Management, online retail, marketing, credit score modeling, data storage, healthcare, and energy valuation. Previously, I was a Machine Learning Tech Lead at Meta on the automation at scale of model optimization for Ads ranking.


I am now training the next generation of Machine Learning engineers.


A pattern of wavy dots

Join an upcoming cohort

Build Production-Ready LLMs From Scratch

Cohort 2

$1,500

Dates

July 12—Aug 18, 2025

Payment Deadline

July 11, 2025
Get reimbursed

Course schedule

4-6 hours per week

  • Saturdays & Sundays

    2:00pm - 5:00pm PST

    Live sessions - every session will be recorded and available after

  • Weekly projects

    2 hours per week

    Hands-on projects with community support for maximum learning

Free resource

The Attention Is All You Need: The Full Guide

This is a complete guide on the original Transformer architecture and attention mechanisms! This includes 40 pages of every detail you need to know to start your journey in the world of Large Language Models.

Get the Guide

Learning is better with cohorts

Learning is better with cohorts

Active hands-on learning

This course builds on live workshops and hands-on projects

Interactive and project-based

You’ll be interacting with other learners through breakout rooms and project teams

Learn with a cohort of peers

Join a community of like-minded people who want to learn and grow alongside you

Frequently Asked Questions

Stay in the loop

Sign up to be the first to know about course updates.

A pattern of wavy dots

Join an upcoming cohort

Build Production-Ready LLMs From Scratch

Cohort 2

$1,500

Dates

July 12—Aug 18, 2025

Payment Deadline

July 11, 2025
Get reimbursed

$1,500

6 Weeks