Build Production-Ready Agentic-RAG Applications From Scratch

3 Weeks

Cohort-based Course

End-to-end: orchestrate and deploy agentic Retrieval-Augmented Generation with LangGraph, FastAPI, and React frontend in 2 weeks.

Build Production-Ready Agentic-RAG Applications From Scratch

3 Weeks

Cohort-based Course

End-to-end: orchestrate and deploy agentic Retrieval-Augmented Generation with LangGraph, FastAPI, and React frontend in 2 weeks.

Hosted by

Damien Benveniste

Former Meta ML Tech Lead, CEO @ AiEdge

Damien Benveniste

Former Meta ML Tech Lead, CEO @ AiEdge

Previously at

Course overview

From Prototype to Production: Ship Reliable and Scalable RAG Pipelines

The Real-World AI Engineering Roadblocks You Face Today

👋 Prototype → Production Gap — Moving from a notebook demo to a secure, observable, multi-tenant service requires orchestration, evals, guardrails, and ops most teams lack.

👋 “Easy RAG” vs “Reliable RAG” — Anyone can retrieve-then-generate; making answers faithful, fresh, fast, and cost-controlled under real traffic is the hard part.

👋 Framework Overload — The ecosystem is noisy; you need clear criteria (maturity, extensibility, latency, cost) and reference patterns to choose confidently.

👋 It’s Software Engineering First — Success hinges on clean interfaces, tests, typed configs, tracing, CI/CD, and change management—not just prompts and models.

👋 From Laptop to 1M Users — Scaling demands streaming, batching, caching, autoscaling, and SLOs, or your p95 explodes and costs spiral.

How this course will help you

✅ Ship a real Agentic RAG app, not a demo — Stand up an end-to-end stack—LangGraph → FastAPI → React, that runs locally today and deploys via a clean, fork-and-ship monorepo.

✅ Make retrieval dependable, not lucky — Adopt schema-aware chunking, strong dense embeddings with sensible metadata filters, and context packing with citations so answers stay faithful, fresh, and concise.

✅ Harden agentic workflows — Design a typed LangGraph state and build nodes for rewrite → retrieve → rerank → synthesize → cite → safety-check, with retries and timeouts so plans don’t loop or stall.

✅ Scale the experience, not the headaches — Enable server-streaming in FastAPI, cap top-k, trim context budgets, and add early-exit rules; deploy with autoscaling so you can serve real traffic without infra fuss.

✅ See enough to fix things fast — Bake in structured logs (no vendor tracing), per-step timing counters, and UI breadcrumbs/citations to follow query → context → answer and spot common failure patterns quickly.

✅ Choose frameworks with confidence — Follow an opinionated reference architecture plus a simple choice rubric (maturity, extensibility, latency, cost, swap effort) so you know when to stick—and how to swap components without rewrites.

✅ Write maintainable RAG code — Use clean module boundaries (ingest / retrieve / rerank / synthesize), typed configs (Pydantic Settings), and sensible secrets/env management so your team can extend it safely.

You’ll walk away with

✨ A running Agentic RAG app (LangGraph + FastAPI + React) in a fork-and-ship monorepo.

✨ An ingestion/indexing pipeline with metadata, hybrid retrieval, and optional re-ranking.

✨ A chat UI with citations, source previews, and conversation memory that behaves.

✨ Deploy scripts and env templates to go live right after class.

✨ A framework choice memo + adapters to swap models/vector stores without starting over.

Bottom line: this isn’t a vitamin, it’s a blueprint you can put in production.

Who is this course for

ML/LLM engineers ready to ship Agentic RAG: LangGraph + FastAPI + React, GCP deploy, citations, history-safe chat.

Full-stack/platform engineers adding RAG fast: opinionated stack, streaming FastAPI, React UI with citations, GCP deploy.

Founder/solutions architect proving value: grounded chat over your repos/docs, agentic flows, deploy to production.

ML/LLM engineers ready to ship Agentic RAG: LangGraph + FastAPI + React, GCP deploy, citations, history-safe chat.

Full-stack/platform engineers adding RAG fast: opinionated stack, streaming FastAPI, React UI with citations, GCP deploy.

Founder/solutions architect proving value: grounded chat over your repos/docs, agentic flows, deploy to production.

Prerequisites

Python 3.10+ fundamentals (functions, modules, async)
The backend and LangGraph nodes are Python; basics help you focus on RAG patterns instead of syntax or environment setup.
JavaScript basics (ES6 syntax and async)
Used for the minimal web UI and examples; JS fluency helps you send requests, handle streaming responses, and add simple interactions.

What you’ll get out of this course

Orchestrate complex RAG pipelines with LangGraph and OpenAI API

Build a typed LangGraph that routes rewrite → retrieve → rerank → synthesize → cite → self-check with retries, timeouts, early-exit rules, and real tool calls—exposed as a clean HTTP API.

Build scalable asynchronous applications with FastAPI

Ship async FastAPI endpoints, well-typed request/response models, input validation, and sensible timeouts—ready to run locally and deploy to production.

Implement chatbot interfaces with React

Create a chat UI that shows citations and source previews, lets users scope queries, preserves safe chat history, and handles transient API errors gracefully.

Mitigate hallucinations with LLM judges, structured output, and context engineering

Cut errors via schema-aware chunking, dedupe and budgeted context packing, plus lightweight LLM checks and schema-constrained outputs to verify claims and enforce citations before responding.

Design effective LLM prompts for high-level control on generation output

Write prompts that steer behavior: system prompts, task decomposition, Pydantic/JSON-schema constraints, and clear rules for tone, citations, and safe refusals.

Develop end-to-end RAG applications using the software engineering best practices

Produce a maintainable codebase: clean module boundaries (ingest/retrieve/rerank/synthesize), typed configs, secrets/env management, reproducible local dev, and deploy that mirrors local.

What’s included

Live sessions

Learn directly from Damien Benveniste in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

Week 1

Sep 27—Sep 28

Sep

Session 1: Parsing the Data

Sat 9/274:00 PM—7:00 PM (UTC)

Sep

Session 2: Building the Indexing Pipeline

Sun 9/284:00 PM—7:00 PM (UTC)

Week 2

Sep 29—Oct 5

Oct

Session 3: Building the Basic RAG Pipeline

Sat 10/44:00 PM—7:00 PM (UTC)

Oct

Session 4: Building the Agentic RAG Pipeline

Sun 10/54:00 PM—7:00 PM (UTC)

Week 3

Oct 6—Oct 12

Oct

Session 5: Building the Backend Application

Sat 10/114:00 PM—7:00 PM (UTC)

Oct

Session 6: Building the Frontend Application

Sun 10/124:00 PM—7:00 PM (UTC)

What students are saying

Meet your instructor

Damien Benveniste

Welcome, my name is Damien Benveniste! After a Ph.D. in theoretical Physics, I started my career in Machine Learning more than 10 years ago.

I have been a Data Scientist, Machine Learning Engineer, and Software Engineer. I have led various Machine Learning projects in diverse industry sectors such as AdTech, Market Research, Financial Advising, Cloud Management, online retail, marketing, credit score modeling, data storage, healthcare, and energy valuation. Previously, I was a Machine Learning Tech Lead at Meta on the automation at scale of model optimization for Ads ranking.

I am now training the next generation of Machine Learning engineers.

Be the first to know about upcoming cohorts