3 Weeks
·Cohort-based Course
Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.
3 Weeks
·Cohort-based Course
Gain the key skills for designing effective agents and optimizing their performance. Dive deep into evaluations, tools, MCP, and RL.
Course overview
Modern teams are under pressure to ship LLM agents that actually work in production, but it's difficult to cut through the noise and determine what actually works. There's an ever-growing number of "agent frameworks" promising great results, yet whose abstractions are opaque and difficult to optimize. Blog posts and one-off repos explain pieces of the puzzle, but AI is moving faster than ever.
Many engineers struggle to:
- Choose the right agent pattern for their use case
- Incorporate reliable tool use into agentic workflows.
- Evaluate where and why agents fail.
- Deploy agents which optimize intelligence, cost, and latency.
- Understand when and how to improve agent performance with finetuning and RL.
We keep hearing that 2025 is the Year of the Agents. Everyone’s talking about MCP and A2A and GRPO but no one seems to agree on when you should use them. Agentic interactions are becoming table-stakes consumer features, and investors are eager to see that you’re keeping up with the times.
Popular agent products like Deep Research, Devin, and Manus are built by companies who don’t want to share their tricks. Open-source alternatives often underperform or are complex to understand and adapt. Textbooks don’t exist yet, and sifting through every new paper is basically a full-time job. The latest API models can make for powerful agents, but costs get out of control quickly. Few people outside of the big AI labs have hands-on expertise in optimizing LLM agents using reinforcement learning. Will and Kyle happen to be two of them.
---
What to expect:
Beyond core principles, this course emphasizes hands-on practice for building production-ready agents, including:
- How to integrate MCP tools for popular services like Notion, Linear, and Slack into your agent applications
- How to build your own MCP servers for custom APIs and data
- How to scaffold and prompt agents for complex tool workflows
- How to evaluate and interactively refine agents with human-in-the-loop prompting
- How to use rule-based and LLM-based evaluations as reward signals for RL or synthetic data filtering
- How to train cost-effective agents which outperform models like o3 at a fraction of the cost using GRPO
The course will have 2x weekly lectures for 3 weeks, and we will have additional sections for office hours (see schedule below). Lecture videos will be available to watch asynchronously, and we'll also have a Discord chat for offline discussions.
Lectures will incorporate live coding/prompting with tools like Cursor, Claude Code, and Jupyter notebooks. Familiarity with Python, high-level AI/ML concepts, and LLM APIs is assumed.
---
Course schedule:
Lecture 1 (6/17)
Agent Patterns and Principles
- ReAct, MemGPT, Agentic RAG, Multi-Agent (A2A)
- Hands-on demos with HF smolagents + other frameworks
Lecture 2 (6/19)
Model Context Protocol: When and Why
- Client/Server architectures for tool calls
- Approaches to auth
- Hands-on agentic MCP flow demos with Claude Desktop + Claude Code etc.
Lecture 3 (6/24)
Evals for Agents
- Extending eval techniques to agentic workflows
- Rule-based vs LLM-as-judge
- Filtering rollouts for synthetic data collection
- Brief demo of SFT on filtered rollouts
Lecture 4 (6/26)
Reinforcement Learning for Busy Engineers
- Crash course in RL fundamentals without the math
- GRPO vs DPO vs PPO
- Demo of GRPO for training a reasoning model (via HF TRL)
Lecture 5 (6/24)
Formulating Business Problems as RL Tasks
- How to think about reward/rubric design for real-world tasks
- Environment = Tasks + Tools + Verifiers
- Walkthrough of problem formulation for email search (via ART)
Lecture 6 (6/24)
Training Agents with GRPO
- Deep dive into RL experimentation for agent workflows (via ART)
- Broader ecosystem: other RL trainers + integrations with existing agent/tool frameworks (smolagents, MCP)
01
A Senior SWE turned AI Engineer at a Series D SaaS company who's eager to replace brittle pipelines with highly-optimized agents
02
A Founder + CTO of a Series A startup who wants to offer a best-in-class agentic AI experience to discerning customers
03
A Technical Director at a Fortune 500 company responsible for evaluating the best approaches and vendors for agentic AI solutions
Understand key concepts and patterns underlying modern LLM agents, and how to choose the right approach for your use case
Build portable, reliable tools for your agents and data using Model Context Protocol (MCP)
Implement your own Research agents, incorporating custom format instructions and data access
Learn the fundamentals of Reinforcement Learning (RL) and how it applies to agents
Formulate your agentic tasks as RL problems, with evaluation metrics that enable learning from reward feedback
Use RL algorithms like Group-Relative Policy Optimization (GRPO) to train agents which outperform frontier models on your tasks
A holistic understanding of modern principles and techniques for designing production-ready agents and optimizing them with RL
9 interactive live sessions
Lifetime access to course materials
6 in-depth lessons
Direct access to instructor
Projects to apply learnings
Guided feedback & reflection
Private community of peers
Course certificate upon completion
Maven Satisfaction Guarantee
This course is backed by Maven’s guarantee. You can receive a full refund within 14 days after the course ends, provided you meet the completion criteria in our refund policy.
Production-Ready Agent Engineering: From MCP to RL
Jun
17
Lesson 1
Jun
19
Lesson 2
Jun
20
Jun
24
Lesson 3
Jun
26
Lesson 4
Jun
27
Jul
1
Lesson 5
Jul
2
Jul
3
Lesson 6
Will is a Research Lead at Prime Intellect, working on advancing the frontier of open-source agentic RL. He was previously a Machine Learning Researcher at Morgan Stanley and an Applied Scientist at AWS, and completed a PhD in Computer Science at Columbia University focused on multi-agent learning.
Kyle is the CTO of OpenPipe, the RL post-training company. Through OpenPipe, he has helped dozens of companies of all sizes train custom models optimized for their tasks. He has previous ML experience at Y Combinator and Google.
Join an upcoming cohort
Cohort 1
$1,000
Dates
Payment Deadline
4-6 hours per week
Tuesdays & Thursdays
5:00pm - 6:30pm EST
June 17 - July 3
2x weekly lectures and at least 1x weekly office hours with instructors
Weekly projects
2 hours per week
Take-home exercises for more hands-on exposure to the week's topics
Sign up to be the first to know about course updates.
Join an upcoming cohort
Cohort 1
$1,000
Dates
Payment Deadline