Reward Hacking 101: Keeping Your Agent Honest

Hosted by Kyle Corbitt

Mon, Jun 23, 2025

5:00 PM UTC (30 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Production-Ready Agent Engineering: From MCP to RL
Will Brown and Kyle Corbitt
View syllabus

What you'll learn

Spot Reward Hacking Early

Students will learn to recognize and diagnose reward hacking in RL systems and in everyday incentive structures.

Trace-Driven Debugging Skills

Learn to inspect rollout traces with tools like ART or Langfuse to surface and explain unexpected exploit strategies.

Fix & Realign Incentives Fast

Apply quick reward tweaks or auxiliary checks to neutralize hacks and steer models back toward desired objectives.

Why this topic matters

RL is powering everything from chatbots to trading agents, but mis-specified rewards can make them optimize the wrong thing—sometimes with costly or dangerous outcomes. Understanding reward hacking equips builders to spot misalignment early, patch loopholes quickly, and ship trustworthy AI systems.

You'll learn from

Kyle Corbitt

Founder at OpenPipe

Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Learn directly from Kyle Corbitt

By continuing, you agree to Maven's Terms and Privacy Policy.

© 2025 Maven Learning, Inc.