Reward Hacking 101: Keeping Your Agent Honest
Hosted by Kyle Corbitt
Mon, Jun 23, 2025
5:00 PM UTC (30 minutes)
Virtual (Zoom)
Free to join
By continuing, you agree to Maven's Terms and Privacy Policy.
Go deeper with a course
Mon, Jun 23, 2025
5:00 PM UTC (30 minutes)
Virtual (Zoom)
Free to join
Go deeper with a course
What you'll learn
Spot Reward Hacking Early
Students will learn to recognize and diagnose reward hacking in RL systems and in everyday incentive structures.
Trace-Driven Debugging Skills
Learn to inspect rollout traces with tools like ART or Langfuse to surface and explain unexpected exploit strategies.
Fix & Realign Incentives Fast
Apply quick reward tweaks or auxiliary checks to neutralize hacks and steer models back toward desired objectives.
Why this topic matters
RL is powering everything from chatbots to trading agents, but mis-specified rewards can make them optimize the wrong thing—sometimes with costly or dangerous outcomes. Understanding reward hacking equips builders to spot misalignment early, patch loopholes quickly, and ship trustworthy AI systems.
You'll learn from
Kyle Corbitt
Founder at OpenPipe
Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.
By continuing, you agree to Maven's Terms and Privacy Policy.