Reward Hacking 101: Keeping Your Agent Honest

Hosted by Kyle Corbitt

128 students

In this video

What you'll learn

Spot Reward Hacking Early

Students will learn to recognize and diagnose reward hacking in RL systems and in everyday incentive structures.

Trace-Driven Debugging Skills

Learn to inspect rollout traces with tools like ART or Langfuse to surface and explain unexpected exploit strategies.

Fix & Realign Incentives Fast

Apply quick reward tweaks or auxiliary checks to neutralize hacks and steer models back toward desired objectives.

Why this topic matters

RL is powering everything from chatbots to trading agents, but mis-specified rewards can make them optimize the wrong thing—sometimes with costly or dangerous outcomes. Understanding reward hacking equips builders to spot misalignment early, patch loopholes quickly, and ship trustworthy AI systems.

You'll learn from

Kyle Corbitt

Founder at OpenPipe

Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Share this lesson

128 students

Share this lesson

128 students

Go deeper with a course

Production-Ready Agent Engineering: From MCP to RL