Collaborative AI Evals with Human Feedback

Hosted by Rogério Chaves

Tue, Mar 3, 2026

4:00 PM UTC (30 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

AI Evals and Analytics Playbook
Stella Liu and Amy Chen
View syllabus

What you'll learn

Why collaboration is essential to AI evals

Why running evals is a team sport. PMs, data scientists, and SMEs all need a seat at the table, not just the engineers.

Human feedback loops before scaling LLM-as-a-judge

Collecting and calibrating human annotations before handing off judgment to automated evaluators at scale.

Design evals and simulations for non-technical teammates

Create, review, & iterate on evals without needing to write code or understand technical details

Real-world customer stories from Langwatch

Hear customer stories from LangWatch on why cross-functional feedback is key to successful evals

Why this topic matters

Most teams automate their evals too early. Human feedback loops are necessary in order to make automated evals scores trustworthy. When developers, PMs, and SMEs can't easily collaborate on evals, teams end up optimizing for metrics that don't reflect real product quality. This session shows you how to bridge that gap with practical workflows and a live demo of tools designed for the whole team.

You'll learn from

Rogério Chaves

CTO & Cofounder at Langwatch

Rogério Chaves is CTO and cofoudner of LangWatch. He is an expert in LLM evaluation, agent testing, and LLMOps. He created the Agent Testing Pyramid, which covers unit tests for AI evals, probabilistic evals, and end-to-end agent simulation.

LinkedIn

LangWatch
Booking.com
Thoughtworks

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.