Optimize Your Dev Setup For Evals w/ Cursor Rules & MCP

Lightning Lessons

Practical skills & tools to accelerate your career

Optimize Your Dev Setup For Evals w/ Cursor Rules & MCP

Hosted by Isaac Flath, Hamel Husain, and Shreya Shankar

667 students

What you'll learn

How to use MCP context for AI evaluation frameworks

Configure MCPs to pull llms.txt intelligently or data from your eval system to automate data analysis and debugging.

Cursor rules for Phoenix, Braintrust, and Inspect

Customize your dev environment for the specific tool you are using and your preferences. Isaac will share his recipes.

Use AI for evaluation development and debugging

Greatly reduce the friction of setting up evals by automating away the tedious bits.

Why this topic matters

AI evaluations are complex and model context is what lets AI help you. We will cover different approaches an strategies for giving coding models context to help you, and show the most robust way to curate that information. In you see and learn my process for creating cursor rules for common AI evaluation tools such as Phoenix, Braintrust, and Inspect that will make you significantly faster at bui

You'll learn from

Isaac Flath

AI Engineer & Fullstack Developer

Isaac is a data scientist focused on AI applications. While this often means machine learning and deep learning it often means web app development and other things. AI is only a component of a successful AI application.

I am currently building out Gallery.FastHT.ML](https://gallery.fastht.ml/) and generally developing the FastHTML ecosystem.

My primary hobby is dance. I used to teach ballroom dance full time, which is where I met my partner. My partner runs her own dance instruction business here in D.C.

Hamel Husain

ML Engineer with 20 years of experience

Hamel is a machine learning engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

Shreya Shankar

ML Systems Researcher Making AI Evaluation Work in Practice

Shreya is an experienced ML Engineer who is currently a PhD candidate in computer science at UC Berkeley, where she builds systems that help people use AI to work with data effectively. Her research focuses on developing practical tools and frameworks for building reliable ML systems, with recent groundbreaking work on LLM evaluation and data quality. She has published influential papers on evaluating and aligning LLM systems, including "Who Validates the Validators?" which explores how to systematically align LLM evaluations with human preferences.

Prior to her PhD, Shreya worked as an ML engineer in industry and completed her BS and MS in computer science at Stanford. Her work appears in top data management and HCI venues including SIGMOD, VLDB, and UIST. She is currently supported by the NDSEG Fellowship and has collaborated extensively with major tech companies and startups to deploy her research in production environments. Her recent projects like DocETL and SPADE demonstrate her ability to bridge theoretical frameworks with practical implementations that help developers build more reliable AI systems.

Share this lesson

667 students

Share this lesson

667 students

Go deeper with a course

Featured in Lenny’s List

AI Evals For Engineers & PMs