Build Your Own Eval Tools With Notebooks!

Hosted by Vincent D. Warmerdam, Hamel Husain, and Shreya Shankar

604 students

In this video

What you'll learn

How to use modern widgets in your Python notebooks

We're going to focus on techniques made available by marimo, but many of these tools can also be applied in Jupyter.

How to rethink notebooks for production/rapid prototyping.

Stop using notebooks solely as a throwaway scratchpad and a tool for writing actual software.

Leverage notebooks to find data worth looking at faster

How to use notebooks to quickly explore data in unique ways not afforded in other tools.

Why this topic matters

The biggest impediment to enterprises getting value from AI today is that we've somehow managed to convince the world that tweaking prompts and trying out new models is the prestigious pinnacle of high-value work, and curating a high quality dataset is tedious grunt work to be outsourced, automated and kept small at all times. This talk is about tools that you can (should!) make for yourself.

You'll learn from

Vincent D. Warmerdam

Engineer @ marimo

Vincent is a senior data professional who worked as an engineer, researcher, team lead, and educator in the past. You might know him from tech talks with an attempt to defend common sense over hype in the data space. He is especially interested in understanding algorithmic systems so that one may prevent failure. As such, he has always had a preference to keep calm, and to check the dataset before flowing tonnes of tensors.

Hamel Husain

ML Engineer with 20 years of experience

Hamel is a machine learning engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

Shreya Shankar

ML Systems Researcher Making AI Evaluation Work in Practice

Shreya is an experienced ML Engineer who is currently a PhD candidate in computer science at UC Berkeley, where she builds systems that help people use AI to work with data effectively. Her research focuses on developing practical tools and frameworks for building reliable ML systems, with recent groundbreaking work on LLM evaluation and data quality. She has published influential papers on evaluating and aligning LLM systems, including "Who Validates the Validators?" which explores how to systematically align LLM evaluations with human preferences.

Prior to her PhD, Shreya worked as an ML engineer in industry and completed her BS and MS in computer science at Stanford. Her work appears in top data management and HCI venues including SIGMOD, VLDB, and UIST. She is currently supported by the NDSEG Fellowship and has collaborated extensively with major tech companies and startups to deploy her research in production environments. Her recent projects like DocETL and SPADE demonstrate her ability to bridge theoretical frameworks with practical implementations that help developers build more reliable AI systems.

Share this lesson

604 students

Share this lesson

604 students

Go deeper with a course

Featured in Lenny’s List

AI Evals For Engineers & PMs