Building Effective AI-Powered Data Pipelines

Hosted by Shreya Shankar

Thu, Dec 4, 2025

6:00 PM UTC (1 hour)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Featured in Lenny’s List
AI Evals For Engineers & PMs
Hamel Husain and Shreya Shankar
View syllabus

What you'll learn

Architect Semantic Data Pipelines

Transform unstructured documents into structured data by chaining semantic operators and classifying data at scale.

Optimize via Semantic Rewrites

Improve accuracy by decomposing complex operators (e.g., converting a "Map" into "Split-Map-Reduce")

Slash Costs with Task Cascades

Learn how to route easy queries to cheaper models and hard ones to expensive models without sacrificing quality .

Why this topic matters

The true potential of LLMs isn't just in chatbots, but in reasoning over massive amounts of unstructured data like legal transcripts and reports. Shreya Shankar demonstrates how to architect pipelines that extract, classify, and aggregate structure from chaos accurately and efficiently.

You'll learn from

Shreya Shankar

ML Systems Researcher Making AI Evaluation Work in Practice

Shreya builds open-source systems for AI-powered data processing. She is a final-year PhD at UC Berkeley. Shreya created DocETL, an open-source system for analyzing unstructured text at scale. DocETL has been deployed across journalism, law, medicine, policy, finance, and urban planning. Her research has been published at top computer science venues including VLDB, SIGMOD, and UIST (including a Best Paper award). Before her PhD, Shreya worked as a machine learning and data engineer at startups. She holds a BS in Computer Science from Stanford University.

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.