Scaling Late Interaction to Billions of Documents

Hosted by Marek Galovic, Hamel Husain, and Isaac Flath

Thu, Jul 9, 2026

6:00 PM UTC (45 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Featured in Lenny’s List
AI Evals For Engineers & PMs
Hamel Husain and Shreya Shankar
View syllabus

What you'll learn

Why single-vector retrieval loses detail

Understand what gets collapsed when each document is represented as one vector, and why that hurts long-tail queries

What late interaction changes

See how token- and patch-level representations preserve more semantic detail than single-vector retrieval.

Why late interaction is expensive

Learn where the 10-100x storage and compute overhead comes from.

How we scaled it to billions of document

Learn the production methods used to keep late interaction retrieval fast at billion-document scale.

How production constraints shape retrieval

See how sub-100ms p99 latency, filtering, and online index updates affect the architecture.

Why this topic matters

Single-vector retrieval is cheap, but it throws away detail that matters for hard queries. Late interaction keeps more of that detail, but the production cost is large. This lesson shows how we scaled it to billions of documents while keeping latency, filtering, and index updates practical.

You'll learn from

Marek Galovic

CEO, Co-Founder @TopK. ex-Pinecone, ex-Shopify

Marek is the CEO and co-founder of TopK - an AI-native search engine. Before founding TopK, Marek led data/control plane engineering teams at Pinecone and worked on fraud detection and financial forecasting at Shopify. He holds a degree in computer science and artificial intelligence from CTU Prague, where he researched game theory and adversarial machine learning algorithms applied to computer security (published at NeurIPS).

Hamel Husain

ML Engineer with 25+ years of experience

Hamel Husain is a ML Engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

Isaac Flath

AI product engineer, 10 years of experience in AI.

I’m an AI and product engineer building systems that work with private knowledge and support real workflows. I’ve taught people how to use AI, from a Boot.dev RAG course to live courses on AI-assisted development. I’ve also helped teams improve AI products, tools, and workflows from AnkiHub (collaborative learning tools) and SpecStory (agentic software) to enterprise companies like Travel + Leisure and General Mills.


These days I focus on context-first AI systems. In practice, that means helping teams see and improve the parts of the system that decide what the system can use: retrieval, memory, tool use, evals, traces, harnesses, and the product interface around them. I help teams find where the process bottlenecks, whether the problem is search, agent behavior, workflow design, or the human interface, and then fix that layer.

Previously at

Pinecone
GitHub
Shopify.com
Airbnb
See all products from Hamel Husain & Shreya Shankar

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.