Scaling Late Interaction to Billions of Documents

Free Lesson

Scaling Late Interaction to Billions of Documents

Part of AI Product Engineering

•

Hosted by Marek Galovic and Hamel Husain

879 students

In this video

What you'll learn

Why single-vector retrieval loses detail

Understand what gets collapsed when each document is represented as one vector, and why that hurts long-tail queries

What late interaction changes

See how token- and patch-level representations preserve more semantic detail than single-vector retrieval.

Why late interaction is expensive

Learn where the 10-100x storage and compute overhead comes from.

How we scaled it to billions of document

Learn the production methods used to keep late interaction retrieval fast at billion-document scale.

How production constraints shape retrieval

See how sub-100ms p99 latency, filtering, and online index updates affect the architecture.

Why this topic matters

Single-vector retrieval is cheap, but it throws away detail that matters for hard queries. Late interaction keeps more of that detail, but the production cost is large. This lesson shows how we scaled it to billions of documents while keeping latency, filtering, and index updates practical.

You'll learn from

Marek Galovic

CEO, Co-Founder @TopK. ex-Pinecone, ex-Shopify

Marek is the CEO and co-founder of TopK - an AI-native search engine. Before founding TopK, Marek led data/control plane engineering teams at Pinecone and worked on fraud detection and financial forecasting at Shopify. He holds a degree in computer science and artificial intelligence from CTU Prague, where he researched game theory and adversarial machine learning algorithms applied to computer security (published at NeurIPS).

Hamel Husain

ML Engineer with 20+ years of experience

Hamel Husain is a ML Engineer with 20+ years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

Previously at