How to choose an OCR Model

Hosted by Joe Barrow, Isaac Flath, and Hamel Husain

Tue, Jul 7, 2026

5:00 PM UTC (45 minutes)

Virtual (Zoom)

Free to join

Invite your network

Go deeper with a course

Featured in Lenny’s List
AI Evals For Engineers & PMs
Hamel Husain and Shreya Shankar
View syllabus

What you'll learn

When an OCR API is enough

Understand when AWS Textract, Datalab, or a similar API is the right choice because you want OCR without infrastructure

What self-hosted OCR actually requires

See the basic pieces of an OCR stack: model serving, document ingestion, batching, storage, observability, and output

What open models make possible

Learn where models like LightOnOCR, Chandra, and DotsOCR fit and improve outputs

How output format changes model selection

Compare document structure extraction, plain text, markdown, and layout-aware outputs

What you gain by owning the stack

See what you get from hosting having your own scale to 0 infrastructure (like with modal)

Why this topic matters

OCR can start as a simple API call, and that is often the right choice. But teams hit limits when they need better structure, custom outputs, lower cost, or more control. Joe Barrow will show when to use a managed OCR API, what a manageable OCR stack looks like, and what open models make possible.

You'll learn from

Joe Barrow

Senior Research Scientist at Adobe Research

Joe Barrow is a Senior Research Scientist at Adobe Research, working on training and efficiently serving VLMs for document tasks. He previously led the machine learning team at Pattern Data, building document processing pipelines for hundreds of millions of pages. His open source ML projects include commonforms (making PDF forms fillable), tinyhnsw (building a vector database for fun and no profit), and LambdaNet (Haskell deep learning framework).

Isaac Flath

AI product engineer, 10 years of experience in AI.

I’m an AI and product engineer building systems that work with private knowledge and support real workflows. I’ve taught people how to use AI, from a Boot.dev RAG course to live courses on AI-assisted development. I’ve also helped teams improve AI products, tools, and workflows from AnkiHub (collaborative learning tools) and SpecStory (agentic software) to enterprise companies like Travel + Leisure and General Mills.


These days I focus on context-first AI systems. In practice, that means helping teams see and improve the parts of the system that decide what the system can use: retrieval, memory, tool use, evals, traces, harnesses, and the product interface around them. I help teams find where the process bottlenecks, whether the problem is search, agent behavior, workflow design, or the human interface, and then fix that layer.

Hamel Husain

ML Engineer with 25+ years of experience

Hamel Husain is a ML Engineer with over 20 years of experience. He has worked with innovative companies such as Airbnb and GitHub, which included early LLM research used by OpenAI, for code understanding. He has also led and contributed to numerous popular open-source machine-learning tools. Hamel is currently an independent consultant helping companies build AI products.

See all products from Hamel Husain & Shreya Shankar

Sign up to join this lesson

By continuing, you agree to Maven's Terms and Privacy Policy.