All courses Product

Build Voice AI Applications That Listen and Act in Real-Time

Cohort-based Course

Voice AI is about to have it's own ChatGPT moment. Learn how to build applications that listen and act.

Build Voice AI Applications That Listen and Act in Real-Time

Cohort-based Course

Voice AI is about to have it's own ChatGPT moment. Learn how to build applications that listen and act.

Hosted by

Nicolay Gerold and Ivan Leo

Learn from the engineers who've shipped what you're trying to build.

Nicolay Gerold and Ivan Leo

Learn from the engineers who've shipped what you're trying to build.

Course overview

Voice AI is here, but it is freaking hard to build

Latency budgets, desktop permissions, SIP rules, plus the AI minefield—hallucinations, accent-biased transcriptions, token-burst rate limits, and model drift—turn “just add voice” into months of painful debugging even for senior engineers. WebRTC jitter ruins timing, loose prompts spark off-brand replies, and a single bad transcript cascades into wrong actions and angry users.

Over four weeks you’ll build three voice products—desktop, browser, and telephony—side-by-side with the maintainers of Vapi, Pipecat, LiveKit, and Whisper, plus Ivan and Nicolay, who’ve shipped these systems in production. You’ll master the shared scaffolding (stream segmentation, prompt stitching, cost/latency meters) and the AI guardrails (real-time validation, confidence scoring, speculative decoding) that keep voice assistants responsive, factual, and customer-safe across every channel and model.

But don't just trust us that Voice AI is here to stay and having it's "ChatGPT" moment.

"Humans interact with businesses in many ways, but one way hasn't changed much in almost 100 years—and that's phone calls.Today, over a trillion calls exist between a business and a customer....new voice models and conversational LLMs are now incredibly good ... startups are ... making voice AI bots that are indistinguishable from humans." - Gustaf Alströmer, YC -- in a call for Voice AI startups

"For enterprises, AI directly replaces human labor with technology. It’s cheaper, faster, more reliable — and often outperforms humans. Voice agents also allow businesses to be available to their customers 24/7 to answer questions, schedule appointments, or complete purchases...For consumers, we believe voice will be the first — and perhaps the primary — way people interact with AI." - Olivia Moore, a16z -- AI voice in consumer

There are already meeting bots that talk to you over Zoom, language coaches that help you learn Spanish in a webapp, ambient assistants that sit on your laptop, listen, and help you out when you need some input. "Voice UX" will keep growing in surface: We will move into cars, AR glasses, smart speakers, and other areas we haven't even considered yet.

Each channel breaks in its own way—latency on the web, permissions on desktop, SIP rules on telephony. One demo can’t teach you all of that.

That's why we build three separate voice AI applications in this course. A webapp, a native (MacOS) app, and a telephony app.

1. Native macOS Meeting Assistant – records your mic locally, takes live notes, pushes tasks to Notion, and pings you in Slack before deadlines. Learn how the most successful app to date (Granola) does it. Manual note-taking in back-to-back calls burns 6 h/week and important information still slips.

2. Web-based AI Sales Coach – simulates tough customers, scores every response, and shows real-time coaching tips without breaking flow. Learn how to live update UX based on an ongoing conversation. New reps take 6 months to hit quota; live coaching is expensive.

3. Telephony Booking Bot – calls clients, confirms appointments, handles DTMF/silence, and writes results straight into your CRM. Learn how to reliably call and handle diverse accents. Staff spend hours calling clients; no-show rate ~45 %.

Why you care as a student

- These metrics resonate with CTOs, PMs, and investors—your demo isn’t a toy.

- Each channel teaches a different “gotcha”: OS sandbox, browser jitter, telephony regs. Master once, reuse forever.

- Portfolio proof: three repos that shout “I can ship voice products anywhere users speak.”

After that, the next interface is just more plumbing.

The tools you’ll learn

Vapi – voice routing without IVR hell

Pipecat – low-latency audio transforms

LiveKit – WebRTC that survives bad networks

Whisper/Elevanlabs/AssemblyAI – fast, accurate transcription

OpenAI & Gemini Realtime – millisecond-level reasoning

Hands-on workshops with the engineers who wrote these libraries

Exclusive Access: Connect directly with the engineers building these tools through dedicated workshops and Q&A sessions. Learn from those who know these technologies best.

- LiveKit founder workshop: learn about WebRTC and how to make networking a breeze.

- OpenAI real-time API creators: learn how to best prompt real-time model.

- more workshops from speakers from ElevenLabs, AssemblyAI, Vapi, Pipecat will be added soon.

Prerequisites (read this)

- Comfortable in TypeScript/JavaScript (async/await, streams, React or similar)

- Basic REST & WebSocket chops

- Familiarity with Git and command-line tooling

If you’ve never shipped production code, this bootcamp will overwhelm you.

Who is this course for

The “Build-It-Now” CTO racing to add voice; needs production-ready blueprints, cost controls, and multi-channel code now.

Software & AI Engineer – General software & AI engineers exploring voice; seek hands-on repos to learn streaming audio, LLM prompts,...

The “Build-It-Now” CTO racing to add voice; needs production-ready blueprints, cost controls, and multi-channel code now.

Software & AI Engineer – General software & AI engineers exploring voice; seek hands-on repos to learn streaming audio, LLM prompts,...

What you’ll get out of this course

Ship 3 real voice products in 4 weeks

By Demo Day you’ll have a macOS meeting assistant, a WebRTC sales-coach webapp, and a Twilio/Vapi booking bot running on your own account—ready to show a boss, investor, or client.

Save 24+ engineering hours on “figuring it out”

We hand you working repos, infra scripts, and latency / cost benchmarks. Ship voice features 2–3 × faster than starting cold.

Quantifiable business impact you can brag about

Meeting assistant users cut note-cleanup by 6 h/week and miss 0 action items.
Sales coach slices rep ramp-time from 6 → 3 weeks.
Booking bot drops no-shows by 40 %, freeing staff for upsell calls.

Hands-on with the maintainers

Live coding + AMA sessions with:

Russel D'Sa (LiveKit) – WebRTC, LiveKit, Voice Agents
Ivan Leo & Nicolay Gerold (Aisbach) – prompt stitching, production guardrails

Plug-and-play test & guardrail suite

Automated latency alerts, hallucination detectors, and ASR-accuracy checks you can drop straight into any future voice project—so bugs surface in CI, not in prod.

Voice-AI Tool Selection Playbook

Download-ready spreadsheet + benchmarks scripts that score every major ASR (Whisper, AssemblyAI, Deepgram), TTS (ElevenLabs, Polly), routing layer (Vapi, Twilio), and realtime LLM (OpenAI, Gemini) on latency, cost, language coverage, and hallucination rate. Run npm run bench.

Private Discord “War Room” for Real-Time Help

Get into a members-only Slack with maintainers (Ivan, Nicolay) and other builders. Dedicated channels for #latency-bugs, #prompt-design, and #show-your-metrics guarantee you can paste logs, share PRs, book 15-min pairing slots, and get answers during the course.

What’s included

Live sessions

Learn directly from Nicolay Gerold & Ivan Leo in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Maven Guarantee

This course is backed by the Maven Guarantee. Students are eligible for a full refund up until the halfway point of the course.

Course syllabus

14 live sessions • 27 lessons • 5 projects

Week 1

Sep 1—Sep 7

Fundamentals of Voice AI: Background & Kickoff

6 items

Fundamentals of Voice AI: Live Events

Sep
2
Kick-off Code Walkthrough (60 min)
Tue 9/210:00 AM—11:00 AM (UTC)
Sep
3
Expert Talk – Streaming with OpenAI /Gemini APIs (60 min)
Wed 9/310:00 AM—11:00 AM (UTC)
Sep
5
Help Line (60 min)
Fri 9/510:00 AM—11:00 AM (UTC)

Fundamentals of Voice AI: Weekly Challenge

2 items

Week 2

Sep 8—Sep 14

MacOS Meeting Assistant: Background & Kickoff

4 items

MacOS Meeting Assistant: Live Events

Sep
8
Kick-off Code Walkthrough (60 min)
Mon 9/810:00 AM—11:00 AM (UTC)
Sep
9
Expert Talk: Prompt Engineering for Voice AI
Tue 9/910:00 AM—11:00 AM (UTC)
Sep
12
Help Line (60 min)
Fri 9/1210:00 AM—11:00 AM (UTC)

MacOS Meeting Assistant: Weekly Challenge

2 items

Week 3

Sep 15—Sep 21

AI Sales Coach: Background & Kickoff

6 items

AI Sales Coach: Live Events

Sep
15
Kick-off Code Walkthrough (60 min)
Mon 9/1510:00 AM—11:00 AM (UTC)
Sep
16
Expert Talk: Evaluating Voice Assistants
Tue 9/1610:00 AM—11:00 AM (UTC)
Sep
19
Help Line
Fri 9/1910:00 AM—11:00 AM (UTC)
Sep
18
Extra Panel: Pipecat vs. Daily vs. LiveKit
Thu 9/1810:00 AM—11:00 AM (UTC)
Optional

AI Sales Coach: Weekly Challenge

2 items

Week 4

Sep 22—Sep 28

Telephony Booking Bot: Background & Kickoff

6 items

Telephony Booking Bot: Live Events

Sep
22
Kick-off Code Walkthrough (60 min)
Mon 9/2210:00 AM—11:00 AM (UTC)
Sep
23
Expert Talk: How to do telephony with voice AI at scale
Tue 9/2310:00 AM—11:00 AM (UTC)
Sep
25
Workshop: Telephony Flow & SIP Quirks
Thu 9/2510:00 AM—11:00 AM (UTC)
Sep
26
Help Line
Fri 9/2610:00 AM—11:00 AM (UTC)

1 more item

Telephony Booking Bot: Weekly Challenge

2 items

Week 5

Sep 29—Sep 30

Nothing scheduled for this week

Post-course

Post-Course: Wrap-up

1 item

Meet your instructor

Nicolay Gerold

CTO, Managing Partner

Nicolay has been working on LLMs since 2019 and is the founder of Aisbach, where he specialized on generative AI systems.

Ivan Leo

Research Engineer

Ivan is a full-stack engineer turned research engineer. He brings academic breakthroughs into industry. Ivan maintains open source libraries like Instructor, indomee and Kura.

Be the first to know about upcoming cohorts