Build Voice AI Applications That Listen and Act in Real-Time

New
·

5 Weeks

·

Cohort-based Course

Voice AI is about to have it's own ChatGPT moment. Learn how to build applications that listen and act.

Course overview

Voice AI is here, but it is freaking hard to build

Latency budgets, desktop permissions, SIP rules, plus the AI minefield—hallucinations, accent-biased transcriptions, token-burst rate limits, and model drift—turn “just add voice” into months of painful debugging even for senior engineers. WebRTC jitter ruins timing, loose prompts spark off-brand replies, and a single bad transcript cascades into wrong actions and angry users.


Over four weeks you’ll build three voice products—desktop, browser, and telephony—side-by-side with the maintainers of Vapi, Pipecat, LiveKit, and Whisper, plus Ivan and Nicolay, who’ve shipped these systems in production. You’ll master the shared scaffolding (stream segmentation, prompt stitching, cost/latency meters) and the AI guardrails (real-time validation, confidence scoring, speculative decoding) that keep voice assistants responsive, factual, and customer-safe across every channel and model.


But don't just trust us that Voice AI is here to stay and having it's "ChatGPT" moment.


"Humans interact with businesses in many ways, but one way hasn't changed much in almost 100 years—and that's phone calls.Today, over a trillion calls exist between a business and a customer....new voice models and conversational LLMs are now incredibly good ... startups are ... making voice AI bots that are indistinguishable from humans." - Gustaf Alströmer, YC -- in a call for Voice AI startups


"For enterprises, AI directly replaces human labor with technology. It’s cheaper, faster, more reliable — and often outperforms humans. Voice agents also allow businesses to be available to their customers 24/7 to answer questions, schedule appointments, or complete purchases...For consumers, we believe voice will be the first — and perhaps the primary — way people interact with AI." - Olivia Moore, a16z -- AI voice in consumer


There are already meeting bots that talk to you over Zoom, language coaches that help you learn Spanish in a webapp, ambient assistants that sit on your laptop, listen, and help you out when you need some input. "Voice UX" will keep growing in surface: We will move into cars, AR glasses, smart speakers, and other areas we haven't even considered yet.


Each channel breaks in its own way—latency on the web, permissions on desktop, SIP rules on telephony. One demo can’t teach you all of that.


That's why we build three separate voice AI applications in this course. A webapp, a native (MacOS) app, and a telephony app.


1. Native macOS Meeting Assistant – records your mic locally, takes live notes, pushes tasks to Notion, and pings you in Slack before deadlines. Learn how the most successful app to date (Granola) does it. Manual note-taking in back-to-back calls burns 6 h/week and important information still slips.

2. Web-based AI Sales Coach – simulates tough customers, scores every response, and shows real-time coaching tips without breaking flow. Learn how to live update UX based on an ongoing conversation. New reps take 6 months to hit quota; live coaching is expensive.

3. Telephony Booking Bot – calls clients, confirms appointments, handles DTMF/silence, and writes results straight into your CRM. Learn how to reliably call and handle diverse accents. Staff spend hours calling clients; no-show rate ~45 %.


Why you care as a student

- These metrics resonate with CTOs, PMs, and investors—your demo isn’t a toy.

- Each channel teaches a different “gotcha”: OS sandbox, browser jitter, telephony regs. Master once, reuse forever.

- Portfolio proof: three repos that shout “I can ship voice products anywhere users speak.”


After that, the next interface is just more plumbing.


The tools you’ll learn

Vapi – voice routing without IVR hell

Pipecat – low-latency audio transforms

LiveKit – WebRTC that survives bad networks

Whisper/Elevanlabs/AssemblyAI – fast, accurate transcription

OpenAI & Gemini Realtime – millisecond-level reasoning


Hands-on workshops with the engineers who wrote these libraries


Exclusive Access: Connect directly with the engineers building these tools through dedicated workshops and Q&A sessions. Learn from those who know these technologies best.

- LiveKit founder workshop: learn about WebRTC and how to make networking a breeze.

- OpenAI real-time API creators: learn how to best prompt real-time model.

- more workshops from speakers from ElevenLabs, AssemblyAI, Vapi, Pipecat will be added soon.


Prerequisites (read this)

- Comfortable in TypeScript/JavaScript (async/await, streams, React or similar)

- Basic REST & WebSocket chops

- Familiarity with Git and command-line tooling


If you’ve never shipped production code, this bootcamp will overwhelm you.


Who is this course for

01

The “Build-It-Now” CTO racing to add voice; needs production-ready blueprints, cost controls, and multi-channel code now.

02

Software & AI Engineer – General software & AI engineers exploring voice; seek hands-on repos to learn streaming audio, LLM prompts,...

What you’ll get out of this course

Ship 3 real voice products in 4 weeks

By Demo Day you’ll have a macOS meeting assistant, a WebRTC sales-coach webapp, and a Twilio/Vapi booking bot running on your own account—ready to show a boss, investor, or client.

Save 24+ engineering hours on “figuring it out”

We hand you working repos, infra scripts, and latency / cost benchmarks. Ship voice features 2–3 × faster than starting cold.

Quantifiable business impact you can brag about

  • Meeting assistant users cut note-cleanup by 6 h/week and miss 0 action items.
  • Sales coach slices rep ramp-time from 6 → 3 weeks.
  • Booking bot drops no-shows by 40 %, freeing staff for upsell calls.

Hands-on with the maintainers

Live coding + AMA sessions with:

  • Russel D'Sa (LiveKit) – WebRTC, LiveKit, Voice Agents
  • Ivan Leo & Nicolay Gerold (Aisbach) – prompt stitching, production guardrails


Plug-and-play test & guardrail suite

Automated latency alerts, hallucination detectors, and ASR-accuracy checks you can drop straight into any future voice project—so bugs surface in CI, not in prod.

Voice-AI Tool Selection Playbook

Download-ready spreadsheet + benchmarks scripts that score every major ASR (Whisper, AssemblyAI, Deepgram), TTS (ElevenLabs, Polly), routing layer (Vapi, Twilio), and realtime LLM (OpenAI, Gemini) on latency, cost, language coverage, and hallucination rate. Run npm run bench.

Private Discord “War Room” for Real-Time Help

Get into a members-only Slack with maintainers (Ivan, Nicolay) and other builders. Dedicated channels for #latency-bugs, #prompt-design, and #show-your-metrics guarantee you can paste logs, share PRs, book 15-min pairing slots, and get answers during the course.

This course includes

15 interactive live sessions

Lifetime access to course materials

27 in-depth lessons

Direct access to instructor

5 projects to apply learnings

Guided feedback & reflection

Private community of peers

Course certificate upon completion

Maven Satisfaction Guarantee

This course is backed by Maven’s guarantee. You can receive a full refund within 14 days after the course ends, provided you meet the completion criteria in our refund policy.

Course syllabus

Week 1

Sep 1—Sep 7

    Fundamentals of Voice AI: Background & Kickoff

    6 items

    Fundamentals of Voice AI: Live Events

    • Sep

      2

      Kick-off Code Walkthrough (60 min)

      Tue 9/210:00 AM—11:00 AM (UTC)
    • Sep

      3

      Expert Talk – Streaming with OpenAI /Gemini APIs (60 min)

      Wed 9/310:00 AM—11:00 AM (UTC)
    • Sep

      5

      Help Line (60 min)

      Fri 9/510:00 AM—11:00 AM (UTC)

    Fundamentals of Voice AI: Weekly Challenge

    2 items

Week 2

Sep 8—Sep 14

    MacOS Meeting Assistant: Background & Kickoff

    4 items

    MacOS Meeting Assistant: Live Events

    • Sep

      8

      Kick-off Code Walkthrough (60 min)

      Mon 9/810:00 AM—11:00 AM (UTC)
    • Sep

      9

      Expert Talk: Prompt Engineering for Voice AI

      Tue 9/910:00 AM—11:00 AM (UTC)
    • Sep

      12

      Help Line (60 min)

      Fri 9/1210:00 AM—11:00 AM (UTC)

    MacOS Meeting Assistant: Weekly Challenge

    2 items

Week 3

Sep 15—Sep 21

    AI Sales Coach: Background & Kickoff

    6 items

    AI Sales Coach: Live Events

    • Sep

      15

      Kick-off Code Walkthrough (60 min)

      Mon 9/1510:00 AM—11:00 AM (UTC)
    • Sep

      16

      Expert Talk: Evaluating Voice Assistants

      Tue 9/1610:00 AM—11:00 AM (UTC)
    • Sep

      19

      Help Line

      Fri 9/1910:00 AM—11:00 AM (UTC)
    • Sep

      18

      Extra Panel: Pipecat vs. Daily vs. LiveKit

      Thu 9/1810:00 AM—11:00 AM (UTC)
      Optional

    AI Sales Coach: Weekly Challenge

    2 items

Week 4

Sep 22—Sep 28

    Telephony Booking Bot: Background & Kickoff

    6 items

    Telephony Booking Bot: Live Events

    • Sep

      22

      Kick-off Code Walkthrough (60 min)

      Mon 9/2210:00 AM—11:00 AM (UTC)
    • Sep

      23

      Expert Talk: How to do telephony with voice AI at scale

      Tue 9/2310:00 AM—11:00 AM (UTC)
    • Sep

      25

      Workshop: Telephony Flow & SIP Quirks

      Thu 9/2510:00 AM—11:00 AM (UTC)
    • Sep

      26

      Help Line

      Fri 9/2610:00 AM—11:00 AM (UTC)
    1 more item

    Telephony Booking Bot: Weekly Challenge

    2 items

Week 5

Sep 29—Sep 30
    Nothing scheduled for this week

Post-course

    Post-Course: Wrap-up

    1 item

Meet your instructor

Nicolay Gerold

Nicolay Gerold

CTO, Managing Partner

Nicolay has been working on LLMs since 2019 and is the founder of Aisbach, where he specialized on generative AI systems.

Ivan Leo

Ivan Leo

Research Engineer

Ivan is a full-stack engineer turned research engineer. He brings academic breakthroughs into industry. Ivan maintains open source libraries like Instructor, indomee and Kura.

A pattern of wavy dots

Join an upcoming cohort

Build Voice AI Applications That Listen and Act in Real-Time

Cohort 1

$1,449

Dates

Aug 31—Sep 29, 2025

Payment Deadline

Aug 30, 2025
Get reimbursed

Course schedule

8-10 hours per week

  • Live Sessions

    Tuesdays 1pm/5pm CET

    Interactive workshops and implementation guidance


    We will determine the best time based on the students and their time zones.

  • Office Hours

    Friday 1pm/pm CET

    Get 1:1 help with your prototypes and code before you dive into the weekend.


    We expect that you start to get into the code after the live session and then already have questions before the weekend.

  • Weekly projects

    5 hours per week

    Your weekly implementations. These can be adjustments to existing code or complete implementations of real-time voice applications.


    You can bring your own idea or build upon our existing applications. We are happy to help you with both, but recommend the later.

Frequently Asked Questions

Stay in the loop

Sign up to be the first to know about course updates.

A pattern of wavy dots

Join an upcoming cohort

Build Voice AI Applications That Listen and Act in Real-Time

Cohort 1

$1,449

Dates

Aug 31—Sep 29, 2025

Payment Deadline

Aug 30, 2025
Get reimbursed

$1,449

5 Weeks