LLMOps Mastery

Aurimas Griciūnas

Founder @ SwirlAI • Ex CPO @ neptune.ai

⚙️ LLMOps: From Eval Pipelines to Optimized Inference

LLMOps Mastery is a 6-week, cohort-based deep dive for engineers who have built LLM applications and need to make them production-grade. You will optimize, fine-tune, deploy, and operate real LLM systems.

🛠️ What You'll Optimize

You start with three pre-built AI systems and spend 6 weeks transforming them: adding eval gates, observability, fine-tuned models, optimized serving, and cost controls.

🧑‍💻 Technologies include:

  • Evaluation frameworks and LLM-as-judge pipelines

  • Observability and tracing

  • Automated prompt optimization

  • LLM gateways and model routing

  • Fine-tuning

  • Inference serving and quantization

  • GPU profiling and multi-model serving

  • CI/CD with eval gates

🧠 How It Works

Each week covers 5 lessons with heavy hands-on:

  • Live Sessions (2x/week, 90 min): Concepts, real-world trade-offs, Q&A.

  • Written Content: Detailed technical writeups for reference.

  • Hands-On Labs (4-7 hrs/week): Deploy serving engines, fine-tune models, stress test under load, and profile GPUs.

What you’ll learn

Master the operational layer most AI courses skip: evals, fine-tuning, serving, cost control, and production reliability.

  • Implement LLM-as-judge, reference-based, and trajectory-based eval metrics across RAG, extraction, and agent systems.

  • Create golden datasets from production data, synthetic generation, and human annotations.

  • Wire eval gates into CI/CD so bad prompt or model changes cannot ship.

  • Trace every LLM call, tool invocation, and retrieval step using OpenTelemetry across multiple backends.

  • Build dashboards for latency, cost, token usage, and eval scores. Set up drift detection and alerting.

  • Diagnose production failures by tracing a bad output back to the exact step that caused it.

  • Use automated prompt optimization frameworks to find better prompts, measured by your eval pipeline.

  • Deploy an LLM gateway with cost-based routing: cheap models for simple tasks, powerful models for hard ones.

  • Implement multi-layer caching (exact match, semantic, prompt caching) and measure real cost savings.

  • Fine-tune open-source models with LoRA/QLoRA on your own data using cloud GPUs.

  • Run DPO alignment to shape model behavior using preference data from production logs.

  • A/B test fine-tuned models against baselines using your eval pipeline, with automated rollback on regression.

  • Quantize models and benchmark quality vs latency vs memory trade-offs on real workloads.

  • Deploy and stress test serving engines under concurrent load to find real throughput limits.

  • Serve multiple models (LLM + embedding + task models) on a single GPU without latency spikes.

  • Design full LLMOps stacks: gateway, observability, evals, serving, and fine-tuning as connected layers.

  • Run real cost analysis: self-hosted inference vs API pricing with actual numbers from your systems.

  • Make informed build-vs-buy decisions for every layer of your LLM stack with real production data.

Learn directly from Aurimas

Aurimas Griciūnas

Aurimas Griciūnas

LinkedIn Top Voice in AI • Founder & CEO @ SwirlAI

Former CPO @ Neptune.ai (acquired by OpenAI)
neptune.ai

Who this course is for

  • Platform Engineers

    Who run LLM systems in production and need to cut cost, improve reliability, and scale serving.

  • ML & AI Engineers

    Who have built LLM apps and want to master evals, fine-tuning, inference serving, and cost optimization.

  • Engineering Managers & Tech Leads

    Who make AI infrastructure decisions and need real data on self-hosting vs API and GPU spend.

What's included

Aurimas Griciūnas

Live sessions

Learn directly from Aurimas Griciūnas in a real-time, interactive format.

Lifetime access

Go back to course content and recordings whenever you need to.

Community of peers

Stay accountable and share insights with like-minded professionals.

Certificate of completion

Share your new skills with your employer or on LinkedIn.

Code-along Recordings

20+ Hours of pre-recorded coding videos that you can refer to when digging into specific topics.

Compute Credits

$500 in Modal Compute Credits.

Maven Guarantee

Your purchase is backed by the Maven Guarantee.

Course syllabus

12 live sessions • 66 lessons

Week 1

May 11—May 17

    Observability and Evaluation Foundations

    5 items

    Hands-on Section

    6 items

    May

    11

    Live Session

    Mon 5/112:00 PM—3:30 PM (UTC)

    May

    13

    Live Session

    Wed 5/132:00 PM—3:30 PM (UTC)

Week 2

May 18—May 24

    Prompt Management, Optimization, and Production Monitoring

    5 items

    Hands-on Section

    6 items

    May

    18

    Live Session

    Mon 5/182:00 PM—3:30 PM (UTC)

    May

    20

    Live Session

    Wed 5/202:00 PM—3:30 PM (UTC)

Free resource

Deploy Reliable AI Systems with LLMOps cover image

Deploy Reliable AI Systems with LLMOps

What Is LLMOps

Learn what LLMOps is and why it’s essential for production-ready LLM applications.

Build Observability into AI Systems

Learn how to evaluate and monitor LLM-based systems to detect failures before they reach users.

Build Your Roadmap

Create a clear step-by-step LLMOps plan that fits your team’s tools, workflows, and stage of AI adoption.

Schedule

Live sessions

4 hrs / week

    • Mon, May 11

      2:00 PM—3:30 PM (UTC)

    • Wed, May 13

      2:00 PM—3:30 PM (UTC)

    • Mon, May 18

      2:00 PM—3:30 PM (UTC)

Projects

5 hrs / week

Async content

3 hrs / week

Frequently asked questions

$2,200

USD

May 11Jun 17
Enroll