Setting up your first AI evaluation

Hosted by Madalina Turlea and Catalina Turlea

In this video

What you'll learn

Identify your AI feature's specific evaluation metrics

Move beyond generic metrics to specifics that measure the ways your AI feature fails for your specific use case

How to choose between deterministic and LLM-based evaluation

Understand when to use zero-cost deterministic checks vs. AI-as-judge, and why you should start with deterministic first

Build your first automated evaluation

Set up practical deterministic checks: format validation, schema checks, length limits, and value ranges

Why this topic matters

Generic metrics like "helpfulness" or "hallucination" won't tell you if your AI feature solves real user problems. A product recommendation AI that's "helpful" but recommends the wrong products is useless. Having strong evaluations in place is key to building impactful AI features that work with a certain level of accuracy. We will start from the most simple evaluations you can do, building up to

You'll learn from

Madalina Turlea

Co-founder @Lovelaice, 10+ years in Product

I'm co-founder of Lovelaice and a product leader with 10+ years building products across fintech, payments, and compliance. I hold a CFA charter and have led AI product development in highly regulated environments — where AI failures aren't just embarrassing, they're liabilities.

I've watched smart teams make the same mistakes: choosing models based on benchmarks that don't reflect their use case, writing prompts that work in testing but fail in production, and leaving domain experts out of the loop. These aren't edge cases — they're why 80% of AI projects underperform.

Through these failures (my own included), I developed a systematic approach to AI experimentation that puts domain expertise at the center. I teach what I've learned building Lovelaice: how to test, evaluate, and iterate on AI — before it reaches your users.

Catalina Turlea

Founder @Lovelaice

I bring over 14 years of software development expertise and a decade of startup experience to help teams build AI products that actually work. After founding my first company six years ago, I run a consultancy specializing in helping startups build MVPs, solve complex technical challenges, and integrate AI effectively.

I've seen firsthand how AI projects fail due to lack of systematic experimentation—teams treat AI like traditional software and struggle with inconsistent results. That's why I co-created Lovelace, a platform designed for non-technical professionals to experiment with AI agents systematically.

Go deeper with a course

Build and evaluate your first AI feature
Madalina Turlea and Catalina Turlea
View syllabus