Skill

Guide to Running a Local Language Model

Madhav Malhotra

Co-Founder, QurioSkill.

What does this contain?

An reference guide that explains how to run language models on your own laptop, written as a companion to the AI Saturdays session.

The core idea: a model is a file, an inference engine runs it
GGUF, the file format that works on Windows, macOS, and Linux
Quantization explained, and why Q4_K_M is the common default
How RAM sets the limit on which models fit on a given laptop
A short framework for picking a model, with a recommended starting point at each RAM tier
The recommended model used throughout: Llama 3.2 3B Instruct (GGUF, Q4_K_M), about 2 GB, runs on any laptop with 8 GB of RAM

Free

A short reference guide to running language models on your laptop. Covers LM Studio, Ollama, Jan, and WebLLM.