Getting Started with Local LLMs¶
Everything you need to start running LLMs on your own hardware — no cloud required.
Step 1: Choose your hardware¶
Local LLMs run best with a dedicated GPU, but they work on CPU too.
| Hardware | What you can run | Speed |
|---|---|---|
| 8 GB RAM, no GPU | 3B models (Q4) | Slow (2-5 tok/s) |
| 16 GB RAM, integrated GPU | 7-8B models (Q4) | Decent (15-30 tok/s) |
| 8 GB VRAM GPU | 7-8B models (Q4) | Fast (40-80 tok/s) |
| 16 GB VRAM GPU | 13-14B models (Q4) | Fast (30-60 tok/s) |
| 24 GB VRAM GPU | 34B models (Q4) | Fast (20-40 tok/s) |
| Apple M1/M2/M3 16GB+ | 7-14B models | Fast (30-55 tok/s) |
Step 2: Pick a runner¶
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2:3b
# Chat
ollama run llama3.2:3b
Best for: Beginners, quick setup, desktop use.
# Clone and build
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j
# Download a GGUF model
wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
# Run
./llama-cli -m Llama-3.2-3B-Instruct-Q4_K_M.gguf -p "Hello!"
Best for: Developers, custom pipelines, server deployments.
- Download from lmstudio.ai
- Search and download models in-app
- Click "Start Server" for an OpenAI-compatible API
Best for: GUI users, quick experimentation.
Step 3: Choose a model¶
For beginners, start small:
| Model | Why start here |
|---|---|
| Llama 3.2 3B | Small, fast, surprisingly capable |
| Phi-3 Mini 3.8B | Great reasoning for its size |
| Gemma 2 2B | Google's tiny but mighty model |
| Qwen 2.5 3B | Strong multilingual support |
Step 4: Build something¶
# OpenAI-compatible API (works with Ollama, LM Studio, llama.cpp server)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.2:3b",
messages=[{"role": "user", "content": "Write a Python function to sort a list"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")