llmfit: one command to find what model runs on your hardware¶

AlexsJones shipped llmfit — a terminal tool that detects your hardware and tells you exactly which models will run well. 26,000 stars and counting.

What it does¶

# That's it. llmfit detects your GPU, RAM, CPU and scores every model.
llmfit

The interactive TUI scores each model across four dimensions:

Quality — benchmark performance (MMLU, HumanEval, etc.)
Speed — estimated tokens/sec on your specific hardware
Fit — whether it fits in your VRAM/RAM at the right quantization
Context — max context window you can realistically use

Why it matters¶

Choosing a local model is overwhelming. Hundreds of GGUF files, quant levels, model sizes — and no easy way to know what works on your machine. llmfit solves this:

Feature	Detail
Hardware detection	Auto-detects GPU, VRAM, RAM, CPU cores
Provider support	Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio
Multi-GPU	Splits models across GPUs automatically
MoE aware	Understands active vs total parameters for MoE models
Dynamic quantization	Picks the optimal quant level for your hardware
Community leaderboard	Real tok/s, TTFT, VRAM data from actual users (powered by localmaxxing.com)
27+ hardware presets	Simulate RTX 5090 down to Apple M1 with `S`

Key bindings¶

Key	Feature
`b`	Community leaderboard — real-world perf from other users
`D`	Download manager — queue, history, delete models
`A`	Advanced config — tune scoring weights, TPS efficiency
`S`	Simulate different hardware
`H`	Compare hardware presets before buying

Sister projects¶

AlexsJones also maintains:

sympozium — managing AI agents in Kubernetes
llmserve — TUI for serving local LLM models
llama-panel — native macOS app for llama-server

Get it¶

cargo install llmfit
# or
brew install llmfit

View on GitHub