Skip to content

llmfit: one command to find what model runs on your hardware

AlexsJones shipped llmfit — a terminal tool that detects your hardware and tells you exactly which models will run well. 26,000 stars and counting.

What it does

# That's it. llmfit detects your GPU, RAM, CPU and scores every model.
llmfit

The interactive TUI scores each model across four dimensions:

  • Quality — benchmark performance (MMLU, HumanEval, etc.)
  • Speed — estimated tokens/sec on your specific hardware
  • Fit — whether it fits in your VRAM/RAM at the right quantization
  • Context — max context window you can realistically use

Why it matters

Choosing a local model is overwhelming. Hundreds of GGUF files, quant levels, model sizes — and no easy way to know what works on your machine. llmfit solves this:

Feature Detail
Hardware detection Auto-detects GPU, VRAM, RAM, CPU cores
Provider support Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio
Multi-GPU Splits models across GPUs automatically
MoE aware Understands active vs total parameters for MoE models
Dynamic quantization Picks the optimal quant level for your hardware
Community leaderboard Real tok/s, TTFT, VRAM data from actual users (powered by localmaxxing.com)
27+ hardware presets Simulate RTX 5090 down to Apple M1 with S

Key bindings

Key Feature
b Community leaderboard — real-world perf from other users
D Download manager — queue, history, delete models
A Advanced config — tune scoring weights, TPS efficiency
S Simulate different hardware
H Compare hardware presets before buying

Sister projects

AlexsJones also maintains:

  • sympozium — managing AI agents in Kubernetes
  • llmserve — TUI for serving local LLM models
  • llama-panel — native macOS app for llama-server

Get it

cargo install llmfit
# or
brew install llmfit

View on GitHub