Skip to content

Tools & Ecosystem

The essential tools for local LLM development — categorized and compared.

Model Discovery

Tool Platform Description GPU support
llmfit All (Rust) Hardware-aware model finder, TUI Any

Model Runners

Tool Platform Best for GPU support
Ollama macOS, Linux, Windows Beginners, quick setup CUDA, Metal, ROCm
llama.cpp All (C/C++) Servers, custom pipelines CUDA, Metal, Vulkan, ROCm, SYCL
DwarfStar 4 macOS, Linux DeepSeek V4 Flash optimized Metal, CUDA
LM Studio macOS, Windows, Linux GUI users CUDA, Metal
vLLM Linux Production serving CUDA, ROCm
MLX macOS (Apple Silicon) Mac-native dev Apple Silicon GPU
llama.rn iOS, Android Mobile inference Metal, Vulkan

Frontends & UIs

Tool Description Key feature
Open WebUI Self-hosted ChatGPT clone RAG, tools, multi-user
SillyTavern Character chat frontend Roleplay, character cards
Anything LLM All-in-one desktop app RAG, agents, multi-model
Jan Open-source ChatGPT alternative Offline-first, extensions
GPT4All Desktop local AI No GPU required

Model Sources

Source Description
Hugging Face The largest model repository — GGUF, safetensors, everything
Ollama Library Curated, ready-to-run models
LM Studio Search In-app model discovery

Development Tools

Tool Description
LangChain Framework for LLM applications
LlamaIndex Data framework for LLM apps
Ollama Python/JS SDK Programmatic model access
outlines Structured generation (JSON, regex)
Guidance Controlled generation from Microsoft
llama-cpp-python Python bindings for llama.cpp

Quantization Tools

Tool Description
llama.cpp quantize Built-in quantization (GGUF formats)
AutoGPTQ GPTQ quantization for GPU inference
bitsandbytes 4-bit and 8-bit quantization
AWQ Activation-aware weight quantization

Monitoring & Observability

Tool Description
Weights & Biases Experiment tracking
Langfuse LLM observability (open source)
Phoenix AI observability & evaluation