Tools & Ecosystem¶

The essential tools for local LLM development — categorized and compared.

Model Discovery¶

Tool	Platform	Description	GPU support
llmfit	All (Rust)	Hardware-aware model finder, TUI	Any

Tool	Platform	Best for	GPU support
Ollama	macOS, Linux, Windows	Beginners, quick setup	CUDA, Metal, ROCm
llama.cpp	All (C/C++)	Servers, custom pipelines	CUDA, Metal, Vulkan, ROCm, SYCL
DwarfStar 4	macOS, Linux	DeepSeek V4 Flash optimized	Metal, CUDA
LM Studio	macOS, Windows, Linux	GUI users	CUDA, Metal
vLLM	Linux	Production serving	CUDA, ROCm
MLX	macOS (Apple Silicon)	Mac-native dev	Apple Silicon GPU
llama.rn	iOS, Android	Mobile inference	Metal, Vulkan

Tool	Description	Key feature
Open WebUI	Self-hosted ChatGPT clone	RAG, tools, multi-user
SillyTavern	Character chat frontend	Roleplay, character cards
Anything LLM	All-in-one desktop app	RAG, agents, multi-model
Jan	Open-source ChatGPT alternative	Offline-first, extensions
GPT4All	Desktop local AI	No GPU required

Source	Description
Hugging Face	The largest model repository — GGUF, safetensors, everything
Ollama Library	Curated, ready-to-run models
LM Studio Search	In-app model discovery

Tool	Description
LangChain	Framework for LLM applications
LlamaIndex	Data framework for LLM apps
Ollama Python/JS SDK	Programmatic model access
outlines	Structured generation (JSON, regex)
Guidance	Controlled generation from Microsoft
llama-cpp-python	Python bindings for llama.cpp

Tool	Description
llama.cpp quantize	Built-in quantization (GGUF formats)
AutoGPTQ	GPTQ quantization for GPU inference
bitsandbytes	4-bit and 8-bit quantization
AWQ	Activation-aware weight quantization

Tool	Description
Weights & Biases	Experiment tracking
Langfuse	LLM observability (open source)
Phoenix	AI observability & evaluation