lllm.dev¶
News on local LLMs. Covering models, tools, and the open-source AI ecosystem.
Latest News¶
-
may 20 Llama 4 just dropped — and it runs on a laptop
Meta's Llama 4 8B runs smoothly on consumer laptops. 128K context, multimodal, Apache 2.0. New default for local LLM dev.
-
may 15 Ollama 1.0 is here
Multi-GPU support, model caching, streaming SSE with per-token metadata. Two years of iteration, now stable.
-
may 10 MLX for text: Apple's secret weapon for local AI
Apple's native MLX framework now has first-class text generation. 20–30% faster than llama.cpp on Apple Silicon.
-
may 6 DwarfStar 4: antirez ships a dedicated DeepSeek V4 Flash engine
284B MoE on a MacBook. KV cache on disk, 1M context, 11,000 stars in 16 days. Built on llama.cpp.
-
may 5 llama.cpp gets Vulkan backend
One backend, every GPU — NVIDIA, AMD, Intel, mobile. No more CUDA-only or Metal-only. A game-changer for AMD/Intel users.
-
feb 15 llmfit: one command to find what model runs on your hardware
Auto-detects GPU, RAM, CPU. Scores every model for your machine. 26K stars. Rust TUI.