lllm.dev¶

News on local LLMs. Covering models, tools, and the open-source AI ecosystem.

Latest News¶

may 20 Llama 4 just dropped — and it runs on a laptop

Meta's Llama 4 8B runs smoothly on consumer laptops. 128K context, multimodal, Apache 2.0. New default for local LLM dev.

Read about Llama 4
may 15 Ollama 1.0 is here

Multi-GPU support, model caching, streaming SSE with per-token metadata. Two years of iteration, now stable.

Read about Ollama 1.0
may 10 MLX for text: Apple's secret weapon for local AI

Apple's native MLX framework now has first-class text generation. 20–30% faster than llama.cpp on Apple Silicon.

Read about MLX for text
may 6 DwarfStar 4: antirez ships a dedicated DeepSeek V4 Flash engine

284B MoE on a MacBook. KV cache on disk, 1M context, 11,000 stars in 16 days. Built on llama.cpp.

Read about DwarfStar 4
may 5 llama.cpp gets Vulkan backend

One backend, every GPU — NVIDIA, AMD, Intel, mobile. No more CUDA-only or Metal-only. A game-changer for AMD/Intel users.

Read about Vulkan backend
feb 15 llmfit: one command to find what model runs on your hardware

Auto-detects GPU, RAM, CPU. Scores every model for your machine. 26K stars. Rust TUI.

Read about llmfit