Skip to content

lllm.dev

News on local LLMs. Covering models, tools, and the open-source AI ecosystem.


Latest News

  • may 20 Llama 4 just dropped — and it runs on a laptop

    Meta's Llama 4 8B runs smoothly on consumer laptops. 128K context, multimodal, Apache 2.0. New default for local LLM dev.

    Read about Llama 4

  • may 15 Ollama 1.0 is here

    Multi-GPU support, model caching, streaming SSE with per-token metadata. Two years of iteration, now stable.

    Read about Ollama 1.0

  • may 10 MLX for text: Apple's secret weapon for local AI

    Apple's native MLX framework now has first-class text generation. 20–30% faster than llama.cpp on Apple Silicon.

    Read about MLX for text

  • may 6 DwarfStar 4: antirez ships a dedicated DeepSeek V4 Flash engine

    284B MoE on a MacBook. KV cache on disk, 1M context, 11,000 stars in 16 days. Built on llama.cpp.

    Read about DwarfStar 4

  • may 5 llama.cpp gets Vulkan backend

    One backend, every GPU — NVIDIA, AMD, Intel, mobile. No more CUDA-only or Metal-only. A game-changer for AMD/Intel users.

    Read about Vulkan backend

  • feb 15 llmfit: one command to find what model runs on your hardware

    Auto-detects GPU, RAM, CPU. Scores every model for your machine. 26K stars. Rust TUI.

    Read about llmfit