Skip to content

Hardware for Local LLMs

Community hardware data tracked from Hugging Face Hardware.


GPU Landscape

  • :material-nvidia: NVIDIA · 45%

    RTX 30xx 27% · RTX 40xx 27% · RTX 50xx 19% · Datacenter 13%

  • Apple Silicon · 35%

    M1–M4 series with unified memory — the local LLM sweet spot.

  • :material-gpu: AMD · 5%

    RX 7000 33% · RX 6000/5000 30% · RX 9000 19% · Instinct 5%


NVIDIA

GPU VRAM Users
RTX 3060 12 GB 17k
RTX 3090 24 GB 14k
RTX 4090 24 GB 14k
RTX 5090 32 GB 12k
RTX 5070 Ti 16 GB 9k
RTX 5060 Ti 16 GB 8k
RTX 4060 Ti 16 GB 6k
RTX 5080 16 GB 6k
RTX 3080 12 GB 6k
RTX 4070 Ti Super 16 GB 4k
RTX 4080 SUPER 16 GB 4k
H100 80 GB 4k
A100 80 GB 3k
GB10 (DGX Spark) 128 GB 3k
T4 16 GB 3k

AMD

GPU VRAM Users
RX 7900 XTX 24 GB 3k
RX 7900 XT 20 GB 2k
RX 7800 XT 16 GB 2k
RX 6800 XT 16 GB 1k
MI300X 192 GB 1k

Apple Silicon

Chip Max RAM Users
M1 16 GB 18k
M2 24 GB 15k
M3 36 GB 14k
M4 48 GB 12k
M3 Pro 36 GB 10k
M4 Pro 48 GB 9k
M1 Max 64 GB 8k
M2 Max 96 GB 7k
M1 Pro 32 GB 6k
M3 Max 128 GB 5k
M2 Ultra 192 GB 4k
M4 Max 128 GB 3k

Model VRAM Guide

Model Size Q4_K_M Q8_0 FP16
7B–8B 5 GB 8 GB 16 GB
13B–14B 8 GB 14 GB 26 GB
34B 20 GB 34 GB 68 GB
70B–72B 40 GB 70 GB 140 GB
405B MoE 230 GB

Recommendations

  • 8 GB VRAM

    7B–8B models (Llama 4 8B, Mistral 7B, Gemma 3)

  • 12–16 GB VRAM

    7B–14B comfortably, 34B with CPU offloading

  • 24 GB VRAM

    34B easily, 70B at Q4. The enthusiast sweet spot.

  • 32 GB+ VRAM

    70B comfortably, small MoE models. Near-datacenter perf.

  • Apple Silicon 16 GB+

    Excellent for 7B–14B via MLX. 20–30% faster than llama.cpp.

  • Apple Silicon 64 GB+

    Runs 70B models using unified memory. No GPU offloading needed.