Model Gallery

5 models from 1 repositories

Filter by type:

Filter by tags:

deepseek-v4-flash-q2
DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) - only loadable via the ds4 backend. Requires >=128 GB RAM. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Repository: localai

deepseek-v4-flash-q2-q4
DeepSeek V4 Flash (mixed q2/q4 GGUF, ~91 GB) - only loadable via the ds4 backend. The last 6 expert layers are kept at Q4_K (the rest IQ2XXS), trading a little extra memory for higher quality than the pure-q2 build while still fitting in RAM on a 128 GB machine. imatrix-tuned. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Repository: localai

deepseek-v4-flash-q4-ssd
DeepSeek V4 Flash (full 4-bit experts GGUF, ~153 GB) - only loadable via the ds4 backend, with SSD streaming enabled so it runs on a 128 GB machine even though the weights do not fit in RAM: routed MoE experts stream from the GGUF on SSD while the non-routed weights stay resident. SSD streaming is Metal (Darwin) only; generation speed depends on SSD speed and the expert cache. Tune the routed-expert cache with the 'ssd_streaming_cache_experts:NGB' option (default: automatic budget). See https://github.com/antirez/ds4.

Repository: localai

deepseek-v4-flash-q2-mtp
DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) paired with the optional MTP speculative-decoding weights (~3.5 GB) for a slight speedup. Only loadable via the ds4 backend; requires >=128 GB RAM. MTP helps only with greedy decoding (temperature 0), so the override pins temperature to 0. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Repository: localai

deepseek-v4-pro-q2-ssd
DeepSeek V4 Pro (IQ2XXS GGUF, ~433 GB, imatrix-tuned) - only loadable via the ds4 backend, with SSD streaming so the Pro-class model can be run on a 128 GB machine. This is experimental and slow: it needs ~433 GB of free SSD plus enough RAM for the resident weights, KV cache, and routed-expert cache, and is best used with thinking off for inspection or occasional work. SSD streaming is Metal (Darwin) only. See https://github.com/antirez/ds4.

Repository: localai