LocalAI - Models

llm-compiler-13b-imat

LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. LLM Compiler is free for both research and commercial use. LLM Compiler is available in two flavors: LLM Compiler, the foundational models, pretrained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimizations; and LLM Compiler FTD, which is further fine-tuned to predict the best optimizations for code in LLVM assembly to reduce code size, and to disassemble assembly code to LLVM-IR.

Links

Tags

llm-compiler-13b-ftd

LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. LLM Compiler is free for both research and commercial use. LLM Compiler is available in two flavors: LLM Compiler, the foundational models, pretrained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimizations; and LLM Compiler FTD, which is further fine-tuned to predict the best optimizations for code in LLVM assembly to reduce code size, and to disassemble assembly code to LLVM-IR.

Links

Tags

llm-compiler-7b-imat-GGUF

LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. LLM Compiler is free for both research and commercial use. LLM Compiler is available in two flavors: LLM Compiler, the foundational models, pretrained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimizations; and LLM Compiler FTD, which is further fine-tuned to predict the best optimizations for code in LLVM assembly to reduce code size, and to disassemble assembly code to LLVM-IR.

Links

Tags

llm-compiler-7b-ftd-imat

LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. LLM Compiler is free for both research and commercial use. LLM Compiler is available in two flavors: LLM Compiler, the foundational models, pretrained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimizations; and LLM Compiler FTD, which is further fine-tuned to predict the best optimizations for code in LLVM assembly to reduce code size, and to disassemble assembly code to LLVM-IR.

Links

Tags

deepseek-v4-flash-q2

DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) - only loadable via the ds4 backend. Requires >=128 GB RAM. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Links

https://huggingface.co/antirez/deepseek-v4-gguf

Tags

deepseek-v4-flash-q2-q4

DeepSeek V4 Flash (mixed q2/q4 GGUF, ~91 GB) - only loadable via the ds4 backend. The last 6 expert layers are kept at Q4_K (the rest IQ2XXS), trading a little extra memory for higher quality than the pure-q2 build while still fitting in RAM on a 128 GB machine. imatrix-tuned. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Links

https://huggingface.co/antirez/deepseek-v4-gguf

Tags

deepseek-v4-flash-q2-mtp

DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) paired with the optional MTP speculative-decoding weights (~3.5 GB) for a slight speedup. Only loadable via the ds4 backend; requires >=128 GB RAM. MTP helps only with greedy decoding (temperature 0), so the override pins temperature to 0. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Links

https://huggingface.co/antirez/deepseek-v4-gguf

Tags

Model Gallery

Filter by type:

Filter by tags:

llm-compiler-13b-imat

llm-compiler-13b-ftd

llm-compiler-7b-imat-GGUF

llm-compiler-7b-ftd-imat

deepseek-v4-flash-q2

deepseek-v4-flash-q2-q4

deepseek-v4-flash-q2-mtp