Model Gallery

14 models from 1 repositories

Filter by type:

Filter by tags:

gemmable-4-12b-mtp
## Gemmable 4 12B Gemmable 4 12B is a GGUF export of Gemma 4 12B fine-tuned on Fable-5 style reasoning and assistant traces. ## Highlights - Base model: `google/gemma-4-12B` - Format: GGUF - Training style: Fable-5 style reasoning and assistant traces - Distribution: fp16 GGUF plus matching assistant GGUFs for each quant - Intended use: local inference, coding, reasoning, and assistant workflows ## How to use ### llama.cpp Standard load: ```bash llama-server -m "gemmable-4-12b-fp16.gguf" ``` Speculative / draft-MTP load: ```bash llama-server -m "gemmable-4-12b-Q4_K_M.gguf" \ --spec-draft-model "gemmable-4-12b-Q4_K_M-mtp.gguf" \ --spec-type draft-mtp \ --spec-draft-n-max 4 ``` Use the matching fp16 or quantized main file with its `-mtp` companion. ### LM Studio 1. Search this repo, download target + mtp file. 2. Load target. 3. Load settings โ†’ Speculative Decoding โ†’ select mtp file file. (Requires LM Studio with am17an's PR merged or custom llama.cpp runtime. As of 2026-05, mainline LM Studio runtime doesn't yet haveย `draft-mtp`ย for Gemma-4 โ€” track upstream merge.) ## GGUF / local inference notes ...

Repository: localai

qwopus3.6-27b-coder-compat-mtp
๐Ÿช Qwopus-3.6-27B-Coder Coder SFT Release Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2 ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Dense Model โšก Agentic Coding ๐Ÿ› ๏ธ Tool Calling & Agent ๐Ÿ† SWE-bench Verified: 67.0% (off-thinking) ๐Ÿ’ก What is Qwopus-3.6-27B-Coder? ๐Ÿช Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base โ€” which achieved 87.43% MMLU-Pro and 75.25% SWE-bench Verified โ€” and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments. ๐Ÿงฉ Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows. ๐Ÿ› ๏ธ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution. ...

Repository: localaiLicense: apache-2.0

qwen3.6-35b-a3b-nvfp4-mtp
# Qwen3.6-35B-A3B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-35B-A3B. ## Model Overview ...

Repository: localai

qwopus3.6-27b-v2-mtp-nvfp4
๐Ÿช Qwopus3.6-27B-v2-MTP MTP Release Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Parameters โšก Speculative Decoding ๐Ÿ› ๏ธ Coding / DevOps / Math ๐Ÿ’ก What is Qwopus3.6-27B-v2-MTP? ๐Ÿช Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster. โšก MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts. ๐Ÿงฉ Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories. ๐Ÿงช GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks. ๐Ÿš€ Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not. ...

Repository: localai

qwopus3.6-27b-coder-mtp-nvfp4
๐Ÿช Qwopus-3.6-27B-Coder Coder SFT Release Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2 ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Dense Model โšก Agentic Coding ๐Ÿ› ๏ธ Tool Calling & Agent ๐Ÿ† SWE-bench Verified: 67.0% (off-thinking) ๐Ÿ’ก What is Qwopus-3.6-27B-Coder? ๐Ÿช Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base โ€” which achieved 87.43% MMLU-Pro (300ex) and 75.25% SWE-bench Verified โ€” and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments. ๐Ÿงฉ Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows. ๐Ÿ› ๏ธ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution. ...

Repository: localai

qwen3.6-27b-nvfp4-mtp
# Qwen3.6-27B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-27B. ## Model Overview ...

Repository: localai

qwen3.6-27b-mtp-pi-tune
# Qwen3.6-27B [](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead. For more details, please refer to our blog post Qwen3.6-27B. ## Model Overview ...

Repository: localaiLicense: apache-2.0

qwopus3.6-27b-coder-mtp
๐Ÿช Qwopus3.6-27B-v2 SFT Release Reasoning-Enhanced Dense Language Model Fine-Tuned on Qwen3.6-27B ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Parameters ๐Ÿ”ฅ 3-Stage Curriculum SFT ๐Ÿ› ๏ธ Vision & Tool-use Support ๐Ÿ’ก What is Qwopus3.6-27B-v2? ๐Ÿช Qwopus3.6-27B-v2 is a reasoning-enhanced dense language model built on top of Qwen3.6-27B. By leveraging a multi-stage curriculum learning pipeline and augmented with Trace Inversion datasets (claude-opus-4.6/4.7-traceInversion), it reverse-engineers the compressed "Reasoning Bubbles" of commercial LLMs into structured, step-by-step synthetic reasoning traces, successfully eliminating logical shortcuts and knowledge fractures. ๐Ÿงฉ Structured Reasoning Injects reconstructed deep CoT chains to eliminate logical shortcuts via Trace Inversion. ๐Ÿชถ Style Consistency Enforces strict constraints on the format and convergence of <think> tags. ๐Ÿ” Distillation Alignment Ensures high-quality cross-source SFT data alignment to narrow the capacity gap. โšก RL Scalability Sets up a stable formatting pipeline optimized for downstream Reinforcement Learning (RL). ## ๐Ÿ’ก 1. Base Model, Training Library & Cooperation ...

Repository: localaiLicense: apache-2.0

qwopus3.5-9b-coder-mtp
# ๐ŸŒŸ Qwopus3.5-9B-v3.5 ## ๐Ÿ’ก Model Overview & v3.5 Design Qwopus3.5-9B-v3.5 is a **data-scaled continuation** of the Qwopus3.5-9B-v3 model. The training data in v3.5 is expanded to cover a broader range of domains, including mathematics, programming, puzzle-solving, multilingual dialogue, instruction-following, multi-turn interactions, and STEM-related tasks. Qwopus3.5-9B-v3.5 is a reasoning-enhanced model based on **Qwen3.5-9B**, designed for: - ๐Ÿงฉ Structured reasoning - ๐Ÿ”ง Tool-augmented workflows - ๐Ÿ” Multi-step agentic tasks - โšก Token-efficient inference Compared with Qwopus3.5-9B-v3, **3.5 version does not introduce a new architecture, RL stage, or template redesign**. This version is trained with approximately **2ร— more SFT data**. ## ๐ŸŽฏ Motivation & Generalization Insight The motivation behind v3.5 comes from a simple observation: > This work is motivated by the hypothesis that scaling high-quality SFT data may further enhance the generalization ability of large language models. In earlier Qwopus3.5 experiments, structured reasoning was observed to improve both **accuracy and efficiency**: ...

Repository: localaiLicense: apache-2.0

qwopus3.6-27b-v2-mtp
๐Ÿช Qwopus3.6-27B-v2-MTP MTP Release Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Parameters โšก Speculative Decoding ๐Ÿ› ๏ธ Coding / DevOps / Math ๐Ÿ’ก What is Qwopus3.6-27B-v2-MTP? ๐Ÿช Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster. โšก MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts. ๐Ÿงฉ Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories. ๐Ÿงช GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks. ๐Ÿš€ Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not. ...

Repository: localaiLicense: apache-2.0

gemma-4-e2b-it:sglang-mtp
Google Gemma 4 E2B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E2B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E2B variant has 5B total / 2B effective parameters and targets the smaller end of consumer GPUs.

Repository: localaiLicense: gemma

gemma-4-e4b-it:sglang-mtp
Google Gemma 4 E4B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E4B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E4B variant has 8B total / 4B effective parameters โ€” the natural pick for consumer GPUs in the 16โ€“24 GB range.

Repository: localaiLicense: gemma

mimo-7b-mtp:sglang
Xiaomi MiMo-7B-RL served by SGLang with built-in Multi-Token Prediction (MTP) heads (no separate drafter needed) plus online fp8 weight quantization to fit on a 16 GB consumer GPU. ~90% acceptance per the model card. Verified end-to-end at ~88 tok/s on an RTX 5070 Ti (16 GB). Note: mem_fraction_static is dropped to 0.7 (vs sglang's 0.85 default) because the MTP draft worker's vocab embedding is loaded unquantised (~1.2 GiB) and OOMs the static reservation otherwise.

Repository: localaiLicense: mit

deepseek-v4-flash-q2-mtp
DeepSeek V4 Flash (IQ2XXS GGUF, ~81 GB) paired with the optional MTP speculative-decoding weights (~3.5 GB) for a slight speedup. Only loadable via the ds4 backend; requires >=128 GB RAM. MTP helps only with greedy decoding (temperature 0), so the override pins temperature to 0. Metal (Darwin) or CUDA (Linux). See https://github.com/antirez/ds4 for details.

Repository: localai