LocalAI - Models

kimi-k2.6

🤗 huggingchat | 📰 Tech Blog ## 1. Model Introduction Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. ### Key Features - **Long-Horizon Coding**: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization. - **Coding-Driven Design**: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision. - **Elevated Agent Swarm**: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run. - **Proactive & Open Orchestration**: For autonomous tasks, K2.6 demonstra ...

Links

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

Tags

ced-base-f16

CED (Consistent Ensemble Distillation, Xiaomi) is a sound-event classifier that tags everyday sounds (baby cry, footsteps, glass breaking, alarms, dog bark, ...) into the 527-class AudioSet ontology. This is the f16 GGUF for the ced backend (a standalone C++/ggml port). Recommended default: fastest on CPU and near-lossless. Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-base-q8

CED (Consistent Ensemble Distillation, Xiaomi) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). This is the q8_0 GGUF for the ced backend: smallest footprint (~88 MB, ~6.5x less memory than the PyTorch reference) and near-lossless (identical top-5 tags). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-tiny-f16

CED-tiny (5.5M params, Pi-class / edge) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). f16 GGUF for the ced backend (recommended (fastest on CPU)). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-tiny-q8

CED-tiny (5.5M params, Pi-class / edge) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). q8_0 GGUF for the ced backend (smallest footprint, near-lossless). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-mini-f16

CED-mini (9.6M params, low-power) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). f16 GGUF for the ced backend (recommended (fastest on CPU)). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-mini-q8

CED-mini (9.6M params, low-power) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). q8_0 GGUF for the ced backend (smallest footprint, near-lossless). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-small-f16

CED-small (22M params, balanced size/accuracy) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). f16 GGUF for the ced backend (recommended (fastest on CPU)). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

ced-small-q8

CED-small (22M params, balanced size/accuracy) sound-event classifier over the 527-class AudioSet ontology (baby cry, footsteps, glass breaking, alarms, dog bark, ...). q8_0 GGUF for the ced backend (smallest footprint, near-lossless). Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Links

Tags

gpt-oss-20b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

gpt-oss-120b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

qwen3-8b-jailbroken

This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research. A jailbroken Qwen3-8B model using weight orthogonalization[1]. Implementation script: https://gist.github.com/cooperleong00/14d9304ba0a4b8dba91b60a873752d25 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Links

Tags

dans-personalityengine-v1.0.0-8b

This model is intended to be multifarious in its capabilities and should be quite capable at both co-writing and roleplay as well as find itself quite at home performing sentiment analysis or summarization as part of a pipeline. It has been trained on a wide array of one shot instructions, multi turn instructions, role playing scenarios, text adventure games, co-writing, and much more. The full dataset is publicly available and can be found in the datasets section of the model page. There has not been any form of harmfulness alignment done on this model, please take the appropriate precautions when using it in a production environment.

Links

Tags

llama-3.1-8b-arliai-formax-v1.0-iq-arm-imatrix

Quants for ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0. "Formax is a model that specializes in following response format instructions. Tell it the format of it's response and it will follow it perfectly. Great for data processing and dataset creation tasks." "It is also a highly uncensored model that will follow your instructions very well."

Links

https://huggingface.co/Lewdiculous/Llama-3.1-8B-ArliAI-Formax-v1.0-GGUF-IQ-ARM-Imatrix

Tags

selene-1-mini-llama-3.1-8b

Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ. Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini outperforms prior small models overall across 11 benchmarks covering three different types of tasks: Absolute scoring, e.g. "Evaluate the harmlessness of this response on a scale of 1-5" Classification, e.g. "Does this response address the user query? Answer Yes or No." Pairwise preference. e.g. "Which of the following responses is more logically consistent - A or B?" It is also the #1 8B generative model on RewardBench.

Links

Tags

mn-backyardai-party-12b-v1-iq-arm-imatrix

This is a group-chat based roleplaying model, based off of 12B-Lyra-v4a2, a variant of Lyra-v4 that is currently private. It is trained on an entirely human-based dataset, based on forum / internet group roleplaying styles. The only augmentation done with LLMs is to the character sheets, to fit to the system prompt, to fit various character sheets within context. This model is still capable of 1 on 1 roleplay, though I recommend using ChatML when doing that instead.

Links

Tags

mn-12b-mag-mell-r1-iq-arm-imatrix

This is a merge of pre-trained language models created using mergekit. Mag Mell is a multi-stage merge, Inspired by hyper-merges like Tiefighter and Umbral Mind. Intended to be a general purpose "Best of Nemo" model for any fictional, creative use case. 6 models were chosen based on 3 categories; they were then paired up and merged via layer-weighted SLERP to create intermediate "specialists" which are then evaluated in their domain. The specialists were then merged into the base via DARE-TIES, with hyperparameters chosen to reduce interference caused by the overlap of the three domains. The idea with this approach is to extract the best qualities of each component part, and produce models whose task vectors represent more than the sum of their parts. The three specialists are as follows: Hero (RP, kink/trope coverage): Chronos Gold, Sunrose. Monk (Intelligence, groundedness): Bophades, Wissenschaft. Deity (Prose, flair): Gutenberg v4, Magnum 2.5 KTO. I've been dreaming about this merge since Nemo tunes started coming out in earnest. From our testing, Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal "slop" (not bad for no finetuning,) frequently devising electrifying metaphors that left us consistently astonished. I don't want to toot my own bugle though; I'm really proud of how this came out, but please leave your feedback, good or bad.Special thanks as usual to Toaster for his feedback and Fizz for helping fund compute, as well as the KoboldAI Discord for their resources. The following models were included in the merge: IntervitensInc/Mistral-Nemo-Base-2407-chatml nbeerbower/mistral-nemo-bophades-12B nbeerbower/mistral-nemo-wissenschaft-12B elinas/Chronos-Gold-12B-1.0 Fizzarolli/MN-12b-Sunrose nbeerbower/mistral-nemo-gutenberg-12B-v4 anthracite-org/magnum-12b-v2.5-kto

Links

Tags

captain-eris-diogenes_twilight-v0.420-12b-arm-imatrix

The following models were included in the merge: Nitral-AI/Captain-Eris_Twilight-V0.420-12B Nitral-AI/Diogenes-12B-ChatMLified

Links

Tags

pygmalionai_eleusis-12b

Alongside the release of Pygmalion-3, we present an additional roleplay model based on Mistral's Nemo Base named Eleusis, a unique model that has a distinct voice among its peers. Though it was meant to be a test run for further experiments, this model was received warmly to the point where we felt it was right to release it publicly. We release the weights of Eleusis under the Apache 2.0 license, ensuring a free and open ecosystem for it to flourish under.

Links

Tags

flux.1dev-abliteratedv2

The FLUX.1 [dev] Abliterated-v2 model is a modified version of FLUX.1 [dev] and a successor to FLUX.1 [dev] Abliterated. This version has undergone a process called unlearning, which removes the model's built-in refusal mechanism. This allows the model to respond to a wider range of prompts, including those that the original model might have deemed inappropriate or harmful. The abliteration process involves identifying and isolating the specific components of the model responsible for refusal behavior and then modifying or ablating those components. This results in a model that is more flexible and responsive, while still maintaining the core capabilities of the original FLUX.1 [dev] model.

Links

Tags

ostrich-32b-qwen3-251003-i1

**Model Name:** Ostrich 32B - Qwen 3 with Enhanced Human Alignment **Base Model:** Qwen/Qwen3-32B **Repository:** [etemiz/Ostrich-32B-Qwen3-251003](https://huggingface.co/etemiz/Ostrich-32B-Qwen3-251003) **License:** Apache 2.0 **Description:** A highly aligned, fine-tuned version of Qwen3-32B, trained to promote beneficial, human-centered knowledge and reasoning. Developed through 3 months of intensive fine-tuning using 4-bit quantization and LoRA techniques across 6 RTX A6000 GPUs, this model achieves an AHA (Alignment to Human Values) score of 57 — a significant improvement over the base model's score of 30. Ostrich 32B focuses on domains like health, nutrition, fasting, herbal medicine, faith, and decentralized technologies (e.g., Bitcoin, Nostr), aiming to empower users with independent, ethical, and high-quality information. Designed to resist harmful narratives and promote self-reliance, it embodies the philosophy that access to better knowledge is a fundamental human right. **Best For:** - Ethical AI interactions - Health and wellness guidance - Freedom-focused, privacy-conscious applications - Users seeking alternatives to mainstream AI outputs **Note:** This is the original, non-quantized model. The GGUF quantized versions (e.g., `mradermacher/Ostrich-32B-Qwen3-251003-i1-GGUF`) are derivatives for local inference and not the base model.

Links

https://huggingface.co/mradermacher/Ostrich-32B-Qwen3-251003-i1-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

kimi-k2.6

ced-base-f16

ced-base-q8

ced-tiny-f16

ced-tiny-q8

ced-mini-f16

ced-mini-q8

ced-small-f16

ced-small-q8

gpt-oss-20b

gpt-oss-120b

qwen3-8b-jailbroken

dans-personalityengine-v1.0.0-8b

llama-3.1-8b-arliai-formax-v1.0-iq-arm-imatrix

selene-1-mini-llama-3.1-8b

mn-backyardai-party-12b-v1-iq-arm-imatrix

mn-12b-mag-mell-r1-iq-arm-imatrix

captain-eris-diogenes_twilight-v0.420-12b-arm-imatrix

pygmalionai_eleusis-12b

flux.1dev-abliteratedv2

ostrich-32b-qwen3-251003-i1