LocalAI - Models

privacy-filter-nemotron-q8

Q8_0 quant of privacy-filter-nemotron (~1.64 GB, vs ~2.8 GB for F16) for RAM-constrained / edge use (e.g. a 4 GB Raspberry Pi 5). The MoE expert weights are stored 8-bit; attention, embeddings and the classifier head stay F16. Same model, policy and runtime as the F16 entry - see privacy-filter-nemotron for the full description. Prefer the F16 entry when you can afford it: it is the reference artifact. On a mixed-PII document the publisher measured q8 matching F16 on 99.93% of token labels with an identical span set at threshold 0.5 - but one token flipped, and for PII a single dropped span is a leak. Treat q8 as a deliberate size/speed tradeoff and validate it on your own data.

Links

Tags

qwen3-vl-embedding-8b

**Model Name:** Qwen3-VL-Embedding-8B **Base Model:** Qwen/Qwen3-VL-8B-Instruct **Description:** The **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker** model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model. Specifically designed for multimodal information retrieval and cross-modal understanding, this suite accepts diverse inputs including text, images, screenshots, and videos, as well as inputs containing a mixture of these modalities. **Key Features:** - Model Type: MultiModal Embedding - Supported Languages: 30+ Languages - Supported Input Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations (e.g., text + image, text + video) - Number of Parameters: 8B - Context Length: 32k - Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 64 to 4096 **Downloads:** - [GGUF Files](https://huggingface.co/Qwen/Qwen3-VL-Embedding-8B) (e.g., `Qwen3-VL-Embedding-8B-Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_embedding import Qwen3VLEmbedder model = Qwen3VLEmbedder(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

Tags

qwen3-vl-embedding-2b

**Model Name:** Qwen3-VL-Embedding-2B **Base Model:** Qwen/Qwen3-VL-2B-Instruct **Description:** The **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker** model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model. Specifically designed for multimodal information retrieval and cross-modal understanding, this suite accepts diverse inputs including text, images, screenshots, and videos, as well as inputs containing a mixture of these modalities. **Key Features:** - Model Type: MultiModal Embedding - Supported Languages: 30+ Languages - Supported Input Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations (e.g., text + image, text + video) - Number of Parameters: 2B - Context Length: 32k - Embedding Dimension: Up to 2048, supports user-defined output dimensions ranging from 64 to 2048 **Downloads:** - [GGUF Files](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B) (e.g., `Qwen3-VL-Embedding-2B-Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_embedding import Qwen3VLEmbedder model = Qwen3VLEmbedder(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

Tags

qwen3-vl-reranker-8b

**Model Name:** Qwen3-VL-Reranker-8B **Base Model:** Qwen/Qwen3-VL-Reranker-8B **Description:** A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching. **Key Features:** - **Multimodal**: Text, images, videos, and mixed content. - **Language Support**: 30+ languages. - **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options. - **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3). - **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores. **Downloads:** - [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF) (e.g., `Qwen3-VL-Reranker-8B.Q8_0.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF

Tags

qwen3-vl-reranker-2b-i1

**Model Name:** Qwen3-VL-Reranker-2B-i1 **Base Model:** Qwen/Qwen3-VL-Reranker-2B **Description:** A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching. **Key Features:** - **Multimodal**: Text, images, videos, and mixed content. - **Language Support**: 30+ languages. - **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options. - **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3). - **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores. **Downloads:** - [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF) (e.g., `Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF

Tags

qwen3-vl-30b-a3b-instruct

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. #### Key Enhancements: * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. #### Model Architecture Updates: 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-30B-A3B-Instruct.

Links

https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF

Tags

arcee-ai_afm-4.5b

AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning.

Links

Tags

insightface-buffalo-l

Face recognition using insightface's `buffalo_l` pack (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB). Default choice, highest accuracy. Weights delivered via LocalAI's gallery mechanism (SHA-256 verified, cached in the models directory like any other managed model). NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.

Links

https://github.com/deepinsight/insightface

Tags

insightface-buffalo-m

Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace + genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a cheaper detector — good balance on mid-range hardware. NON-COMMERCIAL RESEARCH USE ONLY.

Links

https://github.com/deepinsight/insightface

Tags

insightface-buffalo-s

Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder + genderage, ~159MB). Good fit for mid-range CPU deployments. NON-COMMERCIAL RESEARCH USE ONLY.

Links

https://github.com/deepinsight/insightface

Tags

insightface-buffalo-sc

Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB). NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty attributes for this pack. Ideal for edge/embedded deployments where only verification and embedding are needed. NON-COMMERCIAL RESEARCH USE ONLY.

Links

https://github.com/deepinsight/insightface

Tags

insightface-antelopev2

Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer + genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on harder benchmarks; pays for it in GPU memory. NON-COMMERCIAL RESEARCH USE ONLY.

Links

https://github.com/deepinsight/insightface

Tags

insightface-opencv

Face recognition using OpenCV Zoo weights: YuNet detector + SFace 128-d recognizer (fp32). APACHE 2.0 — safe for commercial use. Lower accuracy than insightface packs, no demographic head (`/v1/face/analyze` returns detection regions only). Weights are downloaded on install via LocalAI's gallery mechanism (~40MB).

Links

https://github.com/opencv/opencv_zoo

Tags

insightface-opencv-int8

Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB). Roughly 3x smaller and noticeably faster on CPU than the fp32 variant at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe. Weights are downloaded on install via LocalAI's gallery mechanism.

Links

https://github.com/opencv/opencv_zoo

Tags

face-detect-buffalo-l

Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and shipped as a single GGUF for the `face-detect` backend. Highest accuracy of the buffalo line. No Python / onnxruntime / torch runtime: face-detect.cpp reads the detector and embedder architecture (`facedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select buffalo_l. Drives the Embedding / Detect / FaceVerify / FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect} REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see `face-detect-yunet-sface`.

Links

Tags

face-detect-buffalo-m

Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF for the `face-detect` backend. Same recognition accuracy as `buffalo_l` with a cheaper detector: a good balance on mid-range hardware. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_m engine. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Links

Tags

face-detect-buffalo-s

Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the `face-detect` backend. Small and CPU-friendly: a good fit for mid-range and edge deployments. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_s engine. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Links

Tags

face-detect-buffalo-sc

Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M detector + a small ArcFace embedder), converted to a C++/ggml GGUF for the `face-detect` backend. This is the smallest insightface pack: the lightest option for low-resource and edge deployments. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_sc engine. If this GGUF embeds the MiniFASNet anti-spoof ensemble, it is available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Links

Tags

face-detect-antelopev2

Face recognition with insightface's `antelopev2` pack (SCRFD-10G detector + ArcFace glint360k R100, 512-d embedder), converted to a C++/ggml GGUF for the `face-detect` backend. The higher-accuracy insightface pack: heavier, but the best fit when recognition quality matters more than speed. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the antelopev2 engine. If this GGUF embeds the MiniFASNet anti-spoof ensemble, it is available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Links

Tags

face-detect-yunet-sface

Face recognition with OpenCV Zoo weights: YuNet detector + SFace 128-d recognizer, converted to a C++/ggml GGUF for the `face-detect` backend. APACHE 2.0: safe for commercial use. Lower accuracy than the buffalo packs and no demographic head, but the commercial-friendly alternative to the insightface buffalo line. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the YuNet + SFace engine.

Links

Tags

speechbrain-ecapa-tdnn

Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained on VoxCeleb. 192-d L2-normalised embeddings, ~1.9% Equal Error Rate on VoxCeleb1-O. APACHE 2.0 — commercial-safe. The checkpoint is auto-downloaded from HuggingFace on first LoadModel (no separate weight file in gallery `files:`). Points at the upstream SpeechBrain HF repo directly — same bytes every deployment.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

privacy-filter-nemotron-q8

qwen3-vl-embedding-8b

qwen3-vl-embedding-2b

qwen3-vl-reranker-8b

qwen3-vl-reranker-2b-i1

qwen3-vl-30b-a3b-instruct

arcee-ai_afm-4.5b

insightface-buffalo-l

insightface-buffalo-m

insightface-buffalo-s

insightface-buffalo-sc

insightface-antelopev2

insightface-opencv

insightface-opencv-int8

face-detect-buffalo-l

face-detect-buffalo-m

face-detect-buffalo-s

face-detect-buffalo-sc

face-detect-antelopev2

face-detect-yunet-sface

speechbrain-ecapa-tdnn