Model Gallery

43 models from 1 repositories

Filter by type:

Filter by tags:

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

lfm2.5-audio-1.5b-asr

LFM2.5-Audio-1.5B in ASR mode. System prompt `Perform ASR.` is prepended; output is capitalised and punctuated. Wire this entry as a transcription model on the /v1/audio/transcriptions endpoint.

Repository: localaiLicense: LFM-Open-License-v1.0

face-detect-buffalo-l

Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and shipped as a single GGUF for the `face-detect` backend. Highest accuracy of the buffalo line. No Python / onnxruntime / torch runtime: face-detect.cpp reads the detector and embedder architecture (`facedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select buffalo_l. Drives the Embedding / Detect / FaceVerify / FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect} REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see `face-detect-yunet-sface`.

Repository: localaiLicense: insightface-non-commercial

voice-detect-ecapa-tdnn

Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe. No Python / torch runtime: voice-detect.cpp reads the embedding architecture (`voicedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select ECAPA-TDNN. Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the /v1/voice/{verify,embed,register,identify,forget} REST endpoints.

Repository: localaiLicense: apache-2.0

voice-detect-emotion-wav2vec2

Voice analysis (age / gender / emotion) with audEERING's wav2vec2 model, converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender and emotion class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0

voice-detect-age-gender-wav2vec2

wav2vec2-large-robust age + gender analysis head (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0

rfdetr-cpp-nano

RF-DETR Nano object detection model, served via the native rfdetr.cpp backend (ggml + purego, no Python). Q8_0 quantization is the recommended default for CPU: same accuracy as F16/F32, ~20MB on disk, fastest CPU latency. Pure C++/ggml runtime; no Python dependencies. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

locate-anything-3b

NVIDIA LocateAnything-3B open-vocabulary object detection (visual grounding), served via the native locate-anything.cpp backend (C++/ggml + purego, no Python). Describe what to find in a text prompt and get labeled boxes back; separate multiple categories with . Q8_0 is the recommended default: box-identical to F16/F32, ~6.3GB, fastest CPU latency. Drop-in for the /v1/detection endpoint (pass the prompt).

Repository: localaiLicense: other

depth-anything-2-base

Depth Anything V2 (base / ViT-B) monocular depth, served via the native depth-anything.cpp backend (C++/ggml + purego, no Python at inference). Given an image it returns a dense monocular depth map only — no camera pose, no confidence. This is the relative variant (relative inverse depth). Use GenerateImage (src -> normalized depth PNG at dst) or the Depth endpoint. q4_k is the recommended CPU default.

Repository: localaiLicense: apache-2.0

depth-anything-2-base-q8_0

Depth Anything V2 (base / ViT-B), q8_0 — near-lossless 8-bit quant. Same relative monocular depth output as the q4_k default at higher fidelity. Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-base-f16

Depth Anything V2 (base / ViT-B), f16 — half precision, no measurable accuracy loss vs f32. Relative monocular depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-base-f32

Depth Anything V2 (base / ViT-B), f32 — maximum reference fidelity. Relative monocular depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-small

Depth Anything V2 (small / ViT-S), f32 — the smallest, fastest backbone for relative monocular depth on CPU. Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-large

Depth Anything V2 (large / ViT-L), f32 — higher-quality relative monocular depth than base. Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-hypersim-small

Depth Anything V2 Metric (Hypersim, indoor / ViT-S), q4_k — metric monocular depth in METRES (indoor, max_depth 20). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-hypersim-base

Depth Anything V2 Metric (Hypersim, indoor / ViT-B), q4_k — metric monocular depth in METRES (indoor, max_depth 20). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-hypersim-large

Depth Anything V2 Metric (Hypersim, indoor / ViT-L), q4_k — highest-quality metric monocular depth in METRES (indoor, max_depth 20). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-vkitti-small

Depth Anything V2 Metric (Virtual KITTI, outdoor / ViT-S), q4_k — metric monocular depth in METRES (outdoor, max_depth 80). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-vkitti-base

Depth Anything V2 Metric (Virtual KITTI, outdoor / ViT-B), q4_k — metric monocular depth in METRES (outdoor, max_depth 80). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

depth-anything-2-metric-vkitti-large

Depth Anything V2 Metric (Virtual KITTI, outdoor / ViT-L), q4_k — highest-quality metric monocular depth in METRES (outdoor, max_depth 80). Depth only (no pose). Use GenerateImage (src -> depth PNG) or the Depth endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-small

RF-DETR Small object detection model (DINOv2-small backbone, 512px input, 3 decoder layers), served via the native rfdetr.cpp backend (ggml + purego, no Python). A step up from Nano in accuracy while staying lightweight on CPU. F16 quantization is the recommended default: identical accuracy to F32 at roughly half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

Page 1