Model Gallery

8 models from 1 repositories

Filter by type:

Filter by tags:

speechbrain-ecapa-tdnn
Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained on VoxCeleb. 192-d L2-normalised embeddings, ~1.9% Equal Error Rate on VoxCeleb1-O. APACHE 2.0 — commercial-safe. The checkpoint is auto-downloaded from HuggingFace on first LoadModel (no separate weight file in gallery `files:`). Points at the upstream SpeechBrain HF repo directly — same bytes every deployment.

Repository: localaiLicense: apache-2.0

wespeaker-resnet34
Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb, exported to ONNX. 256-d embeddings, CPU-friendly — avoids the PyTorch runtime entirely (onnxruntime only). APACHE 2.0. Pair with the `speaker-recognition` backend's OnnxDirectEngine. Use when ECAPA-TDNN's torch dependency is undesirable (small images, edge deployments).

Repository: localaiLicense: cc-by-4.0

voice-detect-ecapa-tdnn
Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe. No Python / torch runtime: voice-detect.cpp reads the embedding architecture (`voicedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select ECAPA-TDNN. Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the /v1/voice/{verify,embed,register,identify,forget} REST endpoints.

Repository: localaiLicense: apache-2.0

voice-detect-wespeaker-resnet34
Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d embeddings, CPU-friendly and runtime-free (no onnxruntime or torch). CC-BY-4.0. Use when you want WeSpeaker's ResNet34 topology instead of ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the engine.

Repository: localaiLicense: cc-by-4.0

voice-detect-eres2net
Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d embeddings with strong verification accuracy. APACHE 2.0. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the ERes2Net engine.

Repository: localaiLicense: apache-2.0

voice-detect-campplus
Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d embeddings, a fast context-aware masking topology well-suited to CPU and edge deployments. APACHE 2.0. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the CAM++ engine.

Repository: localaiLicense: apache-2.0

voice-detect-emotion-wav2vec2
Voice analysis (age / gender / emotion) with audEERING's wav2vec2 model, converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender and emotion class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0

voice-detect-age-gender-wav2vec2
wav2vec2-large-robust age + gender analysis head (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0