privacy-filter-multilingual
A multilingual PII token-classification model: a fine-tune of
openai/privacy-filter by OpenMed. It labels every token with a BIOES tag
over 54 PII categories (217 classes) across 16 languages (ar, bn, de, en,
es, fr, hi, it, ja, ko, nl, pt, te, tr, vi, zh), spanning identity, contact,
address, financial, vehicle, digital, and crypto entities.
In LocalAI this is a PII detector for the NER redactor tier: set
known_usecases to [token_classify] (as below), and any model opts into
redaction by listing this one under pii.detectors. The detection policy
(which categories to mask vs block, and the score threshold) lives on this
model's own pii_detection block - see the overrides below. It runs locally
with no Python, served by the standalone privacy-filter backend's
TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
entity spans).
Architecture: gpt-oss-style sparse MoE (8 layers, 128 experts top-4, ~50M
active per token), bidirectional banded attention, o200k tokenizer; served
via the openai-privacy-filter architecture. F16, ~2.7 GB.