Detection Models
DFPN ships with four pre-configured detection models that cover the most common deepfake and synthetic-media threats. Each model is maintained as a standalone module under the models/ directory and can be run by any worker node.
Pre-configured Models
face-forensics -- Face Manipulation Detection
Detects face-swap and reenactment forgeries in still images using the Self-Blended Images (SBI) training strategy on an EfficientNet-B4 backbone.
| Benchmark | Accuracy |
|---|---|
| FaceForensics++ (c23) | 97.2% |
| Celeb-DF v2 | 91.4% |
Processing speed: ~50 ms on GPU / ~500 ms on CPU
Best for
Profile photos, identity documents, social-media headshots -- any image where a face is the primary subject.
universal-fake-detect -- AI-Generated Image Detection
A CLIP-based classifier (CLIP-ViT-L/14) fine-tuned to separate real photographs from outputs of modern generative models.
| Generator | Accuracy |
|---|---|
| ProGAN | 99.8% |
| Stable Diffusion | 89.4% |
| DALL-E | 85.7% |
Processing speed: ~100 ms on GPU / ~800 ms on CPU
Best for
Identifying fully synthetic images produced by diffusion models, GANs, or other generative pipelines.
video-ftcn -- Video Authenticity Detection
Combines an Xception frame-level feature extractor with a Temporal CNN to capture inter-frame inconsistencies that reveal video-level manipulation.
| Benchmark | Accuracy |
|---|---|
| FaceForensics++ | 96.4% |
| Celeb-DF v2 | 88.9% |
Processing speed: ~2 s on GPU / ~30 s on CPU
CPU note
Video analysis is extremely slow on CPU-only nodes. The default CPU configuration disables this modality. See CPU-only configuration for details.
ssl-antispoofing -- Voice Cloning Detection
Leverages wav2vec 2.0 / XLSR-53 self-supervised speech representations to detect synthetic and cloned voices.
| Benchmark | Accuracy |
|---|---|
| ASVspoof 2021 (LA) | 99.2% |
Processing speed: ~200 ms on GPU / ~2 s on CPU
Best for
Voice messages, phone-call recordings, podcast clips -- any audio where speaker authenticity matters.
Performance Comparison
Accuracy by model
| Model | Primary Benchmark | Accuracy | Secondary Benchmark | Accuracy |
|---|---|---|---|---|
| face-forensics | FF++ (c23) | 97.2% | Celeb-DF v2 | 91.4% |
| universal-fake-detect | ProGAN | 99.8% | Stable Diffusion | 89.4% |
| video-ftcn | FF++ | 96.4% | Celeb-DF v2 | 88.9% |
| ssl-antispoofing | ASVspoof 2021 | 99.2% | -- | -- |
Processing speed
| Model | GPU Latency | CPU Latency | GPU Required? |
|---|---|---|---|
| face-forensics | 50 ms | 500 ms | Recommended |
| universal-fake-detect | 100 ms | 800 ms | Recommended |
| video-ftcn | 2 s | 30 s | Strongly recommended |
| ssl-antispoofing | 200 ms | 2 s | Recommended |
Supported Modalities
Every model, worker, and analysis request declares which modalities it supports using a bitfield. This allows efficient on-chain filtering without string comparison.
| Modality | Description | Bit Value |
|---|---|---|
ImageAuthenticity |
Real vs. AI-generated image classification | 1 (bit 0) |
VideoAuthenticity |
Temporal forgery detection in video | 2 (bit 1) |
AudioAuthenticity |
General audio manipulation detection | 4 (bit 2) |
FaceManipulation |
Face-swap and reenactment detection | 8 (bit 3) |
VoiceCloning |
Synthetic / cloned voice detection | 16 (bit 4) |
GeneratedContent |
Fully AI-generated media detection | 32 (bit 5) |
Combine values with bitwise OR. For example, a worker that handles face manipulation and AI-generated images advertises modalities = 8 | 1 | 32 = 41.
Model-to-modality mapping
| Model | Modalities |
|---|---|
| face-forensics | FaceManipulation (8) |
| universal-fake-detect | ImageAuthenticity (1) + GeneratedContent (32) = 33 |
| video-ftcn | VideoAuthenticity (2) |
| ssl-antispoofing | VoiceCloning (16) |
Standardized Output Format
All models produce a JSON result with the same envelope so that workers and on-chain aggregation logic can treat them uniformly.
{
"verdict": "manipulated",
"confidence": 0.973,
"detections": [
{
"type": "face_swap",
"confidence": 0.973,
"region": {
"x": 120,
"y": 80,
"width": 256,
"height": 256
},
"metadata": {
"model": "face-forensics-sbi",
"version": "1.0.0"
}
}
]
}
| Field | Type | Description |
|---|---|---|
verdict |
string | One of authentic, manipulated, or inconclusive |
confidence |
float | Overall confidence score between 0.0 and 1.0 |
detections |
array | Individual findings, each with its own confidence and optional spatial region |
detections[].type |
string | Detection category (e.g. face_swap, generated_image, voice_clone) |
detections[].region |
object | Bounding box for spatial detections (images/video); omitted for audio |
detections[].metadata |
object | Model identifier and version that produced this detection |