Despite the proliferation of proactive defenses against deepfakes, the lack of a unified evaluation protocol precludes fair comparison and masks critical vulnerabilities. Existing studies rely on incompatible choices of metrics and generators, rendering cross-paper comparisons ambiguous, and current evaluations frequently overlook the gap between ideal settings and real-world deployment.
Our benchmark bridges this gap. We define a comprehensive taxonomy of evaluation metrics across four key dimensions — pixel-level fidelity, perceptual fidelity, identity disruption, and visual quality — and systematically evaluate seven representative defenses against eight deepfake generators spanning attribute manipulation and face swap.
Our analysis exposes critical blind spots: fidelity and identity metrics capture orthogonal performance axes, often leading to conflicting interpretations when relied upon individually. We further identify a fundamental trade-off where peak white-box performance signals overfitting rather than genuine protection, and we introduce a calibrated evaluation to correct generator-induced identity bias.
How effectively the defense degrades the deepfake output across pixel, perceptual, identity, and visual-quality dimensions.
How well the defense survives practical transformations — JPEG compression, blur, Gaussian and salt-and-pepper noise.
How well a defense crafted against one generator generalizes to unseen target generators with diverse architectures.
Select a deepfake generator and a dataset to view its leaderboard. Click any column header to sort — AvgRank is lower-is-better; all other metrics are higher-is-better.
Attribute manipulation task on CelebA-HQ. Evaluation follows our unified protocol.
| Rank | Method | Setting | AvgRank ↓ | MSE ↑ | LPIPS ↑ | CIDR ↑ | BRISQUE ↑ | ROB ↑ | TE ↑ |
|---|
Ranks are computed per generator by AvgRank (lower is better). TE (transferability) is only defined for white-box defenses; "—" denotes values not reported in the paper. All numbers are from Table 2 of our paper.
Detection-based defenses act after a deepfake is generated, training a classifier to flag manipulated content. Proactive defenses act before: by preemptively modifying input images prior to release, they disrupt the deepfake generation process itself, producing visibly broken or identity-shifted outputs.
CIDR (Calibrated Identity Disruption Rate) is our proposed metric that corrects for generator-induced identity bias. Standard identity-distance metrics implicitly entangle two effects: the defense's perturbation, and the generator's intrinsic tendency to alter identity. CIDR isolates the former, giving a fairer measure of identity-level disruption attributable to the defense.
Pixel-, perceptual-, and identity-metrics have incompatible scales and distributions. Direct averaging would let one metric dominate purely due to its scale. AvgRank aggregates by ranking each defense per metric and per generator, then averaging the ranks — a scale-invariant aggregation. Lower AvgRank means consistently strong relative performance.
We observe that defenses with the highest white-box disruption often transfer poorly to unseen generators and degrade sharply under post-processing. Peak white-box performance, in other words, can signal over-specialization to a single generator's gradient landscape rather than genuine, generalizable protection.
We evaluate seven representative defenses spanning multiple paradigms: White-box — PGD, Disrupting Deepfakes, DF-RAP, Anti-Forgery, Latent Attack. Black-box — SCOL, NullSwap. See Appendix B (Table 5) of the paper for details on each method.
If you use our benchmark in your research, please cite our paper.
Coming soon — citation information will be added upon the camera-ready release.