ICML 2026 ICML 2026

Proactive Defense Benchmark
against Deepfake Generation

Joonhyuk Baek*, Wonjune Seo*, Jae-yun Kim, Saerom Park, Hoki Kim

* Equal contribution  ·  Corresponding author

01

About the Benchmark

Despite the proliferation of proactive defenses against deepfakes, the lack of a unified evaluation protocol precludes fair comparison and masks critical vulnerabilities. Existing studies rely on incompatible choices of metrics and generators, rendering cross-paper comparisons ambiguous, and current evaluations frequently overlook the gap between ideal settings and real-world deployment.

Our benchmark bridges this gap. We define a comprehensive taxonomy of evaluation metrics across four key dimensions — pixel-level fidelity, perceptual fidelity, identity disruption, and visual quality — and systematically evaluate seven representative defenses against eight deepfake generators spanning attribute manipulation and face swap.

Our analysis exposes critical blind spots: fidelity and identity metrics capture orthogonal performance axes, often leading to conflicting interpretations when relied upon individually. We further identify a fundamental trade-off where peak white-box performance signals overfitting rather than genuine protection, and we introduce a calibrated evaluation to correct generator-induced identity bias.

Disruption

How effectively the defense degrades the deepfake output across pixel, perceptual, identity, and visual-quality dimensions.

Robustness

How well the defense survives practical transformations — JPEG compression, blur, Gaussian and salt-and-pepper noise.

Transferability

How well a defense crafted against one generator generalizes to unseen target generators with diverse architectures.

Radar charts comparing proactive defenses across generators and datasets
Check out our paper for a detailed analysis.
02

Evaluation Protocol

Metrics

  • MSE ↑ pixel-wise fidelity
  • LPIPS ↑ perceptual fidelity
  • CIDR ↑ identity disruption
  • BRISQUE ↑ visual quality
  • AvgRank ↓ aggregate rank
  • ROB ↑ robustness score
  • TE ↑ transferability

Generators

Attribute Manipulation
  • StarGAN
  • StyleCLIP
  • DiffAE
Face Swap
  • SimSwap
  • pSp-mix
  • BlendFace
  • DiffSwap
  • DiffFace

Datasets

  • CelebA-HQ
  • FFHQ
  • VGGFace2-HQ
Robustness Probes
  • JPEG
  • Blur
  • Gaussian Noise
  • Salt & Pepper
03

Leaderboards

Select a deepfake generator and a dataset to view its leaderboard. Click any column header to sort — AvgRank is lower-is-better; all other metrics are higher-is-better.

Attribute Manipulation
Face Swap

Leaderboard: StarGAN · CelebA-HQ

Attribute manipulation task on CelebA-HQ. Evaluation follows our unified protocol.

Rank Method Setting AvgRank ↓ MSE ↑ LPIPS ↑ CIDR ↑ BRISQUE ↑ ROB ↑ TE ↑

Ranks are computed per generator by AvgRank (lower is better). TE (transferability) is only defined for white-box defenses; "—" denotes values not reported in the paper. All numbers are from Table 2 of our paper.

04

FAQ

What does "proactive defense" mean, and how does it differ from detection?

Detection-based defenses act after a deepfake is generated, training a classifier to flag manipulated content. Proactive defenses act before: by preemptively modifying input images prior to release, they disrupt the deepfake generation process itself, producing visibly broken or identity-shifted outputs.

What is CIDR, and why introduce a new identity metric?

CIDR (Calibrated Identity Disruption Rate) is our proposed metric that corrects for generator-induced identity bias. Standard identity-distance metrics implicitly entangle two effects: the defense's perturbation, and the generator's intrinsic tendency to alter identity. CIDR isolates the former, giving a fairer measure of identity-level disruption attributable to the defense.

Why use AvgRank instead of averaging metrics directly?

Pixel-, perceptual-, and identity-metrics have incompatible scales and distributions. Direct averaging would let one metric dominate purely due to its scale. AvgRank aggregates by ranking each defense per metric and per generator, then averaging the ranks — a scale-invariant aggregation. Lower AvgRank means consistently strong relative performance.

What does "overfitting" mean in this context?

We observe that defenses with the highest white-box disruption often transfer poorly to unseen generators and degrade sharply under post-processing. Peak white-box performance, in other words, can signal over-specialization to a single generator's gradient landscape rather than genuine, generalizable protection.

Which defenses are evaluated?

We evaluate seven representative defenses spanning multiple paradigms: White-box — PGD, Disrupting Deepfakes, DF-RAP, Anti-Forgery, Latent Attack. Black-box — SCOL, NullSwap. See Appendix B (Table 5) of the paper for details on each method.

05

Citation

If you use our benchmark in your research, please cite our paper.

BibTeX

Coming soon — citation information will be added upon the camera-ready release.