CosPGD: An efficient white-box adversarial attack for pixel-wise prediction tasks

Accepted at ICML 2024

Poster

Abstract

While neural networks allow highly accurate predictions in many tasks, their lack of robustness towards even slight input perturbations often hampers their deployment. Adversarial attacks such as the seminal projected gradient descent (PGD) offer an effective means to evaluate a model's robustness and dedicated solutions have been proposed for attacks on semantic segmentation or optical flow estimation. While they attempt to increase the attack's efficiency, a further objective is to balance its effect, so that it acts on the entire image domain instead of isolated point-wise predictions. This often comes at the cost of optimization stability and thus efficiency. Here, we propose CosPGD, an attack that encourages more balanced errors over the entire image domain while increasing the attack's overall efficiency. To this end, CosPGD leverages a simple alignment score computed from any pixel-wise prediction and its target to scale the loss in a smooth and fully differentiable way. It leads to efficient evaluations of a model's robustness for semantic segmentation as well as regression models (such as optical flow, disparity estimation, or image restoration), and it allows it to outperform the previous SotA attack on semantic segmentation. We provide code for the CosPGD algorithm and example usage at https://github.com/shashankskagnihotri/cospgd

Algorithm

Prediction Alignment Scaling

Here, we report, change in pixel-wise image gradients over attack iterations on DeepLabV3 performing semantic segmentation on PASCAL VOC 2012 validation subset.

We observe that the absolute difference between gradient values (top) is larger for PGD and increasing for SegPGD, while being stable for CosPGD.

Further, CosPGD has fewer changes in gradient direction over attack iterations (bottom) compared to PGD and SegPGD.

This shows CosPGD is more stable during optimization compared to PGD and SegPGD.

Demo on Semantic Segmentation

Attacking SegFormer with a MIT-B0 backbone using ADE20K with different ℓ bounded ε values and with different adversarial attacks: SegPGD, PGD and CosPGD as untargeted attacks.

CosPGD outperforms all the other attacks across ε values and attack iterations.

Demo on Optical Flow Estimation

Comparing PGD and CosPGD as a targeted ℓ-norm constrained 40 iteration attacks on RAFT using Sintel (clean) validation dataset.

Citations

Please use the following citations:

As Normal Text:

Agnihotri, S., Jung, S. &; Keuper, M.. (2024). CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:416–451 Available from proceedings.mlr.press/v235/agnihotri24b.html.

BibTeX:

@InProceedings{pmlr-v235-agnihotri24b,
  title =      {{C}os{PGD}: an efficient white-box adversarial attack for pixel-wise prediction tasks},
  author =       {Agnihotri, Shashank and Jung, Steffen and Keuper, Margret},
  booktitle =      {Proceedings of the 41st International Conference on Machine Learning},
  pages =      {416--451},
  year =      {2024},
  editor =      {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume =      {235},
  series =      {Proceedings of Machine Learning Research},
  month =      {21--27 Jul},
  publisher =    {PMLR},
  pdf =      {https://raw.githubusercontent.com/mlresearch/v235/main/assets/agnihotri24b/agnihotri24b.pdf},
  url =      {https://proceedings.mlr.press/v235/agnihotri24b.html},  
}

Acknowledgements

Steffen Jung and Margret Keuper acknowledge funding by the DFG Research Unit 5336 – Learning to Sense.

The OMNI cluster of University of Siegen was used for some of the initial computations.

Affiliations