Minimally Distorted Explainable Adversarial Attacks


Sparse adversarial attack approaches can be unified in a framework that is often based on a convex relaxation of the domain of the generated perturbation. However, there is a priori no real need for convexity relaxation. We consider generating adversarial images with a non-convex loss in a non-convex l_p neighbourhood of an input image, thus stepping away from the l_0 combinatorial problem while remaining as continuous but sparser than the l_1 ball. We formalise the concept of finding adversarial examples as an explicit optimization problem for the special case of p=1/2, for which we have access to an efficient proximal operator. To enhance the interpretability of generated attacks, we adjust the regularization parameter of the perturbation for each pixel separately. This modification increases the likelihood of perturbing pixels close to the already perturbed pixels. Experiments show that our method computes highly sparse and interpretable adversarial examples for ImageNet models.

Aug 28, 2023 11:00 AM — 11:30 AM
Tokyo, Japan
Shpresim Sadiku
PhD @TUBerlin/BMS
Scientific Assistant @ZuseInstitute