Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks

We propose a simple and highly query-efficient black-box adversarial attack named SWITCH, which has a state-of-the-art performance under $ell_2$ and $ell_infty$ norms in the score-based setting. In the black box attack setting, designing query-efficient attacks remains an open problem...

The high query efficiency of the proposed approach stems from the combination of transfer-based attacks and random-search-based ones. The surrogate model’s gradient $hat{mathbf{g}}$ is exploited for the guidance, which is then switched if our algorithm detects that it does not point to the adversarial region by using a query, thereby keeping the objective loss function of the target model rising as much as possible. Two switch operations are available, i.e., SWITCH$_text{neg}$ and SWITCH$_text{rnd}$. SWITCH$_text{neg}$ takes $-hat{mathbf{g}}$ as the new direction, which is reasonable under an approximate local linearity assumption. SWITCH$_text{rnd}$ computes the gradient from another model, which is randomly selected from a large model set, to help bypass the potential obstacle in optimization. Experimental results show that these strategies boost the optimization process whereas following the original surrogate gradients does not work. In SWITCH, no query is used to estimate the gradient, and all the queries aim to determine whether to switch directions, resulting in unprecedented query efficiency. We demonstrate that our approach outperforms 10 state-of-the-art attacks on CIFAR-10, CIFAR-100 and TinyImageNet datasets. SWITCH can serve as a strong baseline for future black-box attacks. The PyTorch source code is released in https://github.com/machanic/SWITCH .

ADVERSARIAL ATTACK

Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks

Leave a Reply Cancel reply