Improving Query Efficiency of Black-box Adversarial Attack

Deep neural networks (DNNs) have demonstrated excellent performance on various tasks, however they are under the risk of adversarial examples that can be easily generated when the target model is accessible to an attacker (white-box setting). As plenty of machine learning models have been deployed via online services that only provide query outputs from inaccessible models (e.g. Google Cloud Vision API2), black-box adversarial attacks (inaccessible target model) are of critical security concerns in practice rather than white-box ones… However, existing […]

Read more

Contextualized Perturbation for Textual Adversarial Attack

Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs… This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in […]

Read more

Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks

We propose a simple and highly query-efficient black-box adversarial attack named SWITCH, which has a state-of-the-art performance under $ell_2$ and $ell_infty$ norms in the score-based setting. In the black box attack setting, designing query-efficient attacks remains an open problem… The high query efficiency of the proposed approach stems from the combination of transfer-based attacks and random-search-based ones. The surrogate model’s gradient $hat{mathbf{g}}$ is exploited for the guidance, which is then switched if our algorithm detects that it does not point […]

Read more