Large Norms of CNN Layers Do Not Hurt Adversarial Robustness

Since the Lipschitz properties of convolutional neural network (CNN) are widely considered to be related to adversarial robustness, we theoretically characterize the $ell_1$ norm and $ell_infty$ norm of 2D multi-channel convolutional layers and provide efficient methods to compute the exact $ell_1$ norm and $ell_infty$ norm. Based on our theorem, we propose a novel regularization method termed norm decay, which can effectively reduce the norms of CNN layers… Experiments show that norm-regularization methods, including norm decay, weight decay, and singular value […]

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Adversarial training, especially projected gradient descent (PGD), has been the most successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs are meaningful and interpretable by humans… However, the concept of interpretability is not mathematically well established, making it difficult to evaluate it quantitatively. We define interpretability as the alignment of the model gradient with the vector pointing toward the closest point of the support of the other class. We propose […]