“Robustness May Be at Odds With Accuracy”, 2018-05-30 ():
We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy.
We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed empirically in more complex settings.
Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception.
…Overall, when training on the entire dataset, we observe a decline in standard accuracy as the strength of the adversary increases (see Figure 7 of Appendix G for a plot of standard accuracy vs. ε). (Note that this still holds if we train on batches that contain natural examples as well, as recommended by et al 2017. See Appendix B for details.) Similar effects were also observed in prior and concurrent work ( et al 2017; et al 2018; et al 2018b; et al 2018; et al 2019; et al 2018).
View PDF: