“Gradient-Based Adversarial Attacks against Text Transformers”, Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela2021-04-15 (, ; similar)⁠:

We propose the first general-purpose gradient-based attack against transformer models.

Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization.

We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.

Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.