ā€œManipulating SGD With Data Ordering Attacksā€, Ilia Shumailov, Zakhar Shumaylov, Dmitry Kazhdan, Yiren Zhao, Nicolas Papernot, Murat A. Erdogdu, Ross Anderson2021-04-19 ()⁠:

Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors.

In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviors specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures.

We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors.

…This attack is realistic and can be instantiated in several ways. The attack code can be infiltrated into: the operating system handing file system requests; the disk handling individual data accesses; the software that determines the way random data sampling is performed; the distributed storage manager; or the machine learning pipeline itself handling prefetch operations. That is a substantial attack surface, and for large models these components may be controlled by different principals. The attack is also very stealthy. The attacker does not add any noise or perturbation to the data.There are no triggers or backdoors introduced into the dataset. All of the data points are natural. In two of 4 variants the attacker uses the whole dataset and does not oversample any given point, i.e. the sampling is without replacement. This makes it difficult to deploy simple countermeasures.