“FastSiam: Resource-Efficient Self-Supervised Learning on a Single GPU”, Daniel Pototzky, Azhar Sultan, Lars Schmidt-Thieme2022-09-20 ()⁠:

Self-supervised pretraining has shown impressive performance in recent years, matching or even outperforming ImageNet weights on a broad range of downstream tasks. Unfortunately, existing methods require massive amounts of computing power with large batch sizes and batch norm statistics synchronized across multiple GPUs. This effectively excludes substantial parts of the computer vision community from the benefits of self-supervised learning who do not have access to extensive computing resources.

To address that, we develop FastSiam with the aim of matching ImageNet weights given as little computing power as possible.

We find that a core weakness of previous methods like SimSiam is that they compute the training target based on a single augmented crop (or “view”), leading to target instability. We show that by using multiple views per image instead of one, the training target can be stabilized, allowing for faster convergence and substantially reduced runtime.

We evaluate FastSiam on multiple challenging downstream tasks including object detection, instance segmentation and keypoint detection and find that it matches ImageNet weights after 25 epochs of pretraining on a single GPU with a batch size of only 32.