“It’s About Time: Analog Clock Reading in the Wild”, Charig Yang, Weidi Xie, Andrew Zisserman2021-11-17 (, )⁠:

[homepage; code] In this paper, we present a framework for reading analog clocks in natural images or videos.

Specifically, we make the following contributions: First, we create a scalable pipeline for generating synthetic clocks, reducing the requirements for the labour-intensive annotations. Second, we introduce a clock recognition architecture based on spatial transformer networks (STN), which is trained end-to-end for clock alignment and recognition.

We show that the model trained on the proposed synthetic dataset generalizes towards real clocks with good accuracy, advocating a Sim2Real training regime.

Third, to further reduce the gap between simulation and real data, we leverage the special property of “time”, ie. uniformity, to generate reliable pseudo-labels on real unlabeled clock videos, and show that training on these videos offers further improvements while still requiring zero manual annotations.

Lastly, we introduce 3 benchmark datasets based on COCO, Open Images, and The Clock movie, with full annotations for time, accurate to the minute.