“Test-Time Augmentation to Solve ARC”, 2024-04-16 (; backlinks):
Interview with Jack Cole: ARC World Record Holder
With a dual career spanning clinical psychology and software development, Jack Cole and his team have brought a unique perspective to tackling the ARC Challenge. Leveraging cognitive insights and advanced machine learning techniques, the team’s approach stands out for its novel use of test-time fine-tuning and synthetic data enhancement.
…Q: How would you summarize your ARC solution in a few sentences; what makes it stand out from other solutions?
A: Our ARC solution stands out due to several key elements. Firstly, we fine-tune models on synthetic and augmented data. Secondly, we employ test-time fine-tuning [dynamic evaluation]. Lastly, we have developed an approach called AIRV (augment, inference, reverse augmentation, and vote), which is analogous to test-time augmentation. These innovations are crucial, as transformer models perform relatively poorly on ARC without them.
In recent months, our approach has been bolstered by the outstanding work of Michael Hodel on synthetic data, further enhancing our solution’s effectiveness. Our best single solution model has achieved a maximum score of 33% on Kaggle, besting all previous approaches combined (save for our own ensemble that scored 34% with Lab42).
Q: What are your plans on how to reach an even higher score? Are you thinking about developing new AI models or training techniques?
A: Without any additional innovations, my current approach is likely to keep advancing. It has continued at a rate of about 1 additional hidden test set item every week or two. This is due to ongoing training on TPUs that I have a grant from Google TPU Research Cloud for training. I am currently training different models of various sizes and architectures. If we are able to receive some financial support, I have a large roadmap of additional techniques to explore (largely around notions of self-improving loops or bootstrapping).