“AstroPT: Scaling Large Observation Models for Astronomy”, Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company2024-05-23 (, , )⁠:

This work presents AstroPT, an autoregressive pretrained GPT-2-style transformer developed with astronomical use-cases in mind.

The AstroPT models presented here have been pretrained on 8.6 million 512 × 512 pixel grz-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that:

AstroPT follows a similar saturating log-log scaling law to textual models [cf. Henighan et al 2020]. We also find that the models’ performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point.

We believe that collaborative community development paves the best route towards realizing an open source ‘Large Observation Model’—a model trained on data taken from the observational sciences at the scale seen in natural language processing.

To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.

Figure 2: Validation set losses over our full training runs. The left plot shows the validation loss per training floating point operation (FLOP), and the right plot shows the validation loss per 16 × 16 image patch token seen. Each run is labeled with the total neural parameter count as cross-matched in Table 1.
Figure 4: Here we show our relative linear probe performances per pretraining FLOP spent on a selection of scientifically-meaningful downstream tasks. The markers are colored according to the models’ parameter counts. We run a Spearman’s ρ fit and find in all cases a strong positive correlation between downstream task performance and model size, meaning that a larger model has more informative embeddings. In this plot ‘Mg’ and ‘Mz’ are the absolute magnitudes in the g and z bands, ‘mean sSFR’ is the mean specific star formation rate, and ‘M∗’ is the stellar mass. ‘smooth?’, ‘disc?’, ‘artefact?’, ‘edge on?’ and ‘tight spiral?’ are Galaxy Zoo survey responses for these morphological features. Our metadata sources are described further in §2.2.