This work presents AstroPT, an autoregressive pretrained GPT-2-style transformer developed with astronomical use-cases in mind.
The AstroPT models presented here have been pretrained on 8.6 million 512 Ă 512 pixel grz-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that:
AstroPT follows a similar saturating log-log scaling law to textual models [cf. Henighanet al2020]. We also find that the modelsâ performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point.
We believe that collaborative community development paves the best route towards realizing an open source âLarge Observation Modelââa model trained on data taken from the observational sciences at the scale seen in natural language processing.
To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.
Figure 2: Validation set losses over our full training runs.
The left plot shows the validation loss per training floating point operation (FLOP), and the right plot shows the validation loss per 16 Ă 16 image patch token seen. Each run is labeled with the total neural parameter count as cross-matched in Table 1.
Figure 4: Here we show our relative linear probe performances per pretraining FLOP spent on a selection of scientifically-meaningful downstream tasks.
The markers are colored according to the modelsâ parameter counts. We run a Spearmanâs Ď fit and find in all cases a strong positive correlation between downstream task performance and model size, meaning that a larger model has more informative embeddings.
In this plot âMgâ and âMzâ are the absolute magnitudes in the g and z bands, âmean sSFRâ is the mean specific star formation rate, and âMââ is the stellar mass. âsmooth?â, âdisc?â, âartefact?â, âedge on?â and âtight spiral?â are Galaxy Zoo survey responses for these morphological features.
Our metadata sources are described further in §2.2.