“The Neural Architecture of Language: Integrative Reverse-Engineering Converges on a Model for Predictive Processing”, 2020-10-09 (; backlinks; similar):
The neuroscience of perception has recently been revolutionized with an integrative reverse-engineering approach in which computation, brain function, and behavior are linked across many different datasets and many computational models. We here present a first systematic study taking this approach into higher-level cognition: human language processing, our species’ signature cognitive skill.
We find that the most powerful ‘transformer’ networks predict neural responses at nearly 100% and generalize across different datasets and data types (fMRI, ECoG). Across models, statistically-significant correlations are observed among all 3 metrics of performance: neural fit, fit to behavioral responses, and accuracy on the next-word prediction task (but not other language tasks), consistent with the long-standing hypothesis that the brain’s language system is optimized for predictive processing.
Model architectures with initial weights further perform surprisingly similar to final trained models, suggesting that inherent structure—and not just experience with language—crucially contributes to a model’s match to the brain.
[Published as “The neural architecture of language: Integrative modeling converges on predictive processing”, et al 2021.]