“Computed Structures of Core Eukaryotic Protein Complexes”, Ian R. Humphreys, Jimin Pei, Minkyung Baek, Aditya Krishnakumar, Ivan Anishchenko, Sergey Ovchinnikov, Jing Zhang, Travis J. Ness, Sudeep Banjade, Saket R. Bagde, Viktoriya G. Stancheva, Xiao-Han Li, Kaixian Liu, Zhi Zheng, Daniel J. Barrero, Upasana Roy, Jochen Kuper, Israel S. Fernández, Barnabas Szakal, Dana Branzei, Josep Rizo, Caroline Kisker, Eric C. Greene, Sue Biggins, Scott Keeney, Elizabeth A. Miller, J. Christopher Fromme, Tamara L. Hendrickson, Qian Cong, David Baker2021-11-11 (; similar)⁠:

[Lowe commentary] Deep learning for protein interactions: The use of deep learning has revolutionized the field of protein modeling. Humphreys et al 2021 combined this approach with proteome-wide, coevolution-guided protein interaction identification to conduct a large-scale screen of protein-protein interactions in yeast (see the Perspective by Pereira and Schwede). The authors generated predicted interactions and accurate structures for complexes spanning key biological processes in Saccharomyces cerevisiae. The complexes include larger protein assemblies such as trimers, tetramers, and pentamers and provide insights into biological function.


Background: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. High-throughput experimental methods such as yeast 2-hybrid and affinity-purification mass spectrometry have been used to identify interactions in multiple organisms, but there are inconsistencies between different datasets, and the methods do not provide high-resolution structural information. Here, we use deep learning methods to systematically identify and build structures for the protein complexes that mediate key processes in eukaryotes.

Rationale: Interacting proteins often co-evolve, and in prokaryotes, evolutionary information can be used to identify interactions on the proteome scale at an accuracy higher than that of experimental screens. Extending this method to eukaryotes is complicated because there are fewer genome sequences available, resulting in weaker coevolutionary signals. The deep learning methods RoseTTAFold and AlphaFold, have a rich understanding of protein sequence-structure relationships, and so could help overcome this limitation.

Results: We developed a coevolution-guided protein interaction identification pipeline that incorporates a rapidly computable version of RoseTTAFold with the slower but more accurate AlphaFold to systematically evaluate interactions between 8.3 million pairs of yeast proteins. RoseTTAFold alone has comparable performance in identifying protein-protein interactions to that of large-scale experimental methods; combination with AlphaFold increases identification accuracy. In total, we constructed models for 106 previously unidentified assemblies and 806 that were structurally uncharacterized.

These complexes provide rich insights into a range of biological processes from transcription, translation, and DNA repair to protein transport and modification. For example, Rad51 plays a pivotal role in DNA repair through homologous recombination, and mutations are associated with Fanconi anemia and cancer in humans. Rad55 and Rad57 are positive regulators of Rad51 assembly on single-stranded DNA. Our Rad55–Rad57–Rad51 complex model suggests that Rad55–Rad57 can bind at the 5′ end of the Rad51 single-stranded DNA filament and may stabilize the filament conformation of Rad51. Glycosylphosphatidylinositol transamidase (GPI-T) is a pentameric enzyme complex that catalyzes the attachment of GPI anchors to the C terminus of proteins. GPI-T is structurally uncharacterized, and mutations in subunits of the complex have been implicated in neurodevelopmental disorders and cancer in humans. Our model of the 5-protein assembly shows that the previously identified catalytic dyad is positioned adjacent to a channel formed by 3 other subunits that could function in C-terminal GPI-T signal peptide recognition.

Conclusion: Our approach extends the range of large-scale deep learning-based structure modeling from monomeric proteins to protein assemblies. Following up on the many new interactions and complex structures should advance the understanding of a wide range of eukaryotic cellular processes and provide new targets for therapeutic intervention. Our results herald a new era of structural biology in which computation plays a fundamental role in both interaction discovery and structure determination.