Kenneth D Carr, Dane Evan D. Zambrano, Connor Weidle, Alex Goodson, Helen E Eisenach, Harley Pyles, Alexis Courbet, Neil P King, Andrew J Borst
{"title":"Protein identification using cryo-EM and artificial intelligence guides improved sample purification","authors":"Kenneth D Carr, Dane Evan D. Zambrano, Connor Weidle, Alex Goodson, Helen E Eisenach, Harley Pyles, Alexis Courbet, Neil P King, Andrew J Borst","doi":"10.1101/2024.09.11.612515","DOIUrl":null,"url":null,"abstract":"Protein purification is essential in protein biochemistry, structural biology, and protein design. It enables the determination of protein structures, the study of biological mechanisms, and the biochemical and biophysical characterization of both natural and de novo designed proteins. Despite the broad application of various protein purification protocols, standard strategies can still encounter challenges, such as the unintended co-purification of unknown contaminants alongside the target protein. In particular, co-purification issues pose significant challenges for designed self-assembling protein nanomaterials, as it is difficult to determine whether unexpected observed geometries represent novel assembly states of the designed system, cross-contamination from other assemblies, or native proteins originating from the expression host. In this study, we assessed the ability of an automated structure-to-sequence pipeline to unambiguously identify an unknown co-purifying protein found across several purified designed protein samples. Using cryo-electron microscopy (Cryo-EM), ModelAngelo's sequence-agnostic automated model-building feature, and the Basic Local Alignment Search Tool (BLAST), we identified the unknown protein as dihydrolipoamide succinyltransferase (DLST). This identification was further confirmed by comparing the cryo-EM data with available DLST structures in the Protein Data Bank (PDB) and AlphaFold 3 predictions from the top BLAST hits. The clear identification of DLST informed our subsequent literature search and led to the rational modification of our protein purification protocol, ultimately enabling the exclusion of the contaminant from preparations of our target nanoparticle. This study demonstrates the successful application of a structure-to-sequence workflow, integrating Cryo-EM, ModelAngelo, protein BLAST, PDB structures, and AlphaFold 3 predictions, to identify and remove an unknown protein contaminant from multiple purified samples. It also highlights the broader potential of integrating Cryo-EM with AI-driven tools for accurate protein identification across various samples and contexts in protein science.","PeriodicalId":501147,"journal":{"name":"bioRxiv - Biochemistry","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Biochemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.11.612515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Protein purification is essential in protein biochemistry, structural biology, and protein design. It enables the determination of protein structures, the study of biological mechanisms, and the biochemical and biophysical characterization of both natural and de novo designed proteins. Despite the broad application of various protein purification protocols, standard strategies can still encounter challenges, such as the unintended co-purification of unknown contaminants alongside the target protein. In particular, co-purification issues pose significant challenges for designed self-assembling protein nanomaterials, as it is difficult to determine whether unexpected observed geometries represent novel assembly states of the designed system, cross-contamination from other assemblies, or native proteins originating from the expression host. In this study, we assessed the ability of an automated structure-to-sequence pipeline to unambiguously identify an unknown co-purifying protein found across several purified designed protein samples. Using cryo-electron microscopy (Cryo-EM), ModelAngelo's sequence-agnostic automated model-building feature, and the Basic Local Alignment Search Tool (BLAST), we identified the unknown protein as dihydrolipoamide succinyltransferase (DLST). This identification was further confirmed by comparing the cryo-EM data with available DLST structures in the Protein Data Bank (PDB) and AlphaFold 3 predictions from the top BLAST hits. The clear identification of DLST informed our subsequent literature search and led to the rational modification of our protein purification protocol, ultimately enabling the exclusion of the contaminant from preparations of our target nanoparticle. This study demonstrates the successful application of a structure-to-sequence workflow, integrating Cryo-EM, ModelAngelo, protein BLAST, PDB structures, and AlphaFold 3 predictions, to identify and remove an unknown protein contaminant from multiple purified samples. It also highlights the broader potential of integrating Cryo-EM with AI-driven tools for accurate protein identification across various samples and contexts in protein science.