Protein identification using cryo-EM and artificial intelligence guides improved sample purification

bioRxiv - Biochemistry Pub Date : 2024-09-12 DOI:10.1101/2024.09.11.612515

Kenneth D Carr, Dane Evan D. Zambrano, Connor Weidle, Alex Goodson, Helen E Eisenach, Harley Pyles, Alexis Courbet, Neil P King, Andrew J Borst

{"title":"Protein identification using cryo-EM and artificial intelligence guides improved sample purification","authors":"Kenneth D Carr, Dane Evan D. Zambrano, Connor Weidle, Alex Goodson, Helen E Eisenach, Harley Pyles, Alexis Courbet, Neil P King, Andrew J Borst","doi":"10.1101/2024.09.11.612515","DOIUrl":null,"url":null,"abstract":"Protein purification is essential in protein biochemistry, structural biology, and protein design. It enables the determination of protein structures, the study of biological mechanisms, and the biochemical and biophysical characterization of both natural and de novo designed proteins. Despite the broad application of various protein purification protocols, standard strategies can still encounter challenges, such as the unintended co-purification of unknown contaminants alongside the target protein. In particular, co-purification issues pose significant challenges for designed self-assembling protein nanomaterials, as it is difficult to determine whether unexpected observed geometries represent novel assembly states of the designed system, cross-contamination from other assemblies, or native proteins originating from the expression host. In this study, we assessed the ability of an automated structure-to-sequence pipeline to unambiguously identify an unknown co-purifying protein found across several purified designed protein samples. Using cryo-electron microscopy (Cryo-EM), ModelAngelo's sequence-agnostic automated model-building feature, and the Basic Local Alignment Search Tool (BLAST), we identified the unknown protein as dihydrolipoamide succinyltransferase (DLST). This identification was further confirmed by comparing the cryo-EM data with available DLST structures in the Protein Data Bank (PDB) and AlphaFold 3 predictions from the top BLAST hits. The clear identification of DLST informed our subsequent literature search and led to the rational modification of our protein purification protocol, ultimately enabling the exclusion of the contaminant from preparations of our target nanoparticle. This study demonstrates the successful application of a structure-to-sequence workflow, integrating Cryo-EM, ModelAngelo, protein BLAST, PDB structures, and AlphaFold 3 predictions, to identify and remove an unknown protein contaminant from multiple purified samples. It also highlights the broader potential of integrating Cryo-EM with AI-driven tools for accurate protein identification across various samples and contexts in protein science.","PeriodicalId":501147,"journal":{"name":"bioRxiv - Biochemistry","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Biochemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.11.612515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Protein purification is essential in protein biochemistry, structural biology, and protein design. It enables the determination of protein structures, the study of biological mechanisms, and the biochemical and biophysical characterization of both natural and de novo designed proteins. Despite the broad application of various protein purification protocols, standard strategies can still encounter challenges, such as the unintended co-purification of unknown contaminants alongside the target protein. In particular, co-purification issues pose significant challenges for designed self-assembling protein nanomaterials, as it is difficult to determine whether unexpected observed geometries represent novel assembly states of the designed system, cross-contamination from other assemblies, or native proteins originating from the expression host. In this study, we assessed the ability of an automated structure-to-sequence pipeline to unambiguously identify an unknown co-purifying protein found across several purified designed protein samples. Using cryo-electron microscopy (Cryo-EM), ModelAngelo's sequence-agnostic automated model-building feature, and the Basic Local Alignment Search Tool (BLAST), we identified the unknown protein as dihydrolipoamide succinyltransferase (DLST). This identification was further confirmed by comparing the cryo-EM data with available DLST structures in the Protein Data Bank (PDB) and AlphaFold 3 predictions from the top BLAST hits. The clear identification of DLST informed our subsequent literature search and led to the rational modification of our protein purification protocol, ultimately enabling the exclusion of the contaminant from preparations of our target nanoparticle. This study demonstrates the successful application of a structure-to-sequence workflow, integrating Cryo-EM, ModelAngelo, protein BLAST, PDB structures, and AlphaFold 3 predictions, to identify and remove an unknown protein contaminant from multiple purified samples. It also highlights the broader potential of integrating Cryo-EM with AI-driven tools for accurate protein identification across various samples and contexts in protein science.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用低温电子显微镜和人工智能识别蛋白质，为改进样品纯化提供指导

蛋白质纯化对蛋白质生物化学、结构生物学和蛋白质设计至关重要。通过蛋白质纯化，可以确定蛋白质结构，研究生物机制，并对天然蛋白质和全新设计的蛋白质进行生物化学和生物物理鉴定。尽管各种蛋白质纯化方案应用广泛，但标准策略仍会遇到挑战，例如目标蛋白质与未知杂质的意外共纯化。特别是，共纯化问题给设计的自组装蛋白质纳米材料带来了巨大挑战，因为很难确定观察到的意外几何形状是代表设计系统的新组装状态、来自其他组装体的交叉污染，还是来自表达宿主的原生蛋白质。在这项研究中，我们评估了自动结构到序列管道的能力，以明确识别在多个纯化的设计蛋白质样品中发现的未知共纯化蛋白质。利用冷冻电子显微镜（Cryo-EM）、ModelAngelo 的序列不可知自动建模功能和基本局部比对搜索工具（BLAST），我们将未知蛋白质鉴定为二氢脂酰胺琥珀酰基转移酶（DLST）。通过将低温电子显微镜数据与蛋白质数据库（PDB）中现有的 DLST 结构和 BLAST 命中率最高的 AlphaFold 3 预测结果进行比较，我们进一步确认了这一鉴定结果。DLST 的明确识别为我们随后的文献检索提供了依据，并促使我们对蛋白质纯化方案进行了合理的修改，最终使我们在制备目标纳米粒子时排除了该污染物。这项研究展示了结构到序列工作流程的成功应用，它整合了 Cryo-EM、ModelAngelo、蛋白质 BLAST、PDB 结构和 AlphaFold 3 预测，从多个纯化样品中识别并剔除了未知蛋白质污染物。它还强调了将低温电子显微镜与人工智能驱动的工具相结合，在蛋白质科学的各种样品和环境中准确鉴定蛋白质的更广泛潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv - Biochemistry

自引率

0.00%

发文量