首页 > 最新文献

Cell Systems最新文献

英文 中文
A proteogenomics data-driven knowledge base of human cancer. 人类癌症的蛋白基因组学数据驱动知识库。
IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-09-20 Epub Date: 2023-08-23 DOI: 10.1016/j.cels.2023.07.007
Yuxing Liao, Sara R Savage, Yongchao Dou, Zhiao Shi, Xinpei Yi, Wen Jiang, Jonathan T Lei, Bing Zhang

By combining mass-spectrometry-based proteomics and phosphoproteomics with genomics, epi-genomics, and transcriptomics, proteogenomics provides comprehensive molecular characterization of cancer. Using this approach, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has characterized over 1,000 primary tumors spanning 10 cancer types, many with matched normal tissues. Here, we present LinkedOmicsKB, a proteogenomics data-driven knowledge base that makes consistently processed and systematically precomputed CPTAC pan-cancer proteogenomics data available to the public through ∼40,000 gene-, protein-, mutation-, and phenotype-centric web pages. Visualization techniques facilitate efficient exploration and reasoning of complex, interconnected data. Using three case studies, we illustrate the practical utility of LinkedOmicsKB in providing new insights into genes, phosphorylation sites, somatic mutations, and cancer phenotypes. With precomputed results of 19,701 coding genes, 125,969 phosphosites, and 256 genotypes and phenotypes, LinkedOmicsKB provides a comprehensive resource to accelerate proteogenomics data-driven discoveries to improve our understanding and treatment of human cancer. A record of this paper's transparent peer review process is included in the supplemental information.

通过将基于质谱的蛋白质组学和磷酸蛋白质组学与基因组学、表基因组学和转录组学相结合,蛋白基因组学提供了癌症的全面分子表征。使用这种方法,临床蛋白质组肿瘤分析联合会(CPTAC)已经鉴定了1000多个原发肿瘤,涵盖10种癌症类型,其中许多具有匹配的正常组织。在这里,我们展示了LinkedOmicsKB,这是一个蛋白基因组学数据驱动的知识库,通过约40000个以基因、蛋白质、突变和表型为中心的网页,向公众提供一致处理和系统预计算的CPTAC泛癌蛋白基因组学数据。可视化技术有助于对复杂的、相互关联的数据进行有效的探索和推理。通过三个案例研究,我们阐明了LinkedOmicsKB在提供对基因、磷酸化位点、体细胞突变和癌症表型的新见解方面的实用性。LinkedOmicsKB预计算了19701个编码基因、125969个磷酸位点和256种基因型和表型的结果,为加速蛋白基因组学数据驱动的发现提供了全面的资源,以提高我们对人类癌症的理解和治疗。本文的透明同行评审过程记录包含在补充信息中。
{"title":"A proteogenomics data-driven knowledge base of human cancer.","authors":"Yuxing Liao, Sara R Savage, Yongchao Dou, Zhiao Shi, Xinpei Yi, Wen Jiang, Jonathan T Lei, Bing Zhang","doi":"10.1016/j.cels.2023.07.007","DOIUrl":"10.1016/j.cels.2023.07.007","url":null,"abstract":"<p><p>By combining mass-spectrometry-based proteomics and phosphoproteomics with genomics, epi-genomics, and transcriptomics, proteogenomics provides comprehensive molecular characterization of cancer. Using this approach, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has characterized over 1,000 primary tumors spanning 10 cancer types, many with matched normal tissues. Here, we present LinkedOmicsKB, a proteogenomics data-driven knowledge base that makes consistently processed and systematically precomputed CPTAC pan-cancer proteogenomics data available to the public through ∼40,000 gene-, protein-, mutation-, and phenotype-centric web pages. Visualization techniques facilitate efficient exploration and reasoning of complex, interconnected data. Using three case studies, we illustrate the practical utility of LinkedOmicsKB in providing new insights into genes, phosphorylation sites, somatic mutations, and cancer phenotypes. With precomputed results of 19,701 coding genes, 125,969 phosphosites, and 256 genotypes and phenotypes, LinkedOmicsKB provides a comprehensive resource to accelerate proteogenomics data-driven discoveries to improve our understanding and treatment of human cancer. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":" ","pages":"777-787.e5"},"PeriodicalIF":9.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10070752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bipartite function of ESRRB can integrate signaling over time to balance self-renewal and differentiation. ESRRB的二分功能可以随着时间的推移整合信号,以平衡自我更新和分化。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-09-20 Epub Date: 2023-08-25 DOI: 10.1016/j.cels.2023.07.008
Teresa E Knudsen, William B Hamilton, Martin Proks, Maria Lykkegaard, Madeleine Linneberg-Agerholm, Alexander V Nielsen, Marta Perera, Luna Lynge Malzard, Ala Trusina, Joshua M Brickman

Cooperative DNA binding of transcription factors (TFs) integrates the cellular context to support cell specification during development. Naive mouse embryonic stem cells are derived from early development and can sustain their pluripotent identity indefinitely. Here, we ask whether TFs associated with pluripotency evolved to directly support this state or if the state emerges from their combinatorial action. NANOG and ESRRB are key pluripotency factors that co-bind DNA. We find that when both factors are expressed, ESRRB supports pluripotency. However, when NANOG is absent, ESRRB supports a bistable culture of cells with an embryo-like primitive endoderm identity ancillary to pluripotency. The stoichiometry between NANOG and ESRRB allows quantitative titration of this differentiation, and in silico modeling of bipartite ESRRB activity suggests it safeguards plasticity in differentiation. Thus, the concerted activity of cooperative TFs can transform their effect to sustain intermediate cell identities and allow ex vivo expansion of immortal stem cells. A record of this paper's transparent peer review process is included in the supplemental information.

转录因子的协同DNA结合整合了细胞环境,以支持发育过程中的细胞规范。天真的小鼠胚胎干细胞来源于早期发育,可以无限期地维持其多能干特性。在这里,我们询问与多能性相关的转录因子是否进化为直接支持这种状态,或者这种状态是否是从它们的组合作用中产生的。NANOG和ESRRB是共同结合DNA的关键多能性因子。我们发现,当两种因子都表达时,ESRRB支持多能性。然而,当NANOG不存在时,ESRRB支持具有胚胎样原始内胚层特性的细胞的双稳态培养,该特性辅助多能性。NANOG和ESRRB之间的化学计量允许对这种分化进行定量滴定,并且在二分ESRRB活性的计算机建模中表明,它保护了分化中的可塑性。因此,协同转录因子的协同活性可以改变其作用,以维持中间细胞身份,并允许永生干细胞的离体扩增。本文的透明同行评审过程记录包含在补充信息中。
{"title":"A bipartite function of ESRRB can integrate signaling over time to balance self-renewal and differentiation.","authors":"Teresa E Knudsen,&nbsp;William B Hamilton,&nbsp;Martin Proks,&nbsp;Maria Lykkegaard,&nbsp;Madeleine Linneberg-Agerholm,&nbsp;Alexander V Nielsen,&nbsp;Marta Perera,&nbsp;Luna Lynge Malzard,&nbsp;Ala Trusina,&nbsp;Joshua M Brickman","doi":"10.1016/j.cels.2023.07.008","DOIUrl":"10.1016/j.cels.2023.07.008","url":null,"abstract":"<p><p>Cooperative DNA binding of transcription factors (TFs) integrates the cellular context to support cell specification during development. Naive mouse embryonic stem cells are derived from early development and can sustain their pluripotent identity indefinitely. Here, we ask whether TFs associated with pluripotency evolved to directly support this state or if the state emerges from their combinatorial action. NANOG and ESRRB are key pluripotency factors that co-bind DNA. We find that when both factors are expressed, ESRRB supports pluripotency. However, when NANOG is absent, ESRRB supports a bistable culture of cells with an embryo-like primitive endoderm identity ancillary to pluripotency. The stoichiometry between NANOG and ESRRB allows quantitative titration of this differentiation, and in silico modeling of bipartite ESRRB activity suggests it safeguards plasticity in differentiation. Thus, the concerted activity of cooperative TFs can transform their effect to sustain intermediate cell identities and allow ex vivo expansion of immortal stem cells. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":" ","pages":"788-805.e8"},"PeriodicalIF":9.3,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10075495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-throughput functional characterization of combinations of transcriptional activators and repressors. 转录激活因子和阻遏因子组合的高通量功能表征。
IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-09-20 Epub Date: 2023-08-04 DOI: 10.1016/j.cels.2023.07.001
Adi X Mukund, Josh Tycko, Sage J Allen, Stephanie A Robinson, Cecelia Andrews, Joydeb Sinha, Connor H Ludwig, Kaitlyn Spees, Michael C Bassik, Lacramioara Bintu

Despite growing knowledge of the functions of individual human transcriptional effector domains, much less is understood about how multiple effector domains within the same protein combine to regulate gene expression. Here, we measure transcriptional activity for 8,400 effector domain combinations by recruiting them to reporter genes in human cells. In our assay, weak and moderate activation domains synergize to drive strong gene expression, whereas combining strong activators often results in weaker activation. In contrast, repressors combine linearly and produce full gene silencing, and repressor domains often overpower activation domains. We use this information to build a synthetic transcription factor whose function can be tuned between repression and activation independent of recruitment to target genes by using a small-molecule drug. Altogether, we outline the basic principles of how effector domains combine to regulate gene expression and demonstrate their value in building precise and flexible synthetic biology tools. A record of this paper's transparent peer review process is included in the supplemental information.

尽管人们对单个人类转录效应结构域的功能越来越了解,但对同一蛋白质内多个效应结构域如何结合调节基因表达的了解却少得多。在这里,我们通过招募8400个效应结构域组合到人类细胞中的报告基因来测量它们的转录活性。在我们的分析中,弱激活结构域和中等激活结构域协同驱动强基因表达,而结合强激活剂通常导致较弱的激活。相反,阻遏物线性结合并产生完全的基因沉默,并且阻遏物结构域通常压倒激活结构域。我们利用这些信息构建了一种合成转录因子,通过使用小分子药物,其功能可以在抑制和激活之间进行调节,而不依赖于靶基因的募集。总之,我们概述了效应域如何结合调节基因表达的基本原理,并证明了它们在构建精确灵活的合成生物学工具方面的价值。本文的透明同行评审过程记录包含在补充信息中。
{"title":"High-throughput functional characterization of combinations of transcriptional activators and repressors.","authors":"Adi X Mukund, Josh Tycko, Sage J Allen, Stephanie A Robinson, Cecelia Andrews, Joydeb Sinha, Connor H Ludwig, Kaitlyn Spees, Michael C Bassik, Lacramioara Bintu","doi":"10.1016/j.cels.2023.07.001","DOIUrl":"10.1016/j.cels.2023.07.001","url":null,"abstract":"<p><p>Despite growing knowledge of the functions of individual human transcriptional effector domains, much less is understood about how multiple effector domains within the same protein combine to regulate gene expression. Here, we measure transcriptional activity for 8,400 effector domain combinations by recruiting them to reporter genes in human cells. In our assay, weak and moderate activation domains synergize to drive strong gene expression, whereas combining strong activators often results in weaker activation. In contrast, repressors combine linearly and produce full gene silencing, and repressor domains often overpower activation domains. We use this information to build a synthetic transcription factor whose function can be tuned between repression and activation independent of recruitment to target genes by using a small-molecule drug. Altogether, we outline the basic principles of how effector domains combine to regulate gene expression and demonstrate their value in building precise and flexible synthetic biology tools. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":" ","pages":"746-763.e5"},"PeriodicalIF":9.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10642976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10218034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engineering allosteric transcription factors guided by the LacI topology. 由LacI拓扑结构引导的工程变构转录因子。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.04.008
Ashley N Hersey, Valerie E Kay, Sumin Lee, Matthew J Realff, Corey J Wilson

Allosteric transcription factors (aTFs) are used in a myriad of processes throughout biology and biotechnology. aTFs have served as the workhorses for developments in synthetic biology, fundamental research, and protein manufacturing. One of the most utilized TFs is the lactose repressor (LacI). In addition to being an exceptional tool for gene regulation, LacI has also served as an outstanding model system for understanding allosteric communication. In this perspective, we will use the LacI TF as the principal exemplar for engineering alternate functions related to allostery-i.e., alternate protein DNA interactions, alternate protein-ligand interactions, and alternate phenotypic mechanisms. In addition, we will summarize the design rules and heuristics for each design goal and demonstrate how the resulting design rules and heuristics can be extrapolated to engineer other aTFs with a similar topology-i.e., from the broader LacI/GalR family of TFs.

变构转录因子(aTFs)在整个生物学和生物技术的无数过程中使用。atf已成为合成生物学、基础研究和蛋白质制造领域发展的主力。最常用的tf之一是乳糖抑制因子(LacI)。除了作为基因调控的特殊工具外,LacI还作为理解变构通讯的杰出模型系统。从这个角度来看,我们将使用LacI TF作为与变构相关的工程替代功能的主要范例。交替的蛋白质DNA相互作用,交替的蛋白质配体相互作用,以及交替的表型机制。此外,我们将总结每个设计目标的设计规则和启发式方法,并演示如何将结果设计规则和启发式方法外推到具有类似拓扑的其他atf中。来自更广泛的LacI/GalR TFs家族。
{"title":"Engineering allosteric transcription factors guided by the LacI topology.","authors":"Ashley N Hersey,&nbsp;Valerie E Kay,&nbsp;Sumin Lee,&nbsp;Matthew J Realff,&nbsp;Corey J Wilson","doi":"10.1016/j.cels.2023.04.008","DOIUrl":"https://doi.org/10.1016/j.cels.2023.04.008","url":null,"abstract":"<p><p>Allosteric transcription factors (aTFs) are used in a myriad of processes throughout biology and biotechnology. aTFs have served as the workhorses for developments in synthetic biology, fundamental research, and protein manufacturing. One of the most utilized TFs is the lactose repressor (LacI). In addition to being an exceptional tool for gene regulation, LacI has also served as an outstanding model system for understanding allosteric communication. In this perspective, we will use the LacI TF as the principal exemplar for engineering alternate functions related to allostery-i.e., alternate protein DNA interactions, alternate protein-ligand interactions, and alternate phenotypic mechanisms. In addition, we will summarize the design rules and heuristics for each design goal and demonstrate how the resulting design rules and heuristics can be extrapolated to engineer other aTFs with a similar topology-i.e., from the broader LacI/GalR family of TFs.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"645-655"},"PeriodicalIF":9.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10046056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces. 通过潜在空间揭示肿瘤微环境中分子相互作用的空间景观。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.07.006
Atul Deshpande, Melanie Loth, Dimitrios N Sidiropoulos, Shuming Zhang, Long Yuan, Alexander T F Bell, Qingfeng Zhu, Won Jin Ho, Cesar Santa-Maria, Daniele M Gilkes, Stephen R Williams, Cedric R Uytingco, Jennifer Chew, Andrej Hartnett, Zachary W Bent, Alexander V Favorov, Aleksander S Popel, Mark Yarchoan, Ashley Kiemen, Pei-Hsun Wu, Kohei Fujikura, Denis Wirtz, Laura D Wood, Lei Zheng, Elizabeth M Jaffee, Robert A Anders, Ludmila Danilova, Genevieve Stein-O'Brien, Luciane T Kagohara, Elana J Fertig
{"title":"Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces.","authors":"Atul Deshpande, Melanie Loth, Dimitrios N Sidiropoulos, Shuming Zhang, Long Yuan, Alexander T F Bell, Qingfeng Zhu, Won Jin Ho, Cesar Santa-Maria, Daniele M Gilkes, Stephen R Williams, Cedric R Uytingco, Jennifer Chew, Andrej Hartnett, Zachary W Bent, Alexander V Favorov, Aleksander S Popel, Mark Yarchoan, Ashley Kiemen, Pei-Hsun Wu, Kohei Fujikura, Denis Wirtz, Laura D Wood, Lei Zheng, Elizabeth M Jaffee, Robert A Anders, Ludmila Danilova, Genevieve Stein-O'Brien, Luciane T Kagohara, Elana J Fertig","doi":"10.1016/j.cels.2023.07.006","DOIUrl":"10.1016/j.cels.2023.07.006","url":null,"abstract":"","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"722"},"PeriodicalIF":9.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10523348/pdf/nihms-1925932.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10531707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simplifying complex antibody engineering using machine learning. 使用机器学习简化复杂的抗体工程。
IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.04.009
Emily K Makowski, Hsin-Ting Chen, Peter M Tessier

Machine learning is transforming antibody engineering by enabling the generation of drug-like monoclonal antibodies with unprecedented efficiency. Unsupervised algorithms trained on massive and diverse protein sequence datasets facilitate the prediction of panels of antibody variants with native-like intrinsic properties (e.g., high stability), greatly reducing the amount of subsequent experimentation needed to identify specific candidates that also possess desired extrinsic properties (e.g., high affinity). Additionally, supervised algorithms, which are trained on deep sequencing datasets obtained after enrichment of in vitro antibody libraries for one or more specific extrinsic properties, enable the prediction of antibody variants with desired combinations of extrinsic properties without the need for additional screening. Here we review recent advances using both machine learning approaches and how they are impacting the field of antibody engineering as well as key outstanding challenges and opportunities for these paradigm-changing methods.

机器学习正在改变抗体工程,以前所未有的效率产生类似药物的单克隆抗体。在大量和多样化的蛋白质序列数据集上训练的无监督算法有助于预测具有天然样固有特性(例如,高稳定性)的抗体变体组,大大减少了识别具有所需外部特性(例如,高亲和力)的特定候选物所需的后续实验量。此外,在体外抗体库富集一种或多种特定外在特性后获得的深度测序数据集上训练的监督算法,能够预测具有所需外在特性组合的抗体变体,而无需额外的筛选。在这里,我们回顾了使用机器学习方法的最新进展,以及它们如何影响抗体工程领域,以及这些改变范式的方法面临的关键挑战和机遇。
{"title":"Simplifying complex antibody engineering using machine learning.","authors":"Emily K Makowski, Hsin-Ting Chen, Peter M Tessier","doi":"10.1016/j.cels.2023.04.009","DOIUrl":"10.1016/j.cels.2023.04.009","url":null,"abstract":"<p><p>Machine learning is transforming antibody engineering by enabling the generation of drug-like monoclonal antibodies with unprecedented efficiency. Unsupervised algorithms trained on massive and diverse protein sequence datasets facilitate the prediction of panels of antibody variants with native-like intrinsic properties (e.g., high stability), greatly reducing the amount of subsequent experimentation needed to identify specific candidates that also possess desired extrinsic properties (e.g., high affinity). Additionally, supervised algorithms, which are trained on deep sequencing datasets obtained after enrichment of in vitro antibody libraries for one or more specific extrinsic properties, enable the prediction of antibody variants with desired combinations of extrinsic properties without the need for additional screening. Here we review recent advances using both machine learning approaches and how they are impacting the field of antibody engineering as well as key outstanding challenges and opportunities for these paradigm-changing methods.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"667-675"},"PeriodicalIF":9.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10733906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10421318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning protein fitness landscapes with deep mutational scanning data from multiple sources. 从多个来源的深度突变扫描数据学习蛋白质适应度景观。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.07.003
Lin Chen, Zehong Zhang, Zhenghao Li, Rui Li, Ruifeng Huo, Lifan Chen, Dingyan Wang, Xiaomin Luo, Kaixian Chen, Cangsong Liao, Mingyue Zheng

One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.

机器学习辅助定向进化(MLDE)的关键之一是准确学习适应度景观,即从序列变体到期望函数的概念映射。在这里,我们描述了一种多蛋白质训练方案,该方案利用来自不同蛋白质的现有深度突变扫描数据来帮助理解新蛋白质的适应度景观。概念验证试验旨在从三个方面验证该训练方案:单变量效应的随机和位置外推,新蛋白质的零射击适应度预测,以及单变量效应的高阶变体效应外推。此外,我们的研究发现了以前被忽视的强大基线,它们意想不到的良好表现使我们注意到MLDE的陷阱。总的来说,这些结果可能会提高我们对不同蛋白质适应度谱之间关联的理解,并为开发更好的机器学习辅助方法来指导蛋白质的定向进化提供启示。本文的透明同行评议过程记录包含在补充信息中。
{"title":"Learning protein fitness landscapes with deep mutational scanning data from multiple sources.","authors":"Lin Chen,&nbsp;Zehong Zhang,&nbsp;Zhenghao Li,&nbsp;Rui Li,&nbsp;Ruifeng Huo,&nbsp;Lifan Chen,&nbsp;Dingyan Wang,&nbsp;Xiaomin Luo,&nbsp;Kaixian Chen,&nbsp;Cangsong Liao,&nbsp;Mingyue Zheng","doi":"10.1016/j.cels.2023.07.003","DOIUrl":"https://doi.org/10.1016/j.cels.2023.07.003","url":null,"abstract":"<p><p>One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"706-721.e5"},"PeriodicalIF":9.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10046053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. PocketAnchor:学习基于结构的口袋表示,用于蛋白质-配体相互作用预测。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.05.005
Shuya Li, Tingzhong Tian, Ziting Zhang, Ziheng Zou, Dan Zhao, Jianyang Zeng

Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.

蛋白质-配体相互作用对细胞活动和药物发现过程至关重要。适当和有效地表示蛋白质特征对于开发用于预测蛋白质-配体相互作用的计算方法,特别是数据驱动方法至关重要。然而,现有的方法可能无法充分研究蛋白质口袋中配体占据区域的特征。在这里,我们设计了一种基于结构的蛋白质表示方法,命名为PocketAnchor,用于捕获蛋白质口袋的局部环境和空间特征,以促进蛋白质-配体相互作用相关的学习任务。我们将“锚点”定义为到达空腔和位于蛋白质表面附近的探针点,并设计了一种特定的信息传递策略,用于从这些锚点附近的原子和表面收集局部信息。综合评价表明,我们的方法成功应用于口袋检测和结合亲和力预测,这表明我们基于锚定的方法可以提供有效的蛋白质特征表示,以提高蛋白质-配体相互作用的预测。
{"title":"PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction.","authors":"Shuya Li,&nbsp;Tingzhong Tian,&nbsp;Ziting Zhang,&nbsp;Ziheng Zou,&nbsp;Dan Zhao,&nbsp;Jianyang Zeng","doi":"10.1016/j.cels.2023.05.005","DOIUrl":"https://doi.org/10.1016/j.cels.2023.05.005","url":null,"abstract":"<p><p>Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define \"anchors\" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"692-705.e6"},"PeriodicalIF":9.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10401429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein engineering via sequence-performance mapping. 通过序列性能映射的蛋白质工程。
IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 Epub Date: 2023-07-25 DOI: 10.1016/j.cels.2023.06.009
Adam McConnell, Benjamin J Hackel

Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.

新的和改进的蛋白质的发现和进化为分子治疗、诊断和工业生物技术提供了力量。发现和进化都需要有效的筛选和有效的文库,尽管它们的挑战不同,因为分别缺乏或存在具有所需功能的初始蛋白质变体。大量的高通量实验和计算技术使高效的筛选能够识别高性能的蛋白质变体。在合作伙伴关系中,需要对序列空间进行知情搜索,以克服序列性能景观的巨大性、稀疏性和复杂性。在蛋白质工程的历史轨迹早期,这些元素与识别最具性能序列的不同方法相一致:从大型随机组合库中选择与合理的计算设计。这些观点的协同作用现已取得实质性进展。组合库的合理设计有助于序列空间的实验搜索,而高通量、高完整性的实验数据为计算设计提供了信息。在协作界面的核心,高效的蛋白质表征(而不仅仅是选择最佳变体)绘制了序列性能图。这种定量图谱阐明了蛋白质序列与性能之间的复杂关系,例如结合、催化效率、生物活性和可开发性,从而推进了基础蛋白质科学,促进了蛋白质的发现和进化。
{"title":"Protein engineering via sequence-performance mapping.","authors":"Adam McConnell, Benjamin J Hackel","doi":"10.1016/j.cels.2023.06.009","DOIUrl":"10.1016/j.cels.2023.06.009","url":null,"abstract":"<p><p>Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"656-666"},"PeriodicalIF":9.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10527434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10047733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein engineering of pores for separation, sensing, and sequencing. 用于分离、传感和测序的孔隙蛋白质工程。
IF 9.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2023-08-16 DOI: 10.1016/j.cels.2023.07.004
Laxmicharan Samineni, Bibek Acharya, Harekrushna Behera, Hyeonji Oh, Manish Kumar, Ratul Chowdhury

Proteins are critical to cellular function and survival. They are complex molecules with precise structures and chemistries, which allow them to serve diverse functions for maintaining overall cell homeostasis. Since the discovery of the first enzyme in 1833, a gamut of advanced experimental and computational tools has been developed and deployed for understanding protein structure and function. Recent studies have demonstrated the ability to redesign/alter natural proteins for applications in industrial processes of interest and to make customized, novel synthetic proteins in the laboratory through protein engineering. We comprehensively review the successes in engineering pore-forming proteins and correlate the amino acid-level biochemistry of different pore modification strategies to the intended applications limited to nucleotide/peptide sequencing, single-molecule sensing, and precise molecular separations.

蛋白质对细胞功能和存活至关重要。它们是复杂的分子,具有精确的结构和化学成分,这使它们能够发挥各种功能,维持细胞的整体稳态。自1833年发现第一种酶以来,一系列先进的实验和计算工具被开发出来,用于理解蛋白质的结构和功能。最近的研究已经证明了重新设计/改变天然蛋白质用于工业过程的能力,以及通过蛋白质工程在实验室中制造定制的新型合成蛋白质的能力。我们全面回顾了工程孔形成蛋白的成功,并将不同孔修饰策略的氨基酸水平生物化学与仅限于核苷酸/肽测序,单分子传感和精确分子分离的预期应用相关联。
{"title":"Protein engineering of pores for separation, sensing, and sequencing.","authors":"Laxmicharan Samineni,&nbsp;Bibek Acharya,&nbsp;Harekrushna Behera,&nbsp;Hyeonji Oh,&nbsp;Manish Kumar,&nbsp;Ratul Chowdhury","doi":"10.1016/j.cels.2023.07.004","DOIUrl":"https://doi.org/10.1016/j.cels.2023.07.004","url":null,"abstract":"<p><p>Proteins are critical to cellular function and survival. They are complex molecules with precise structures and chemistries, which allow them to serve diverse functions for maintaining overall cell homeostasis. Since the discovery of the first enzyme in 1833, a gamut of advanced experimental and computational tools has been developed and deployed for understanding protein structure and function. Recent studies have demonstrated the ability to redesign/alter natural proteins for applications in industrial processes of interest and to make customized, novel synthetic proteins in the laboratory through protein engineering. We comprehensively review the successes in engineering pore-forming proteins and correlate the amino acid-level biochemistry of different pore modification strategies to the intended applications limited to nucleotide/peptide sequencing, single-molecule sensing, and precise molecular separations.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 8","pages":"676-691"},"PeriodicalIF":9.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10046052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cell Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1