Pub Date : 2024-09-18Epub Date: 2024-09-06DOI: 10.1016/j.cels.2024.08.005
Milton Pividori, Marylyn D Ritchie, Diego H Milone, Casey S Greene
Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper's transparent peer review process is included in the supplemental information.
{"title":"An efficient, not-only-linear correlation coefficient based on clustering.","authors":"Milton Pividori, Marylyn D Ritchie, Diego H Milone, Casey S Greene","doi":"10.1016/j.cels.2024.08.005","DOIUrl":"10.1016/j.cels.2024.08.005","url":null,"abstract":"<p><p>Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"854-868.e3"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18Epub Date: 2024-09-04DOI: 10.1016/j.cels.2024.08.004
Andre Neil Forbes, Duo Xu, Sandra Cohen, Priya Pancholi, Ekta Khurana
Most cancer types lack targeted therapeutic options, and when first-line targeted therapies are available, treatment resistance is a huge challenge. Recent technological advances enable the use of assay for transposase-accessible chromatin with sequencing (ATAC-seq) and RNA sequencing (RNA-seq) on patient tissue in a high-throughput manner. Here, we present a computational approach that leverages these datasets to identify drug targets based on tumor lineage. We constructed gene regulatory networks for 371 patients of 22 cancer types using machine learning approaches trained with three-dimensional genomic data for enhancer-to-promoter contacts. Next, we identified the key transcription factors (TFs) in these networks, which are used to find therapeutic vulnerabilities, by direct targeting of either TFs or the proteins that they interact with. We validated four candidates identified for neuroendocrine, liver, and renal cancers, which have a dismal prognosis with current therapeutic options.
{"title":"Discovery of therapeutic targets in cancer using chromatin accessibility and transcriptomic data.","authors":"Andre Neil Forbes, Duo Xu, Sandra Cohen, Priya Pancholi, Ekta Khurana","doi":"10.1016/j.cels.2024.08.004","DOIUrl":"10.1016/j.cels.2024.08.004","url":null,"abstract":"<p><p>Most cancer types lack targeted therapeutic options, and when first-line targeted therapies are available, treatment resistance is a huge challenge. Recent technological advances enable the use of assay for transposase-accessible chromatin with sequencing (ATAC-seq) and RNA sequencing (RNA-seq) on patient tissue in a high-throughput manner. Here, we present a computational approach that leverages these datasets to identify drug targets based on tumor lineage. We constructed gene regulatory networks for 371 patients of 22 cancer types using machine learning approaches trained with three-dimensional genomic data for enhancer-to-promoter contacts. Next, we identified the key transcription factors (TFs) in these networks, which are used to find therapeutic vulnerabilities, by direct targeting of either TFs or the proteins that they interact with. We validated four candidates identified for neuroendocrine, liver, and renal cancers, which have a dismal prognosis with current therapeutic options.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"824-837.e6"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11415227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The regulation of genes can be mathematically described by input-output functions that are typically assumed to be time invariant. This fundamental assumption underpins the design of synthetic gene circuits and the quantitative understanding of natural gene regulatory networks. Here, we found that this assumption is challenged in mammalian cells. We observed that a synthetic reporter gene can exhibit unexpected transcriptional memory, leading to a shift in the dose-response curve upon a second induction. Mechanistically, we investigated the cis-dependency of transcriptional memory, revealing the necessity of promoter DNA methylation in establishing memory. Furthermore, we showed that the synthetic transcription factor's effective DNA binding affinity underlies trans-dependency, which is associated with its capacity to undergo biomolecular condensation. These principles enabled modulating memory by perturbing either cis- or trans-regulation of genes. Together, our findings suggest the potential pervasiveness of transcriptional memory and implicate the need to model mammalian gene regulation with time-varying input-output functions. A record of this paper's transparent peer review process is included in the supplemental information.
基因的调控可以用输入-输出函数进行数学描述,这些函数通常被假定为时间不变。这一基本假设是设计合成基因回路和定量理解天然基因调控网络的基础。在这里,我们发现这一假设在哺乳动物细胞中受到了挑战。我们观察到,合成报告基因会表现出意想不到的转录记忆,导致剂量反应曲线在第二次诱导时发生移动。从机理上讲,我们研究了转录记忆的顺式依赖性,揭示了启动子 DNA 甲基化对建立记忆的必要性。此外,我们还发现合成转录因子的有效 DNA 结合亲和力是反式依赖性的基础,而反式依赖性与其进行生物分子缩聚的能力有关。这些原理使我们能够通过干扰基因的顺式或反式调控来调节记忆。总之,我们的研究结果表明转录记忆具有潜在的普遍性,并暗示了利用时变输入-输出功能来模拟哺乳动物基因调控的必要性。本文的同行评审过程透明,其记录见补充信息。
{"title":"Promoter DNA methylation and transcription factor condensation are linked to transcriptional memory in mammalian cells.","authors":"Shenqi Fan, Liang Ma, Chengzhi Song, Xu Han, Bijunyao Zhong, Yihan Lin","doi":"10.1016/j.cels.2024.08.007","DOIUrl":"10.1016/j.cels.2024.08.007","url":null,"abstract":"<p><p>The regulation of genes can be mathematically described by input-output functions that are typically assumed to be time invariant. This fundamental assumption underpins the design of synthetic gene circuits and the quantitative understanding of natural gene regulatory networks. Here, we found that this assumption is challenged in mammalian cells. We observed that a synthetic reporter gene can exhibit unexpected transcriptional memory, leading to a shift in the dose-response curve upon a second induction. Mechanistically, we investigated the cis-dependency of transcriptional memory, revealing the necessity of promoter DNA methylation in establishing memory. Furthermore, we showed that the synthetic transcription factor's effective DNA binding affinity underlies trans-dependency, which is associated with its capacity to undergo biomolecular condensation. These principles enabled modulating memory by perturbing either cis- or trans-regulation of genes. Together, our findings suggest the potential pervasiveness of transcriptional memory and implicate the need to model mammalian gene regulation with time-varying input-output functions. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"808-823.e6"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18Epub Date: 2024-09-04DOI: 10.1016/j.cels.2024.08.001
Elisa Gallo, Stefano De Renzis, James Sharpe, Roberto Mayor, Jonas Hartmann
The discovery of general principles underlying the complexity and diversity of cellular and developmental systems is a central and long-standing aim of biology. While new technologies collect data at an ever-accelerating rate, there is growing concern that conceptual progress is not keeping pace. We contend that this is due to a paucity of conceptual frameworks that support meaningful generalizations. This led us to develop the core and periphery (C&P) hypothesis, which posits that many biological systems can be decomposed into a highly versatile core with a large behavioral repertoire and a specific periphery that configures said core to perform one particular function. Versatile cores tend to be widely reused across biology, which confers generality to theories describing them. Here, we introduce this concept and describe examples at multiple scales, including Turing patterning, actomyosin dynamics, multi-cellular morphogenesis, and vertebrate gastrulation. We also sketch its evolutionary basis and discuss key implications and open questions. We propose that the C&P hypothesis could unlock new avenues of conceptual progress in mesoscale biology.
{"title":"Versatile system cores as a conceptual basis for generality in cell and developmental biology.","authors":"Elisa Gallo, Stefano De Renzis, James Sharpe, Roberto Mayor, Jonas Hartmann","doi":"10.1016/j.cels.2024.08.001","DOIUrl":"10.1016/j.cels.2024.08.001","url":null,"abstract":"<p><p>The discovery of general principles underlying the complexity and diversity of cellular and developmental systems is a central and long-standing aim of biology. While new technologies collect data at an ever-accelerating rate, there is growing concern that conceptual progress is not keeping pace. We contend that this is due to a paucity of conceptual frameworks that support meaningful generalizations. This led us to develop the core and periphery (C&P) hypothesis, which posits that many biological systems can be decomposed into a highly versatile core with a large behavioral repertoire and a specific periphery that configures said core to perform one particular function. Versatile cores tend to be widely reused across biology, which confers generality to theories describing them. Here, we introduce this concept and describe examples at multiple scales, including Turing patterning, actomyosin dynamics, multi-cellular morphogenesis, and vertebrate gastrulation. We also sketch its evolutionary basis and discuss key implications and open questions. We propose that the C&P hypothesis could unlock new avenues of conceptual progress in mesoscale biology.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"790-807"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18Epub Date: 2024-09-06DOI: 10.1016/j.cels.2024.08.006
Ruoqiao Chen, Jiayu Zhou, Bin Chen
Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.
{"title":"Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.","authors":"Ruoqiao Chen, Jiayu Zhou, Bin Chen","doi":"10.1016/j.cels.2024.08.006","DOIUrl":"10.1016/j.cels.2024.08.006","url":null,"abstract":"<p><p>Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"869-884.e6"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423933/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18Epub Date: 2024-09-04DOI: 10.1016/j.cels.2024.08.002
Chandana Gopalakrishnappa, Zeqian Li, Seppe Kuehn
Interactions between photosynthetic and heterotrophic microbes play a key role in global primary production. Understanding phototroph-heterotroph interactions remains challenging because these microbes reside in chemically complex environments. Here, we leverage a massively parallel droplet microfluidic platform that enables us to interrogate interactions between photosynthetic algae and heterotrophic bacteria in >100,000 communities across ∼525 environmental conditions with varying pH, carbon availability, and phosphorus availability. By developing a statistical framework to dissect interactions in this complex dataset, we reveal that the dependence of algae-bacteria interactions on nutrient availability is strongly modulated by pH and buffering capacity. Furthermore, we show that the chemical identity of the available organic carbon source controls how pH, buffering capacity, and nutrient availability modulate algae-bacteria interactions. Our study reveals the previously underappreciated role of pH in modulating phototroph-heterotroph interactions and provides a framework for thinking about interactions between phototrophs and heterotrophs in more natural contexts.
{"title":"Environmental modulators of algae-bacteria interactions at scale.","authors":"Chandana Gopalakrishnappa, Zeqian Li, Seppe Kuehn","doi":"10.1016/j.cels.2024.08.002","DOIUrl":"10.1016/j.cels.2024.08.002","url":null,"abstract":"<p><p>Interactions between photosynthetic and heterotrophic microbes play a key role in global primary production. Understanding phototroph-heterotroph interactions remains challenging because these microbes reside in chemically complex environments. Here, we leverage a massively parallel droplet microfluidic platform that enables us to interrogate interactions between photosynthetic algae and heterotrophic bacteria in >100,000 communities across ∼525 environmental conditions with varying pH, carbon availability, and phosphorus availability. By developing a statistical framework to dissect interactions in this complex dataset, we reveal that the dependence of algae-bacteria interactions on nutrient availability is strongly modulated by pH and buffering capacity. Furthermore, we show that the chemical identity of the available organic carbon source controls how pH, buffering capacity, and nutrient availability modulate algae-bacteria interactions. Our study reveals the previously underappreciated role of pH in modulating phototroph-heterotroph interactions and provides a framework for thinking about interactions between phototrophs and heterotrophs in more natural contexts.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"838-853.e13"},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11412779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1016/j.cels.2024.07.004
Zhixin Cyrillus Tan, Aaron S Meyer
Recent biological studies have been revolutionized in scale and granularity by multiplex and high-throughput assays. Profiling cell responses across several experimental parameters, such as perturbations, time, and genetic contexts, leads to richer and more generalizable findings. However, these multidimensional datasets necessitate a reevaluation of the conventional methods for their representation and analysis. Traditionally, experimental parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial experiment context reflected by the structure. As Marshall McLuhan famously stated, "the medium is the message." In this work, we propose that the experiment structure is the medium in which subsequent analysis is performed, and the optimal choice of data representation must reflect the experiment structure. We review how tensor-structured analyses and decompositions can preserve this information. We contend that tensor methods are poised to become integral to the biomedical data sciences toolkit.
{"title":"The structure is the message: Preserving experimental context through tensor decomposition.","authors":"Zhixin Cyrillus Tan, Aaron S Meyer","doi":"10.1016/j.cels.2024.07.004","DOIUrl":"10.1016/j.cels.2024.07.004","url":null,"abstract":"<p><p>Recent biological studies have been revolutionized in scale and granularity by multiplex and high-throughput assays. Profiling cell responses across several experimental parameters, such as perturbations, time, and genetic contexts, leads to richer and more generalizable findings. However, these multidimensional datasets necessitate a reevaluation of the conventional methods for their representation and analysis. Traditionally, experimental parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial experiment context reflected by the structure. As Marshall McLuhan famously stated, \"the medium is the message.\" In this work, we propose that the experiment structure is the medium in which subsequent analysis is performed, and the optimal choice of data representation must reflect the experiment structure. We review how tensor-structured analyses and decompositions can preserve this information. We contend that tensor methods are poised to become integral to the biomedical data sciences toolkit.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":"15 8","pages":"679-693"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11366223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142038014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1016/j.cels.2024.07.008
Matthew Smart, David F Moreno, Murat Acar
How do variations in nutrient levels influence cellular lifespan? A dynamical systems model of a core circuit involved in yeast aging suggests principles underlying lifespan extension observed at static and alternating glucose levels that are reminiscent of intermittent fasting regimens.
{"title":"Rationally reprogramming single-cell aging trajectories and lifespan through dynamic modulation of environmental inputs.","authors":"Matthew Smart, David F Moreno, Murat Acar","doi":"10.1016/j.cels.2024.07.008","DOIUrl":"https://doi.org/10.1016/j.cels.2024.07.008","url":null,"abstract":"<p><p>How do variations in nutrient levels influence cellular lifespan? A dynamical systems model of a core circuit involved in yeast aging suggests principles underlying lifespan extension observed at static and alternating glucose levels that are reminiscent of intermittent fasting regimens.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":"15 8","pages":"676-678"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142038013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21Epub Date: 2024-08-05DOI: 10.1016/j.cels.2024.07.005
Xinran Lian, Nikša Praljak, Subu K Subramanian, Sarah Wasinger, Rama Ranganathan, Andrew L Ferguson
Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts. Here, we examine the ability of generative models to produce synthetic versions of Src-homology 3 (SH3) domains that mediate signaling in the Sho1 osmotic stress response pathway of yeast. We show that a variational autoencoder (VAE) model produces artificial sequences that experimentally recapitulate the function of natural SH3 domains. More generally, the model organizes all fungal SH3 domains such that locality in the model latent space (but not simply locality in sequence space) enriches the design of synthetic orthologs and exposes non-obvious amino acid constraints distributed near and far from the SH3 ligand-binding site. The ability of generative models to design ortholog-like functions in vivo opens new avenues for engineering protein function in specific cellular contexts and environments.
{"title":"Deep-learning-based design of synthetic orthologs of SH3 signaling domains.","authors":"Xinran Lian, Nikša Praljak, Subu K Subramanian, Sarah Wasinger, Rama Ranganathan, Andrew L Ferguson","doi":"10.1016/j.cels.2024.07.005","DOIUrl":"10.1016/j.cels.2024.07.005","url":null,"abstract":"<p><p>Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts. Here, we examine the ability of generative models to produce synthetic versions of Src-homology 3 (SH3) domains that mediate signaling in the Sho1 osmotic stress response pathway of yeast. We show that a variational autoencoder (VAE) model produces artificial sequences that experimentally recapitulate the function of natural SH3 domains. More generally, the model organizes all fungal SH3 domains such that locality in the model latent space (but not simply locality in sequence space) enriches the design of synthetic orthologs and exposes non-obvious amino acid constraints distributed near and far from the SH3 ligand-binding site. The ability of generative models to design ortholog-like functions in vivo opens new avenues for engineering protein function in specific cellular contexts and environments.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"725-737.e7"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141899168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21Epub Date: 2024-08-07DOI: 10.1016/j.cels.2024.07.001
Alexander T F Bell, Jacob T Mitchell, Ashley L Kiemen, Melissa Lyman, Kohei Fujikura, Jae W Lee, Erin Coyne, Sarah M Shin, Sushma Nagaraj, Atul Deshpande, Pei-Hsun Wu, Dimitrios N Sidiropoulos, Rossin Erbe, Jacob Stern, Rena Chan, Stephen Williams, James M Chell, Lauren Ciotti, Jacquelyn W Zimmerman, Denis Wirtz, Won Jin Ho, Neeha Zaidi, Elizabeth Thompson, Elizabeth M Jaffee, Laura D Wood, Elana J Fertig, Luciane T Kagohara
This study introduces a new imaging, spatial transcriptomics (ST), and single-cell RNA-sequencing integration pipeline to characterize neoplastic cell state transitions during tumorigenesis. We applied a semi-supervised analysis pipeline to examine premalignant pancreatic intraepithelial neoplasias (PanINs) that can develop into pancreatic ductal adenocarcinoma (PDAC). Their strict diagnosis on formalin-fixed and paraffin-embedded (FFPE) samples limited the single-cell characterization of human PanINs within their microenvironment. We leverage whole transcriptome FFPE ST to enable the study of a rare cohort of matched low-grade (LG) and high-grade (HG) PanIN lesions to track progression and map cellular phenotypes relative to single-cell PDAC datasets. We demonstrate that cancer-associated fibroblasts (CAFs), including antigen-presenting CAFs, are located close to PanINs. We further observed a transition from CAF-related inflammatory signaling to cellular proliferation during PanIN progression. We validate these findings with single-cell high-dimensional imaging proteomics and transcriptomics technologies. Altogether, our semi-supervised learning framework for spatial multi-omics has broad applicability across cancer types to decipher the spatiotemporal dynamics of carcinogenesis.
{"title":"PanIN and CAF transitions in pancreatic carcinogenesis revealed with spatial data integration.","authors":"Alexander T F Bell, Jacob T Mitchell, Ashley L Kiemen, Melissa Lyman, Kohei Fujikura, Jae W Lee, Erin Coyne, Sarah M Shin, Sushma Nagaraj, Atul Deshpande, Pei-Hsun Wu, Dimitrios N Sidiropoulos, Rossin Erbe, Jacob Stern, Rena Chan, Stephen Williams, James M Chell, Lauren Ciotti, Jacquelyn W Zimmerman, Denis Wirtz, Won Jin Ho, Neeha Zaidi, Elizabeth Thompson, Elizabeth M Jaffee, Laura D Wood, Elana J Fertig, Luciane T Kagohara","doi":"10.1016/j.cels.2024.07.001","DOIUrl":"10.1016/j.cels.2024.07.001","url":null,"abstract":"<p><p>This study introduces a new imaging, spatial transcriptomics (ST), and single-cell RNA-sequencing integration pipeline to characterize neoplastic cell state transitions during tumorigenesis. We applied a semi-supervised analysis pipeline to examine premalignant pancreatic intraepithelial neoplasias (PanINs) that can develop into pancreatic ductal adenocarcinoma (PDAC). Their strict diagnosis on formalin-fixed and paraffin-embedded (FFPE) samples limited the single-cell characterization of human PanINs within their microenvironment. We leverage whole transcriptome FFPE ST to enable the study of a rare cohort of matched low-grade (LG) and high-grade (HG) PanIN lesions to track progression and map cellular phenotypes relative to single-cell PDAC datasets. We demonstrate that cancer-associated fibroblasts (CAFs), including antigen-presenting CAFs, are located close to PanINs. We further observed a transition from CAF-related inflammatory signaling to cellular proliferation during PanIN progression. We validate these findings with single-cell high-dimensional imaging proteomics and transcriptomics technologies. Altogether, our semi-supervised learning framework for spatial multi-omics has broad applicability across cancer types to decipher the spatiotemporal dynamics of carcinogenesis.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"753-769.e5"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11409191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}