Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.
空间转录组学(ST)正在推进我们对复杂组织和生物体的了解。然而,建立一个强大的聚类算法来定义单个组织切片中的空间一致性区域,并对来自不同来源的多个组织切片进行配准或整合,以进行必要的下游分析,这仍然具有挑战性。许多聚类、配准和整合方法都是利用空间信息专门为 ST 数据设计的。由于缺乏全面的基准研究,使得方法的选择和未来的方法开发变得更加复杂。在本研究中,我们利用各种不同规模、技术、物种和复杂程度的真实和模拟数据集,系统地对各种最先进的算法进行了基准测试。我们使用不同的定量和定性指标和分析方法来分析每种方法的优缺点,其中包括空间聚类精度和连续性、均匀流形近似和投影可视化、层间和点对点配准精度以及三维重建等八个指标,这些指标旨在评估方法性能和数据质量。用于评估的代码可在我们的 GitHub 上获取。此外,我们还提供在线笔记本教程和文档,以方便复制所有基准测试结果,并支持对新方法和新数据集的研究。通过分析,我们提出了涵盖多个方面的综合建议,帮助用户根据自己的具体需求选择最佳工具,并指导未来的方法开发。
{"title":"Benchmarking clustering, alignment, and integration methods for spatial transcriptomics","authors":"Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou","doi":"10.1186/s13059-024-03361-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03361-0","url":null,"abstract":"Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Base editing is a powerful tool for artificial evolution to create allelic diversity and improve agronomic traits. However, the great evolutionary potential for every sgRNA target has been overlooked. And there is currently no high-throughput method for generating and characterizing as many changes in a single target as possible based on large mutant pools to permit rapid gene directed evolution in plants. In this study, we establish an efficient germline-specific evolution system to screen beneficial alleles in Arabidopsis which could be applied for crop improvement. This system is based on a strong egg cell-specific cytosine base editor and the large seed production of Arabidopsis, which enables each T1 plant with unedited wild type alleles to produce thousands of independent T2 mutant lines. It has the ability of creating a wide range of mutant lines, including those containing atypical base substitutions, and as well providing a space- and labor-saving way to store and screen the resulting mutant libraries. Using this system, we efficiently generate herbicide-resistant EPSPS, ALS, and HPPD variants that could be used in crop breeding. Here, we demonstrate the significant potential of base editing-mediated artificial evolution for each sgRNA target and devised an efficient system for conducting deep evolution to harness this potential.
{"title":"Creating large-scale genetic diversity in Arabidopsis via base editing-mediated deep artificial evolution","authors":"Xiang Wang, Wenbo Pan, Chao Sun, Hong Yang, Zhentao Cheng, Fei Yan, Guojing Ma, Yun Shang, Rui Zhang, Caixia Gao, Lijing Liu, Huawei Zhang","doi":"10.1186/s13059-024-03358-9","DOIUrl":"https://doi.org/10.1186/s13059-024-03358-9","url":null,"abstract":"Base editing is a powerful tool for artificial evolution to create allelic diversity and improve agronomic traits. However, the great evolutionary potential for every sgRNA target has been overlooked. And there is currently no high-throughput method for generating and characterizing as many changes in a single target as possible based on large mutant pools to permit rapid gene directed evolution in plants. In this study, we establish an efficient germline-specific evolution system to screen beneficial alleles in Arabidopsis which could be applied for crop improvement. This system is based on a strong egg cell-specific cytosine base editor and the large seed production of Arabidopsis, which enables each T1 plant with unedited wild type alleles to produce thousands of independent T2 mutant lines. It has the ability of creating a wide range of mutant lines, including those containing atypical base substitutions, and as well providing a space- and labor-saving way to store and screen the resulting mutant libraries. Using this system, we efficiently generate herbicide-resistant EPSPS, ALS, and HPPD variants that could be used in crop breeding. Here, we demonstrate the significant potential of base editing-mediated artificial evolution for each sgRNA target and devised an efficient system for conducting deep evolution to harness this potential.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"367 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1186/s13059-024-03360-1
Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén
Analysis of clonal dynamics in human tissues is enabled by somatic genetic variation. Here, we show that analysis of mitochondrial mutations in single cells is dramatically improved in females when using X chromosome inactivation to select informative clonal mutations. Applying this strategy to human peripheral mononuclear blood cells reveals clonal structures within T cells that otherwise are blurred by non-informative mutations, including the separation of gamma-delta T cells, suggesting this approach can be used to decipher clonal dynamics of cells in human tissues.
体细胞基因变异有助于分析人体组织中的克隆动态。在这里,我们展示了当使用 X 染色体失活来选择有信息的克隆突变时,雌性单细胞中线粒体突变的分析得到了显著改善。将这一策略应用于人类外周单核血细胞,可以揭示 T 细胞内因非信息突变而模糊不清的克隆结构,包括γ-δ T 细胞的分离,这表明这种方法可用于解密人类组织中细胞的克隆动态。
{"title":"Inferring clonal somatic mutations directed by X chromosome inactivation status in single cells","authors":"Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén","doi":"10.1186/s13059-024-03360-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03360-1","url":null,"abstract":"Analysis of clonal dynamics in human tissues is enabled by somatic genetic variation. Here, we show that analysis of mitochondrial mutations in single cells is dramatically improved in females when using X chromosome inactivation to select informative clonal mutations. Applying this strategy to human peripheral mononuclear blood cells reveals clonal structures within T cells that otherwise are blurred by non-informative mutations, including the separation of gamma-delta T cells, suggesting this approach can be used to decipher clonal dynamics of cells in human tissues.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1186/s13059-024-03354-z
Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano
The Pharyngeal Endoderm (PE) is an extremely relevant developmental tissue, serving as the progenitor for the esophagus, parathyroids, thyroids, lungs, and thymus. While several studies have highlighted the importance of PE cells, a detailed transcriptional and epigenetic characterization of this important developmental stage is still missing, especially in humans, due to technical and ethical constraints pertaining to its early formation. Here we fill this knowledge gap by developing an in vitro protocol for the derivation of PE-like cells from human Embryonic Stem Cells (hESCs) and by providing an integrated multi-omics characterization. Our PE-like cells robustly express PE markers and are transcriptionally homogenous and similar to in vivo mouse PE cells. In addition, we define their epigenetic landscape and dynamic changes in response to Retinoic Acid by combining ATAC-Seq and ChIP-Seq of histone modifications. The integration of multiple high-throughput datasets leads to the identification of new putative regulatory regions and to the inference of a Retinoic Acid-centered transcription factor network orchestrating the development of PE-like cells. By combining hESCs differentiation with computational genomics, our work reveals the epigenetic dynamics that occur during human PE differentiation, providing a solid resource and foundation for research focused on the development of PE derivatives and the modeling of their developmental defects in genetic syndromes.
咽部内胚层(PE)是一种极其重要的发育组织,是食道、甲状旁腺、甲状腺、肺和胸腺的祖细胞。虽然一些研究强调了 PE 细胞的重要性,但由于其早期形成的技术和伦理限制,对这一重要发育阶段的详细转录和表观遗传特征描述仍然缺失,尤其是在人类中。在这里,我们开发了一种从人类胚胎干细胞(hESCs)衍生 PE 样细胞的体外方案,并提供了综合的多组学表征,从而填补了这一知识空白。我们的 PE 样细胞能强有力地表达 PE 标记,并且转录同源,与体内小鼠 PE 细胞相似。此外,我们还结合组蛋白修饰的 ATAC-Seq 和 ChIP-Seq,确定了它们的表观遗传格局以及对维甲酸反应的动态变化。通过整合多个高通量数据集,我们发现了新的假定调控区域,并推断出了一个以视黄酸为中心的转录因子网络,该网络协调着类PE细胞的发育。通过将 hESCs 分化与计算基因组学相结合,我们的工作揭示了人类 PE 分化过程中发生的表观遗传动态,为重点研究 PE 衍生物的发展及其遗传综合征发育缺陷的建模提供了坚实的资源和基础。
{"title":"Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm","authors":"Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano","doi":"10.1186/s13059-024-03354-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03354-z","url":null,"abstract":"The Pharyngeal Endoderm (PE) is an extremely relevant developmental tissue, serving as the progenitor for the esophagus, parathyroids, thyroids, lungs, and thymus. While several studies have highlighted the importance of PE cells, a detailed transcriptional and epigenetic characterization of this important developmental stage is still missing, especially in humans, due to technical and ethical constraints pertaining to its early formation. Here we fill this knowledge gap by developing an in vitro protocol for the derivation of PE-like cells from human Embryonic Stem Cells (hESCs) and by providing an integrated multi-omics characterization. Our PE-like cells robustly express PE markers and are transcriptionally homogenous and similar to in vivo mouse PE cells. In addition, we define their epigenetic landscape and dynamic changes in response to Retinoic Acid by combining ATAC-Seq and ChIP-Seq of histone modifications. The integration of multiple high-throughput datasets leads to the identification of new putative regulatory regions and to the inference of a Retinoic Acid-centered transcription factor network orchestrating the development of PE-like cells. By combining hESCs differentiation with computational genomics, our work reveals the epigenetic dynamics that occur during human PE differentiation, providing a solid resource and foundation for research focused on the development of PE derivatives and the modeling of their developmental defects in genetic syndromes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"52 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141904320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1186/s13059-024-03340-5
Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval
Microsatellite instability (MSI) due to mismatch repair deficiency (dMMR) is common in colorectal cancer (CRC). These cancers are associated with somatic coding events, but the noncoding pathophysiological impact of this genomic instability is yet poorly understood. Here, we perform an analysis of coding and noncoding MSI events at the different steps of colorectal tumorigenesis using whole exome sequencing and search for associated splicing events via RNA sequencing at the bulk-tumor and single-cell levels. Our results demonstrate that MSI leads to hundreds of noncoding DNA mutations, notably at polypyrimidine U2AF RNA-binding sites which are endowed with cis-activity in splicing, while higher frequency of exon skipping events are observed in the mRNAs of MSI compared to non-MSI CRC. At the DNA level, these noncoding MSI mutations occur very early prior to cell transformation in the dMMR colonic crypt, accounting for only a fraction of the exon skipping in MSI CRC. At the RNA level, the aberrant exon skipping signature is likely to impair colonic cell differentiation in MSI CRC affecting the expression of alternative exons encoding protein isoforms governing cell fate, while also targeting constitutive exons, making dMMR cells immunogenic in early stage before the onset of coding mutations. This signature is characterized by its similarity to the oncogenic U2AF1-S34F splicing mutation observed in several other non-MSI cancer. Overall, these findings provide evidence that a very early RNA splicing signature partly driven by MSI impairs cell differentiation and promotes MSI CRC initiation, far before coding mutations which accumulate later during MSI tumorigenesis.
{"title":"Microsatellite instability at U2AF-binding polypyrimidic tract sites perturbs alternative splicing during colorectal cancer initiation","authors":"Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval","doi":"10.1186/s13059-024-03340-5","DOIUrl":"https://doi.org/10.1186/s13059-024-03340-5","url":null,"abstract":"Microsatellite instability (MSI) due to mismatch repair deficiency (dMMR) is common in colorectal cancer (CRC). These cancers are associated with somatic coding events, but the noncoding pathophysiological impact of this genomic instability is yet poorly understood. Here, we perform an analysis of coding and noncoding MSI events at the different steps of colorectal tumorigenesis using whole exome sequencing and search for associated splicing events via RNA sequencing at the bulk-tumor and single-cell levels. Our results demonstrate that MSI leads to hundreds of noncoding DNA mutations, notably at polypyrimidine U2AF RNA-binding sites which are endowed with cis-activity in splicing, while higher frequency of exon skipping events are observed in the mRNAs of MSI compared to non-MSI CRC. At the DNA level, these noncoding MSI mutations occur very early prior to cell transformation in the dMMR colonic crypt, accounting for only a fraction of the exon skipping in MSI CRC. At the RNA level, the aberrant exon skipping signature is likely to impair colonic cell differentiation in MSI CRC affecting the expression of alternative exons encoding protein isoforms governing cell fate, while also targeting constitutive exons, making dMMR cells immunogenic in early stage before the onset of coding mutations. This signature is characterized by its similarity to the oncogenic U2AF1-S34F splicing mutation observed in several other non-MSI cancer. Overall, these findings provide evidence that a very early RNA splicing signature partly driven by MSI impairs cell differentiation and promotes MSI CRC initiation, far before coding mutations which accumulate later during MSI tumorigenesis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"38 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1186/s13059-024-03352-1
Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore
Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER—Functional SNV IdeNtification using DNase footprints and eRNA. We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.
{"title":"DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants","authors":"Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore","doi":"10.1186/s13059-024-03352-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03352-1","url":null,"abstract":"Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER—Functional SNV IdeNtification using DNase footprints and eRNA. We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"44 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1186/s13059-024-03362-z
Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao
Pangenome inference is an indispensable step in bacterial genomics, yet its scalability poses a challenge due to the rapid growth of genomic collections. This paper presents PanTA, a software package designed for constructing pangenomes of large bacterial datasets, showing unprecedented efficiency levels multiple times higher than existing tools. PanTA introduces a novel mechanism to construct the pangenome progressively without rebuilding the accumulated collection from scratch. The progressive mode is shown to consume orders of magnitude less computational resources than existing solutions in managing growing datasets. The software is open source and is publicly available at https://github.com/amromics/panta and at 10.6084/m9.figshare.23724705 .
{"title":"Efficient inference of large prokaryotic pangenomes with PanTA","authors":"Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao","doi":"10.1186/s13059-024-03362-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03362-z","url":null,"abstract":"Pangenome inference is an indispensable step in bacterial genomics, yet its scalability poses a challenge due to the rapid growth of genomic collections. This paper presents PanTA, a software package designed for constructing pangenomes of large bacterial datasets, showing unprecedented efficiency levels multiple times higher than existing tools. PanTA introduces a novel mechanism to construct the pangenome progressively without rebuilding the accumulated collection from scratch. The progressive mode is shown to consume orders of magnitude less computational resources than existing solutions in managing growing datasets. The software is open source and is publicly available at https://github.com/amromics/panta and at 10.6084/m9.figshare.23724705 .","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1186/s13059-024-03353-0
Yawei Li, Yuan Luo
Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression profiles from scRNA-seq and spatial localization from ST data for deconvolution. Extensive benchmarking on multiple datasets demonstrates that STdGCN outperforms 17 state-of-the-art models. In a human breast cancer Visium dataset, STdGCN delineates stroma, lymphocytes, and cancer cells, aiding tumor microenvironment analysis. In human heart ST data, STdGCN identifies changes in endothelial-cardiomyocyte communications during tissue development.
空间分辨转录组学将高通量转录组测量与保留的空间细胞组织信息相结合。然而,许多技术无法达到单细胞分辨率。我们提出了 STdGCN,这是一种利用单细胞 RNA 测序(scRNA-seq)作为空间转录组(ST)数据中细胞类型解卷积参考的图模型。STdGCN 结合了 scRNA-seq 的表达谱和 ST 数据的空间定位来进行解卷积。在多个数据集上进行的广泛基准测试表明,STdGCN 优于 17 种最先进的模型。在人类乳腺癌 Visium 数据集中,STdGCN 划分了基质、淋巴细胞和癌细胞,有助于肿瘤微环境分析。在人体心脏 ST 数据中,STdGCN 可识别组织发育过程中内皮细胞-心肌细胞通信的变化。
{"title":"STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks","authors":"Yawei Li, Yuan Luo","doi":"10.1186/s13059-024-03353-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03353-0","url":null,"abstract":"Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression profiles from scRNA-seq and spatial localization from ST data for deconvolution. Extensive benchmarking on multiple datasets demonstrates that STdGCN outperforms 17 state-of-the-art models. In a human breast cancer Visium dataset, STdGCN delineates stroma, lymphocytes, and cancer cells, aiding tumor microenvironment analysis. In human heart ST data, STdGCN identifies changes in endothelial-cardiomyocyte communications during tissue development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"18 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.
{"title":"scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data","authors":"Xiyue Cao, Yu-An Huang, Zhu-Hong You, Xuequn Shang, Lun Hu, Peng-Wei Hu, Zhi-An Huang","doi":"10.1186/s13059-024-03357-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03357-w","url":null,"abstract":"Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1186/s13059-024-03335-2
Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis
A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models―Enformer and Sei―varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax―through single-task learning or high capacity multi-task models―can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.
{"title":"Current genomic deep learning models display decreased performance in cell type-specific accessible regions","authors":"Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis","doi":"10.1186/s13059-024-03335-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03335-2","url":null,"abstract":"A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models―Enformer and Sei―varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax―through single-task learning or high capacity multi-task models―can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"37 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}