首页 > 最新文献

GigaScience最新文献

英文 中文
An ecosystem for producing and sharing metadata within the web of FAIR Data. 一个在FAIR数据网络中生成和共享元数据的生态系统。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae111
Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil

Background: Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.

Results: We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).

Conclusion: The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.

背景:描述性元数据对于报告、发现、利用和动员研究数据集至关重要。然而,将元数据问题作为数据管理计划的一部分来解决,对于数据生产者来说可能会很复杂。为了组织和记录数据,必须创建各种描述性元数据。此外,在共享数据时,重要的是确保元数据的互操作性符合FAIR(可查找、可访问、可互操作、可重用)原则。考虑到这些挑战的实际性质,需要能够有效地帮助数据管理人员的管理工具。此外,这些工具应满足数据生产者的需要,便于使用,只需要很少的培训。结果:我们开发了Maggot (Metadata Aggregation on Data Storage),这是一个基于web的工具,可以使用高级元数据在本地管理数据目录。主要目标是方便数据传播和存储在数据存储库中。使用Maggot,用户可以轻松地生成高级元数据并将其附加到数据集,从而在协作环境中实现无缝共享。这种方法与许多数据管理计划相一致,因为它有效地解决了与数据组织、文档、存储和元数据共享相关的挑战,这些挑战基于协作组内部和外部的FAIR原则。此外,Maggot支持元数据交叉(即,生成的元数据可以转换为特定数据存储库使用的模式,或者使用适合第三方应用程序收集数据的格式导出)。结论:Maggot的主要目的是使用精心选择的模式和标准来简化高级元数据的收集。此外,它简化了通过元数据的数据可访问性,这通常是公共资助项目的需求。因此,可以利用Maggot促进有效的本地管理,以促进数据共享,同时遵守公平原则。此外,它还有助于在欧洲开放科学云框架内准备未来的EOSC FAIR数据网络。
{"title":"An ecosystem for producing and sharing metadata within the web of FAIR Data.","authors":"Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil","doi":"10.1093/gigascience/giae111","DOIUrl":"https://doi.org/10.1093/gigascience/giae111","url":null,"abstract":"<p><strong>Background: </strong>Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.</p><p><strong>Results: </strong>We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).</p><p><strong>Conclusion: </strong>The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality assembly revealing the PMEL gene for the unique plumage phenotype in Liancheng ducks. 高质量的基因组装揭示了连城鸭独特羽色表型的 PMEL 基因。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae114
Zhen Wang, Zhanbao Guo, Hongfei Liu, Tong Liu, Dapeng Liu, Simeng Yu, Hehe Tang, He Zhang, Qiming Mou, Bo Zhang, Junting Cao, Martine Schroyen, Shuisheng Hou, Zhengkui Zhou

Background: Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.

Results: In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.

Conclusions: Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.

背景:羽毛的颜色是鸭子的一个显著特征,连城鸭以其白色的羽毛、黑色的喙和蹼足为特征,是一个很好的研究对象。然而,学术界对鸭羽毛颜色的遗传机制的理解仍然有限。为此,利用HiFi reads对连城鸭基因组(GCA_039998735.1)进行从头组装,并从连城鸭和北京鸭中获得F2个分离群体。目的是探讨连城鸭白羽的遗传机制。结果:本研究共组装了1.29 Gb连城鸭基因组,其中N50为12.17 Mb,支架N50为83.98 Mb。除了MITF基因的上位性作用外,全基因组关联分析确定了包含PMEL基因的0.8 Mb基因组区域。该基因编码了一种色素细胞特有的蛋白质,对黑色素小体(负责色素沉着的细胞器)内纤维片的形成至关重要。此外,连锁不平衡分析还发现了2个候选单核苷酸多态性(Chr33: 5,303,994A>G;5,303,997A>G)可能改变PMEL转录,可能影响连城鸭的羽毛颜色。结论:本研究为连城鸭构建了一个高质量的基因组,并提供了令人信服的证据,证明该品种的白色羽毛特征可归因于PMEL基因。总的来说,这些发现为未来的研究和育种计划提供了重要的见解和方向,旨在了解和操纵鸟类的羽毛颜色。
{"title":"A high-quality assembly revealing the PMEL gene for the unique plumage phenotype in Liancheng ducks.","authors":"Zhen Wang, Zhanbao Guo, Hongfei Liu, Tong Liu, Dapeng Liu, Simeng Yu, Hehe Tang, He Zhang, Qiming Mou, Bo Zhang, Junting Cao, Martine Schroyen, Shuisheng Hou, Zhengkui Zhou","doi":"10.1093/gigascience/giae114","DOIUrl":"10.1093/gigascience/giae114","url":null,"abstract":"<p><strong>Background: </strong>Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.</p><p><strong>Results: </strong>In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.</p><p><strong>Conclusions: </strong>Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutation impact on mRNA versus protein expression across human cancers. 突变对人类癌症中mRNA和蛋白质表达的影响。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae113
Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang

Background: Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.

Results: We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely "long-tail" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.

Conclusion: This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.

背景:通常认为癌症突变会改变蛋白质,从而促进肿瘤的发生。然而,除了基因表达外,突变是如何影响蛋白质表达的,很少有系统的研究。这一点很重要,因为mRNA和蛋白质水平在翻译效率和蛋白质降解等因素的驱动下,往往只表现出适度的相关性。来自大型肿瘤队列的蛋白质基因组数据集为系统分析体细胞突变对mRNA和蛋白质丰度的影响提供了机会,并确定了对这些分子水平有不同影响的突变。结果:我们通过配对基因组学和全球蛋白质组学分析,对6种癌症类型的953例癌症病例的mRNA和蛋白质水平表达进行了全面分析。47.2%的体细胞表达数量性状位点(seQTLs)受到蛋白水平的影响,包括CDH1和MSH3截断,以及其他可能来自“长尾”驱动基因的突变。设计鉴定体细胞蛋白特异性QTLs (spsQTLs)的统计管道,我们揭示了几种基因突变,包括NF1和MAP2K4截断和TP53错义,它们对蛋白质丰度的影响不成比例,无法用转录组学解释。通过大规模平行变异效应分析(MAVE)的数据交叉验证,与高肿瘤TP53蛋白相关的TP53错感更有可能在实验上被证实是功能性的。结论:本研究揭示了体细胞突变对mRNA和蛋白水平的影响,强调了整合蛋白质基因组学数据以综合识别功能显著的癌症突变的必要性。这些见解为进一步的功能验证和治疗靶向提供了一个优先考虑突变的框架。
{"title":"Mutation impact on mRNA versus protein expression across human cancers.","authors":"Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang","doi":"10.1093/gigascience/giae113","DOIUrl":"10.1093/gigascience/giae113","url":null,"abstract":"<p><strong>Background: </strong>Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.</p><p><strong>Results: </strong>We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely \"long-tail\" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.</p><p><strong>Conclusion: </strong>This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq. 银河作为生物信息学的门户:多界面银河实践培训套件(MIGHTS)用于scRNA-seq。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae107
Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon

Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians. Single-cell analysis is in great demand from biologists and biomedical scientists, as evidenced by the proliferation of training events, materials, and collaborative global efforts like the Human Cell Atlas. However, iterative analyses lacking reinstantiation, coupled with unstandardized pipelines, have made effective single-cell training a moving target.

Findings: To address these challenges, we present a Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for single-cell RNA sequencing (scRNA-seq) analysis, which offers parallel analytical methods using a graphical interface (buttons) or code. With clear, interoperable materials, MIGHTS facilitates smooth transitions between environments. Bridging the biologist-programmer gap, MIGHTS emphasizes interdisciplinary communication for effective learning at all levels. Real-world data analysis in MIGHTS promotes critical thinking and best practices, while FAIR data principles ensure validation of results. MIGHTS is freely available, hosted on the Galaxy Training Network, and leverages Galaxy interfaces for analyses in both settings. Given the ongoing popularity of Python-based (Scanpy) and R-based (Seurat & Monocle) scRNA-seq analyses, MIGHTS enables analyses using both.

Conclusions: MIGHTS consists of 11 tutorials, including recordings, slide decks, and interactive visualizations, and a demonstrated track record of sustainability via regular updates and community collaborations. Parallel pathways in MIGHTS enable concurrent training of scientists at any programming level, addressing the heterogeneous needs of novice bioinformaticians.

背景:生物信息学是生物医学科学的基础,但对生物学家和临床医生来说,掌握它是一个陡峭的学习曲线。在分析数据的同时学习编码是很困难的。通过将这两个方面分开,并为崭露头角的生物信息学家提供中间步骤,曲线可能会变得平坦。生物学家和生物医学科学家对单细胞分析的需求很大,培训活动、材料和人类细胞图谱等全球合作努力的激增证明了这一点。然而,缺乏重新建立的迭代分析,加上不标准化的管道,使得有效的单细胞训练成为一个移动的目标。为了解决这些挑战,我们提出了用于单细胞RNA测序(scRNA-seq)分析的多界面Galaxy动手训练套件(might),它提供了使用图形界面(按钮)或代码的并行分析方法。凭借清晰、可互操作的材料,might促进了环境之间的平稳过渡。为了弥合生物学家和程序员之间的鸿沟,梅茨强调跨学科的交流,以便在各个层次上有效地学习。真实世界的数据分析在might促进批判性思维和最佳实践,而公平数据原则确保结果的验证。MIGHTS是免费提供的,托管在银河训练网络上,并利用银河接口进行两种设置的分析。考虑到基于python (Scanpy)和基于r (Seurat & Monocle)的scRNA-seq分析的持续流行,might可以同时使用这两种分析方法。结论:MIGHTS由11个教程组成,包括录音、幻灯片和交互式可视化,并通过定期更新和社区合作展示了可持续发展的记录。并行路径可能使科学家在任何编程水平的同时培训,解决新手生物信息学家的异质需求。
{"title":"Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq.","authors":"Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon","doi":"10.1093/gigascience/giae107","DOIUrl":"10.1093/gigascience/giae107","url":null,"abstract":"<p><strong>Background: </strong>Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians. Single-cell analysis is in great demand from biologists and biomedical scientists, as evidenced by the proliferation of training events, materials, and collaborative global efforts like the Human Cell Atlas. However, iterative analyses lacking reinstantiation, coupled with unstandardized pipelines, have made effective single-cell training a moving target.</p><p><strong>Findings: </strong>To address these challenges, we present a Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for single-cell RNA sequencing (scRNA-seq) analysis, which offers parallel analytical methods using a graphical interface (buttons) or code. With clear, interoperable materials, MIGHTS facilitates smooth transitions between environments. Bridging the biologist-programmer gap, MIGHTS emphasizes interdisciplinary communication for effective learning at all levels. Real-world data analysis in MIGHTS promotes critical thinking and best practices, while FAIR data principles ensure validation of results. MIGHTS is freely available, hosted on the Galaxy Training Network, and leverages Galaxy interfaces for analyses in both settings. Given the ongoing popularity of Python-based (Scanpy) and R-based (Seurat & Monocle) scRNA-seq analyses, MIGHTS enables analyses using both.</p><p><strong>Conclusions: </strong>MIGHTS consists of 11 tutorials, including recordings, slide decks, and interactive visualizations, and a demonstrated track record of sustainability via regular updates and community collaborations. Parallel pathways in MIGHTS enable concurrent training of scientists at any programming level, addressing the heterogeneous needs of novice bioinformaticians.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering. 基于知识图的思想:面向泛癌症问答的知识图增强LLM框架。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae082
Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li

Background: In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.

Results: We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.

Conclusions: The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.

背景:近年来,大型语言模型(llm)在各个领域显示出前景,特别是在生物医学科学领域。然而,它们在现实世界中的应用常常受到诸如错误输出和幻觉反应等问题的限制。结果:我们开发了基于知识图的思维(KGT)框架,这是一个创新的解决方案,将法学硕士与知识图(KGs)集成在一起,通过利用知识图中的可验证信息来改善他们的初始反应,从而显著减少推理中的事实错误。KGT框架具有很强的适应性,并且在各种开源llm中表现良好。值得注意的是,KGT可以通过潜在的药物-癌症关联来促进现有药物的新用途的发现,并可以通过分析相关的生物标志物和遗传机制来帮助预测耐药性。为了评估生物医学领域知识图谱问答任务,我们利用泛癌症知识图谱开发了泛癌症问答基准,命名为泛癌症问答。结论:KGT框架大大提高了llm在生物医学领域的准确性和实用性。本研究作为概念的证明,展示了其在生物医学问题回答方面的卓越表现。
{"title":"Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering.","authors":"Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li","doi":"10.1093/gigascience/giae082","DOIUrl":"10.1093/gigascience/giae082","url":null,"abstract":"<p><strong>Background: </strong>In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.</p><p><strong>Results: </strong>We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.</p><p><strong>Conclusions: </strong>The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A chromosome-scale genome assembly of the pioneer plant Stylosanthes angustifolia: insights into genome evolution and drought adaptation.
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae118
Chun Liu, Jianyu Zhang, Ranran Xu, Jinhui Lv, Zhu Qiao, Mingzhou Bai, Shancen Zhao, Lijuan Luo, Guodao Liu, Pandao Liu

Background: Drought is a major limiting factor for plant survival and crop productivity. Stylosanthes angustifolia, a pioneer plant, exhibits remarkable drought tolerance, yet the molecular mechanisms driving its drought resistance remain largely unexplored.

Results: We present a chromosome-scale reference genome of S. angustifolia, which provides insights into its genome evolution and drought tolerance mechanisms. The assembled genome is 645.88 Mb in size, containing 319.98 Mb of repetitive sequences and 36,857 protein-coding genes. The high quality of this genome assembly is demonstrated by the presence of 99.26% BUSCO and a 19.49 long terminal repeat assembly index. Evolutionary analyses revealed that S. angustifolia shares a whole-genome duplication (WGD) event with other legumes but lacks recent WGD. Additionally, S. angustifolia has undergone gene expansion through tandem duplication approximately 12.31 million years ago. Through integrative multiomics analyses, we identified 4 gene families-namely, xanthoxin dehydrogenase, 2-hydroxyisoflavanone dehydratase, patatin-related phospholipase A, and stachyose synthetase-that underwent tandem duplication and were significantly upregulated under drought stress. These gene families contribute to the biosynthesis of abscisic acid, genistein, daidzein, jasmonic acid, and stachyose, thereby enhancing drought tolerance.

Conclusions: The genome assembly of S. angustifolia represents a significant advancement in understanding the genetic mechanisms underlying drought tolerance in this pioneer plant species. This genomic resource provides critical insights into the evolution of drought resistance and offers valuable genetic information for breeding programs aimed at improving drought resistance in crops.

{"title":"A chromosome-scale genome assembly of the pioneer plant Stylosanthes angustifolia: insights into genome evolution and drought adaptation.","authors":"Chun Liu, Jianyu Zhang, Ranran Xu, Jinhui Lv, Zhu Qiao, Mingzhou Bai, Shancen Zhao, Lijuan Luo, Guodao Liu, Pandao Liu","doi":"10.1093/gigascience/giae118","DOIUrl":"https://doi.org/10.1093/gigascience/giae118","url":null,"abstract":"<p><strong>Background: </strong>Drought is a major limiting factor for plant survival and crop productivity. Stylosanthes angustifolia, a pioneer plant, exhibits remarkable drought tolerance, yet the molecular mechanisms driving its drought resistance remain largely unexplored.</p><p><strong>Results: </strong>We present a chromosome-scale reference genome of S. angustifolia, which provides insights into its genome evolution and drought tolerance mechanisms. The assembled genome is 645.88 Mb in size, containing 319.98 Mb of repetitive sequences and 36,857 protein-coding genes. The high quality of this genome assembly is demonstrated by the presence of 99.26% BUSCO and a 19.49 long terminal repeat assembly index. Evolutionary analyses revealed that S. angustifolia shares a whole-genome duplication (WGD) event with other legumes but lacks recent WGD. Additionally, S. angustifolia has undergone gene expansion through tandem duplication approximately 12.31 million years ago. Through integrative multiomics analyses, we identified 4 gene families-namely, xanthoxin dehydrogenase, 2-hydroxyisoflavanone dehydratase, patatin-related phospholipase A, and stachyose synthetase-that underwent tandem duplication and were significantly upregulated under drought stress. These gene families contribute to the biosynthesis of abscisic acid, genistein, daidzein, jasmonic acid, and stachyose, thereby enhancing drought tolerance.</p><p><strong>Conclusions: </strong>The genome assembly of S. angustifolia represents a significant advancement in understanding the genetic mechanisms underlying drought tolerance in this pioneer plant species. This genomic resource provides critical insights into the evolution of drought resistance and offers valuable genetic information for breeding programs aimed at improving drought resistance in crops.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143032998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis. 254条亚洲冰鱼的端粒-端粒基因组和重测序揭示了其进化、基因组足迹。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae115
Yanfeng Zhou, Chenhe Wang, Binhu Wang, Dongpo Xu, Xizhao Zhang, You Ge, Shulun Jiang, Fujiang Tang, Chunhai Chen, Xuemei Li, Jianbo Jian, Yang You

The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data. The telomere was identified in both ends of the contig/chromosome. The expanded gene associated with circadian entrainment suggests that P. chinensis may exhibit a high sensitivity to photoperiod. The contracted genes' immune-related families and DNA repair associated with positive selection in P. chinensis suggested the selection pressure during adaptive evolution. The population genetic analysis reported the genetic diversity and genomic footprints in 254 individuals from 8 different locations. The natural seawater samples can be the highest diversity and different from other freshwater and introduced populations. The divergent regions' associated genes were found to be related to the osmotic pressure system, suggesting adaptations to alkalinity and salinity. Thus, the T2T genome and genetic variation can be valuable resources for genomic footprints in P. chinensis, shedding light on its evolution, comparative genomics, and the genetic differences between natural and introduced populations.

由于其生态和生理意义以及在渔业资源中的经济重要性,亚洲冰鱼(Protosalanx chinensis)在中国各水域经历了数十年的广泛定植。在这里,我们结合PacBio HiFi长读取和超长ONT(纳米孔)读取和Hi-C数据解码了p.c chinensis的端粒到端粒(T2T)基因组。端粒位于染色体的两端。与昼夜节律夹带相关的扩增基因表明,中华水杨可能对光周期具有高度敏感性。收缩基因的免疫相关家族和与正选择相关的DNA修复表明,中华水杨在适应进化过程中存在选择压力。群体遗传分析报告了来自8个不同地点的254个个体的遗传多样性和基因组足迹。天然海水样本的多样性最高,不同于其他淡水种群和引进种群。分化区的相关基因被发现与渗透压系统有关,表明对碱度和盐度的适应。因此,T2T基因组和遗传变异可作为中国猿猴基因组足迹的宝贵资源,有助于揭示其进化、比较基因组学以及自然种群和引种种群之间的遗传差异。
{"title":"Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis.","authors":"Yanfeng Zhou, Chenhe Wang, Binhu Wang, Dongpo Xu, Xizhao Zhang, You Ge, Shulun Jiang, Fujiang Tang, Chunhai Chen, Xuemei Li, Jianbo Jian, Yang You","doi":"10.1093/gigascience/giae115","DOIUrl":"10.1093/gigascience/giae115","url":null,"abstract":"<p><p>The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data. The telomere was identified in both ends of the contig/chromosome. The expanded gene associated with circadian entrainment suggests that P. chinensis may exhibit a high sensitivity to photoperiod. The contracted genes' immune-related families and DNA repair associated with positive selection in P. chinensis suggested the selection pressure during adaptive evolution. The population genetic analysis reported the genetic diversity and genomic footprints in 254 individuals from 8 different locations. The natural seawater samples can be the highest diversity and different from other freshwater and introduced populations. The divergent regions' associated genes were found to be related to the osmotic pressure system, suggesting adaptations to alkalinity and salinity. Thus, the T2T genome and genetic variation can be valuable resources for genomic footprints in P. chinensis, shedding light on its evolution, comparative genomics, and the genetic differences between natural and introduced populations.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data reuse in agricultural genomics research: challenges and recommendations. 农业基因组学研究中的数据重用:挑战与建议。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae106
Alenka Hafner, Victoria DeLeo, Cecilia H Deng, Christine G Elsik, Damarius S Fleming, Peter W Harrison, Theodore S Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H Quezada-Rodríguez, Christopher K Tuggle, James E Koltes

The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.

长期以来,科学界一直受益于数据再利用带来的机遇。认识到有必要找出农业研究界在数据再利用方面面临的挑战和瓶颈,并提出解决方案,因此在 AgBioData 联盟框架内成立了数据再利用工作组。在此,我们明确了数据标准的局限性、元数据的缺陷、数据互操作性、数据所有权、数据可用性、用户技能水平、资源可用性和公平问题,并特别关注农业基因组学研究。我们提出了利益相关者可以实施的解决方案,以减轻和克服这些挑战,并对基因组学和转录组学数据再利用的未来提出了乐观的看法。
{"title":"Data reuse in agricultural genomics research: challenges and recommendations.","authors":"Alenka Hafner, Victoria DeLeo, Cecilia H Deng, Christine G Elsik, Damarius S Fleming, Peter W Harrison, Theodore S Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H Quezada-Rodríguez, Christopher K Tuggle, James E Koltes","doi":"10.1093/gigascience/giae106","DOIUrl":"10.1093/gigascience/giae106","url":null,"abstract":"<p><p>The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142978043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chromosome-level echidna genome illuminates evolution of multiple sex chromosome system in monotremes. 染色体水平的针鼹基因组揭示了单孔动物多性染色体系统的进化。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae112
Yang Zhou, Jiazheng Jin, Xuemei Li, Gregory Gedman, Sarah Pelan, Arang Rhie, Chuan Jiang, Olivier Fedrigo, Kerstin Howe, Adam M Phillippy, Erich D Jarvis, Frank Grutzner, Qi Zhou, Guojie Zhang

Background: A thorough analysis of genome evolution is fundamental for biodiversity understanding. The iconic monotremes (platypus and echidna) feature extraordinary biology. However, they also exhibit rearrangements in several chromosomes, especially in the sex chromosome chain. Therefore, the lack of a chromosome-level echidna genome has limited insights into genome evolution in monotremes, in particular the multiple sex chromosomes complex.

Results: Here, we present a new long reads-based chromosome-level short-beaked echidna (Tachyglossus aculeatus) genome, which allowed the inference of chromosomal rearrangements in the monotreme ancestor (2n = 64) and each extant species. Analysis of the more complete sex chromosomes uncovered homology between 1 Y chromosome and multiple X chromosomes, suggesting that it is the ancestral X that has undergone reciprocal translocation with ancestral autosomes to form the complex. We also identified dozens of ampliconic genes on the sex chromosomes, with several ancestral ones expressed during male meiosis, suggesting selective constraints in pairing the multiple sex chromosomes.

Conclusion: The new echidna genome provides an important basis for further study of the unique biology and conservation of this species.

背景:基因组进化的深入分析是理解生物多样性的基础。标志性的单孔目动物(鸭嘴兽和针鼹)具有非凡的生物学特征。然而,它们在一些染色体上也表现出重排,尤其是在性染色体链上。因此,缺乏染色体水平的针鼹基因组限制了对单孔动物基因组进化的认识,特别是多性染色体复合体。结果:在这里,我们提出了一个新的基于长reads的染色体水平短喙针鼹(Tachyglossus aculeatus)基因组,该基因组允许对单目祖先(2n = 64)和每个现存物种的染色体重排进行推断。对更完整的性染色体的分析发现了1条Y染色体和多条X染色体之间的同源性,这表明是祖先的X染色体与祖先的常染色体发生了相互易位,形成了复合体。我们还在性染色体上发现了数十个扩增基因,其中几个祖先基因在雄性减数分裂期间表达,这表明多性染色体配对存在选择性约束。结论:新的针鼹基因组为进一步研究针鼹的独特生物学和保护提供了重要依据。
{"title":"Chromosome-level echidna genome illuminates evolution of multiple sex chromosome system in monotremes.","authors":"Yang Zhou, Jiazheng Jin, Xuemei Li, Gregory Gedman, Sarah Pelan, Arang Rhie, Chuan Jiang, Olivier Fedrigo, Kerstin Howe, Adam M Phillippy, Erich D Jarvis, Frank Grutzner, Qi Zhou, Guojie Zhang","doi":"10.1093/gigascience/giae112","DOIUrl":"10.1093/gigascience/giae112","url":null,"abstract":"<p><strong>Background: </strong>A thorough analysis of genome evolution is fundamental for biodiversity understanding. The iconic monotremes (platypus and echidna) feature extraordinary biology. However, they also exhibit rearrangements in several chromosomes, especially in the sex chromosome chain. Therefore, the lack of a chromosome-level echidna genome has limited insights into genome evolution in monotremes, in particular the multiple sex chromosomes complex.</p><p><strong>Results: </strong>Here, we present a new long reads-based chromosome-level short-beaked echidna (Tachyglossus aculeatus) genome, which allowed the inference of chromosomal rearrangements in the monotreme ancestor (2n = 64) and each extant species. Analysis of the more complete sex chromosomes uncovered homology between 1 Y chromosome and multiple X chromosomes, suggesting that it is the ancestral X that has undergone reciprocal translocation with ancestral autosomes to form the complex. We also identified dozens of ampliconic genes on the sex chromosomes, with several ancestral ones expressed during male meiosis, suggesting selective constraints in pairing the multiple sex chromosomes.</p><p><strong>Conclusion: </strong>The new echidna genome provides an important basis for further study of the unique biology and conservation of this species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11710854/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling patterns in spatial transcriptomics data: a novel approach utilizing graph attention autoencoder and multiscale deep subspace clustering network. 揭示空间转录组学数据中的模式:一种利用图注意自编码器和多尺度深子空间聚类网络的新方法。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae103
Liqian Zhou, Xinhuai Peng, Min Chen, Xianzhi He, Geng Tian, Jialiang Yang, Lihong Peng

Background: The accurate deciphering of spatial domains, along with the identification of differentially expressed genes and the inference of cellular trajectory based on spatial transcriptomic (ST) data, holds significant potential for enhancing our understanding of tissue organization and biological functions. However, most of spatial clustering methods can neither decipher complex structures in ST data nor entirely employ features embedded in different layers.

Results: This article introduces STMSGAL, a novel framework for analyzing ST data by incorporating graph attention autoencoder and multiscale deep subspace clustering. First, STMSGAL constructs ctaSNN, a cell type-aware shared nearest neighbor graph, using Louvian clustering exclusively based on gene expression profiles. Subsequently, it integrates expression profiles and ctaSNN to generate spot latent representations using a graph attention autoencoder and multiscale deep subspace clustering. Lastly, STMSGAL implements spatial clustering, differential expression analysis, and trajectory inference, providing comprehensive capabilities for thorough data exploration and interpretation. STMSGAL was evaluated against 7 methods, including SCANPY, SEDR, CCST, DeepST, GraphST, STAGATE, and SiGra, using four 10x Genomics Visium datasets, 1 mouse visual cortex STARmap dataset, and 2 Stereo-seq mouse embryo datasets. The comparison showcased STMSGAL's remarkable performance across Davies-Bouldin, Calinski-Harabasz, S_Dbw, and ARI values. STMSGAL significantly enhanced the identification of layer structures across ST data with different spatial resolutions and accurately delineated spatial domains in 2 breast cancer tissues, adult mouse brain (FFPE), and mouse embryos.

Conclusions: STMSGAL can serve as an essential tool for bridging the analysis of cellular spatial organization and disease pathology, offering valuable insights for researchers in the field.

背景:空间结构域的准确解读,以及基于空间转录组学(ST)数据的差异表达基因的识别和细胞轨迹的推断,对于增强我们对组织组织和生物学功能的理解具有重要的潜力。然而,大多数空间聚类方法既不能解析ST数据中的复杂结构,也不能完全利用嵌入在不同层中的特征。结果:本文介绍了一种结合图注意自编码器和多尺度深子空间聚类的ST数据分析新框架STMSGAL。首先,STMSGAL利用完全基于基因表达谱的Louvian聚类构建了ctaSNN,这是一个细胞类型感知的共享近邻图。随后,结合表达谱和ctaSNN,利用图注意自编码器和多尺度深子空间聚类生成点隐表示。最后,STMSGAL实现了空间聚类、差分表达分析和轨迹推断,为深入的数据探索和解释提供了全面的能力。STMSGAL采用7种方法进行评估,包括SCANPY、SEDR、CCST、DeepST、GraphST、STAGATE和SiGra,使用4个10x Genomics Visium数据集、1个小鼠视觉皮层STARmap数据集和2个Stereo-seq小鼠胚胎数据集。对比显示了STMSGAL在Davies-Bouldin、Calinski-Harabasz、S_Dbw和ARI值上的卓越性能。STMSGAL在不同空间分辨率的ST数据中显著增强了层结构的识别,并准确描绘了2种乳腺癌组织、成年小鼠脑(FFPE)和小鼠胚胎的空间域。结论:STMSGAL可以作为连接细胞空间组织和疾病病理分析的重要工具,为该领域的研究人员提供有价值的见解。
{"title":"Unveiling patterns in spatial transcriptomics data: a novel approach utilizing graph attention autoencoder and multiscale deep subspace clustering network.","authors":"Liqian Zhou, Xinhuai Peng, Min Chen, Xianzhi He, Geng Tian, Jialiang Yang, Lihong Peng","doi":"10.1093/gigascience/giae103","DOIUrl":"10.1093/gigascience/giae103","url":null,"abstract":"<p><strong>Background: </strong>The accurate deciphering of spatial domains, along with the identification of differentially expressed genes and the inference of cellular trajectory based on spatial transcriptomic (ST) data, holds significant potential for enhancing our understanding of tissue organization and biological functions. However, most of spatial clustering methods can neither decipher complex structures in ST data nor entirely employ features embedded in different layers.</p><p><strong>Results: </strong>This article introduces STMSGAL, a novel framework for analyzing ST data by incorporating graph attention autoencoder and multiscale deep subspace clustering. First, STMSGAL constructs ctaSNN, a cell type-aware shared nearest neighbor graph, using Louvian clustering exclusively based on gene expression profiles. Subsequently, it integrates expression profiles and ctaSNN to generate spot latent representations using a graph attention autoencoder and multiscale deep subspace clustering. Lastly, STMSGAL implements spatial clustering, differential expression analysis, and trajectory inference, providing comprehensive capabilities for thorough data exploration and interpretation. STMSGAL was evaluated against 7 methods, including SCANPY, SEDR, CCST, DeepST, GraphST, STAGATE, and SiGra, using four 10x Genomics Visium datasets, 1 mouse visual cortex STARmap dataset, and 2 Stereo-seq mouse embryo datasets. The comparison showcased STMSGAL's remarkable performance across Davies-Bouldin, Calinski-Harabasz, S_Dbw, and ARI values. STMSGAL significantly enhanced the identification of layer structures across ST data with different spatial resolutions and accurately delineated spatial domains in 2 breast cancer tissues, adult mouse brain (FFPE), and mouse embryos.</p><p><strong>Conclusions: </strong>STMSGAL can serve as an essential tool for bridging the analysis of cellular spatial organization and disease pathology, offering valuable insights for researchers in the field.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142978066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1