Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae111
Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil
Background: Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.
Results: We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).
Conclusion: The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.
背景:描述性元数据对于报告、发现、利用和动员研究数据集至关重要。然而,将元数据问题作为数据管理计划的一部分来解决,对于数据生产者来说可能会很复杂。为了组织和记录数据,必须创建各种描述性元数据。此外,在共享数据时,重要的是确保元数据的互操作性符合FAIR(可查找、可访问、可互操作、可重用)原则。考虑到这些挑战的实际性质,需要能够有效地帮助数据管理人员的管理工具。此外,这些工具应满足数据生产者的需要,便于使用,只需要很少的培训。结果:我们开发了Maggot (Metadata Aggregation on Data Storage),这是一个基于web的工具,可以使用高级元数据在本地管理数据目录。主要目标是方便数据传播和存储在数据存储库中。使用Maggot,用户可以轻松地生成高级元数据并将其附加到数据集,从而在协作环境中实现无缝共享。这种方法与许多数据管理计划相一致,因为它有效地解决了与数据组织、文档、存储和元数据共享相关的挑战,这些挑战基于协作组内部和外部的FAIR原则。此外,Maggot支持元数据交叉(即,生成的元数据可以转换为特定数据存储库使用的模式,或者使用适合第三方应用程序收集数据的格式导出)。结论:Maggot的主要目的是使用精心选择的模式和标准来简化高级元数据的收集。此外,它简化了通过元数据的数据可访问性,这通常是公共资助项目的需求。因此,可以利用Maggot促进有效的本地管理,以促进数据共享,同时遵守公平原则。此外,它还有助于在欧洲开放科学云框架内准备未来的EOSC FAIR数据网络。
{"title":"An ecosystem for producing and sharing metadata within the web of FAIR Data.","authors":"Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil","doi":"10.1093/gigascience/giae111","DOIUrl":"https://doi.org/10.1093/gigascience/giae111","url":null,"abstract":"<p><strong>Background: </strong>Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.</p><p><strong>Results: </strong>We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).</p><p><strong>Conclusion: </strong>The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.
Results: In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.
Conclusions: Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.
{"title":"A high-quality assembly revealing the PMEL gene for the unique plumage phenotype in Liancheng ducks.","authors":"Zhen Wang, Zhanbao Guo, Hongfei Liu, Tong Liu, Dapeng Liu, Simeng Yu, Hehe Tang, He Zhang, Qiming Mou, Bo Zhang, Junting Cao, Martine Schroyen, Shuisheng Hou, Zhengkui Zhou","doi":"10.1093/gigascience/giae114","DOIUrl":"10.1093/gigascience/giae114","url":null,"abstract":"<p><strong>Background: </strong>Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.</p><p><strong>Results: </strong>In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.</p><p><strong>Conclusions: </strong>Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae113
Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang
Background: Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.
Results: We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely "long-tail" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.
Conclusion: This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.
{"title":"Mutation impact on mRNA versus protein expression across human cancers.","authors":"Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang","doi":"10.1093/gigascience/giae113","DOIUrl":"10.1093/gigascience/giae113","url":null,"abstract":"<p><strong>Background: </strong>Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.</p><p><strong>Results: </strong>We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely \"long-tail\" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.</p><p><strong>Conclusion: </strong>This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae107
Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon
Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians. Single-cell analysis is in great demand from biologists and biomedical scientists, as evidenced by the proliferation of training events, materials, and collaborative global efforts like the Human Cell Atlas. However, iterative analyses lacking reinstantiation, coupled with unstandardized pipelines, have made effective single-cell training a moving target.
Findings: To address these challenges, we present a Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for single-cell RNA sequencing (scRNA-seq) analysis, which offers parallel analytical methods using a graphical interface (buttons) or code. With clear, interoperable materials, MIGHTS facilitates smooth transitions between environments. Bridging the biologist-programmer gap, MIGHTS emphasizes interdisciplinary communication for effective learning at all levels. Real-world data analysis in MIGHTS promotes critical thinking and best practices, while FAIR data principles ensure validation of results. MIGHTS is freely available, hosted on the Galaxy Training Network, and leverages Galaxy interfaces for analyses in both settings. Given the ongoing popularity of Python-based (Scanpy) and R-based (Seurat & Monocle) scRNA-seq analyses, MIGHTS enables analyses using both.
Conclusions: MIGHTS consists of 11 tutorials, including recordings, slide decks, and interactive visualizations, and a demonstrated track record of sustainability via regular updates and community collaborations. Parallel pathways in MIGHTS enable concurrent training of scientists at any programming level, addressing the heterogeneous needs of novice bioinformaticians.
{"title":"Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq.","authors":"Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon","doi":"10.1093/gigascience/giae107","DOIUrl":"10.1093/gigascience/giae107","url":null,"abstract":"<p><strong>Background: </strong>Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians. Single-cell analysis is in great demand from biologists and biomedical scientists, as evidenced by the proliferation of training events, materials, and collaborative global efforts like the Human Cell Atlas. However, iterative analyses lacking reinstantiation, coupled with unstandardized pipelines, have made effective single-cell training a moving target.</p><p><strong>Findings: </strong>To address these challenges, we present a Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for single-cell RNA sequencing (scRNA-seq) analysis, which offers parallel analytical methods using a graphical interface (buttons) or code. With clear, interoperable materials, MIGHTS facilitates smooth transitions between environments. Bridging the biologist-programmer gap, MIGHTS emphasizes interdisciplinary communication for effective learning at all levels. Real-world data analysis in MIGHTS promotes critical thinking and best practices, while FAIR data principles ensure validation of results. MIGHTS is freely available, hosted on the Galaxy Training Network, and leverages Galaxy interfaces for analyses in both settings. Given the ongoing popularity of Python-based (Scanpy) and R-based (Seurat & Monocle) scRNA-seq analyses, MIGHTS enables analyses using both.</p><p><strong>Conclusions: </strong>MIGHTS consists of 11 tutorials, including recordings, slide decks, and interactive visualizations, and a demonstrated track record of sustainability via regular updates and community collaborations. Parallel pathways in MIGHTS enable concurrent training of scientists at any programming level, addressing the heterogeneous needs of novice bioinformaticians.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae082
Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li
Background: In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.
Results: We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.
Conclusions: The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.
{"title":"Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering.","authors":"Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li","doi":"10.1093/gigascience/giae082","DOIUrl":"10.1093/gigascience/giae082","url":null,"abstract":"<p><strong>Background: </strong>In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.</p><p><strong>Results: </strong>We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.</p><p><strong>Conclusions: </strong>The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae118
Chun Liu, Jianyu Zhang, Ranran Xu, Jinhui Lv, Zhu Qiao, Mingzhou Bai, Shancen Zhao, Lijuan Luo, Guodao Liu, Pandao Liu
Background: Drought is a major limiting factor for plant survival and crop productivity. Stylosanthes angustifolia, a pioneer plant, exhibits remarkable drought tolerance, yet the molecular mechanisms driving its drought resistance remain largely unexplored.
Results: We present a chromosome-scale reference genome of S. angustifolia, which provides insights into its genome evolution and drought tolerance mechanisms. The assembled genome is 645.88 Mb in size, containing 319.98 Mb of repetitive sequences and 36,857 protein-coding genes. The high quality of this genome assembly is demonstrated by the presence of 99.26% BUSCO and a 19.49 long terminal repeat assembly index. Evolutionary analyses revealed that S. angustifolia shares a whole-genome duplication (WGD) event with other legumes but lacks recent WGD. Additionally, S. angustifolia has undergone gene expansion through tandem duplication approximately 12.31 million years ago. Through integrative multiomics analyses, we identified 4 gene families-namely, xanthoxin dehydrogenase, 2-hydroxyisoflavanone dehydratase, patatin-related phospholipase A, and stachyose synthetase-that underwent tandem duplication and were significantly upregulated under drought stress. These gene families contribute to the biosynthesis of abscisic acid, genistein, daidzein, jasmonic acid, and stachyose, thereby enhancing drought tolerance.
Conclusions: The genome assembly of S. angustifolia represents a significant advancement in understanding the genetic mechanisms underlying drought tolerance in this pioneer plant species. This genomic resource provides critical insights into the evolution of drought resistance and offers valuable genetic information for breeding programs aimed at improving drought resistance in crops.
{"title":"A chromosome-scale genome assembly of the pioneer plant Stylosanthes angustifolia: insights into genome evolution and drought adaptation.","authors":"Chun Liu, Jianyu Zhang, Ranran Xu, Jinhui Lv, Zhu Qiao, Mingzhou Bai, Shancen Zhao, Lijuan Luo, Guodao Liu, Pandao Liu","doi":"10.1093/gigascience/giae118","DOIUrl":"https://doi.org/10.1093/gigascience/giae118","url":null,"abstract":"<p><strong>Background: </strong>Drought is a major limiting factor for plant survival and crop productivity. Stylosanthes angustifolia, a pioneer plant, exhibits remarkable drought tolerance, yet the molecular mechanisms driving its drought resistance remain largely unexplored.</p><p><strong>Results: </strong>We present a chromosome-scale reference genome of S. angustifolia, which provides insights into its genome evolution and drought tolerance mechanisms. The assembled genome is 645.88 Mb in size, containing 319.98 Mb of repetitive sequences and 36,857 protein-coding genes. The high quality of this genome assembly is demonstrated by the presence of 99.26% BUSCO and a 19.49 long terminal repeat assembly index. Evolutionary analyses revealed that S. angustifolia shares a whole-genome duplication (WGD) event with other legumes but lacks recent WGD. Additionally, S. angustifolia has undergone gene expansion through tandem duplication approximately 12.31 million years ago. Through integrative multiomics analyses, we identified 4 gene families-namely, xanthoxin dehydrogenase, 2-hydroxyisoflavanone dehydratase, patatin-related phospholipase A, and stachyose synthetase-that underwent tandem duplication and were significantly upregulated under drought stress. These gene families contribute to the biosynthesis of abscisic acid, genistein, daidzein, jasmonic acid, and stachyose, thereby enhancing drought tolerance.</p><p><strong>Conclusions: </strong>The genome assembly of S. angustifolia represents a significant advancement in understanding the genetic mechanisms underlying drought tolerance in this pioneer plant species. This genomic resource provides critical insights into the evolution of drought resistance and offers valuable genetic information for breeding programs aimed at improving drought resistance in crops.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143032998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae115
Yanfeng Zhou, Chenhe Wang, Binhu Wang, Dongpo Xu, Xizhao Zhang, You Ge, Shulun Jiang, Fujiang Tang, Chunhai Chen, Xuemei Li, Jianbo Jian, Yang You
The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data. The telomere was identified in both ends of the contig/chromosome. The expanded gene associated with circadian entrainment suggests that P. chinensis may exhibit a high sensitivity to photoperiod. The contracted genes' immune-related families and DNA repair associated with positive selection in P. chinensis suggested the selection pressure during adaptive evolution. The population genetic analysis reported the genetic diversity and genomic footprints in 254 individuals from 8 different locations. The natural seawater samples can be the highest diversity and different from other freshwater and introduced populations. The divergent regions' associated genes were found to be related to the osmotic pressure system, suggesting adaptations to alkalinity and salinity. Thus, the T2T genome and genetic variation can be valuable resources for genomic footprints in P. chinensis, shedding light on its evolution, comparative genomics, and the genetic differences between natural and introduced populations.
{"title":"Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis.","authors":"Yanfeng Zhou, Chenhe Wang, Binhu Wang, Dongpo Xu, Xizhao Zhang, You Ge, Shulun Jiang, Fujiang Tang, Chunhai Chen, Xuemei Li, Jianbo Jian, Yang You","doi":"10.1093/gigascience/giae115","DOIUrl":"10.1093/gigascience/giae115","url":null,"abstract":"<p><p>The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data. The telomere was identified in both ends of the contig/chromosome. The expanded gene associated with circadian entrainment suggests that P. chinensis may exhibit a high sensitivity to photoperiod. The contracted genes' immune-related families and DNA repair associated with positive selection in P. chinensis suggested the selection pressure during adaptive evolution. The population genetic analysis reported the genetic diversity and genomic footprints in 254 individuals from 8 different locations. The natural seawater samples can be the highest diversity and different from other freshwater and introduced populations. The divergent regions' associated genes were found to be related to the osmotic pressure system, suggesting adaptations to alkalinity and salinity. Thus, the T2T genome and genetic variation can be valuable resources for genomic footprints in P. chinensis, shedding light on its evolution, comparative genomics, and the genetic differences between natural and introduced populations.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae106
Alenka Hafner, Victoria DeLeo, Cecilia H Deng, Christine G Elsik, Damarius S Fleming, Peter W Harrison, Theodore S Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H Quezada-Rodríguez, Christopher K Tuggle, James E Koltes
The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.
{"title":"Data reuse in agricultural genomics research: challenges and recommendations.","authors":"Alenka Hafner, Victoria DeLeo, Cecilia H Deng, Christine G Elsik, Damarius S Fleming, Peter W Harrison, Theodore S Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H Quezada-Rodríguez, Christopher K Tuggle, James E Koltes","doi":"10.1093/gigascience/giae106","DOIUrl":"10.1093/gigascience/giae106","url":null,"abstract":"<p><p>The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142978043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1093/gigascience/giae112
Yang Zhou, Jiazheng Jin, Xuemei Li, Gregory Gedman, Sarah Pelan, Arang Rhie, Chuan Jiang, Olivier Fedrigo, Kerstin Howe, Adam M Phillippy, Erich D Jarvis, Frank Grutzner, Qi Zhou, Guojie Zhang
Background: A thorough analysis of genome evolution is fundamental for biodiversity understanding. The iconic monotremes (platypus and echidna) feature extraordinary biology. However, they also exhibit rearrangements in several chromosomes, especially in the sex chromosome chain. Therefore, the lack of a chromosome-level echidna genome has limited insights into genome evolution in monotremes, in particular the multiple sex chromosomes complex.
Results: Here, we present a new long reads-based chromosome-level short-beaked echidna (Tachyglossus aculeatus) genome, which allowed the inference of chromosomal rearrangements in the monotreme ancestor (2n = 64) and each extant species. Analysis of the more complete sex chromosomes uncovered homology between 1 Y chromosome and multiple X chromosomes, suggesting that it is the ancestral X that has undergone reciprocal translocation with ancestral autosomes to form the complex. We also identified dozens of ampliconic genes on the sex chromosomes, with several ancestral ones expressed during male meiosis, suggesting selective constraints in pairing the multiple sex chromosomes.
Conclusion: The new echidna genome provides an important basis for further study of the unique biology and conservation of this species.
{"title":"Chromosome-level echidna genome illuminates evolution of multiple sex chromosome system in monotremes.","authors":"Yang Zhou, Jiazheng Jin, Xuemei Li, Gregory Gedman, Sarah Pelan, Arang Rhie, Chuan Jiang, Olivier Fedrigo, Kerstin Howe, Adam M Phillippy, Erich D Jarvis, Frank Grutzner, Qi Zhou, Guojie Zhang","doi":"10.1093/gigascience/giae112","DOIUrl":"10.1093/gigascience/giae112","url":null,"abstract":"<p><strong>Background: </strong>A thorough analysis of genome evolution is fundamental for biodiversity understanding. The iconic monotremes (platypus and echidna) feature extraordinary biology. However, they also exhibit rearrangements in several chromosomes, especially in the sex chromosome chain. Therefore, the lack of a chromosome-level echidna genome has limited insights into genome evolution in monotremes, in particular the multiple sex chromosomes complex.</p><p><strong>Results: </strong>Here, we present a new long reads-based chromosome-level short-beaked echidna (Tachyglossus aculeatus) genome, which allowed the inference of chromosomal rearrangements in the monotreme ancestor (2n = 64) and each extant species. Analysis of the more complete sex chromosomes uncovered homology between 1 Y chromosome and multiple X chromosomes, suggesting that it is the ancestral X that has undergone reciprocal translocation with ancestral autosomes to form the complex. We also identified dozens of ampliconic genes on the sex chromosomes, with several ancestral ones expressed during male meiosis, suggesting selective constraints in pairing the multiple sex chromosomes.</p><p><strong>Conclusion: </strong>The new echidna genome provides an important basis for further study of the unique biology and conservation of this species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11710854/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The accurate deciphering of spatial domains, along with the identification of differentially expressed genes and the inference of cellular trajectory based on spatial transcriptomic (ST) data, holds significant potential for enhancing our understanding of tissue organization and biological functions. However, most of spatial clustering methods can neither decipher complex structures in ST data nor entirely employ features embedded in different layers.
Results: This article introduces STMSGAL, a novel framework for analyzing ST data by incorporating graph attention autoencoder and multiscale deep subspace clustering. First, STMSGAL constructs ctaSNN, a cell type-aware shared nearest neighbor graph, using Louvian clustering exclusively based on gene expression profiles. Subsequently, it integrates expression profiles and ctaSNN to generate spot latent representations using a graph attention autoencoder and multiscale deep subspace clustering. Lastly, STMSGAL implements spatial clustering, differential expression analysis, and trajectory inference, providing comprehensive capabilities for thorough data exploration and interpretation. STMSGAL was evaluated against 7 methods, including SCANPY, SEDR, CCST, DeepST, GraphST, STAGATE, and SiGra, using four 10x Genomics Visium datasets, 1 mouse visual cortex STARmap dataset, and 2 Stereo-seq mouse embryo datasets. The comparison showcased STMSGAL's remarkable performance across Davies-Bouldin, Calinski-Harabasz, S_Dbw, and ARI values. STMSGAL significantly enhanced the identification of layer structures across ST data with different spatial resolutions and accurately delineated spatial domains in 2 breast cancer tissues, adult mouse brain (FFPE), and mouse embryos.
Conclusions: STMSGAL can serve as an essential tool for bridging the analysis of cellular spatial organization and disease pathology, offering valuable insights for researchers in the field.
{"title":"Unveiling patterns in spatial transcriptomics data: a novel approach utilizing graph attention autoencoder and multiscale deep subspace clustering network.","authors":"Liqian Zhou, Xinhuai Peng, Min Chen, Xianzhi He, Geng Tian, Jialiang Yang, Lihong Peng","doi":"10.1093/gigascience/giae103","DOIUrl":"10.1093/gigascience/giae103","url":null,"abstract":"<p><strong>Background: </strong>The accurate deciphering of spatial domains, along with the identification of differentially expressed genes and the inference of cellular trajectory based on spatial transcriptomic (ST) data, holds significant potential for enhancing our understanding of tissue organization and biological functions. However, most of spatial clustering methods can neither decipher complex structures in ST data nor entirely employ features embedded in different layers.</p><p><strong>Results: </strong>This article introduces STMSGAL, a novel framework for analyzing ST data by incorporating graph attention autoencoder and multiscale deep subspace clustering. First, STMSGAL constructs ctaSNN, a cell type-aware shared nearest neighbor graph, using Louvian clustering exclusively based on gene expression profiles. Subsequently, it integrates expression profiles and ctaSNN to generate spot latent representations using a graph attention autoencoder and multiscale deep subspace clustering. Lastly, STMSGAL implements spatial clustering, differential expression analysis, and trajectory inference, providing comprehensive capabilities for thorough data exploration and interpretation. STMSGAL was evaluated against 7 methods, including SCANPY, SEDR, CCST, DeepST, GraphST, STAGATE, and SiGra, using four 10x Genomics Visium datasets, 1 mouse visual cortex STARmap dataset, and 2 Stereo-seq mouse embryo datasets. The comparison showcased STMSGAL's remarkable performance across Davies-Bouldin, Calinski-Harabasz, S_Dbw, and ARI values. STMSGAL significantly enhanced the identification of layer structures across ST data with different spatial resolutions and accurately delineated spatial domains in 2 breast cancer tissues, adult mouse brain (FFPE), and mouse embryos.</p><p><strong>Conclusions: </strong>STMSGAL can serve as an essential tool for bridging the analysis of cellular spatial organization and disease pathology, offering valuable insights for researchers in the field.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142978066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}