GigaScience最新文献_第10页

Mutation impact on mRNA versus protein expression across human cancers. 突变对人类癌症中mRNA和蛋白质表达的影响。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae113

Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang

Background: Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.

Results: We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely "long-tail" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.

Conclusion: This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.

背景：通常认为癌症突变会改变蛋白质，从而促进肿瘤的发生。然而，除了基因表达外，突变是如何影响蛋白质表达的，很少有系统的研究。这一点很重要，因为mRNA和蛋白质水平在翻译效率和蛋白质降解等因素的驱动下，往往只表现出适度的相关性。来自大型肿瘤队列的蛋白质基因组数据集为系统分析体细胞突变对mRNA和蛋白质丰度的影响提供了机会，并确定了对这些分子水平有不同影响的突变。结果：我们通过配对基因组学和全球蛋白质组学分析，对6种癌症类型的953例癌症病例的mRNA和蛋白质水平表达进行了全面分析。47.2%的体细胞表达数量性状位点（seQTLs）受到蛋白水平的影响，包括CDH1和MSH3截断，以及其他可能来自“长尾”驱动基因的突变。设计鉴定体细胞蛋白特异性QTLs （spsQTLs）的统计管道，我们揭示了几种基因突变，包括NF1和MAP2K4截断和TP53错义，它们对蛋白质丰度的影响不成比例，无法用转录组学解释。通过大规模平行变异效应分析（MAVE）的数据交叉验证，与高肿瘤TP53蛋白相关的TP53错感更有可能在实验上被证实是功能性的。结论：本研究揭示了体细胞突变对mRNA和蛋白水平的影响，强调了整合蛋白质基因组学数据以综合识别功能显著的癌症突变的必要性。这些见解为进一步的功能验证和治疗靶向提供了一个优先考虑突变的框架。

{"title":"Mutation impact on mRNA versus protein expression across human cancers.","authors":"Yuqi Liu, Abdulkadir Elmas, Kuan-Lin Huang","doi":"10.1093/gigascience/giae113","DOIUrl":"10.1093/gigascience/giae113","url":null,"abstract":"Background: Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression-in addition to gene expression-has rarely been systematically investigated. This is significant as mRNA and protein levels frequently show only moderate correlation, driven by factors such as translation efficiency and protein degradation. Proteogenomic datasets from large tumor cohorts provide an opportunity to systematically analyze the effects of somatic mutations on mRNA and protein abundance and identify mutations with distinct impacts on these molecular levels.Results: We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across 6 cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including CDH1 and MSH3 truncations, as well as other mutations from likely \"long-tail\" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins are more likely to be experimentally confirmed as functional.Conclusion: This study reveals that somatic mutations can exhibit distinct impacts on mRNA and protein levels, underscoring the necessity of integrating proteogenomic data to comprehensively identify functionally significant cancer mutations. These insights provide a framework for prioritizing mutations for further functional validation and therapeutic targeting.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-quality phenotypic and genotypic dataset of barley genebank core collection to unlock untapped genetic diversity. 大麦基因库核心收集的高质量表型和基因型数据集解锁未开发的遗传多样性。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae121

Zhihui Yuan, Maximilian Rembe, Martin Mascher, Nils Stein, Axel Himmelbach, Murukarthick Jayakodi, Andreas Börner, Klaus Oldach, Ahmed Jahoor, Jens Due Jensen, Julia Rudloff, Viktoria-Elisabeth Dohrendorf, Luisa Pauline Kuhfus, Emmanuelle Dyrszka, Matthieu Conte, Frederik Hinz, Salim Trouchaud, Jochen C Reif, Samira El Hanafi

Background: Genebanks around the globe serve as valuable repositories of genetic diversity, offering not only access to a broad spectrum of plant material but also critical resources for enhancing crop resilience, advancing scientific research, and supporting global food security. To this end, traditional genebanks are evolving into biodigital resource centers where the integration of phenotypic and genotypic data for accessions can drive more informed decision-making, optimize resource allocation, and unlock new opportunities for plant breeding and research. However, the curation and availability of interoperable phenotypic and genotypic data for genebank accessions is still in its infancy and represents an obstacle to rapid scientific discoveries in this field. Therefore, effectively promoting FAIR (i.e., findable, accessible, interoperable, and reusable) access to these data is vital for maximizing the potential of genebanks and driving progress in agricultural innovation.

Findings: Here we provide whole genome sequencing data of 812 barley (Hordeum vulgare L.) plant genetic resources and 298 European elite materials released between 1949 and 2021, as well as the phenotypic data for 4 disease resistance traits and 3 agronomic traits. The robustness of the investigated traits and the interoperability of genomic and phenotypic data were assessed in the current publication, aiming to make this panel publicly available as a resource for future genetic research in barley.

Conclusions: The data showed broad phenotypic variability and high association mapping potential, offering a key resource for identifying genebank donors with untapped genes to advance barley breeding while safeguarding genetic diversity.

背景：全球基因库是遗传多样性的宝贵资源库，不仅提供广泛的植物材料，而且为提高作物抗灾能力、推进科学研究和支持全球粮食安全提供重要资源。为此，传统基因库正在演变为生物数字资源中心，在那里，表型和基因型数据的整合可以推动更明智的决策，优化资源配置，并为植物育种和研究开辟新的机会。然而，基因库中可互操作的表型和基因型数据的管理和可用性仍处于起步阶段，这对该领域的快速科学发现构成了障碍。因此，有效促进这些数据的公平获取（即可查找、可获取、可互操作和可重复使用）对于最大限度地发挥基因库的潜力和推动农业创新的进展至关重要。结果：利用1949 - 2021年发布的812份大麦（Hordeum vulgare L.）植物遗传资源和298份欧洲优质材料的全基因组测序数据，以及4个抗病性状和3个农艺性状的表型数据。在当前的出版物中评估了所调查性状的稳健性以及基因组和表型数据的互操作性，旨在使该小组公开可用，作为未来大麦遗传研究的资源。结论：数据显示了广泛的表型变异性和高关联定位潜力，为鉴定未开发基因的基因库供体提供了关键资源，以推进大麦育种，同时保护遗传多样性。

{"title":"High-quality phenotypic and genotypic dataset of barley genebank core collection to unlock untapped genetic diversity.","authors":"Zhihui Yuan, Maximilian Rembe, Martin Mascher, Nils Stein, Axel Himmelbach, Murukarthick Jayakodi, Andreas Börner, Klaus Oldach, Ahmed Jahoor, Jens Due Jensen, Julia Rudloff, Viktoria-Elisabeth Dohrendorf, Luisa Pauline Kuhfus, Emmanuelle Dyrszka, Matthieu Conte, Frederik Hinz, Salim Trouchaud, Jochen C Reif, Samira El Hanafi","doi":"10.1093/gigascience/giae121","DOIUrl":"10.1093/gigascience/giae121","url":null,"abstract":"Background: Genebanks around the globe serve as valuable repositories of genetic diversity, offering not only access to a broad spectrum of plant material but also critical resources for enhancing crop resilience, advancing scientific research, and supporting global food security. To this end, traditional genebanks are evolving into biodigital resource centers where the integration of phenotypic and genotypic data for accessions can drive more informed decision-making, optimize resource allocation, and unlock new opportunities for plant breeding and research. However, the curation and availability of interoperable phenotypic and genotypic data for genebank accessions is still in its infancy and represents an obstacle to rapid scientific discoveries in this field. Therefore, effectively promoting FAIR (i.e., findable, accessible, interoperable, and reusable) access to these data is vital for maximizing the potential of genebanks and driving progress in agricultural innovation.Findings: Here we provide whole genome sequencing data of 812 barley (Hordeum vulgare L.) plant genetic resources and 298 European elite materials released between 1949 and 2021, as well as the phenotypic data for 4 disease resistance traits and 3 agronomic traits. The robustness of the investigated traits and the interoperability of genomic and phenotypic data were assessed in the current publication, aiming to make this panel publicly available as a resource for future genetic research in barley.Conclusions: The data showed broad phenotypic variability and high association mapping potential, offering a key resource for identifying genebank donors with untapped genes to advance barley breeding while safeguarding genetic diversity.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143390809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Micromix: web infrastructure for visualizing and remixing microbial 'omics data. Micromix：用于可视化和重新混合微生物组学数据的网络基础设施。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae120

Regan J Hayward, Titus Ebbecke, Hanna Fricke, Vo Quang Nguyen, Lars Barquist

Micromix is a flexible web platform for sharing and integrating microbial omics data, including RNA sequencing and transposon-insertion sequencing. Currently, the lack of solutions for making data web-accessible results in omics data being fragmented across supplementary spreadsheets or languishing as raw read data in public repositories. Micromix solves this problem and can be easily deployed on a standard web server or using cloud services. It is organism-agnostic, accommodates data and annotations from various sources, and allows filtering based on KEGG pathways, Gene Ontology terms, and curated gene sets. Visualizations are provided through a plug-in system that integrates existing visualization services and allows rapid development of new services, with available plug-ins currently supporting interactive heatmap and clustering functions. Users can upload their own data in a variety of formats to perform integrative analyses in the context of existing datasets. To support collaborative research, Micromix allows sharing of interactive sessions that maintain defined filtering and/or visualization options. We demonstrate the utility of Micromix with case studies focusing on the SPI-2 pathogenicity island in Salmonella enterica and polysaccharide utilization loci in Bacteroides thetaiotaomicron, showcasing the platform's capabilities for integrating, filtering, and visualizing diverse functional genomic datasets. Micromix is available at http://micromix.systems.

Micromix是一个灵活的网络平台，用于共享和整合微生物组学数据，包括RNA测序和转座子插入测序。目前，缺乏使数据可在网络上访问的解决方案，导致组学数据在补充电子表格中分散，或者作为公共存储库中的原始读取数据而衰弱。Micromix解决了这个问题，可以很容易地部署在一个标准的web服务器或使用云服务。它是生物不可知论的，容纳来自各种来源的数据和注释，并允许基于KEGG途径，基因本体术语和策划的基因集进行过滤。可视化是通过插件系统提供的，该系统集成了现有的可视化服务，并允许快速开发新服务，目前可用的插件支持交互式热图和集群功能。用户可以以各种格式上传自己的数据，以便在现有数据集的上下文中执行综合分析。为了支持协作研究，Micromix允许共享交互式会话，这些会话维护已定义的过滤和/或可视化选项。我们通过案例研究展示了Micromix的实用性，重点研究了肠沙门氏菌的SPI-2致病性岛和拟杆菌的多糖利用位点，展示了该平台整合、过滤和可视化各种功能基因组数据集的能力。Micromix的网址是http://micromix.systems。

{"title":"Micromix: web infrastructure for visualizing and remixing microbial 'omics data.","authors":"Regan J Hayward, Titus Ebbecke, Hanna Fricke, Vo Quang Nguyen, Lars Barquist","doi":"10.1093/gigascience/giae120","DOIUrl":"10.1093/gigascience/giae120","url":null,"abstract":"Micromix is a flexible web platform for sharing and integrating microbial omics data, including RNA sequencing and transposon-insertion sequencing. Currently, the lack of solutions for making data web-accessible results in omics data being fragmented across supplementary spreadsheets or languishing as raw read data in public repositories. Micromix solves this problem and can be easily deployed on a standard web server or using cloud services. It is organism-agnostic, accommodates data and annotations from various sources, and allows filtering based on KEGG pathways, Gene Ontology terms, and curated gene sets. Visualizations are provided through a plug-in system that integrates existing visualization services and allows rapid development of new services, with available plug-ins currently supporting interactive heatmap and clustering functions. Users can upload their own data in a variety of formats to perform integrative analyses in the context of existing datasets. To support collaborative research, Micromix allows sharing of interactive sessions that maintain defined filtering and/or visualization options. We demonstrate the utility of Micromix with case studies focusing on the SPI-2 pathogenicity island in Salmonella enterica and polysaccharide utilization loci in Bacteroides thetaiotaomicron, showcasing the platform's capabilities for integrating, filtering, and visualizing diverse functional genomic datasets. Micromix is available at http://micromix.systems.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11788673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143079386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

M6Allele: a toolkit for detection of allele-specific RNA N6-methyladenosine modifications. M6Allele：用于检测等位基因特异性RNA n6 -甲基腺苷修饰的工具包。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf040

Yin Zhang, Lin Tang, Shengyao Zhi, Bosu Hu, Zhixiang Zuo, Jian Ren, Yubin Xie, Xiaotong Luo

Background: Allelic gene-specific regulatory events are crucial mechanisms in organisms, pivotal to many fundamental biological processes such as embryonic development and chromosome inactivation. Allelic gene imbalance manifests at both RNA expression and epigenetic levels. Recent research has unveiled allelic-specific regulation of RNA N6-methyladenosine (m6A), emphasizing the need for its precise identification. However, prevailing approaches primarily focus on screening allele-specific genetic variations associated with m6A, but not truly identify allelic m6A events. Therefore, the construction of a novel algorithm dedicated to identifying allele-specific m6A (ASm6A) signals is still necessary for comprehensively understanding the regulatory mechanism of ASm6A.

Findings: To address this limitation, we have developed a meta-analysis approach using hierarchical Bayesian models to accurately detect ASm6A events at the peak level from MeRIP-seq data. For user convenience, we introduce a unified analysis pipeline named M6Allele, streamlining the assessment of significant ASm6A across single and paired samples. Applying M6Allele to MeRIP-seq data analysis of pulmonary fibrosis and lung adenocarcinoma reveals enrichment of ASm6A events in key regulatory genes associated with these diseases, suggesting their potential involvement in disease regulation.

Conclusions: Our effort provides a method for precisely identifying ASm6A events at the peak level, elucidates the interplay of m6A with human health and disease genetics, and paves a new visual angle for disease research. The M6Allele software is freely available at https://github.com/RenLabBioinformatics/M6Allele under the MIT license.

背景：等位基因特异性调控事件是生物体中至关重要的机制，对胚胎发育和染色体失活等许多基本生物学过程至关重要。等位基因失衡表现在RNA表达和表观遗传水平上。最近的研究揭示了RNA n6 -甲基腺苷（m6A）的等位基因特异性调控，强调了对其精确鉴定的必要性。然而，主流的方法主要集中在筛选与m6A相关的等位基因特异性遗传变异，而不是真正识别等位基因m6A事件。因此，构建一种专门用于识别等位基因特异性m6A （ASm6A）信号的新算法对于全面了解ASm6A的调控机制仍然是必要的。为了解决这一限制，我们开发了一种使用分层贝叶斯模型的荟萃分析方法，以准确地从MeRIP-seq数据中检测峰值水平的ASm6A事件。为了方便用户，我们引入了一个名为M6Allele的统一分析管道，简化了单个和成对样本中重要ASm6A的评估。将m6等位基因应用于肺纤维化和肺腺癌的MeRIP-seq数据分析，发现与这些疾病相关的关键调控基因中ASm6A事件的富集，提示其可能参与疾病调控。结论：我们的工作提供了一种高峰水平精确鉴定ASm6A事件的方法，阐明了m6A与人类健康和疾病遗传学的相互作用，为疾病研究开辟了新的视角。M6Allele软件在MIT许可下可在https://github.com/RenLabBioinformatics/M6Allele免费获得。

{"title":"M6Allele: a toolkit for detection of allele-specific RNA N6-methyladenosine modifications.","authors":"Yin Zhang, Lin Tang, Shengyao Zhi, Bosu Hu, Zhixiang Zuo, Jian Ren, Yubin Xie, Xiaotong Luo","doi":"10.1093/gigascience/giaf040","DOIUrl":"10.1093/gigascience/giaf040","url":null,"abstract":"Background: Allelic gene-specific regulatory events are crucial mechanisms in organisms, pivotal to many fundamental biological processes such as embryonic development and chromosome inactivation. Allelic gene imbalance manifests at both RNA expression and epigenetic levels. Recent research has unveiled allelic-specific regulation of RNA N6-methyladenosine (m6A), emphasizing the need for its precise identification. However, prevailing approaches primarily focus on screening allele-specific genetic variations associated with m6A, but not truly identify allelic m6A events. Therefore, the construction of a novel algorithm dedicated to identifying allele-specific m6A (ASm6A) signals is still necessary for comprehensively understanding the regulatory mechanism of ASm6A.Findings: To address this limitation, we have developed a meta-analysis approach using hierarchical Bayesian models to accurately detect ASm6A events at the peak level from MeRIP-seq data. For user convenience, we introduce a unified analysis pipeline named M6Allele, streamlining the assessment of significant ASm6A across single and paired samples. Applying M6Allele to MeRIP-seq data analysis of pulmonary fibrosis and lung adenocarcinoma reveals enrichment of ASm6A events in key regulatory genes associated with these diseases, suggesting their potential involvement in disease regulation.Conclusions: Our effort provides a method for precisely identifying ASm6A events at the peak level, elucidates the interplay of m6A with human health and disease genetics, and paves a new visual angle for disease research. The M6Allele software is freely available at https://github.com/RenLabBioinformatics/M6Allele under the MIT license.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12087454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144101503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RWRtoolkit: multi-omic network analysis using random walks on multiplex networks in any species. RWRtoolkit：在任何物种的多路网络上使用随机行走的多组网络分析。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf028

David Kainer, Matthew Lane, Kyle A Sullivan, J Izaak Miller, Mikaela Cashman, Mallory Morgan, Ashley Cliff, Jonathon Romero, Angelica Walker, D Dakota Blair, Hari Chhetri, Yongqin Wang, Mirko Pavicic, Anna Furches, Jaclyn Noshay, Meghan Drake, A J Ireland, Ali Missaoui, Yun Kang, John C Sedbrook, Paramvir Dehal, Shane Canon, Daniel Jacobson

We introduce RWRtoolkit, a multiplex generation, exploration, and statistical package built for R and command-line users. RWRtoolkit enables the efficient exploration of large and highly complex biological networks generated from custom experimental data and/or from publicly available datasets, and is species agnostic. A range of functions can be used to find topological distances between biological entities, determine relationships within sets of interest, search for topological context around sets of interest, and statistically evaluate the strength of relationships within and between sets. The command-line interface is designed for parallelization on high-performance cluster systems, which enables high-throughput analysis such as permutation testing. Several tools in the package have also been made available for use in reproducible workflows via the KBase web application.

我们将介绍RWRtoolkit，这是一个为R和命令行用户构建的多路生成、探索和统计包。RWRtoolkit能够有效地探索从自定义实验数据和/或从公开可用的数据集生成的大型和高度复杂的生物网络，并且是物种不可知性的。一系列函数可用于查找生物实体之间的拓扑距离，确定感兴趣集合内的关系，搜索感兴趣集合周围的拓扑上下文，以及统计地评估集合内和集合之间的关系强度。命令行接口是为高性能集群系统上的并行化而设计的，它支持高吞吐量分析，如排列测试。包中的几个工具也可以通过KBase web应用程序在可再现的工作流中使用。

{"title":"RWRtoolkit: multi-omic network analysis using random walks on multiplex networks in any species.","authors":"David Kainer, Matthew Lane, Kyle A Sullivan, J Izaak Miller, Mikaela Cashman, Mallory Morgan, Ashley Cliff, Jonathon Romero, Angelica Walker, D Dakota Blair, Hari Chhetri, Yongqin Wang, Mirko Pavicic, Anna Furches, Jaclyn Noshay, Meghan Drake, A J Ireland, Ali Missaoui, Yun Kang, John C Sedbrook, Paramvir Dehal, Shane Canon, Daniel Jacobson","doi":"10.1093/gigascience/giaf028","DOIUrl":"https://doi.org/10.1093/gigascience/giaf028","url":null,"abstract":"We introduce RWRtoolkit, a multiplex generation, exploration, and statistical package built for R and command-line users. RWRtoolkit enables the efficient exploration of large and highly complex biological networks generated from custom experimental data and/or from publicly available datasets, and is species agnostic. A range of functions can be used to find topological distances between biological entities, determine relationships within sets of interest, search for topological context around sets of interest, and statistically evaluate the strength of relationships within and between sets. The command-line interface is designed for parallelization on high-performance cluster systems, which enables high-throughput analysis such as permutation testing. Several tools in the package have also been made available for use in reproducible workflows via the KBase web application.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143968343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A telomere-to-telomere genome assembly of koi carp (Cyprinus carpio) using long reads and Hi-C technology. 利用长读取和Hi-C技术组装锦鲤的端粒-端粒基因组。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf087

Jiandong Yuan, Jiang Li, Jun Yong, Xuewu Liao, Huijuan Guo, Yongchao Niu

Background: The common carp (Cyprinus carpio) is a key species in global freshwater aquaculture. One of its variants, the koi carp, is particularly prized for its aesthetic appeal. However, lacking a high-quality genome has limited genetic research and breeding efforts for common carp and koi carp.

Findings: This study presents a gap-free genome for the Taisho Sansyoku koi carp strain (C. carpio). The assembly achieved a total size of 1,555.86 Mb with a contig N50 of 30.45 Mb, comprising 50 gap-free pseudochromosomes ranging in length from 20.70 to 49.02 Mb. The BUSCO completeness score reached 99.20%, and the Genome Continuity Inspector score was 85.82, indicating high genome integrity and accuracy. Notably, 83 out of 100 telomeres were detected, resulting in 33 chromosomes possessing complete telomeres. Comparative genomic analysis showed that the expanded gene families and unique genes play essential roles in various biological traits, such as energy metabolism, endocrine regulation, cell proliferation, and immune response, potentially related to multiple metabolic diseases and health conditions. The positively selected genes are linked to various biological processes, such as the metalloendopeptidase activity, which plays a significant role in the central nervous system and is associated with diseases.

Conclusions: The koi carp genome assembly (CC 4.0) fills a critical gap in understanding common carp's biology and adaptation. It provides an invaluable resource for molecular-guided breeding and genetic enhancement strategies, underscoring the importance of common carp and koi carp in aquaculture and ecological research.

背景：鲤鱼（Cyprinus carpio）是全球淡水养殖的重要物种。它的变种之一锦鲤因其美学吸引力而受到特别珍视。然而，缺乏高质量的基因组限制了普通鲤鱼和锦鲤的遗传研究和育种工作。结果：本研究提出了大正三洲锦鲤品系（C. carpio）的无缺口基因组。该序列全长1,555.86 Mb，序列N50为30.45 Mb，包含50条无间隙假染色体，长度从20.70 ~ 49.02 Mb不等。BUSCO完整性评分达到99.20%，Genome Continuity Inspector评分为85.82，表明基因组完整性和准确性较高。值得注意的是，在100个端粒中检测到83个，其中33个染色体具有完整的端粒。比较基因组分析表明，扩大的基因家族和独特的基因在能量代谢、内分泌调节、细胞增殖和免疫反应等多种生物学性状中发挥重要作用，可能与多种代谢疾病和健康状况有关。正向选择的基因与多种生物过程有关，如金属内肽酶活性，它在中枢神经系统中起着重要作用，并与疾病有关。结论：锦鲤基因组组装（cc4.0）填补了了解普通鲤鱼生物学和适应性的关键空白。它为分子引导育种和遗传增强策略提供了宝贵的资源，强调了鲤鱼和锦鲤在水产养殖和生态研究中的重要性。

{"title":"A telomere-to-telomere genome assembly of koi carp (Cyprinus carpio) using long reads and Hi-C technology.","authors":"Jiandong Yuan, Jiang Li, Jun Yong, Xuewu Liao, Huijuan Guo, Yongchao Niu","doi":"10.1093/gigascience/giaf087","DOIUrl":"https://doi.org/10.1093/gigascience/giaf087","url":null,"abstract":"Background: The common carp (Cyprinus carpio) is a key species in global freshwater aquaculture. One of its variants, the koi carp, is particularly prized for its aesthetic appeal. However, lacking a high-quality genome has limited genetic research and breeding efforts for common carp and koi carp.Findings: This study presents a gap-free genome for the Taisho Sansyoku koi carp strain (C. carpio). The assembly achieved a total size of 1,555.86 Mb with a contig N50 of 30.45 Mb, comprising 50 gap-free pseudochromosomes ranging in length from 20.70 to 49.02 Mb. The BUSCO completeness score reached 99.20%, and the Genome Continuity Inspector score was 85.82, indicating high genome integrity and accuracy. Notably, 83 out of 100 telomeres were detected, resulting in 33 chromosomes possessing complete telomeres. Comparative genomic analysis showed that the expanded gene families and unique genes play essential roles in various biological traits, such as energy metabolism, endocrine regulation, cell proliferation, and immune response, potentially related to multiple metabolic diseases and health conditions. The positively selected genes are linked to various biological processes, such as the metalloendopeptidase activity, which plays a significant role in the central nervous system and is associated with diseases.Conclusions: The koi carp genome assembly (CC 4.0) fills a critical gap in understanding common carp's biology and adaptation. It provides an invaluable resource for molecular-guided breeding and genetic enhancement strategies, underscoring the importance of common carp and koi carp in aquaculture and ecological research.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395963/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SPEX: A modular end-to-end platform for high-plex tissue spatial omics analysis. SPEX：一个模块化的端到端平台，用于高复杂性组织空间组学分析。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf090

Xiao Li, Ximo Pechuan-Jorge, Tyler Risom, Conrad Foo, Alexander Prilipko, Artem Zubkov, Caleb Chan, Patrick Chang, Frank Peale, James Ziai, Sandra Rost, Derrek Hibar, Lisa McGinnis, Evgeniy Tabatsky, Xin Ye, Hector Corrada Bravo, Zhen Shi, Malgorzata Nowicka, Jon Scherdin, James Cowan, Jennifer Giltnane, Darya Orlova, Rajiv Jesudason

Recent advancements in transcriptomics and proteomics have opened the possibility for spatially resolved molecular characterization of tissue architecture with the promise of enabling a deeper understanding of tissue biology in either homeostasis or disease. The wealth of data generated by these technologies has recently driven the development of a wide range of computational methods. These methods have the requirement of advanced coding fluency to be applied and integrated across the full spatial omics analysis process, thus presenting a hurdle for widespread adoption by the biology research community. To address this, we introduce SPEX (Spatial Expression Explorer), a web-based analysis platform that employs modular analysis pipeline design, accessible through a user-friendly interface. SPEX's infrastructure allows for streamlined access to open-source image data management systems, analysis modules, and fully integrated data visualization solutions. Analysis modules include essential steps covering image processing, single-cell analysis, and spatial analysis. We demonstrate SPEX's ability to facilitate the discovery of biological insights in spatially resolved omics datasets from healthy tissue to tumor samples.

转录组学和蛋白质组学的最新进展为组织结构的空间解析分子表征提供了可能性，并有望在稳态或疾病中更深入地了解组织生物学。这些技术产生的大量数据最近推动了各种计算方法的发展。这些方法在整个空间组学分析过程中都需要高级的编码流畅性，因此对生物研究界的广泛采用提出了障碍。为了解决这个问题，我们引入了SPEX（空间表达浏览器），这是一个基于web的分析平台，采用模块化分析管道设计，通过用户友好的界面进行访问。SPEX的基础设施允许对开源图像数据管理系统、分析模块和完全集成的数据可视化解决方案进行简化访问。分析模块包括基本步骤，涵盖图像处理，单细胞分析和空间分析。我们证明了SPEX能够促进从健康组织到肿瘤样本的空间分辨组学数据集的生物学见解的发现。

{"title":"SPEX: A modular end-to-end platform for high-plex tissue spatial omics analysis.","authors":"Xiao Li, Ximo Pechuan-Jorge, Tyler Risom, Conrad Foo, Alexander Prilipko, Artem Zubkov, Caleb Chan, Patrick Chang, Frank Peale, James Ziai, Sandra Rost, Derrek Hibar, Lisa McGinnis, Evgeniy Tabatsky, Xin Ye, Hector Corrada Bravo, Zhen Shi, Malgorzata Nowicka, Jon Scherdin, James Cowan, Jennifer Giltnane, Darya Orlova, Rajiv Jesudason","doi":"10.1093/gigascience/giaf090","DOIUrl":"https://doi.org/10.1093/gigascience/giaf090","url":null,"abstract":"Recent advancements in transcriptomics and proteomics have opened the possibility for spatially resolved molecular characterization of tissue architecture with the promise of enabling a deeper understanding of tissue biology in either homeostasis or disease. The wealth of data generated by these technologies has recently driven the development of a wide range of computational methods. These methods have the requirement of advanced coding fluency to be applied and integrated across the full spatial omics analysis process, thus presenting a hurdle for widespread adoption by the biology research community. To address this, we introduce SPEX (Spatial Expression Explorer), a web-based analysis platform that employs modular analysis pipeline design, accessible through a user-friendly interface. SPEX's infrastructure allows for streamlined access to open-source image data management systems, analysis modules, and fully integrated data visualization solutions. Analysis modules include essential steps covering image processing, single-cell analysis, and spatial analysis. We demonstrate SPEX's ability to facilitate the discovery of biological insights in spatially resolved omics datasets from healthy tissue to tumor samples.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extraction of biological terms using large language models enhances the usability of metadata in the BioSample database. 使用大型语言模型提取生物术语增强了BioSample数据库中元数据的可用性。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf070

Shuya Ikeda, Zhaonan Zou, Hidemasa Bono, Yuki Moriya, Shuichi Kawashima, Toshiaki Katayama, Shinya Oki, Tazro Ohta

BioSample is a repository of experimental sample metadata. It is a comprehensive archive that enables searches of experiments, regardless of type. However, there is substantial variability in the submitted metadata due to the difficulty in defining comprehensive rules for describing them and the limited user awareness of best practices in creating them. This inconsistency poses considerable challenges to the findability and reusability of archived data. Given the scale of BioSample, which hosts over 40 million records, manual curation is impractical. Automatic rule-based ontology mapping methods have been proposed to address this issue, but their effectiveness is limited by the heterogeneity of the metadata. Recently, large language models (LLMs) have gained attention in natural language processing and are promising tools for automating metadata curation. In this study, we evaluated the performance of LLMs in extracting cell line names from BioSample descriptions using a gold-standard dataset derived from ChIP-Atlas, a secondary database of epigenomics experiment data in which samples were manually curated. The LLM-assisted methods outperformed traditional approaches, achieving higher accuracy and coverage. We further extended them to extract information about experimentally manipulated genes from metadata when manual curation had not yet been applied in ChIP-Atlas. This also yielded successful results, including the facilitation of more precise filtering of the data and the prevention of possible misinterpretations caused by the inclusion of unintended data. These findings underscore the potential of LLMs in improving the findability and reusability of experimental data in general, which would considerably reduce the user workload and enable more effective scientific data management.

BioSample是一个实验样本元数据存储库。这是一个全面的档案，可以搜索实验，无论类型。然而，由于难以定义描述元数据的综合规则，以及用户对创建元数据的最佳实践的认识有限，提交的元数据存在很大的可变性。这种不一致性对归档数据的可查找性和可重用性提出了相当大的挑战。考虑到BioSample的规模，它拥有超过4000万条记录，人工管理是不切实际的。基于规则的自动本体映射方法已经被提出来解决这个问题，但其有效性受到元数据异构性的限制。最近，大型语言模型（llm）在自然语言处理中引起了人们的关注，并且是自动化元数据管理的有前途的工具。在本研究中，我们使用来自ChIP-Atlas（表观基因组学实验数据的二级数据库，其中样本是手动整理的）的金标准数据集，评估llm从BioSample描述中提取细胞系名称的性能。llm辅助方法优于传统方法，实现了更高的准确性和覆盖率。我们进一步扩展了它们，以便在ChIP-Atlas尚未应用人工管理时从元数据中提取实验操作基因的信息。这也产生了成功的结果，包括促进更精确地过滤数据和防止因列入意外数据而可能造成的误解。这些发现强调了llm在提高实验数据的可查找性和可重用性方面的潜力，这将大大减少用户的工作量，并实现更有效的科学数据管理。

{"title":"Extraction of biological terms using large language models enhances the usability of metadata in the BioSample database.","authors":"Shuya Ikeda, Zhaonan Zou, Hidemasa Bono, Yuki Moriya, Shuichi Kawashima, Toshiaki Katayama, Shinya Oki, Tazro Ohta","doi":"10.1093/gigascience/giaf070","DOIUrl":"10.1093/gigascience/giaf070","url":null,"abstract":"BioSample is a repository of experimental sample metadata. It is a comprehensive archive that enables searches of experiments, regardless of type. However, there is substantial variability in the submitted metadata due to the difficulty in defining comprehensive rules for describing them and the limited user awareness of best practices in creating them. This inconsistency poses considerable challenges to the findability and reusability of archived data. Given the scale of BioSample, which hosts over 40 million records, manual curation is impractical. Automatic rule-based ontology mapping methods have been proposed to address this issue, but their effectiveness is limited by the heterogeneity of the metadata. Recently, large language models (LLMs) have gained attention in natural language processing and are promising tools for automating metadata curation. In this study, we evaluated the performance of LLMs in extracting cell line names from BioSample descriptions using a gold-standard dataset derived from ChIP-Atlas, a secondary database of epigenomics experiment data in which samples were manually curated. The LLM-assisted methods outperformed traditional approaches, achieving higher accuracy and coverage. We further extended them to extract information about experimentally manipulated genes from metadata when manual curation had not yet been applied in ChIP-Atlas. This also yielded successful results, including the facilitation of more precise filtering of the data and the prevention of possible misinterpretations caused by the inclusion of unintended data. These findings underscore the potential of LLMs in improving the findability and reusability of experimental data in general, which would considerably reduce the user workload and enable more effective scientific data management.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144474817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The genome of Hippophae salicifolia provides new insights into the sexual differentiation of sea buckthorn. 沙棘基因组的研究为沙棘的性别分化提供了新的认识。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf046

Mingyue Chen, Xingyu Yang, Lan Xun, Zhenlin Qu, Shihai Yang, Yunqiang Yang, Yongping Yang

Background: Dioecy, a common reproductive strategy in angiosperms, has evolved independently in various plant lineages, and this has resulted in the evolution of diverse sex chromosome systems and sex determination mechanisms. Hippophae is a genus of dioecious plants with an XY sex determination system, but the molecular underpinnings of this process have not yet been clarified. Most previously published sea buckthorn genome data have been derived from females, yet genomic data on males are critically important for clarifying our understanding of sex determination in this genus. Comparative genomic analyses of male and female sea buckthorn plants can shed light on the origins and evolution of sex. These studies can also enhance our understanding of the molecular mechanisms underlying sexual differentiation and provide novel insights and data for future research on sexual reproduction in plants.

Results: We conducted an in-depth analysis of the genomes of 2 sea buckthorn species, including a male Hippophae gyantsensis, a female Hippophae salicifolia, and 2 haplotypes of male H. salicifolia. The genome size of H. gyantsensis was 704.35 Mb, and that of the female H. salicifolia was 788.28 Mb. The sizes of the 2 haplotype genomes were 1,139.99 Mb and 1,097.34 Mb. The sex-determining region (SDR) of H. salicifolia was 29.71 Mb and contained 249 genes. A comparative analysis of the haplotypes of Chr02 of H. salicifolia revealed that the Y chromosome was shorter than the X chromosome. Chromosomal evolution analysis indicated that Hippophae has experienced significant chromosomal rearrangements following 2 whole-genome duplication events, and the fusion of 2 chromosomes has potentially led to the early formation of sex chromosomes in sea buckthorn. Multiple structural variations between Y and X sex-linked regions might have facilitated the rapid evolution of sex chromosomes in H. salicifolia. Comparison of the transcriptome data of male and female flower buds from H. gyantsensis and H. salicifolia revealed 11 genes specifically expressed in males. Three of these were identified as candidate genes involved in the sex determination of sea buckthorn. These findings will aid future studies of the sex determination mechanisms in sea buckthorn.

Conclusion: A comparative genomic analysis was performed to identify the SDR in H. salicifolia. The origins and evolutionary trajectories of sex chromosomes within Hippophae were also determined. Three potential candidate genes associated with sea buckthorn sex determination were identified. Overall, our findings will aid future studies aimed at clarifying the mechanisms of sex determination.

背景：雌雄异株是被子植物的一种常见的生殖策略，在不同的植物谱系中独立进化，导致了不同的性染色体系统和性别决定机制的进化。河马是一种具有XY性别决定系统的雌雄异株植物属，但这一过程的分子基础尚未明确。大多数先前发表的沙棘基因组数据都来自雌性，但雄性的基因组数据对于澄清我们对这一属的性别决定的理解至关重要。雄性和雌性沙棘植物的比较基因组分析可以揭示性别的起源和进化。这些研究也有助于我们进一步了解植物性别分化的分子机制，为植物有性生殖的进一步研究提供新的见解和数据。结果：我们对2种沙棘的基因组进行了深入分析，包括雄性沙棘（Hippophae gyantsensis）、雌性沙棘（Hippophae salicifolia）和雄性沙棘（H. salicifolia）的2个单倍型。雌雄水杨花基因组大小分别为704.35 Mb和788.28 Mb， 2个单倍型基因组大小分别为1139.99 Mb和1097.34 Mb，性别决定区（SDR）为29.71 Mb，包含249个基因。对水杨花Chr02单倍型的比较分析表明，水杨花的Y染色体比X染色体短。染色体进化分析表明，沙棘在两次全基因组重复事件后经历了显著的染色体重排，两条染色体的融合可能导致沙棘性染色体的早期形成。Y和X性别连锁区之间的多重结构变异可能促进了水杨树性染色体的快速进化。比较江杨和水杨花雌雄花蕾的转录组数据，发现有11个基因在雄性中特异性表达。其中3个被确定为沙棘性别决定的候选基因。这些发现将有助于进一步研究沙棘的性别决定机制。结论：通过比较基因组分析鉴定了水杨花的SDR。确定了河马性染色体的起源和进化轨迹。鉴定了三个与沙棘性别决定相关的潜在候选基因。总之，我们的发现将有助于未来旨在阐明性别决定机制的研究。

{"title":"The genome of Hippophae salicifolia provides new insights into the sexual differentiation of sea buckthorn.","authors":"Mingyue Chen, Xingyu Yang, Lan Xun, Zhenlin Qu, Shihai Yang, Yunqiang Yang, Yongping Yang","doi":"10.1093/gigascience/giaf046","DOIUrl":"10.1093/gigascience/giaf046","url":null,"abstract":"Background: Dioecy, a common reproductive strategy in angiosperms, has evolved independently in various plant lineages, and this has resulted in the evolution of diverse sex chromosome systems and sex determination mechanisms. Hippophae is a genus of dioecious plants with an XY sex determination system, but the molecular underpinnings of this process have not yet been clarified. Most previously published sea buckthorn genome data have been derived from females, yet genomic data on males are critically important for clarifying our understanding of sex determination in this genus. Comparative genomic analyses of male and female sea buckthorn plants can shed light on the origins and evolution of sex. These studies can also enhance our understanding of the molecular mechanisms underlying sexual differentiation and provide novel insights and data for future research on sexual reproduction in plants.Results: We conducted an in-depth analysis of the genomes of 2 sea buckthorn species, including a male Hippophae gyantsensis, a female Hippophae salicifolia, and 2 haplotypes of male H. salicifolia. The genome size of H. gyantsensis was 704.35 Mb, and that of the female H. salicifolia was 788.28 Mb. The sizes of the 2 haplotype genomes were 1,139.99 Mb and 1,097.34 Mb. The sex-determining region (SDR) of H. salicifolia was 29.71 Mb and contained 249 genes. A comparative analysis of the haplotypes of Chr02 of H. salicifolia revealed that the Y chromosome was shorter than the X chromosome. Chromosomal evolution analysis indicated that Hippophae has experienced significant chromosomal rearrangements following 2 whole-genome duplication events, and the fusion of 2 chromosomes has potentially led to the early formation of sex chromosomes in sea buckthorn. Multiple structural variations between Y and X sex-linked regions might have facilitated the rapid evolution of sex chromosomes in H. salicifolia. Comparison of the transcriptome data of male and female flower buds from H. gyantsensis and H. salicifolia revealed 11 genes specifically expressed in males. Three of these were identified as candidate genes involved in the sex determination of sea buckthorn. These findings will aid future studies of the sex determination mechanisms in sea buckthorn.Conclusion: A comparative genomic analysis was performed to identify the SDR in H. salicifolia. The origins and evolutionary trajectories of sex chromosomes within Hippophae were also determined. Three potential candidate genes associated with sea buckthorn sex determination were identified. Overall, our findings will aid future studies aimed at clarifying the mechanisms of sex determination.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12218201/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144553223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chromosome-scale assemblies of three Ormosia species: repetitive sequences distribution and structural rearrangement. 三种红藓的染色体尺度组合：重复序列分布和结构重排。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience

Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf047

Zheng-Feng Wang, En-Ping Yu, Lin Fu, Hua-Ge Deng, Wei-Guang Zhu, Feng-Xia Xu, Hong-Lin Cao

Background: The genus Ormosia belongs to the Fabaceae family; almost all Ormosia species are endemic to China, which is considered one of the centers of this genus. Thus, genomic studies on the genus are needed to better understand species evolution and ensure the conservation and utilization of these species. We performed a chromosome-scale assembly of O. purpureiflora and updated the chromosome-scale assemblies of O. emarginata and O. semicastrata for comparative genomics.

Findings: The genome assembly sizes of the 3 species ranged from 1.42 to 1.58 Gb, with O. purpureiflora being the largest. Repetitive sequences accounted for 74.0-76.3% of the genomes, and the predicted gene counts ranged from 50,517 to 55,061. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis indicated 97.0-98.4% genome completeness, whereas the long terminal repeat (LTR) assembly index values ranged from 13.66 to 17.56, meeting the "reference genome" quality standard. Gene completeness, assessed using BUSCO and OMArk, ranged from 95.1% to 96.3% and from 97.1% to 98.1%, respectively.Characterizing genome architectures further revealed that inversions were the main structural rearrangements in Ormosia. In numbers, density distributions of repetitive elements revealed the types of Helitron and terminal inverted repeat (TIR) elements and the types of Gypsy and unknown LTR retrotransposons (LTR-RTs) concentrated in different regions on the chromosomes, whereas Copia LTR-RTs were generally evenly distributed along the chromosomes in Ormosia.Compared with the sister species Lupinus albus, Ormosia species had lower numbers and percentages of resistance (R) genes and transcription factor genes. Genes related to alkaloid, terpene, and flavonoid biosynthesis were found to be duplicated through tandem or proximal duplications. Notably, some genes associated with growth and defense were absent in O. purpureiflora.By resequencing 153 genotypes (∼30 Gb of data per sample) from 6 O. purpureiflora (sub)populations, we identified 40,146 single nucleotide polymorphisms. Corresponding to its very small populations, O. purpureiflora exhibited low genetic diversity.

Conclusions: The Ormosia genome assemblies provide valuable resources for studying the evolution, conservation, and potential utility of both Ormosia and Fabaceae species.

背景：红豆属属于豆科；几乎所有的红豆属都是中国特有的，中国被认为是红豆属的中心之一。因此，需要对该属进行基因组研究，以更好地了解物种的进化，并确保这些物种的保护和利用。我们对紫花O. purpureiflora进行了染色体尺度组装，并更新了O. emarginata和O. semiastrata的染色体尺度组装，用于比较基因组学。结果：3种植物的基因组组装大小在1.42 ~ 1.58 Gb之间，其中紫花O. purpureiflora最大；重复序列占基因组的74.0-76.3%，预测的基因数量在50,517 ~ 55,061之间。基准通用单拷贝Orthologs （BUSCO）分析显示基因组完整性为97.0 ~ 98.4%，长末端重复序列（LTR）组装指数为13.66 ~ 17.56，符合“参考基因组”质量标准。使用BUSCO和OMArk评估的基因完整性范围分别为95.1% ~ 96.3%和97.1% ~ 98.1%。基因组结构的表征进一步揭示了倒置是红藓的主要结构重排。在数量上，重复元件的密度分布表明Helitron和末端倒置重复（TIR）元件的类型以及Gypsy和未知LTR反转录转座子（LTR- rts）的类型集中在染色体的不同区域，而在红藓中，Copia的LTR- rts一般沿染色体均匀分布。与姐妹种白斑Lupinus albus相比，红豆种的抗性基因(R)和转录因子基因的数量和百分比较低。与生物碱、萜烯和类黄酮生物合成有关的基因通过串联或近端重复被发现。值得注意的是，紫花O. purpureiflora中缺少一些与生长和防御相关的基因。通过对来自6个O. purpureiflora（亚）群体的153个基因型（每个样本约30 Gb数据）进行重测序，我们鉴定出40146个单核苷酸多态性。相对于其非常小的种群，紫花O. purpureiflora表现出较低的遗传多样性。结论：红豆属植物基因组为研究红豆科植物和豆科植物的进化、保护和潜在利用提供了宝贵的资源。

{"title":"Chromosome-scale assemblies of three Ormosia species: repetitive sequences distribution and structural rearrangement.","authors":"Zheng-Feng Wang, En-Ping Yu, Lin Fu, Hua-Ge Deng, Wei-Guang Zhu, Feng-Xia Xu, Hong-Lin Cao","doi":"10.1093/gigascience/giaf047","DOIUrl":"10.1093/gigascience/giaf047","url":null,"abstract":"Background: The genus Ormosia belongs to the Fabaceae family; almost all Ormosia species are endemic to China, which is considered one of the centers of this genus. Thus, genomic studies on the genus are needed to better understand species evolution and ensure the conservation and utilization of these species. We performed a chromosome-scale assembly of O. purpureiflora and updated the chromosome-scale assemblies of O. emarginata and O. semicastrata for comparative genomics.Findings: The genome assembly sizes of the 3 species ranged from 1.42 to 1.58 Gb, with O. purpureiflora being the largest. Repetitive sequences accounted for 74.0-76.3% of the genomes, and the predicted gene counts ranged from 50,517 to 55,061. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis indicated 97.0-98.4% genome completeness, whereas the long terminal repeat (LTR) assembly index values ranged from 13.66 to 17.56, meeting the \"reference genome\" quality standard. Gene completeness, assessed using BUSCO and OMArk, ranged from 95.1% to 96.3% and from 97.1% to 98.1%, respectively.Characterizing genome architectures further revealed that inversions were the main structural rearrangements in Ormosia. In numbers, density distributions of repetitive elements revealed the types of Helitron and terminal inverted repeat (TIR) elements and the types of Gypsy and unknown LTR retrotransposons (LTR-RTs) concentrated in different regions on the chromosomes, whereas Copia LTR-RTs were generally evenly distributed along the chromosomes in Ormosia.Compared with the sister species Lupinus albus, Ormosia species had lower numbers and percentages of resistance (R) genes and transcription factor genes. Genes related to alkaloid, terpene, and flavonoid biosynthesis were found to be duplicated through tandem or proximal duplications. Notably, some genes associated with growth and defense were absent in O. purpureiflora.By resequencing 153 genotypes (∼30 Gb of data per sample) from 6 O. purpureiflora (sub)populations, we identified 40,146 single nucleotide polymorphisms. Corresponding to its very small populations, O. purpureiflora exhibited low genetic diversity.Conclusions: The Ormosia genome assemblies provide valuable resources for studying the evolution, conservation, and potential utility of both Ormosia and Fabaceae species.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144077473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0