首页 > 最新文献

GigaScience最新文献

英文 中文
An ecosystem for producing and sharing metadata within the web of FAIR Data. 一个在FAIR数据网络中生成和共享元数据的生态系统。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae111
Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil

Background: Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.

Results: We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).

Conclusion: The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.

背景:描述性元数据对于报告、发现、利用和动员研究数据集至关重要。然而,将元数据问题作为数据管理计划的一部分来解决,对于数据生产者来说可能会很复杂。为了组织和记录数据,必须创建各种描述性元数据。此外,在共享数据时,重要的是确保元数据的互操作性符合FAIR(可查找、可访问、可互操作、可重用)原则。考虑到这些挑战的实际性质,需要能够有效地帮助数据管理人员的管理工具。此外,这些工具应满足数据生产者的需要,便于使用,只需要很少的培训。结果:我们开发了Maggot (Metadata Aggregation on Data Storage),这是一个基于web的工具,可以使用高级元数据在本地管理数据目录。主要目标是方便数据传播和存储在数据存储库中。使用Maggot,用户可以轻松地生成高级元数据并将其附加到数据集,从而在协作环境中实现无缝共享。这种方法与许多数据管理计划相一致,因为它有效地解决了与数据组织、文档、存储和元数据共享相关的挑战,这些挑战基于协作组内部和外部的FAIR原则。此外,Maggot支持元数据交叉(即,生成的元数据可以转换为特定数据存储库使用的模式,或者使用适合第三方应用程序收集数据的格式导出)。结论:Maggot的主要目的是使用精心选择的模式和标准来简化高级元数据的收集。此外,它简化了通过元数据的数据可访问性,这通常是公共资助项目的需求。因此,可以利用Maggot促进有效的本地管理,以促进数据共享,同时遵守公平原则。此外,它还有助于在欧洲开放科学云框架内准备未来的EOSC FAIR数据网络。
{"title":"An ecosystem for producing and sharing metadata within the web of FAIR Data.","authors":"Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil","doi":"10.1093/gigascience/giae111","DOIUrl":"https://doi.org/10.1093/gigascience/giae111","url":null,"abstract":"<p><strong>Background: </strong>Descriptive metadata are vital for reporting, discovering, leveraging, and mobilizing research datasets. However, resolving metadata issues as part of a data management plan can be complex for data producers. To organize and document data, various descriptive metadata must be created. Furthermore, when sharing data, it is important to ensure metadata interoperability in line with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Given the practical nature of these challenges, there is a need for management tools that can assist data managers effectively. Additionally, these tools should meet the needs of data producers and be user-friendly, requiring minimal training.</p><p><strong>Results: </strong>We developed Maggot (Metadata Aggregation on Data Storage), a web-based tool to locally manage a data catalog using high-level metadata. The main goal was to facilitate easy data dissemination and deposition in data repositories. With Maggot, users can easily generate and attach high-level metadata to datasets, allowing for seamless sharing in a collaborative environment. This approach aligns with many data management plans as it effectively addresses challenges related to data organization, documentation, storage, and the sharing of metadata based on FAIR principles within and beyond the collaborative group. Furthermore, Maggot enables metadata crosswalks (i.e., generated metadata can be converted to the schema used by a specific data repository or be exported using a format suitable for data collection by third-party applications).</p><p><strong>Conclusion: </strong>The primary purpose of Maggot is to streamline the collection of high-level metadata using carefully chosen schemas and standards. Additionally, it simplifies data accessibility via metadata, typically a requirement for publicly funded projects. As a result, Maggot can be utilized to promote effective local management with the goal of facilitating data sharing while adhering to the FAIR principles. Furthermore, it can contribute to the preparation of the future EOSC FAIR Web of Data within the European Open Science Cloud framework.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guidance framework to apply best practices in ecological data analysis: lessons learned from building Galaxy-Ecology. 应用生态数据分析最佳做法的指导框架:从建立星系生态学中学到的经验教训。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae122
Coline Royaux, Jean-Baptiste Mihoub, Marie Jossé, Dominique Pelletier, Olivier Norvez, Yves Reecht, Anne Fouilloux, Helena Rasche, Saskia Hiltemann, Bérénice Batut, Eléaume Marc, Pauline Seguineau, Guillaume Massé, Alan Amossé, Claire Bissery, Romain Lorrilliere, Alexis Martin, Yves Bas, Thimothée Virgoulay, Valentin Chambon, Elie Arnaud, Elisa Michon, Clara Urfer, Eloïse Trigodet, Marie Delannoy, Gregoire Loïs, Romain Julliard, Björn Grüning, Yvan Le Bras

Numerous conceptual frameworks exist for best practices in research data and analysis (e.g., Open Science and FAIR principles). In practice, there is a need for further progress to improve transparency, reproducibility, and confidence in ecology. Here, we propose a practical and operational framework for researchers and experts in ecology to achieve best practices for building analytical procedures from individual research projects to production-level analytical pipelines. We introduce the concept of atomization to identify analytical steps that support generalization by allowing us to go beyond single analyses. The term atomization is employed to convey the idea of single analytical steps as "atoms" composing an analytical procedure. When generalized, "atoms" can be used in more than a single case analysis. These guidelines were established during the development of the Galaxy-Ecology initiative, a web platform dedicated to data analysis in ecology. Galaxy-Ecology allows us to demonstrate a way to reach higher levels of reproducibility in ecological sciences by increasing the accessibility and reusability of analytical workflows once atomized and generalized.

对于研究数据和分析的最佳实践,存在许多概念性框架(例如,开放科学和公平原则)。在实践中,需要进一步提高透明度、可重复性和对生态的信心。在这里,我们为生态学研究人员和专家提出了一个实用和可操作的框架,以实现从单个研究项目到生产级分析管道构建分析程序的最佳实践。我们引入了原子化的概念,通过允许我们超越单一分析来识别支持泛化的分析步骤。术语“原子化”是用来表达单个分析步骤作为组成分析过程的“原子”的思想。当普遍化时,“原子”可以用于不止一个案例分析。这些指导方针是在星系生态倡议的发展过程中制定的,这是一个致力于生态学数据分析的网络平台。星系生态学让我们展示了一种在生态科学中达到更高水平的可重复性的方法,通过增加分析工作流程的可访问性和可重用性,一旦原子化和普遍化。
{"title":"Guidance framework to apply best practices in ecological data analysis: lessons learned from building Galaxy-Ecology.","authors":"Coline Royaux, Jean-Baptiste Mihoub, Marie Jossé, Dominique Pelletier, Olivier Norvez, Yves Reecht, Anne Fouilloux, Helena Rasche, Saskia Hiltemann, Bérénice Batut, Eléaume Marc, Pauline Seguineau, Guillaume Massé, Alan Amossé, Claire Bissery, Romain Lorrilliere, Alexis Martin, Yves Bas, Thimothée Virgoulay, Valentin Chambon, Elie Arnaud, Elisa Michon, Clara Urfer, Eloïse Trigodet, Marie Delannoy, Gregoire Loïs, Romain Julliard, Björn Grüning, Yvan Le Bras","doi":"10.1093/gigascience/giae122","DOIUrl":"10.1093/gigascience/giae122","url":null,"abstract":"<p><p>Numerous conceptual frameworks exist for best practices in research data and analysis (e.g., Open Science and FAIR principles). In practice, there is a need for further progress to improve transparency, reproducibility, and confidence in ecology. Here, we propose a practical and operational framework for researchers and experts in ecology to achieve best practices for building analytical procedures from individual research projects to production-level analytical pipelines. We introduce the concept of atomization to identify analytical steps that support generalization by allowing us to go beyond single analyses. The term atomization is employed to convey the idea of single analytical steps as \"atoms\" composing an analytical procedure. When generalized, \"atoms\" can be used in more than a single case analysis. These guidelines were established during the development of the Galaxy-Ecology initiative, a web platform dedicated to data analysis in ecology. Galaxy-Ecology allows us to demonstrate a way to reach higher levels of reproducibility in ecological sciences by increasing the accessibility and reusability of analytical workflows once atomized and generalized.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11816794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143407005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality assembly revealing the PMEL gene for the unique plumage phenotype in Liancheng ducks. 高质量的基因组装揭示了连城鸭独特羽色表型的 PMEL 基因。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae114
Zhen Wang, Zhanbao Guo, Hongfei Liu, Tong Liu, Dapeng Liu, Simeng Yu, Hehe Tang, He Zhang, Qiming Mou, Bo Zhang, Junting Cao, Martine Schroyen, Shuisheng Hou, Zhengkui Zhou

Background: Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.

Results: In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.

Conclusions: Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.

背景:羽毛的颜色是鸭子的一个显著特征,连城鸭以其白色的羽毛、黑色的喙和蹼足为特征,是一个很好的研究对象。然而,学术界对鸭羽毛颜色的遗传机制的理解仍然有限。为此,利用HiFi reads对连城鸭基因组(GCA_039998735.1)进行从头组装,并从连城鸭和北京鸭中获得F2个分离群体。目的是探讨连城鸭白羽的遗传机制。结果:本研究共组装了1.29 Gb连城鸭基因组,其中N50为12.17 Mb,支架N50为83.98 Mb。除了MITF基因的上位性作用外,全基因组关联分析确定了包含PMEL基因的0.8 Mb基因组区域。该基因编码了一种色素细胞特有的蛋白质,对黑色素小体(负责色素沉着的细胞器)内纤维片的形成至关重要。此外,连锁不平衡分析还发现了2个候选单核苷酸多态性(Chr33: 5,303,994A>G;5,303,997A>G)可能改变PMEL转录,可能影响连城鸭的羽毛颜色。结论:本研究为连城鸭构建了一个高质量的基因组,并提供了令人信服的证据,证明该品种的白色羽毛特征可归因于PMEL基因。总的来说,这些发现为未来的研究和育种计划提供了重要的见解和方向,旨在了解和操纵鸟类的羽毛颜色。
{"title":"A high-quality assembly revealing the PMEL gene for the unique plumage phenotype in Liancheng ducks.","authors":"Zhen Wang, Zhanbao Guo, Hongfei Liu, Tong Liu, Dapeng Liu, Simeng Yu, Hehe Tang, He Zhang, Qiming Mou, Bo Zhang, Junting Cao, Martine Schroyen, Shuisheng Hou, Zhengkui Zhou","doi":"10.1093/gigascience/giae114","DOIUrl":"10.1093/gigascience/giae114","url":null,"abstract":"<p><strong>Background: </strong>Plumage coloration is a distinctive trait in ducks, and the Liancheng duck, characterized by its white plumage and black beak and webbed feet, serves as an excellent subject for such studies. However, academic comprehension of the genetic mechanisms underlying duck plumage coloration remains limited. To this end, the Liancheng duck genome (GCA_039998735.1) was hereby de novo assembled using HiFi reads, and F2 segregating populations were generated from Liancheng and Pekin ducks. The aim was to identify the genetic mechanism of white plumage in Liancheng ducks.</p><p><strong>Results: </strong>In this study, 1.29 Gb Liancheng duck genome was de novo assembled, involving a contig N50 of 12.17 Mb and a scaffold N50 of 83.98 Mb. Beyond the epistatic effect of the MITF gene, genome-wide association study analysis pinpointed a 0.8-Mb genomic region encompassing the PMEL gene. This gene encoded a protein specific to pigment cells and was essential for the formation of fibrillar sheets within melanosomes, the organelles responsible for pigmentation. Additionally, linkage disequilibrium analysis revealed 2 candidate single-nucleotide polymorphisms (Chr33: 5,303,994A>G; 5,303,997A>G) that might alter PMEL transcription, potentially influencing plumage coloration in Liancheng ducks.</p><p><strong>Conclusions: </strong>Our study has assembled a high-quality genome for the Liancheng duck and has presented compelling evidence that the white plumage characteristic of this breed is attributable to the PMEL gene. Overall, these findings offer significant insights and direction for future studies and breeding programs aimed at understanding and manipulating avian plumage coloration.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions. NanoMnT:一个STR分析工具,用于牛津纳米孔测序数据,由STR区域的错误剖面的综合分析驱动。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf013
Gyumin Park, Hyunsu An, Han Luo, Jihwan Park

Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.

牛津纳米孔技术(ONT)测序是第三代测序技术,具有成本效益的长读测序,在生物研究中具有广泛的应用。然而,其在低复杂度区域的高测序错误率阻碍了其在短串联重复序列(STR)相关研究中的应用。为了解决这个问题,我们通过分析公开可用的纳米孔测序数据集,生成了ONT的全面STR错误概况。研究结果表明,序列错误率不仅受序列长度的影响,还受重复单元和侧翼序列的影响。有趣的是,某些侧翼序列与更高的测序精度相关,这表明某些STR位点比其他位点更适合进行纳米孔测序。虽然STR区域内替换错误的碱基质量分数低于正确测序的碱基,但在indel错误中没有观察到这种模式。此外,选择最新的碱基调用者版本和使用超精度模型显著提高了STR测序的准确性。最后,我们提出了NanoMnT,这是一个轻量级的Python工具,可以纠正测序数据中的STR测序错误并估计STR等位基因大小。NanoMnT在估计STR等位基因大小时利用了ONT的特性,与现有工具相比,在1-bp和2-bp重复STR上显示出更好的结果。通过整合我们的研究结果,我们将Ax10重复序列的STR等位基因估计精度从55%提高到78%,在排除具有不利侧链序列的位点时提高到85%。使用NanoMnT,我们通过识别癌症测序数据中的微卫星不稳定状态来展示我们的发现的实用性。NanoMnT可在https://github.com/18parkky/NanoMnT公开获取。
{"title":"NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions.","authors":"Gyumin Park, Hyunsu An, Han Luo, Jihwan Park","doi":"10.1093/gigascience/giaf013","DOIUrl":"10.1093/gigascience/giaf013","url":null,"abstract":"<p><p>Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143648038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomes reveal pervasive distant hybridization in nature among cyprinid fishes. 基因组揭示了自然界中鲤科鱼类之间普遍存在的远端杂交。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae117
Li Ren, Xiaolong Tu, Mengxue Luo, Qizhi Liu, Jialin Cui, Xin Gao, Hong Zhang, Yakui Tai, Yiyan Zeng, Mengdan Li, Chang Wu, Wuhui Li, Jing Wang, Dongdong Wu, Shaojun Liu

Background: Genomic data have unveiled a fascinating aspect of the evolutionary past, showing that the mingling of different species through hybridization has left its mark on the histories of numerous life forms. However, the relationship between hybridization events and the origins of cyprinid fishes remains unclear.

Results: In this study, we generated de novo assembled genomes of 8 cyprinid fishes and conducted phylogenetic analyses on 24 species. Widespread allele sharing across species boundaries was observed within 7 subfamilies of cyprinid fishes. Based on a systematic analysis of multiple tissues, we found that the testis exhibited a conserved pattern of divergence between the herbivorous Megalobrama amblycephala and the carnivorous Culter alburnus, suggesting a potential link to incomplete reproductive isolation. Significant differences in the expression of 4 genes (dpp2, ctrl, psb7, and ppce) in the liver and intestine, accompanied by variations in enzyme activities, indicated swift divergence in digestive enzyme secretion. Moreover, we identified introgressed genes linked to organ development in sympatric fishes with analogous feeding habits within the Cultrinae and Leuciscinae subfamilies.

Conclusions: Our findings highlight the significant role played by incomplete reproductive isolation and frequent gene flow events, particularly those associated with the development of digestive organs, in driving speciation among cyprinid fishes in diverse freshwater ecosystems.

背景:基因组数据揭示了进化历史的一个迷人方面,表明不同物种通过杂交的混合在许多生命形式的历史上留下了印记。然而,杂交事件与鲤科鱼类起源之间的关系尚不清楚。结果:本研究构建了8种鲤科鱼类的从头组装基因组,并对24种鲤科鱼类进行了系统发育分析。在鲤科的7个亚科中发现了广泛的跨种共享等位基因。基于对多个组织的系统分析,我们发现在草食性的大头虫和肉食性的Culter alburnus之间,睾丸表现出一种保守的分化模式,这表明可能与不完全的生殖隔离有关。肝脏和肠道中4个基因(dpp2、ctrl、psb7、ppce)表达差异显著,酶活性也发生变化,说明消化酶分泌分化迅速。此外,我们在Cultrinae和Leuciscinae亚科中具有类似摄食习性的同域鱼类中发现了与器官发育相关的渐渗基因。结论:我们的研究结果强调了不完全的生殖隔离和频繁的基因流动事件,特别是与消化器官发育相关的基因流动事件,在驱动不同淡水生态系统中鲤科鱼类物种形成中的重要作用。
{"title":"Genomes reveal pervasive distant hybridization in nature among cyprinid fishes.","authors":"Li Ren, Xiaolong Tu, Mengxue Luo, Qizhi Liu, Jialin Cui, Xin Gao, Hong Zhang, Yakui Tai, Yiyan Zeng, Mengdan Li, Chang Wu, Wuhui Li, Jing Wang, Dongdong Wu, Shaojun Liu","doi":"10.1093/gigascience/giae117","DOIUrl":"10.1093/gigascience/giae117","url":null,"abstract":"<p><strong>Background: </strong>Genomic data have unveiled a fascinating aspect of the evolutionary past, showing that the mingling of different species through hybridization has left its mark on the histories of numerous life forms. However, the relationship between hybridization events and the origins of cyprinid fishes remains unclear.</p><p><strong>Results: </strong>In this study, we generated de novo assembled genomes of 8 cyprinid fishes and conducted phylogenetic analyses on 24 species. Widespread allele sharing across species boundaries was observed within 7 subfamilies of cyprinid fishes. Based on a systematic analysis of multiple tissues, we found that the testis exhibited a conserved pattern of divergence between the herbivorous Megalobrama amblycephala and the carnivorous Culter alburnus, suggesting a potential link to incomplete reproductive isolation. Significant differences in the expression of 4 genes (dpp2, ctrl, psb7, and ppce) in the liver and intestine, accompanied by variations in enzyme activities, indicated swift divergence in digestive enzyme secretion. Moreover, we identified introgressed genes linked to organ development in sympatric fishes with analogous feeding habits within the Cultrinae and Leuciscinae subfamilies.</p><p><strong>Conclusions: </strong>Our findings highlight the significant role played by incomplete reproductive isolation and frequent gene flow events, particularly those associated with the development of digestive organs, in driving speciation among cyprinid fishes in diverse freshwater ecosystems.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11779505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143065175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interspecific hybridization in Brassica species leads to changes in agronomic traits through the regulation of gene expression by chromatin accessibility and DNA methylation. 种间杂交通过染色质可及性和DNA甲基化对基因表达的调控,导致芸苔属植物农艺性状的变化。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf029
Chengtao Quan, Qin Zhang, Xiaoni Zhang, Kexin Chai, Guoting Cheng, Chaozhi Ma, Cheng Dai

Interspecific hybridization is a common method in plant breeding to combine traits from different species, resulting in allopolyploidization and significant genetic and epigenetic changes. However, our understanding of genome-wide chromatin and gene expression dynamics during allopolyploidization remains limited. This study generated two Brassica allotriploid hybrids via interspecific hybridization. We observed that accessible chromatin regions (ACRs) and DNA methylation interact to regulates gene expression after interspecific hybridization, ultimately influencing the agronomic traits of the hybrids. In total, 234,649 ACRs were identified in the parental lines and hybrids; the hybridization process induces changes in the distribution and abundance of their accessible chromatin regions, particularly in gene regions and their proximity. Genes associated with proximal ACRs were more highly expressed than those associated with distal and genic ACRs. More than half of novel ACRs drove transgressive gene expression in the hybrids, and the transgressive upregulated genes showed significant enrichment in metal ion binding, especially magnesium ion, calcium ion, and potassium ion binding. We also identified Bna.bZIP11 in the single-parent activation ACR, which binds to BnaA06.UF3GT to promote anthocyanin accumulation in F1 hybrids. DNA methylation plays a role in repressing gene expression, and unmethylated ACRs are more transcriptionally active. Additionally, the A-subgenome ACRs were associated with genome dosage rather than DNA methylation. The interplay among DNA methylation, transposable elements, and sRNA contributes to the dynamic landscape of ACRs during interspecific hybridization, resulting in distinct gene expression patterns on the genome.

种间杂交是植物育种中常用的一种将不同物种的性状组合在一起的方法,其结果是异源多倍体化,并产生显著的遗传和表观遗传变化。然而,我们对异源多倍体化过程中全基因组染色质和基因表达动态的理解仍然有限。本研究通过种间杂交获得了两个芸苔异体三倍体杂种。我们观察到,可达染色质区(ACRs)和DNA甲基化相互作用,在种间杂交后调控基因表达,最终影响杂种的农艺性状。在亲本系和杂交种中共鉴定出234,649个acr;杂交过程引起其可接近染色质区域的分布和丰度的变化,特别是在基因区域及其邻近区域。与近端ACRs相关的基因比远端和基因ACRs相关的基因表达更高。半数以上的新ACRs驱动越界基因在杂交种中表达,越界上调基因在金属离子结合中表现出显著的富集,尤其是在镁离子、钙离子和钾离子结合中。我们还确定了Bna。bZIP11在双亲激活ACR中,它与BnaA06结合。UF3GT促进F1杂交体花青素积累。DNA甲基化在抑制基因表达中起作用,未甲基化的acr在转录上更活跃。此外,a亚基因组acr与基因组剂量相关,而与DNA甲基化无关。在种间杂交过程中,DNA甲基化、转座因子和sRNA之间的相互作用决定了acr的动态格局,从而导致基因组上不同的基因表达模式。
{"title":"Interspecific hybridization in Brassica species leads to changes in agronomic traits through the regulation of gene expression by chromatin accessibility and DNA methylation.","authors":"Chengtao Quan, Qin Zhang, Xiaoni Zhang, Kexin Chai, Guoting Cheng, Chaozhi Ma, Cheng Dai","doi":"10.1093/gigascience/giaf029","DOIUrl":"https://doi.org/10.1093/gigascience/giaf029","url":null,"abstract":"<p><p>Interspecific hybridization is a common method in plant breeding to combine traits from different species, resulting in allopolyploidization and significant genetic and epigenetic changes. However, our understanding of genome-wide chromatin and gene expression dynamics during allopolyploidization remains limited. This study generated two Brassica allotriploid hybrids via interspecific hybridization. We observed that accessible chromatin regions (ACRs) and DNA methylation interact to regulates gene expression after interspecific hybridization, ultimately influencing the agronomic traits of the hybrids. In total, 234,649 ACRs were identified in the parental lines and hybrids; the hybridization process induces changes in the distribution and abundance of their accessible chromatin regions, particularly in gene regions and their proximity. Genes associated with proximal ACRs were more highly expressed than those associated with distal and genic ACRs. More than half of novel ACRs drove transgressive gene expression in the hybrids, and the transgressive upregulated genes showed significant enrichment in metal ion binding, especially magnesium ion, calcium ion, and potassium ion binding. We also identified Bna.bZIP11 in the single-parent activation ACR, which binds to BnaA06.UF3GT to promote anthocyanin accumulation in F1 hybrids. DNA methylation plays a role in repressing gene expression, and unmethylated ACRs are more transcriptionally active. Additionally, the A-subgenome ACRs were associated with genome dosage rather than DNA methylation. The interplay among DNA methylation, transposable elements, and sRNA contributes to the dynamic landscape of ACRs during interspecific hybridization, resulting in distinct gene expression patterns on the genome.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143979240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities. 解除高维数据的诅咒:各种生物数据模式的自动投影追踪聚类。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf052
Claire Simpson, Evgeniy Tabatsky, Zainab Rahil, Devon J Eddins, Sasha Tkachev, Florian Georgescauld, Derek Papalegis, Martin Culka, Tyler Levy, Ivan Gregoretti, Connor Meehan, Chiara Schiller, Kresimir Bestak, Denis Schapiro, Andrei Chernyshev, Guenther Walther, Eliver E B Ghosn, Darya Orlova

Unsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structures, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, high-dimensional data may include transcriptomics, proteomics, and a variety of single-cell omics data. Most existing clustering algorithms operate directly in the high-dimensional space, and their performance may be negatively affected by the phenomenon known as the curse of dimensionality. Here, we show an alternative clustering approach that alleviates the curse by sequentially projecting high-dimensional data into a low-dimensional representation. We validated the effectiveness of our approach, named automated projection pursuit (APP), across various biological data modalities, including flow and mass cytometry data, scRNA-seq, multiplex imaging data, and T-cell receptor repertoire data. APP efficiently recapitulated experimentally validated cell-type definitions and revealed new biologically meaningful patterns.

无监督聚类是一种强大的机器学习技术,广泛用于分析高维生物数据。它在发现复杂数据集中的模式、结构和内在关系方面起着至关重要的作用,而不依赖于预定义的标签。在生物学的背景下,高维数据可能包括转录组学、蛋白质组学和各种单细胞组学数据。大多数现有的聚类算法直接在高维空间中运行,它们的性能可能会受到被称为维度诅咒的现象的负面影响。在这里,我们展示了另一种聚类方法,它通过将高维数据顺序地投影到低维表示中来缓解这种诅咒。我们验证了我们的方法的有效性,称为自动投影追踪(APP),跨越各种生物数据模式,包括流式和质量细胞仪数据、scRNA-seq、多重成像数据和t细胞受体库数据。APP有效地概括了实验验证的细胞类型定义,并揭示了新的生物学意义模式。
{"title":"Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities.","authors":"Claire Simpson, Evgeniy Tabatsky, Zainab Rahil, Devon J Eddins, Sasha Tkachev, Florian Georgescauld, Derek Papalegis, Martin Culka, Tyler Levy, Ivan Gregoretti, Connor Meehan, Chiara Schiller, Kresimir Bestak, Denis Schapiro, Andrei Chernyshev, Guenther Walther, Eliver E B Ghosn, Darya Orlova","doi":"10.1093/gigascience/giaf052","DOIUrl":"10.1093/gigascience/giaf052","url":null,"abstract":"<p><p>Unsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structures, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, high-dimensional data may include transcriptomics, proteomics, and a variety of single-cell omics data. Most existing clustering algorithms operate directly in the high-dimensional space, and their performance may be negatively affected by the phenomenon known as the curse of dimensionality. Here, we show an alternative clustering approach that alleviates the curse by sequentially projecting high-dimensional data into a low-dimensional representation. We validated the effectiveness of our approach, named automated projection pursuit (APP), across various biological data modalities, including flow and mass cytometry data, scRNA-seq, multiplex imaging data, and T-cell receptor repertoire data. APP efficiently recapitulated experimentally validated cell-type definitions and revealed new biologically meaningful patterns.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121483/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144173575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case for estradiol: younger brains in women with earlier menarche and later menopause. 雌二醇的一个案例:初潮早、绝经晚的女性大脑更年轻。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf060
Eileen Luders, Inger Sundström Poromaa, Claudia Barth, Christian Gaser

The transition to menopause is marked by a gradual decrease of estradiol. Concurrently, the risk of dementia in women increases around menopause, suggesting that estradiol (or the lack thereof) plays a role in the development of dementia and other age-related neuropathologies. Here, we set out to investigate whether there is a link between brain aging and estradiol-associated events, such as menarche and menopause. For this purpose, we applied a well-validated machine learning approach to analyze both cross-sectional and longitudinal data from a sample of 1,006 postmenopausal women who underwent structural magnetic resonance imaging twice, approximately 2 years apart. We observed less brain aging in women with an earlier menarche, a later menopause, and a longer reproductive span (i.e., the time interval between menarche and menopause). These effects were evident both cross-sectionally and longitudinally, supporting the notion that estradiol has neuroprotective properties and contributes to brain preservation. However, further research is required because the observed effects were small, estradiol was not directly measured, and other factors may modulate female brain health. Future studies might benefit from incorporating actual estradiol (and other hormone) measures, as well as considering genetic predispositions and lifestyle factors alongside indicators of brain aging to deepen our understanding of estradiol's role in maintaining brain health. Additionally, including more diverse study populations (e.g., varying in ethnicity, socioeconomic status, and health status) in follow-up research would enhance the generalizability and applicability of these findings.

过渡到更年期的标志是雌二醇的逐渐减少。同时,女性在更年期前后患痴呆症的风险增加,这表明雌二醇(或缺乏雌二醇)在痴呆症和其他与年龄相关的神经疾病的发展中起着重要作用。在这里,我们着手调查大脑衰老和雌二醇相关事件(如月经初潮和更年期)之间是否存在联系。为此,我们应用了一种经过验证的机器学习方法来分析1006名绝经后妇女样本的横截面和纵向数据,这些妇女接受了两次结构磁共振成像,间隔约2年。我们观察到初潮较早、绝经较晚、生育周期较长(即初潮和绝经之间的时间间隔)的女性大脑衰老较少。这些影响在横断面和纵向上都很明显,支持雌二醇具有神经保护特性并有助于大脑保存的观点。然而,由于观察到的影响很小,雌二醇没有直接测量,以及其他因素可能调节女性大脑健康,因此需要进一步的研究。未来的研究可能会受益于结合实际的雌二醇(和其他激素)测量,以及考虑遗传倾向和生活方式因素以及大脑衰老指标,以加深我们对雌二醇在维持大脑健康方面的作用的理解。此外,在后续研究中纳入更多样化的研究人群(例如,不同种族、社会经济地位和健康状况的人群)将增强这些发现的普遍性和适用性。
{"title":"A Case for estradiol: younger brains in women with earlier menarche and later menopause.","authors":"Eileen Luders, Inger Sundström Poromaa, Claudia Barth, Christian Gaser","doi":"10.1093/gigascience/giaf060","DOIUrl":"10.1093/gigascience/giaf060","url":null,"abstract":"<p><p>The transition to menopause is marked by a gradual decrease of estradiol. Concurrently, the risk of dementia in women increases around menopause, suggesting that estradiol (or the lack thereof) plays a role in the development of dementia and other age-related neuropathologies. Here, we set out to investigate whether there is a link between brain aging and estradiol-associated events, such as menarche and menopause. For this purpose, we applied a well-validated machine learning approach to analyze both cross-sectional and longitudinal data from a sample of 1,006 postmenopausal women who underwent structural magnetic resonance imaging twice, approximately 2 years apart. We observed less brain aging in women with an earlier menarche, a later menopause, and a longer reproductive span (i.e., the time interval between menarche and menopause). These effects were evident both cross-sectionally and longitudinally, supporting the notion that estradiol has neuroprotective properties and contributes to brain preservation. However, further research is required because the observed effects were small, estradiol was not directly measured, and other factors may modulate female brain health. Future studies might benefit from incorporating actual estradiol (and other hormone) measures, as well as considering genetic predispositions and lifestyle factors alongside indicators of brain aging to deepen our understanding of estradiol's role in maintaining brain health. Additionally, including more diverse study populations (e.g., varying in ethnicity, socioeconomic status, and health status) in follow-up research would enhance the generalizability and applicability of these findings.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099614/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144127148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A near telomere-to-telomere genome assembly of the Jinhua pig: enabling more accurate genetic research. 金华猪的近端粒到端粒基因组组装:使更准确的遗传研究。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf048
Caiyun Cao, Jian Miao, Qinqin Xie, Jiabao Sun, Hong Cheng, Zhenyang Zhang, Fen Wu, Shuang Liu, Xiaowei Ye, Huanfa Gong, Zhe Zhang, Qishan Wang, Yuchun Pan, Zhen Wang

Background: Pigs are crucial sources of meat and protein, valuable animal models, and potential donors for xenotransplantation. However, the existing reference genome for pigs is incomplete, with thousands of segments and centromeres and telomeres missing, which limits our understanding of the important traits in these genomic regions.

Findings: We present a near-complete genome assembly for the Jinhua pig (JH-T2T) and provide a set of diploid Jinhua reference genomes, constructed using PacBio HiFi, ONT long reads, and Hi-C reads. This assembly includes all 18 autosomes and the X and Y sex chromosomes, with only 6 gaps. It features annotations of 46.90% repetitive sequences, 33 telomeres, 17 centromeres, and 23,924 high-confident genes. Compared to the Sscrofa11.1, JH-T2T closes nearly all gaps, extends sequences by 177 Mb, predicts more intact telomeres and centromeres, and gains 799 more genes and loses 114 genes. Moreover, it enhances the mapping rate for both Western and Chinese local pigs, outperforming Sscrofa11.1 as a reference genome. Additionally, this comprehensive genome assembly will facilitate large-scale variant detection.

Conclusions: This study produced a near-gapless assembly of the pig genome and provides a set of haploid Jinhua reference genomes. Our findings represent a significant advance in pig genomics, providing a robust resource that enhances genetic research, breeding programs, and biomedical applications.

背景:猪是重要的肉类和蛋白质来源,是有价值的动物模型,也是异种移植的潜在供体。然而,现有的猪参考基因组是不完整的,有数千个片段、着丝粒和端粒缺失,这限制了我们对这些基因组区域重要性状的理解。研究结果:我们提出了金华猪(JH-T2T)近乎完整的基因组组装,并提供了一套二倍体金华参考基因组,使用PacBio HiFi, ONT长读取和Hi-C读取构建。这个组合包括所有18个常染色体和X、Y性染色体,只有6个间隙。它具有46.90%重复序列的注释,33个端粒,17个着丝粒和23,924个高自信基因。与Sscrofa11.1相比,JH-T2T几乎关闭了所有的缺口,延长了177 Mb的序列,预测了更多完整的端粒和着丝粒,增加了799个基因,失去了114个基因。此外,它提高了西方和中国本地猪的定位率,优于Sscrofa11.1作为参考基因组。此外,这种全面的基因组组装将有助于大规模的变异检测。结论:本研究实现了猪基因组的近乎无间隙组装,并提供了一套单倍体金华参考基因组。我们的发现代表了猪基因组学的重大进步,为加强遗传研究、育种计划和生物医学应用提供了强大的资源。
{"title":"A near telomere-to-telomere genome assembly of the Jinhua pig: enabling more accurate genetic research.","authors":"Caiyun Cao, Jian Miao, Qinqin Xie, Jiabao Sun, Hong Cheng, Zhenyang Zhang, Fen Wu, Shuang Liu, Xiaowei Ye, Huanfa Gong, Zhe Zhang, Qishan Wang, Yuchun Pan, Zhen Wang","doi":"10.1093/gigascience/giaf048","DOIUrl":"10.1093/gigascience/giaf048","url":null,"abstract":"<p><strong>Background: </strong>Pigs are crucial sources of meat and protein, valuable animal models, and potential donors for xenotransplantation. However, the existing reference genome for pigs is incomplete, with thousands of segments and centromeres and telomeres missing, which limits our understanding of the important traits in these genomic regions.</p><p><strong>Findings: </strong>We present a near-complete genome assembly for the Jinhua pig (JH-T2T) and provide a set of diploid Jinhua reference genomes, constructed using PacBio HiFi, ONT long reads, and Hi-C reads. This assembly includes all 18 autosomes and the X and Y sex chromosomes, with only 6 gaps. It features annotations of 46.90% repetitive sequences, 33 telomeres, 17 centromeres, and 23,924 high-confident genes. Compared to the Sscrofa11.1, JH-T2T closes nearly all gaps, extends sequences by 177 Mb, predicts more intact telomeres and centromeres, and gains 799 more genes and loses 114 genes. Moreover, it enhances the mapping rate for both Western and Chinese local pigs, outperforming Sscrofa11.1 as a reference genome. Additionally, this comprehensive genome assembly will facilitate large-scale variant detection.</p><p><strong>Conclusions: </strong>This study produced a near-gapless assembly of the pig genome and provides a set of haploid Jinhua reference genomes. Our findings represent a significant advance in pig genomics, providing a robust resource that enhances genetic research, breeding programs, and biomedical applications.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12080228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144077470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GTestimate: improving relative gene expression estimation in scRNA-seq using the Good-Turing estimator. GTestimate:利用Good-Turing估计器改进scRNA-seq中相对基因表达估计。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf084
Martin Fahrenberger, Christopher Esk, Jürgen A Knoblich, Arndt von Haeseler

Background: Single-cell RNA-seq suffers from unwanted technical variation between cells, caused by its complex experiments and shallow sequencing depths. Many conventional normalization methods try to remove this variation by calculating the relative gene expression per cell. However, their choice of the maximum likelihood estimator is not ideal for this application.

Results: We present GTestimate, a new normalization method based on the Good-Turing estimator, which improves upon conventional normalization methods by accounting for unobserved genes. To validate GTestimate, we developed a novel cell-targeted PCR amplification approach (cta-seq), which enables ultra-deep sequencing of single cells. Based on these data, we show that the Good-Turing estimator improves relative gene expression estimation and cell-cell distance estimation. Finally, we use GTestimate's compatibility with Seurat workflows to explore 4 example datasets and show how it can improve downstream results.

Conclusion: By choosing a more suitable estimator for the relative gene expression per cell, we were able to improve scRNA-seq normalization, with potentially large implications for downstream results. GTestimate is available as an easy-to-use R-package and compatible with a variety of workflows, which should enable widespread adoption.

背景:单细胞RNA-seq由于其复杂的实验和较浅的测序深度,在细胞之间存在不必要的技术差异。许多传统的归一化方法试图通过计算每个细胞的相对基因表达来消除这种变异。然而,他们选择的最大似然估计器对于这个应用程序来说并不理想。结果:我们提出了一种新的基于Good-Turing估计的归一化方法GTestimate,它通过考虑未观察到的基因来改进传统的归一化方法。为了验证GTestimate,我们开发了一种新的细胞靶向PCR扩增方法(cta-seq),可以对单个细胞进行超深度测序。基于这些数据,我们证明了Good-Turing估计器改进了相对基因表达估计和细胞-细胞距离估计。最后,我们使用GTestimate与Seurat工作流的兼容性来探索4个示例数据集,并展示它如何改善下游结果。结论:通过选择更合适的每个细胞相对基因表达的估计值,我们能够改善scRNA-seq规范化,这对下游结果可能有很大的影响。GTestimate是一个易于使用的r包,并且与各种工作流兼容,这应该能够被广泛采用。
{"title":"GTestimate: improving relative gene expression estimation in scRNA-seq using the Good-Turing estimator.","authors":"Martin Fahrenberger, Christopher Esk, Jürgen A Knoblich, Arndt von Haeseler","doi":"10.1093/gigascience/giaf084","DOIUrl":"10.1093/gigascience/giaf084","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA-seq suffers from unwanted technical variation between cells, caused by its complex experiments and shallow sequencing depths. Many conventional normalization methods try to remove this variation by calculating the relative gene expression per cell. However, their choice of the maximum likelihood estimator is not ideal for this application.</p><p><strong>Results: </strong>We present GTestimate, a new normalization method based on the Good-Turing estimator, which improves upon conventional normalization methods by accounting for unobserved genes. To validate GTestimate, we developed a novel cell-targeted PCR amplification approach (cta-seq), which enables ultra-deep sequencing of single cells. Based on these data, we show that the Good-Turing estimator improves relative gene expression estimation and cell-cell distance estimation. Finally, we use GTestimate's compatibility with Seurat workflows to explore 4 example datasets and show how it can improve downstream results.</p><p><strong>Conclusion: </strong>By choosing a more suitable estimator for the relative gene expression per cell, we were able to improve scRNA-seq normalization, with potentially large implications for downstream results. GTestimate is available as an easy-to-use R-package and compatible with a variety of workflows, which should enable widespread adoption.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12569601/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145388671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1