首页 > 最新文献

GigaScience最新文献

英文 中文
Federated knowledge retrieval elevates large language model performance on biomedical benchmarks. 联邦知识检索提高大型语言模型在生物医学基准上的性能。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giag007
Janet Joy, Andrew I Su

Background: Large language models (LLMs) have significantly advanced natural language processing in biomedical research; however, their reliance on implicit, statistical representations often results in factual inaccuracies or hallucinations, posing significant concerns in high-stakes biomedical contexts.

Results: To overcome these limitations, we developed BioThings Explorer-Retrieval-Augmented Generation (BTE-RAG), a Retrieval-Augmented Generation framework that integrates the reasoning capabilities of advanced language models with explicit mechanistic evidence sourced from BTE, an API federation of more than sixty authoritative biomedical knowledge sources. We systematically evaluated BTE-RAG in comparison to traditional LLM-only methods across three benchmark datasets that we created from DrugMechDB. These datasets specifically targeted gene-centric mechanisms (798 questions), metabolite effects (201 questions), and drug-biological process relationships (842 questions). On the gene-centric task, BTE-RAG increased accuracy from 51 to 75.8% for GPT-4o mini and from 69.8 to 78.6% for GPT-4o. In metabolite-focused questions, the proportion of responses with cosine similarity scores of at least 0.90 rose by 82% for GPT-4o mini and 77% for GPT-4o. While overall accuracy was consistent in the drug-biological process benchmark, the retrieval method enhanced response concordance, producing a greater than 10% increase in high-agreement answers (from 129 to 144) using GPT-4o. We additionally evaluated BTE-RAG alongside GeneGPT-based models on the GeneTuring gene-disease association benchmark and on our mechanistic gene benchmark, demonstrating that the BTE-RAG layer consistently improves accuracy relative to alternative approaches.

Conclusion: Federated knowledge retrieval provides transparent improvements in accuracy for LLMs, establishing BTE-RAG as a valuable and practical tool for mechanistic exploration and translational biomedical research.

背景:大型语言模型(llm)在生物医学研究中显著推进了自然语言处理,然而,它们对隐式统计表示的依赖往往导致事实不准确或幻觉,这在高风险的生物医学环境中引起了重大关注。结果:为了克服这些限制,我们开发了BTE-RAG,这是一个检索增强生成框架,将高级语言模型的推理能力与来自BioThings Explorer(一个由60多个权威生物医学知识来源组成的API联盟)的明确机制证据集成在一起。通过从DrugMechDB创建的三个基准数据集,我们系统地评估了BTE-RAG与传统LLM-only方法的比较。这些数据集专门针对以基因为中心的机制(798个问题)、代谢物效应(201个问题)和药物-生物过程关系(842个问题)。在以基因为中心的任务中,BTE-RAG将gpt - 40 mini的准确率从51%提高到75.8%,将gpt - 40 mini的准确率从69.8%提高到78.6%。在以代谢物为重点的问题中,gpt - 40 mini和gpt - 40的余弦相似度得分至少为0.90的回答比例分别上升了82%和77%。虽然总体准确性与药物-生物过程基准一致,但检索方法增强了响应一致性,使用gpt - 40产生的高一致性答案(从129到144)增加了10%以上。我们还在GeneTuring基因-疾病关联基准和我们的机制基因基准上评估了BTE-RAG和基于genegpt的模型,证明BTE-RAG层相对于其他方法始终提高准确性。结论:联邦知识检索为大型语言模型的准确性提供了透明的改进,使BTE-RAG成为机械探索和转化生物医学研究中有价值和实用的工具。
{"title":"Federated knowledge retrieval elevates large language model performance on biomedical benchmarks.","authors":"Janet Joy, Andrew I Su","doi":"10.1093/gigascience/giag007","DOIUrl":"10.1093/gigascience/giag007","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have significantly advanced natural language processing in biomedical research; however, their reliance on implicit, statistical representations often results in factual inaccuracies or hallucinations, posing significant concerns in high-stakes biomedical contexts.</p><p><strong>Results: </strong>To overcome these limitations, we developed BioThings Explorer-Retrieval-Augmented Generation (BTE-RAG), a Retrieval-Augmented Generation framework that integrates the reasoning capabilities of advanced language models with explicit mechanistic evidence sourced from BTE, an API federation of more than sixty authoritative biomedical knowledge sources. We systematically evaluated BTE-RAG in comparison to traditional LLM-only methods across three benchmark datasets that we created from DrugMechDB. These datasets specifically targeted gene-centric mechanisms (798 questions), metabolite effects (201 questions), and drug-biological process relationships (842 questions). On the gene-centric task, BTE-RAG increased accuracy from 51 to 75.8% for GPT-4o mini and from 69.8 to 78.6% for GPT-4o. In metabolite-focused questions, the proportion of responses with cosine similarity scores of at least 0.90 rose by 82% for GPT-4o mini and 77% for GPT-4o. While overall accuracy was consistent in the drug-biological process benchmark, the retrieval method enhanced response concordance, producing a greater than 10% increase in high-agreement answers (from 129 to 144) using GPT-4o. We additionally evaluated BTE-RAG alongside GeneGPT-based models on the GeneTuring gene-disease association benchmark and on our mechanistic gene benchmark, demonstrating that the BTE-RAG layer consistently improves accuracy relative to alternative approaches.</p><p><strong>Conclusion: </strong>Federated knowledge retrieval provides transparent improvements in accuracy for LLMs, establishing BTE-RAG as a valuable and practical tool for mechanistic exploration and translational biomedical research.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Giant chromosomes of a tiny plant-the complete telomere-to-telomere genome assembly of the simple thalloid liverwort Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta). 一种微小植物的巨大染色体——简单菌体肝草Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta)端粒到端粒的完整基因组组装。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf145
Joanna Szablińska-Piernik, Paweł Sulima, Jakub Sawicki

Background: The liverwort Apopellia endiviifolia, a dioicous, simple thalloid species, is notable for its cryptic diversity, habitat adaptability, and genomic innovation, and it represents a clade that is sister to all other Jungermanniopsida. These features make A. endiviifolia an essential model for exploring speciation mechanisms and the evolution of genome structures within liverworts.

Findings: We present the genome assembly of a haploid A. endiviifolia isolate with a total size of 2,914,960,273 bp and an N50 of 468,157,909 bp, demonstrating high completeness (99.2% BUSCO) and a high consensus quality (quality value 47.6). The assembly consisted of 9 chromosomes, which included 18 telomeres and 9 centromeres (ranging from 1.9 to 5 Mbp in length). RNA sequencing-based annotation identified 34,615 genes, predominantly protein coding. The transposable elements comprised 12.16% long terminal repeat elements and 57 Helitrons. Among the retroelements, the Copia and Gypsy superfamilies comprised 8.94% and 2.95% of the genome, respectively. The Ty3/Gypsy superfamily was significantly enriched in centromeric regions. The average GC content ranged from 38.8% to 39.6%, with gene density varying between 5.52 and 9.78 genes per 500 kbp. Synteny analysis of related liverwort species has revealed complex chromosomal relationships, indicating extensive genome rearrangements among species.

Conclusions: This study provides the first high-quality reference genome assembly of the haploid liverwort A. endiviifolia. Assembly and annotation offer valuable resources for investigating liverwort evolution, centromere biology, and genome expansion in simple thalloid liverworts.

背景:苔类a . endiviifolia dioicous,简单的叶状物种,值得注意的是它的神秘的多样性、生境适应性,基因组创新,代表着一个进化枝,是所有其他Jungermanniopsida妹妹。这些特征使其成为探索地植物物种形成机制和基因组结构进化的重要模型。结果:我们展示了一个单倍体a . endiviifolia分离物的基因组组装,其总大小为2,914,960,273 bp, N50为468,157,909 bp,显示出高完整性(99.2% BUSCO)和高一致性质量(QV 47.6)。该组合由9条染色体组成,其中包括18个端粒和9个着丝粒(长度从1.9到5mbp不等)。基于rna -seq的注释鉴定了34,615个基因,主要是蛋白质编码。TEs由12.16%的LTR元素和57个helitron组成。其中,Copia超家族和Gypsy超家族分别占基因组的8.94%和2.95%。Ty3/Gypsy超家族在着丝粒区显著富集。平均GC含量为38.8% ~ 39.6%,基因密度为5.52 ~ 9.78个/ 500 kbp。近缘种的同源性分析揭示了复杂的染色体关系,表明物种之间广泛的基因组重排。结论:本研究提供了第一个高质量的单倍体肝草参考基因组序列。组装和注释为研究简单菌体苔类的进化、着丝粒生物学和基因组扩增提供了宝贵的资源。
{"title":"Giant chromosomes of a tiny plant-the complete telomere-to-telomere genome assembly of the simple thalloid liverwort Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta).","authors":"Joanna Szablińska-Piernik, Paweł Sulima, Jakub Sawicki","doi":"10.1093/gigascience/giaf145","DOIUrl":"10.1093/gigascience/giaf145","url":null,"abstract":"<p><strong>Background: </strong>The liverwort Apopellia endiviifolia, a dioicous, simple thalloid species, is notable for its cryptic diversity, habitat adaptability, and genomic innovation, and it represents a clade that is sister to all other Jungermanniopsida. These features make A. endiviifolia an essential model for exploring speciation mechanisms and the evolution of genome structures within liverworts.</p><p><strong>Findings: </strong>We present the genome assembly of a haploid A. endiviifolia isolate with a total size of 2,914,960,273 bp and an N50 of 468,157,909 bp, demonstrating high completeness (99.2% BUSCO) and a high consensus quality (quality value 47.6). The assembly consisted of 9 chromosomes, which included 18 telomeres and 9 centromeres (ranging from 1.9 to 5 Mbp in length). RNA sequencing-based annotation identified 34,615 genes, predominantly protein coding. The transposable elements comprised 12.16% long terminal repeat elements and 57 Helitrons. Among the retroelements, the Copia and Gypsy superfamilies comprised 8.94% and 2.95% of the genome, respectively. The Ty3/Gypsy superfamily was significantly enriched in centromeric regions. The average GC content ranged from 38.8% to 39.6%, with gene density varying between 5.52 and 9.78 genes per 500 kbp. Synteny analysis of related liverwort species has revealed complex chromosomal relationships, indicating extensive genome rearrangements among species.</p><p><strong>Conclusions: </strong>This study provides the first high-quality reference genome assembly of the haploid liverwort A. endiviifolia. Assembly and annotation offer valuable resources for investigating liverwort evolution, centromere biology, and genome expansion in simple thalloid liverworts.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
nf-core/proteinfamilies: A scalable pipeline for the generation of protein families. nf-core/proteinfamilies:一个可扩展的蛋白质家族生成管道。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giag009
Evangelos Karatzas, Martin Beracochea, Fotis A Baltoumas, Eleni Aplakidou, Lorna Richardson, James A Fellows Yates, Daniel Lundin

The growth of metagenomics-derived amino acid sequence data has transformed our understanding of protein function, microbial diversity, and evolutionary relationships. However, the vast majority of these proteins remain functionally uncharacterized. Grouping the millions of such uncharacterised sequences with the few experimentally characterised ones allows the transfer of annotations, while the inspection of conserved residues with multiple sequence alignments can provide clues to function, even in the absence of existing functional information. To address the challenges associated with this data surge and the need to group sequences, we present a scalable, open-source, parametrizable Nextflow pipeline (nf-core/proteinfamilies) that generates nascent protein families or assigns new proteins to existing families. The computational benchmarks demonstrated that resource usage scales approximately linearly with input size, and the biological benchmarks showed that the generated protein families closely resemble manually curated families in widely used databases.

元基因组学衍生的氨基酸序列数据的增长改变了我们对蛋白质功能、微生物多样性和进化关系的理解。然而,这些蛋白质中的绝大多数在功能上仍未被表征。将数百万这样的未表征序列与少数实验表征序列分组允许注释的转移,而使用多个序列比对检查保守残基可以提供功能线索,即使在缺乏现有功能信息的情况下。为了应对与数据激增相关的挑战和对序列进行分组的需求,我们提出了一个可扩展、开源、可参数化的Nextflow管道(nf-core/proteinfamilies),它可以生成新生的蛋白质家族或将新的蛋白质分配给现有的家族。计算基准测试表明,资源使用与输入大小呈近似线性关系,生物基准测试表明,生成的蛋白质家族与广泛使用的数据库中人工筛选的家族非常相似。
{"title":"nf-core/proteinfamilies: A scalable pipeline for the generation of protein families.","authors":"Evangelos Karatzas, Martin Beracochea, Fotis A Baltoumas, Eleni Aplakidou, Lorna Richardson, James A Fellows Yates, Daniel Lundin","doi":"10.1093/gigascience/giag009","DOIUrl":"https://doi.org/10.1093/gigascience/giag009","url":null,"abstract":"<p><p>The growth of metagenomics-derived amino acid sequence data has transformed our understanding of protein function, microbial diversity, and evolutionary relationships. However, the vast majority of these proteins remain functionally uncharacterized. Grouping the millions of such uncharacterised sequences with the few experimentally characterised ones allows the transfer of annotations, while the inspection of conserved residues with multiple sequence alignments can provide clues to function, even in the absence of existing functional information. To address the challenges associated with this data surge and the need to group sequences, we present a scalable, open-source, parametrizable Nextflow pipeline (nf-core/proteinfamilies) that generates nascent protein families or assigns new proteins to existing families. The computational benchmarks demonstrated that resource usage scales approximately linearly with input size, and the biological benchmarks showed that the generated protein families closely resemble manually curated families in widely used databases.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen health care settings. 用于在看不见的医疗环境中快速预测抗菌素耐药性的通用机器学习模型。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf156
Diane Duroux, Paul P Meyer, Giovanni Visonà, Niko Beerenwinkel

Background: The deployment of machine learning in clinical settings is often hindered by the limited generalizability of the models. Models that perform well during development tend to underperform in new environments, limiting their clinical utility. This issue affects models designed for the rapid identification of antimicrobial resistance, which is essential to guide treatment decisions. Traditional susceptibility tests can take up to 3 days, whereas integrating matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry with machine learning has the potential to reduce this to 1 day. However, model performance declines drastically in hospitals or time frames outside the training data.

Results: To improve robustness, we develop advanced feature representations using masked autoencoders (MAEs) for MALDI-TOF spectra and chemical language models and SELF-referencing embedded strings (SELFIES) for antimicrobials. Cross-validated on data from 4 medical institutions, our models demonstrate improved performance and stability. The MAE and SELFIES encodings increase the area under the precision-recall curve by 4% when evaluated on unseen time periods, while the MAE and Molformer language model encodings improve it by 10% when applied across different hospitals.

Conclusions: These results underscore the value of combining deep learning with chemical and spectral information to build generalizable, high-impact clinical artificial intelligence.

背景:机器学习在临床环境中的部署经常受到模型有限的泛化性的阻碍。在开发过程中表现良好的模型往往在新环境中表现不佳,从而限制了它们的临床效用。这一问题影响到为快速识别抗微生物药物耐药性而设计的模型,这对指导治疗决策至关重要。传统的敏感性测试可能需要长达三天的时间,而将MALDI-TOF质谱法与机器学习相结合,有可能将这一时间缩短到一天。然而,在医院或训练数据之外的时间框架中,模型性能急剧下降。结果:为了提高鲁棒性,我们开发了先进的特征表示,使用MALDI-TOF光谱的掩模自编码器(MAE),以及抗菌剂的化学语言模型和自引用嵌入字符串(自)。通过对四家医疗机构的数据进行交叉验证,我们的模型显示出更好的性能和稳定性。当对未见过的时间段进行评估时,MAE和自拍编码将精确召回曲线下的面积增加了4%,而MAE和Molformer语言模型编码在不同医院应用时将其提高了10%。结论:这些结果强调了将深度学习与化学和光谱信息相结合,构建可推广的、高影响力的临床人工智能的价值。
{"title":"Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen health care settings.","authors":"Diane Duroux, Paul P Meyer, Giovanni Visonà, Niko Beerenwinkel","doi":"10.1093/gigascience/giaf156","DOIUrl":"10.1093/gigascience/giaf156","url":null,"abstract":"<p><strong>Background: </strong>The deployment of machine learning in clinical settings is often hindered by the limited generalizability of the models. Models that perform well during development tend to underperform in new environments, limiting their clinical utility. This issue affects models designed for the rapid identification of antimicrobial resistance, which is essential to guide treatment decisions. Traditional susceptibility tests can take up to 3 days, whereas integrating matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry with machine learning has the potential to reduce this to 1 day. However, model performance declines drastically in hospitals or time frames outside the training data.</p><p><strong>Results: </strong>To improve robustness, we develop advanced feature representations using masked autoencoders (MAEs) for MALDI-TOF spectra and chemical language models and SELF-referencing embedded strings (SELFIES) for antimicrobials. Cross-validated on data from 4 medical institutions, our models demonstrate improved performance and stability. The MAE and SELFIES encodings increase the area under the precision-recall curve by 4% when evaluated on unseen time periods, while the MAE and Molformer language model encodings improve it by 10% when applied across different hospitals.</p><p><strong>Conclusions: </strong>These results underscore the value of combining deep learning with chemical and spectral information to build generalizable, high-impact clinical artificial intelligence.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12908719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
pyRootHair: Machine learning accelerated software for high-throughput phenotyping of plant root hair traits. pyRootHair:用于植物根毛性状高通量表型的机器学习加速软件。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf141
Ian Tsang, Lawrence Percival-Alwyn, Stephen Rawsthorne, James Cockram, Fiona Leigh, Jonathan A Atkinson

Background: Root hairs play a key role in plant nutrient and water uptake. Historically, root hair traits have largely been quantified manually. As such, this process has been laborious and low-throughput. However, given their importance for plant health and development, high-throughput quantification of root hair morphology could help underpin rapid advances in the genetic understanding of these traits. With recent increases in the accessibility and availability of artificial intelligence (AI) and machine learning techniques, the development of tools to automate plant phenotyping processes has been greatly accelerated.

Results: We present pyRootHair, a high-throughput, AI-powered software application to automate root hair trait extraction from microscope images of plant roots grown on agar plates. pyRootHair is capable of batch processing over 600 images per hour without manual input from the end user. In this study, we deploy pyRootHair on a panel of 24 diverse wheat (Triticum aestivum and Triticum turgidum ssp. durum) cultivars and uncover a large, previously unresolved amount of variation in many root hair traits. We show that the overall root hair profile falls under 2 distinct shape categories and that different root hair traits often correlate with each other. We also demonstrate that pyRootHair can be deployed on a range of plant species, including oat (Avena sativa), rice (Oryza sativa), teff (Eragrostis tef), and tomato (Solanum lycopersicum).

Conclusions: The application of pyRootHair enables users to rapidly screen a large number of plant germplasm resources for variation in root hair morphology, supporting high-resolution measurements and high-throughput data analysis. This facilitates downstream investigation of the impacts of root hair genetic control and morphological variation on plant performance. pyRootHair is installable via PyPI (https://pypi.org/project/pyRootHair/) and can be accessed on GitHub at https://github.com/iantsang779/pyRootHair.

根毛在植物养分和水分吸收中起着关键作用。历史上,根毛性状在很大程度上是人工量化的。因此,这个过程一直是费力和低吞吐量的。然而,鉴于它们对植物健康和发育的重要性,根毛形态的高通量定量可以帮助在这些性状的遗传理解方面取得快速进展。随着人工智能(AI)和机器学习技术的可及性和可用性的增加,自动化植物表型过程的工具的开发已经大大加快。在这里,我们展示了pyRootHair,这是一款高通量、人工智能驱动的软件应用程序,可以从琼脂板上生长的植物根系的显微镜图像中自动提取根毛特征。pyRootHair能够每小时批量处理超过600张图像,而无需最终用户的手动输入。在这项研究中,我们将pyRootHair部署在24种不同小麦(Triticum aestivum和Triticum turgidum ssp)的面板上。在许多根毛性状中发现了大量以前未解决的变异。我们表明,整体的根毛轮廓属于两个不同的形状类别,不同的根毛性状往往相互关联。我们还证明pyRootHair可以应用于一系列植物物种,包括燕麦(Avena sativa)、水稻(Oryza sativa)、苔麸(Eragrostis tef)和番茄(Solanum lycopersicum)。pyRootHair的应用使用户能够快速筛选大量植物种质资源的根毛形态变异,支持高分辨率测量和高通量数据分析。这有助于下游研究根毛遗传控制和形态变异对植物性能的影响。pyRootHair可以通过PyPI: https://pypi.org/project/pyRootHair/安装,也可以在GitHub上访问https://github.com/iantsang779/pyRootHair。
{"title":"pyRootHair: Machine learning accelerated software for high-throughput phenotyping of plant root hair traits.","authors":"Ian Tsang, Lawrence Percival-Alwyn, Stephen Rawsthorne, James Cockram, Fiona Leigh, Jonathan A Atkinson","doi":"10.1093/gigascience/giaf141","DOIUrl":"10.1093/gigascience/giaf141","url":null,"abstract":"<p><strong>Background: </strong>Root hairs play a key role in plant nutrient and water uptake. Historically, root hair traits have largely been quantified manually. As such, this process has been laborious and low-throughput. However, given their importance for plant health and development, high-throughput quantification of root hair morphology could help underpin rapid advances in the genetic understanding of these traits. With recent increases in the accessibility and availability of artificial intelligence (AI) and machine learning techniques, the development of tools to automate plant phenotyping processes has been greatly accelerated.</p><p><strong>Results: </strong>We present pyRootHair, a high-throughput, AI-powered software application to automate root hair trait extraction from microscope images of plant roots grown on agar plates. pyRootHair is capable of batch processing over 600 images per hour without manual input from the end user. In this study, we deploy pyRootHair on a panel of 24 diverse wheat (Triticum aestivum and Triticum turgidum ssp. durum) cultivars and uncover a large, previously unresolved amount of variation in many root hair traits. We show that the overall root hair profile falls under 2 distinct shape categories and that different root hair traits often correlate with each other. We also demonstrate that pyRootHair can be deployed on a range of plant species, including oat (Avena sativa), rice (Oryza sativa), teff (Eragrostis tef), and tomato (Solanum lycopersicum).</p><p><strong>Conclusions: </strong>The application of pyRootHair enables users to rapidly screen a large number of plant germplasm resources for variation in root hair morphology, supporting high-resolution measurements and high-throughput data analysis. This facilitates downstream investigation of the impacts of root hair genetic control and morphological variation on plant performance. pyRootHair is installable via PyPI (https://pypi.org/project/pyRootHair/) and can be accessed on GitHub at https://github.com/iantsang779/pyRootHair.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12824728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145512386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome Assembly of Three Shrub Mangroves in the Genus Acanthus Reveals Two Polyploidy Events and Expansion of Genes Linked to Root Adaptation in Coastal Habitats. 三种灌木红树棘属的基因组组装揭示了两个多倍体事件和与沿海生境根适应相关的基因扩展。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf162
Wanapinun Nawae, Chaiwat Naktang, Peeraphat Paenpong, Duangjai Sangsrakru, Thippawan Yoocha, Sonicha U-Thoomporn, Wasitthee Kongkachana, Poonsri Wanthongchai, Suchart Yamprasai, Chonlawit Samart, Sithichoke Tangphatsornruang, Wirulda Pootakham

Background: The genomes of mangrove Acanthus species have not been reported, despite their ecological and medicinal importance. Here, we generated reference genomes for three shrub mangroves in the genus Acanthus to clarify their whole-genome duplication and hybridization events and identify genomic features underlying their evolution.

Results: Using PacBio and Hi-C data, we generated a chromosome-scale genome assembly of the recently identified allotetraploid species Acanthus tetraploideus (2n = 96). The genomes of diploid progenitors, Acanthus ilicifolius and Acanthus ebracteatus (2n = 48), were assembled from single-tube long fragment read data. We identified an Acanthus-specific whole-genome duplication (WGD) event that occurred ∼43 million years ago (Mya). Ancestral karyotype reconstruction revealed a shift in haploid chromosome number from 11 to 24 in the progenitors, following the WGD and subsequent chromosomal fission events. The hybridization that formed A. tetraploideus was estimated to have occurred 0.7-1.8 Mya. Phylogenomic and synteny analyses clearly showed that A. tetraploideus inherited subgenomes SG1 and SG2 from A. ilicifolius and A. ebracteatus, respectively. Gene structure and retention analyses revealed a smaller and more structurally flexible genome in A. ebracteatus and SG2 compared with A. ilicifolius and SG1. Gene family and machine learning analyses identified expansions in protein families related to Casparian strip formation, root development, and salt stress response. Several of these families were expanded in A. ilicifolius and SG1 but contracted in A. ebracteatus and SG2. These genomic patterns might have contributed to the establishment of A. tetraploideus within the habitat of A. ebracteatus. For all three species, population analysis revealed clear genetic divergence between samples from the eastern and western coasts of Thailand.

Conclusions: These genome assemblies clarify the polyploidy and hybridization history of Acanthus and highlight gene family changes potentially associated with coastal root adaptation and habitat establishment in intertidal environments. This study provides valuable genomic resources and insights into the evolutionary adaptation of plants to intertidal environments.

尽管红树林棘虫具有重要的生态和药用价值,但其基因组尚未被报道。利用PacBio和Hi-C数据,我们对最近发现的异源四倍体物种Acanthus tetraploideus (2n = 96)进行了染色体尺度的基因组组装。利用stLFR数据,对二倍体祖棘棘(Acanthus ilicifolius)和棘棘(Acanthus ebracteatus, 2n = 48)的基因组进行了组装。我们确定了大约4300万年前(Mya)发生的棘类特异性全基因组复制(WGD)事件。祖先核型重建显示,在WGD和随后的染色体裂变事件之后,祖先的单倍体染色体数从11转移到24。据估计,形成四倍古猿的杂交发生在0.7-1.8亿年前。系统基因组分析和同源性分析表明,四倍体拟南猿分别继承了拟南猿ilicifolius和拟南猿ebracteatus的SG1和SG2亚基因组。基因结构和保留分析表明,与a . ilicifolius和SG1相比,a . ebracteatus和SG2的基因组更小,结构更灵活。基因家族和机器学习分析确定了与Casparian条带形成、根系发育和盐胁迫反应相关的蛋白质家族的扩展。其中几个科在白杨和SG1中扩展,而在白杨和SG2中收缩。这些基因组模式可能促成了四倍猿人在棘足猿人栖息地的建立。对于这三个物种,种群分析显示了泰国东海岸和西海岸样本之间明显的遗传差异。这项研究为植物对潮间带环境的进化适应提供了宝贵的基因组资源和见解。
{"title":"Genome Assembly of Three Shrub Mangroves in the Genus Acanthus Reveals Two Polyploidy Events and Expansion of Genes Linked to Root Adaptation in Coastal Habitats.","authors":"Wanapinun Nawae, Chaiwat Naktang, Peeraphat Paenpong, Duangjai Sangsrakru, Thippawan Yoocha, Sonicha U-Thoomporn, Wasitthee Kongkachana, Poonsri Wanthongchai, Suchart Yamprasai, Chonlawit Samart, Sithichoke Tangphatsornruang, Wirulda Pootakham","doi":"10.1093/gigascience/giaf162","DOIUrl":"10.1093/gigascience/giaf162","url":null,"abstract":"<p><strong>Background: </strong>The genomes of mangrove Acanthus species have not been reported, despite their ecological and medicinal importance. Here, we generated reference genomes for three shrub mangroves in the genus Acanthus to clarify their whole-genome duplication and hybridization events and identify genomic features underlying their evolution.</p><p><strong>Results: </strong>Using PacBio and Hi-C data, we generated a chromosome-scale genome assembly of the recently identified allotetraploid species Acanthus tetraploideus (2n = 96). The genomes of diploid progenitors, Acanthus ilicifolius and Acanthus ebracteatus (2n = 48), were assembled from single-tube long fragment read data. We identified an Acanthus-specific whole-genome duplication (WGD) event that occurred ∼43 million years ago (Mya). Ancestral karyotype reconstruction revealed a shift in haploid chromosome number from 11 to 24 in the progenitors, following the WGD and subsequent chromosomal fission events. The hybridization that formed A. tetraploideus was estimated to have occurred 0.7-1.8 Mya. Phylogenomic and synteny analyses clearly showed that A. tetraploideus inherited subgenomes SG1 and SG2 from A. ilicifolius and A. ebracteatus, respectively. Gene structure and retention analyses revealed a smaller and more structurally flexible genome in A. ebracteatus and SG2 compared with A. ilicifolius and SG1. Gene family and machine learning analyses identified expansions in protein families related to Casparian strip formation, root development, and salt stress response. Several of these families were expanded in A. ilicifolius and SG1 but contracted in A. ebracteatus and SG2. These genomic patterns might have contributed to the establishment of A. tetraploideus within the habitat of A. ebracteatus. For all three species, population analysis revealed clear genetic divergence between samples from the eastern and western coasts of Thailand.</p><p><strong>Conclusions: </strong>These genome assemblies clarify the polyploidy and hybridization history of Acanthus and highlight gene family changes potentially associated with coastal root adaptation and habitat establishment in intertidal environments. This study provides valuable genomic resources and insights into the evolutionary adaptation of plants to intertidal environments.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12903786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145888972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing artificial intelligence for genomic variant prediction: advances, challenges, and future directions. 利用人工智能进行基因组变异预测:进展、挑战和未来方向。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giag004
Indah Pakpahan, Mentari Sihombing, Haohan Liu, Mengyao Wang, Zheng Su, Mingyan Fang

Accurate genetic variant interpretation is crucial for disease research and the development of targeted therapies. Artificial intelligence is transforming this field by integrating computational methodologies across structural biology, evolutionary analysis, and multimodal genomic data. This review examines the evolution from traditional rule-based systems and statistical models to contemporary machine learning, deep learning, and protein language models, while addressing critical challenges in variant classification. Key obstacles include data heterogeneity, interpretability, and the persistence of variants of uncertain significance, emphasizing the critical need for explainable artificial intelligence frameworks and more inclusive genomic databases to improve predictive accuracy across diverse populations. Based on the assessment of current variant impact predictors, we propose strategies for enhanced predictor selection, effective multi-omics data integration, and optimized computational workflows. These recommendations aim to enhance variant interpretation accuracy in both research settings and clinical practice, ultimately contributing to advances in personalized medicine.

准确的基因变异解释对于疾病研究和靶向治疗的发展至关重要。人工智能(AI)通过整合结构生物学、进化分析和多模态基因组数据的计算方法,正在改变这一领域。本文回顾了从传统的基于规则的系统和统计模型到当代机器学习、深度学习和蛋白质语言模型的演变,同时解决了变体分类中的关键挑战。主要障碍包括数据异质性、可解释性和不确定意义变体(VUS)的持久性,这强调了对可解释的人工智能框架和更具包容性的基因组数据库的迫切需要,以提高不同人群的预测准确性。在评估当前变异影响预测因子(VIPs)的基础上,我们提出了增强预测因子选择、有效的多组学数据集成和优化计算工作流程的策略。这些建议旨在提高研究环境和临床实践中变异解释的准确性,最终促进个性化医疗的进步。
{"title":"Harnessing artificial intelligence for genomic variant prediction: advances, challenges, and future directions.","authors":"Indah Pakpahan, Mentari Sihombing, Haohan Liu, Mengyao Wang, Zheng Su, Mingyan Fang","doi":"10.1093/gigascience/giag004","DOIUrl":"10.1093/gigascience/giag004","url":null,"abstract":"<p><p>Accurate genetic variant interpretation is crucial for disease research and the development of targeted therapies. Artificial intelligence is transforming this field by integrating computational methodologies across structural biology, evolutionary analysis, and multimodal genomic data. This review examines the evolution from traditional rule-based systems and statistical models to contemporary machine learning, deep learning, and protein language models, while addressing critical challenges in variant classification. Key obstacles include data heterogeneity, interpretability, and the persistence of variants of uncertain significance, emphasizing the critical need for explainable artificial intelligence frameworks and more inclusive genomic databases to improve predictive accuracy across diverse populations. Based on the assessment of current variant impact predictors, we propose strategies for enhanced predictor selection, effective multi-omics data integration, and optimized computational workflows. These recommendations aim to enhance variant interpretation accuracy in both research settings and clinical practice, ultimately contributing to advances in personalized medicine.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Charting immune variation through genetics and single-cell genomics. 通过遗传学和单细胞基因组学绘制免疫变异图。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf161
Joseph E Powell

Large-scale single-cell genomics projects have revolutionized our understanding of human immune variation. Yet most studies to date have been Eurocentric, limited in cell-type resolution, or restricted to a single data modality. The newly published Chinese Immune Multi-Omics Atlas helps address these gaps by profiling 428 healthy Chinese adults using a multiomics single-cell approach that combines single-cell RNA sequencing and single-cell chromatin accessibility sequencing across over 10 million immune cells. This integrated strategy enabled the identification of 73 distinct immune cell subsets and the construction of cell-type-specific gene regulatory networks linking noncoding enhancers to target genes. The atlas delineated hundreds of enhancer modules (eRegulons), highlighting both established and novel regulators of immune cell identity. By aligning transcriptomic and epigenomic maps, Yin et al. show how expanding both the ancestral diversity and data modalities of immune cell genomics can reveal new biology and provide a valuable addition to global reference cell atlases.

大规模的单细胞基因组学项目彻底改变了我们对人类免疫变异的理解。然而,到目前为止,大多数研究都是以欧洲为中心的,局限于细胞类型分辨率,或者局限于单一数据模式。新发表的中国免疫多组学图谱(CIMA)通过使用多组学单细胞方法对428名健康的中国成年人进行分析,该方法将单细胞RNA测序(scRNA-seq)和单细胞染色质可及性测序(scATAC-seq)结合在1000多万个免疫细胞中,有助于解决这些差距。这种整合策略能够识别73种不同的免疫细胞亚群,并构建将非编码增强子与靶基因连接起来的细胞类型特异性基因调控网络。该图谱描绘了数百个增强子模块(eRegulons),突出了已建立的和新的免疫细胞身份调节因子。通过比对转录组和表观基因组图谱,Yin等人展示了如何扩展免疫细胞基因组学的祖先多样性和数据模式可以揭示新的生物学,并为全球参考细胞图谱提供有价值的补充。
{"title":"Charting immune variation through genetics and single-cell genomics.","authors":"Joseph E Powell","doi":"10.1093/gigascience/giaf161","DOIUrl":"10.1093/gigascience/giaf161","url":null,"abstract":"<p><p>Large-scale single-cell genomics projects have revolutionized our understanding of human immune variation. Yet most studies to date have been Eurocentric, limited in cell-type resolution, or restricted to a single data modality. The newly published Chinese Immune Multi-Omics Atlas helps address these gaps by profiling 428 healthy Chinese adults using a multiomics single-cell approach that combines single-cell RNA sequencing and single-cell chromatin accessibility sequencing across over 10 million immune cells. This integrated strategy enabled the identification of 73 distinct immune cell subsets and the construction of cell-type-specific gene regulatory networks linking noncoding enhancers to target genes. The atlas delineated hundreds of enhancer modules (eRegulons), highlighting both established and novel regulators of immune cell identity. By aligning transcriptomic and epigenomic maps, Yin et al. show how expanding both the ancestral diversity and data modalities of immune cell genomics can reveal new biology and provide a valuable addition to global reference cell atlases.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12821369/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145855259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA-SeqEZPZ: a point-and-click pipeline for comprehensive transcriptomics analysis with interactive visualizations. RNA-SeqEZPZ:一个点和点击管道综合转录组学分析与交互式可视化。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf133
Cenny Taslim, Yuan Zhang, Galen Rask, Genevieve C Kendall, Emily R Theisen

Background: RNA sequencing (RNA-seq) analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These studies generate billions of reads that require easy-to-run, comprehensive, and reproducible analysis. However, many labs rely on in-house scripts, which can be challenging for bench scientists to use and hinder standardization and reproducibility. While existing RNA-seq pipelines attempt to address these challenges, they often lack a complete end-to-end user interface.

Findings: To bridge this gap, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific applications. The innovation of this pipeline lies in the combination of 3 key features: (i) all software is packaged within a Singularity container, eliminating installation issues; (ii) it offers a graphical, point-and-click interface from raw FASTQ files through differential expression and pathway analysis; and (iii) it includes a Nextflow implementation, enabling scalability and portability for seamless execution across various platforms, including job submission in the cloud and cluster computing. Additionally, RNA-SeqEZPZ generates a comprehensive statistical report and offers an option for batch adjustment to minimize effects of noise due to technical variation across replicates. Reports can also be reviewed by a bioinformatician to ensure the overall quality of the analysis.

Conclusions: RNA-SeqEZPZ is a robust, accessible, and scalable solution for comprehensive RNA-seq analysis, enabling researchers to focus on biological insights rather than computational challenges.

背景:由于大量RNA测序实验成本的降低,RNA- seq分析已成为许多基因组研究实验室的常规任务。这些研究产生了数十亿个读数,需要易于运行、全面和可重复的分析。然而,许多实验室依赖于内部脚本,这对于实验室科学家来说是具有挑战性的,并且阻碍了标准化和可重复性。虽然现有的RNA-Seq管道试图解决这些挑战,但它们往往缺乏完整的端到端用户界面。为了弥补这一差距,我们开发了RNA-SeqEZPZ,这是一种具有用户友好的点击界面的自动化管道,无需编程或生物信息学专业知识即可进行严格且可重复的RNA-Seq分析。对于高级用户,还可以从命令行执行管道,从而允许定制步骤以适应特定的应用程序。该管道的创新之处在于三个关键特性的结合:(1)所有软件都打包在一个Singularity容器中,消除了安装问题;(2)通过差分表达式和路径分析,它提供了一个来自原始FASTQ文件的点击式界面;(3)它包含一个Nextflow版本,实现了可扩展性和可移植性,可以在各种平台上无缝执行,包括云和集群计算中的作业提交。此外,RNA-SeqEZPZ生成全面的统计报告,并提供批量调整选项,以尽量减少由于重复的技术变化而产生的噪音影响。报告也可以由生物信息学家审查,以确保分析的整体质量。结论:RNA-SeqEZPZ是一个强大的、可访问的、可扩展的全面RNA-Seq分析解决方案,使研究人员能够专注于生物学见解,而不是计算挑战。
{"title":"RNA-SeqEZPZ: a point-and-click pipeline for comprehensive transcriptomics analysis with interactive visualizations.","authors":"Cenny Taslim, Yuan Zhang, Galen Rask, Genevieve C Kendall, Emily R Theisen","doi":"10.1093/gigascience/giaf133","DOIUrl":"10.1093/gigascience/giaf133","url":null,"abstract":"<p><strong>Background: </strong>RNA sequencing (RNA-seq) analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These studies generate billions of reads that require easy-to-run, comprehensive, and reproducible analysis. However, many labs rely on in-house scripts, which can be challenging for bench scientists to use and hinder standardization and reproducibility. While existing RNA-seq pipelines attempt to address these challenges, they often lack a complete end-to-end user interface.</p><p><strong>Findings: </strong>To bridge this gap, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific applications. The innovation of this pipeline lies in the combination of 3 key features: (i) all software is packaged within a Singularity container, eliminating installation issues; (ii) it offers a graphical, point-and-click interface from raw FASTQ files through differential expression and pathway analysis; and (iii) it includes a Nextflow implementation, enabling scalability and portability for seamless execution across various platforms, including job submission in the cloud and cluster computing. Additionally, RNA-SeqEZPZ generates a comprehensive statistical report and offers an option for batch adjustment to minimize effects of noise due to technical variation across replicates. Reports can also be reviewed by a bioinformatician to ensure the overall quality of the analysis.</p><p><strong>Conclusions: </strong>RNA-SeqEZPZ is a robust, accessible, and scalable solution for comprehensive RNA-seq analysis, enabling researchers to focus on biological insights rather than computational challenges.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12857227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145495174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The genomes of 5 mantises provide insights into sex chromosome evolution and Mantodea phylogeny clarification. 五种螳螂的基因组提供了性染色体进化和螳螂科系统发育澄清的见解。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-21 DOI: 10.1093/gigascience/giaf158
Hangwei Liu, Lihong Lei, Fan Jiang, Bo Zhang, Hengchao Wang, Yutong Zhang, Hanbo Zhao, Guirong Wang, Wei Fan

Background: Praying mantises, members of the order Mantodea, play important roles in agriculture, medicine, bionics, and entertainment. However, the scarcity of genomic resources has hindered extensive studies on mantis evolution and behavior.

Results: Here, we present the chromosome-scale reference genomes of 5 mantis species: the European mantis (Mantis religiosa), Chinese mantis (Tenodera sinensis), triangle dead leaf mantis (Deroplatys truncata), orchid mantis (Hymenopus coronatus), and metallic mantis (Metallyticus violacea). The assembled genome sizes range from ∼2.3 to 4.2 Gb, with contig N50 size 1-109 Mb and 85%-99% of sequence anchored to chromosomes. The annotated protein-coding gene number ranges from 17,804 to 19,017, with a BUSCO complete rate of 96.7%-98.4%. We found that transposable element expansion is the major force governing genome size in Mantodea and suggest that translocations between the X chromosome and an autosome have occurred in the lineage of the family Mantidae. In addition, we found that the lineage of M. violacea has accumulated fewer substitutions than the lineages of other mantises. Furthermore, our genome-wide analyses showed that D. truncata is sister to H. coronatus compared with M. religiosa and T. sinensis, which helps resolve the phylogenic controversies of the Deroplatys genus.

Conclusions: The high-quality genome assemblies of the 5 mantises provide a valuable resource for evolution studies of Mantodea and genetic improvement and breeding of beneficial biological control agents.

背景:螳螂是螳螂目的一员,在农业、医学、仿生学和娱乐中发挥着重要作用。然而,基因组资源的匮乏阻碍了对螳螂进化和行为的广泛研究。结果:本研究获得了欧洲螳螂(mantis religiosa)、中国螳螂(Tenodera sinensis)、三角死叶螳螂(Deroplatys truncata)、兰花螳螂(hymenus coronatus)和金属螳螂(Metallyticus violacea) 5种螳螂的染色体尺度参考基因组。组装的基因组大小范围为~ 2.3-4.2 Gb,其中N50序列大小为1-109 Mb, 85% -99%的序列锚定在染色体上。注释的蛋白编码基因数为17,804 ~ 19,017个,BUSCO完成率为96.7 ~ 98.4%。我们发现,转座因子扩展是控制螳螂基因组大小的主要力量,并表明X染色体和常染色体之间的易位发生在螳螂家族的谱系中。此外,我们还发现紫毛螳螂的谱系比其他种类的螳螂积累了更少的替换。此外,我们的全基因组分析表明,与宗教支原体和中华支原体相比,truncata是冠状支原体的姐妹,这有助于解决Deroplatys属的系统发育争议。结论:高质量的五种螳螂基因组组合为螳螂的进化研究和有益生物防治剂的遗传改良和选育提供了宝贵的资源。
{"title":"The genomes of 5 mantises provide insights into sex chromosome evolution and Mantodea phylogeny clarification.","authors":"Hangwei Liu, Lihong Lei, Fan Jiang, Bo Zhang, Hengchao Wang, Yutong Zhang, Hanbo Zhao, Guirong Wang, Wei Fan","doi":"10.1093/gigascience/giaf158","DOIUrl":"10.1093/gigascience/giaf158","url":null,"abstract":"<p><strong>Background: </strong>Praying mantises, members of the order Mantodea, play important roles in agriculture, medicine, bionics, and entertainment. However, the scarcity of genomic resources has hindered extensive studies on mantis evolution and behavior.</p><p><strong>Results: </strong>Here, we present the chromosome-scale reference genomes of 5 mantis species: the European mantis (Mantis religiosa), Chinese mantis (Tenodera sinensis), triangle dead leaf mantis (Deroplatys truncata), orchid mantis (Hymenopus coronatus), and metallic mantis (Metallyticus violacea). The assembled genome sizes range from ∼2.3 to 4.2 Gb, with contig N50 size 1-109 Mb and 85%-99% of sequence anchored to chromosomes. The annotated protein-coding gene number ranges from 17,804 to 19,017, with a BUSCO complete rate of 96.7%-98.4%. We found that transposable element expansion is the major force governing genome size in Mantodea and suggest that translocations between the X chromosome and an autosome have occurred in the lineage of the family Mantidae. In addition, we found that the lineage of M. violacea has accumulated fewer substitutions than the lineages of other mantises. Furthermore, our genome-wide analyses showed that D. truncata is sister to H. coronatus compared with M. religiosa and T. sinensis, which helps resolve the phylogenic controversies of the Deroplatys genus.</p><p><strong>Conclusions: </strong>The high-quality genome assemblies of the 5 mantises provide a valuable resource for evolution studies of Mantodea and genetic improvement and breeding of beneficial biological control agents.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12908712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145774156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1