Evaluating long-read assemblers to assemble several aphididae genomes.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2025-03-04 DOI:10.1093/bib/bbaf105
Nicolaas F V Burger, Vittorio F Nicolis, Anna-Maria Botha
{"title":"Evaluating long-read assemblers to assemble several aphididae genomes.","authors":"Nicolaas F V Burger, Vittorio F Nicolis, Anna-Maria Botha","doi":"10.1093/bib/bbaf105","DOIUrl":null,"url":null,"abstract":"<p><p>Aphids are a speciose family of the Hemiptera compromising >5500 species. They have adapted to feed off multiple plant species and occur on every continent on Earth. Although economically devastating, very few aphid genomes have been sequenced and assembled, and those that have suffer low contiguity due to repeat-rich and AT-rich genomes. With third-generation sequencing becoming more affordable and approaching quality levels to that of second-generation sequencing, the ability to produce more contiguous aphid genome assemblies is becoming a reality. With a growing list of long-read assemblers becoming available, the choice of which assembly tool to use becomes more complicated. In this study, six recently released long-read assemblers (Canu, Flye, Hifiasm, Mecat2, Raven, and Wtdbg2) were evaluated on several quality and contiguity metrics after assembling four populations (or biotypes) of the same species (Russian wheat aphid, Diuraphis noxia) and two unrelated aphid species that have publicly available long-read sequences. All assemblers did not fare equally well between the different read sets, but, overall, the Hifiasm and Canu assemblers performed the best. Merging of the best assemblies for each read set was also performed using quickmerge, where, in some cases, it resulted in superior assemblies and, in others, introduced more errors. Ab initio gene calling between assemblies of the same read set also showed surprisingly less similarity than expected. Overall, the quality control pipeline followed during the assembly resulted in chromosome-level assemblies with minimal structural or quality artefacts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904405/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf105","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Aphids are a speciose family of the Hemiptera compromising >5500 species. They have adapted to feed off multiple plant species and occur on every continent on Earth. Although economically devastating, very few aphid genomes have been sequenced and assembled, and those that have suffer low contiguity due to repeat-rich and AT-rich genomes. With third-generation sequencing becoming more affordable and approaching quality levels to that of second-generation sequencing, the ability to produce more contiguous aphid genome assemblies is becoming a reality. With a growing list of long-read assemblers becoming available, the choice of which assembly tool to use becomes more complicated. In this study, six recently released long-read assemblers (Canu, Flye, Hifiasm, Mecat2, Raven, and Wtdbg2) were evaluated on several quality and contiguity metrics after assembling four populations (or biotypes) of the same species (Russian wheat aphid, Diuraphis noxia) and two unrelated aphid species that have publicly available long-read sequences. All assemblers did not fare equally well between the different read sets, but, overall, the Hifiasm and Canu assemblers performed the best. Merging of the best assemblies for each read set was also performed using quickmerge, where, in some cases, it resulted in superior assemblies and, in others, introduced more errors. Ab initio gene calling between assemblies of the same read set also showed surprisingly less similarity than expected. Overall, the quality control pipeline followed during the assembly resulted in chromosome-level assemblies with minimal structural or quality artefacts.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估长读汇编器组装多个蚜虫基因组。
蚜虫是半翅目昆虫的一个种科,约有5000种。它们已经适应了以多种植物为食,并且在地球上的每个大陆都有出现。尽管在经济上具有破坏性,但很少有蚜虫基因组被测序和组装,而且那些由于重复序列丰富和at丰富的基因组而遭受低连续性的蚜虫基因组。随着第三代测序变得更加经济实惠,并且质量水平接近第二代测序,生产更多连续蚜虫基因组组装的能力正在成为现实。随着越来越多的长读汇编程序的出现,选择使用哪种汇编工具变得更加复杂。在本研究中,对6个最近发布的长读序列组装体(Canu、Flye、Hifiasm、Mecat2、Raven和Wtdbg2)进行了质量和邻近性评价,并对同一种(俄罗斯小麦蚜、Diuraphis noxia)的4个群体(或生物型)和2个具有公开长读序列的不相关蚜虫物种进行了组装。并不是所有的汇编程序在不同的读取集之间表现都一样好,但是,总的来说,Hifiasm和Canu汇编程序表现最好。对每个读集的最佳程序集的合并也使用quickmerge,在某些情况下,它会产生更好的程序集,而在其他情况下,则会引入更多的错误。在相同读取集的集合之间,从头算基因调用也显示出出乎意料的低相似性。总体而言,在装配过程中遵循的质量控制管道导致染色体水平的装配具有最小的结构或质量人工制品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
期刊最新文献
EpGAT: integrating epigenetics and 3D genome structure to predict alternative splicing and polyadenylation. Could statistical potential models achieve comparable or better performance than deep learning models? Integrating feature selection with unsupervised deep embedding for clustering single-cell RNA-seq data. Master of Metals2: a graph neural network based architecture for the prediction of zinc binding sites in protein structures. ORANGE: a machine learning approach for modeling tissue-specific aging from transcriptomic data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1