SeqKit2: A Swiss army knife for sequence and alignment processing

IF 23.7 Q1 MICROBIOLOGY iMeta Pub Date : 2024-04-05 DOI:10.1002/imt2.191
Wei Shen, Botond Sipos, Liuyang Zhao
{"title":"SeqKit2: A Swiss army knife for sequence and alignment processing","authors":"Wei Shen,&nbsp;Botond Sipos,&nbsp;Liuyang Zhao","doi":"10.1002/imt2.191","DOIUrl":null,"url":null,"abstract":"<p>In the era of ubiquitous high-throughput sequencing studies, there is a growing need for analysis tools that are not just performant but also comprehensive and user-friendly enough to cater to both novice and advanced users. This article introduces SeqKit2, the next iteration of the widely used sequence analysis tool SeqKit, featuring expanded functionality, performance optimizations, and support for additional compression methods. Retaining a pragmatic subcommand architecture, SeqKit2 represents substantial enhancement through the inclusion of 19 additional subcommands, expanding its overall repertoire to a total of 38 in eight categories. The new subcommands add functionality such as amplicon processing and robust, error-tolerant parsing of sequence records. In addition, three subcommands designed for real-time analysis are added for periodic monitoring of properties of FASTQ and Binary Alignment/Map alignment records and real-time streaming from multiple sequence files. The performance of SeqKit2 is benchmarked against the old version of SeqKit, Bioawk, Seqtk, and SeqFu tools. SeqKit2 consistently outperforms its predecessor, albeit with marginally higher memory usage, while maintaining competitive runtimes against other tools. With its broad functionality, proven usability, and ongoing development driven by user feedback, we hope that bioinformaticians will find SeqKit2 useful as a “Swiss army knife” of sequence and alignment processing—equally adept at facilitating ad hoc analyses and seamlessly integrating into larger pipelines.</p>","PeriodicalId":73342,"journal":{"name":"iMeta","volume":null,"pages":null},"PeriodicalIF":23.7000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/imt2.191","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iMeta","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/imt2.191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

In the era of ubiquitous high-throughput sequencing studies, there is a growing need for analysis tools that are not just performant but also comprehensive and user-friendly enough to cater to both novice and advanced users. This article introduces SeqKit2, the next iteration of the widely used sequence analysis tool SeqKit, featuring expanded functionality, performance optimizations, and support for additional compression methods. Retaining a pragmatic subcommand architecture, SeqKit2 represents substantial enhancement through the inclusion of 19 additional subcommands, expanding its overall repertoire to a total of 38 in eight categories. The new subcommands add functionality such as amplicon processing and robust, error-tolerant parsing of sequence records. In addition, three subcommands designed for real-time analysis are added for periodic monitoring of properties of FASTQ and Binary Alignment/Map alignment records and real-time streaming from multiple sequence files. The performance of SeqKit2 is benchmarked against the old version of SeqKit, Bioawk, Seqtk, and SeqFu tools. SeqKit2 consistently outperforms its predecessor, albeit with marginally higher memory usage, while maintaining competitive runtimes against other tools. With its broad functionality, proven usability, and ongoing development driven by user feedback, we hope that bioinformaticians will find SeqKit2 useful as a “Swiss army knife” of sequence and alignment processing—equally adept at facilitating ad hoc analyses and seamlessly integrating into larger pipelines.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SeqKit2:序列和比对处理的瑞士军刀
在高通量测序研究无处不在的时代,人们对分析工具的需求与日俱增,这些工具不仅要性能卓越,还要功能全面、界面友好,既能满足新手用户的需求,也能满足高级用户的需求。本文介绍 SeqKit2,它是广泛使用的序列分析工具 SeqKit 的下一代迭代产品,具有扩展的功能、性能优化和对其他压缩方法的支持。SeqKit2 保留了实用的子命令架构,通过加入 19 个额外的子命令实现了实质性的增强,将其总体功能扩展到 8 类共 38 个。新的子命令增加了一些功能,如扩增子处理和稳健、容错的序列记录解析。此外,还增加了三个专为实时分析设计的子命令,用于定期监测 FASTQ 和二进制配准/图配准记录的属性,以及从多个序列文件中实时流式传输。SeqKit2 的性能以旧版 SeqKit、Bioawk、Seqtk 和 SeqFu 工具为基准。SeqKit2 的性能始终优于其前身,尽管内存使用率略高,但运行时间与其他工具相比仍具有竞争力。SeqKit2 具有广泛的功能、经过验证的可用性以及根据用户反馈进行的持续开发,我们希望生物信息学家会发现 SeqKit2 作为序列和比对处理的 "瑞士军刀 "非常有用--既能促进特别分析,又能无缝集成到更大的流水线中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
10.80
自引率
0.00%
发文量
0
期刊最新文献
Issue Information Novel microbial modifications of bile acids and their functional implications The rheumatoid arthritis gut microbial biobank reveals core microbial species that associate and effect on host inflammation and autoimmune responses Akkermansia muciniphila administration ameliorates streptozotocin-induced hyperglycemia and muscle atrophy by promoting IGF2 secretion from mouse intestine iNAP 2.0: Harnessing metabolic complementarity in microbial network analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1