DeepCorr:基于深度学习的新型 3GS 长读数纠错方法

IF 3.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2024-07-26 DOI:10.7717/peerj-cs.2160
Rongshu Wang, Jianhua Chen
{"title":"DeepCorr:基于深度学习的新型 3GS 长读数纠错方法","authors":"Rongshu Wang, Jianhua Chen","doi":"10.7717/peerj-cs.2160","DOIUrl":null,"url":null,"abstract":"Long reads generated by third-generation sequencing (3GS) technologies are involved in many biological analyses and play a vital role due to their ultra-long read length. However, the high error rate affects the downstream process. DeepCorr, a novel error correction algorithm for data from both PacBio and ONT platforms based on deep learning is proposed. The core algorithm adopts a recurrent neural network to capture the long-term dependencies in the long reads to convert the problem of long-read error correction to a multi-classification task. It first aligns the high-precision short reads to long reads to generate the corresponding feature vectors and labels, then feeds these vectors to the neural network, and finally trains the model for prediction and error correction. DeepCorr produces untrimmed corrected long reads and improves the alignment identity while maintaining the length advantage. It can capture and make full use of the dependencies to polish those bases that are not aligned by any short read. DeepCorr achieves better performance than that of the state-of-the-art error correction methods on real-world PacBio and ONT benchmark data sets and consumes fewer computing resources. It is a comprehensive deep learning-based tool that enables one to correct long reads accurately.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"58 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeepCorr: a novel error correction method for 3GS long reads based on deep learning\",\"authors\":\"Rongshu Wang, Jianhua Chen\",\"doi\":\"10.7717/peerj-cs.2160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long reads generated by third-generation sequencing (3GS) technologies are involved in many biological analyses and play a vital role due to their ultra-long read length. However, the high error rate affects the downstream process. DeepCorr, a novel error correction algorithm for data from both PacBio and ONT platforms based on deep learning is proposed. The core algorithm adopts a recurrent neural network to capture the long-term dependencies in the long reads to convert the problem of long-read error correction to a multi-classification task. It first aligns the high-precision short reads to long reads to generate the corresponding feature vectors and labels, then feeds these vectors to the neural network, and finally trains the model for prediction and error correction. DeepCorr produces untrimmed corrected long reads and improves the alignment identity while maintaining the length advantage. It can capture and make full use of the dependencies to polish those bases that are not aligned by any short read. DeepCorr achieves better performance than that of the state-of-the-art error correction methods on real-world PacBio and ONT benchmark data sets and consumes fewer computing resources. It is a comprehensive deep learning-based tool that enables one to correct long reads accurately.\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2160\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2160","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

第三代测序(3GS)技术产生的长读数因其超长读数长度而在许多生物分析中发挥着重要作用。然而,高错误率会影响下游流程。DeepCorr 是一种基于深度学习的新型纠错算法,适用于 PacBio 和 ONT 平台的数据。其核心算法采用递归神经网络捕捉长读数中的长期依赖关系,将长读数纠错问题转化为多分类任务。它首先将高精度短读与长读对齐,生成相应的特征向量和标签,然后将这些向量输入神经网络,最后训练模型进行预测和纠错。DeepCorr 可以生成未经修剪的校正长读数,并在保持长度优势的同时提高比对识别率。它可以捕捉并充分利用依赖关系,打磨那些没有被任何短读数配准的碱基。在实际的 PacBio 和 ONT 基准数据集上,DeepCorr 比最先进的纠错方法取得了更好的性能,而且消耗的计算资源更少。它是一种基于深度学习的综合工具,能准确校正长读数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DeepCorr: a novel error correction method for 3GS long reads based on deep learning
Long reads generated by third-generation sequencing (3GS) technologies are involved in many biological analyses and play a vital role due to their ultra-long read length. However, the high error rate affects the downstream process. DeepCorr, a novel error correction algorithm for data from both PacBio and ONT platforms based on deep learning is proposed. The core algorithm adopts a recurrent neural network to capture the long-term dependencies in the long reads to convert the problem of long-read error correction to a multi-classification task. It first aligns the high-precision short reads to long reads to generate the corresponding feature vectors and labels, then feeds these vectors to the neural network, and finally trains the model for prediction and error correction. DeepCorr produces untrimmed corrected long reads and improves the alignment identity while maintaining the length advantage. It can capture and make full use of the dependencies to polish those bases that are not aligned by any short read. DeepCorr achieves better performance than that of the state-of-the-art error correction methods on real-world PacBio and ONT benchmark data sets and consumes fewer computing resources. It is a comprehensive deep learning-based tool that enables one to correct long reads accurately.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PeerJ Computer Science
PeerJ Computer Science Computer Science-General Computer Science
CiteScore
6.10
自引率
5.30%
发文量
332
审稿时长
10 weeks
期刊介绍: PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.
期刊最新文献
A model integrating attention mechanism and generative adversarial network for image style transfer. Detecting rumors in social media using emotion based deep learning approach. Harnessing AI and analytics to enhance cybersecurity and privacy for collective intelligence systems. Improving synthetic media generation and detection using generative adversarial networks. Intelligent accounting optimization method based on meta-heuristic algorithm and CNN.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1