用于从头转录组组装的短读序列聚类

Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi
{"title":"用于从头转录组组装的短读序列聚类","authors":"Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi","doi":"10.22059/PBS.2014.50305","DOIUrl":null,"url":null,"abstract":"Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.","PeriodicalId":20726,"journal":{"name":"Progress in Biological Sciences","volume":"114 1","pages":"43-52"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering of Short Read Sequences for de novo Transcriptome Assembly\",\"authors\":\"Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi\",\"doi\":\"10.22059/PBS.2014.50305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.\",\"PeriodicalId\":20726,\"journal\":{\"name\":\"Progress in Biological Sciences\",\"volume\":\"114 1\",\"pages\":\"43-52\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Progress in Biological Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22059/PBS.2014.50305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in Biological Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22059/PBS.2014.50305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

鉴于转录组分析在各种生物学研究中的重要性,并考虑到大量的全转录组测序数据,似乎有必要开发一种算法来组装转录组数据。在这项研究中,我们提出了一种在没有参考基因组的情况下转录组组装的算法。首先,利用不同k-mer长度的de Bruijn图生成连续序列。然后,折衷的混合序列被收集起来,以形成最终的序列。最后,对连续序列进行聚类并给出同型群。该算法能够生成长连续序列,并准确地将其聚类为同形组。为了评估我们的算法,我们将其应用于大鼠转录组的模拟RNA-seq数据集和loricaria gr. cataphracta转录组的实际RNA-seq实验。组装的contigs的正确性超过95%,我们的算法能够在超过80%的转录本长度上重建超过70%的转录本。这项研究表明,应用复杂的合并方法可以改善转录组组装。源代码可通过电子邮件联系相应的作者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluation of the enzymatic activity of halotolerant fungi isolated from Tehran's forest parks Transected sciatic nerve repair using poly L-lactic-co-glycolic acid nanofiber scaffolds with nerve growth factor in rat Physico-chemical and biological properties of electrospun poly-L-lactic acid/hydroxyapatite nano-composite scaffold for bone regeneration Agricultural wastes: Prospective Entomocides in Management of Cowpea Bruchids Callosobruchus maculatus (Fab.) [Coleoptera: Chrysomelidae] Interspecies interactions of halophilic and halotlerant actinomycetes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1