技术和生物学变异对RNA-Seq数据集从头组装影响的综合分析。

IF 2.3 Q3 BIOCHEMICAL RESEARCH METHODS Bioinformatics and Biology Insights Pub Date : 2024-12-05 eCollection Date: 2024-01-01 DOI:10.1177/11779322241274957
Gonzalez Sergio Alberto, Rivarola Maximo, Ribone Andres, Lew Sergio, Paniego Norma
{"title":"技术和生物学变异对RNA-Seq数据集从头组装影响的综合分析。","authors":"Gonzalez Sergio Alberto, Rivarola Maximo, Ribone Andres, Lew Sergio, Paniego Norma","doi":"10.1177/11779322241274957","DOIUrl":null,"url":null,"abstract":"<p><p>De novo assembly of transcriptomes from species without reference genome remains a common problem in functional genomics. While methods and algorithms for transcriptome assembly are continually being developed and published, the quality of de novo assemblies using short reads depends on the complexity of the transcriptome and is limited by several types of errors. One problem to overcome is the research gap regarding the best method to use in each study to obtain high-quality de novo assembly. Currently, there are no established protocols for solving the assembly problem considering the transcriptome complexity. In addition, the accuracy of quality metrics used to evaluate assemblies remains unclear. In this study, we investigate and discuss how different variables accounting for the complexity of RNA-Seq data influence assembly results independently of the software used. For this purpose, we simulated transcriptomic short-read sequence datasets from high-quality full-length predicted transcript models with varying degrees of complexity. Subsequently, we conducted de novo assemblies using different assembly programs, and compared and classified the results using both reference-dependent and independent metrics. These metrics were assessed both individually and combined through multivariate analysis. The degree of alternative splicing and the fragment size of the paired-end reads were identified as the variables with the greatest influence on the assembly results. Moreover, read length and fragment size had different influences on the reconstruction of longer and shorter transcripts. These results underscore the importance of understanding the composition of the transcriptome under study, and making experimental design decisions related to the need to work with reads and fragments of different sizes. In addition, the choice of assembly software will positively impact the final assembly outcome. This selection will affect the completeness of represented genes and assembled isoforms, as well as contribute to error reduction.</p>","PeriodicalId":9065,"journal":{"name":"Bioinformatics and Biology Insights","volume":"18 ","pages":"11779322241274957"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622296/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Analysis of the Influence of Technical and Biological Variations on De Novo Assembly of RNA-Seq Datasets.\",\"authors\":\"Gonzalez Sergio Alberto, Rivarola Maximo, Ribone Andres, Lew Sergio, Paniego Norma\",\"doi\":\"10.1177/11779322241274957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>De novo assembly of transcriptomes from species without reference genome remains a common problem in functional genomics. While methods and algorithms for transcriptome assembly are continually being developed and published, the quality of de novo assemblies using short reads depends on the complexity of the transcriptome and is limited by several types of errors. One problem to overcome is the research gap regarding the best method to use in each study to obtain high-quality de novo assembly. Currently, there are no established protocols for solving the assembly problem considering the transcriptome complexity. In addition, the accuracy of quality metrics used to evaluate assemblies remains unclear. In this study, we investigate and discuss how different variables accounting for the complexity of RNA-Seq data influence assembly results independently of the software used. For this purpose, we simulated transcriptomic short-read sequence datasets from high-quality full-length predicted transcript models with varying degrees of complexity. Subsequently, we conducted de novo assemblies using different assembly programs, and compared and classified the results using both reference-dependent and independent metrics. These metrics were assessed both individually and combined through multivariate analysis. The degree of alternative splicing and the fragment size of the paired-end reads were identified as the variables with the greatest influence on the assembly results. Moreover, read length and fragment size had different influences on the reconstruction of longer and shorter transcripts. These results underscore the importance of understanding the composition of the transcriptome under study, and making experimental design decisions related to the need to work with reads and fragments of different sizes. In addition, the choice of assembly software will positively impact the final assembly outcome. This selection will affect the completeness of represented genes and assembled isoforms, as well as contribute to error reduction.</p>\",\"PeriodicalId\":9065,\"journal\":{\"name\":\"Bioinformatics and Biology Insights\",\"volume\":\"18 \",\"pages\":\"11779322241274957\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622296/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics and Biology Insights\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11779322241274957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics and Biology Insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11779322241274957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

没有参考基因组的物种转录组从头组装仍然是功能基因组学中的一个常见问题。虽然转录组组装的方法和算法不断被开发和发表,但使用短读段的从头组装的质量取决于转录组的复杂性,并受到几种类型错误的限制。需要克服的一个问题是,关于在每个研究中使用最佳方法以获得高质量的从头组装的研究差距。考虑到转录组的复杂性,目前尚无解决组装问题的既定方案。此外,用于评估组件的质量度量的准确性仍然不清楚。在本研究中,我们调查并讨论了考虑RNA-Seq数据复杂性的不同变量如何独立于所使用的软件影响组装结果。为此,我们模拟了来自高质量全长预测转录模型的转录组短读序列数据集,这些模型具有不同程度的复杂性。随后,我们使用不同的装配程序进行了从头组装,并使用参考依赖和独立指标对结果进行了比较和分类。通过多变量分析对这些指标进行单独评估和组合评估。选择剪接的程度和成对末端reads的片段大小是影响组装结果的最大变量。此外,读取长度和片段大小对较长和较短转录本的重建有不同的影响。这些结果强调了理解所研究的转录组的组成的重要性,以及做出与不同大小的reads和片段相关的实验设计决策的重要性。此外,装配软件的选择将对最终的装配结果产生积极的影响。这种选择将影响所代表基因和组装同种异构体的完整性,并有助于减少错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comprehensive Analysis of the Influence of Technical and Biological Variations on De Novo Assembly of RNA-Seq Datasets.

De novo assembly of transcriptomes from species without reference genome remains a common problem in functional genomics. While methods and algorithms for transcriptome assembly are continually being developed and published, the quality of de novo assemblies using short reads depends on the complexity of the transcriptome and is limited by several types of errors. One problem to overcome is the research gap regarding the best method to use in each study to obtain high-quality de novo assembly. Currently, there are no established protocols for solving the assembly problem considering the transcriptome complexity. In addition, the accuracy of quality metrics used to evaluate assemblies remains unclear. In this study, we investigate and discuss how different variables accounting for the complexity of RNA-Seq data influence assembly results independently of the software used. For this purpose, we simulated transcriptomic short-read sequence datasets from high-quality full-length predicted transcript models with varying degrees of complexity. Subsequently, we conducted de novo assemblies using different assembly programs, and compared and classified the results using both reference-dependent and independent metrics. These metrics were assessed both individually and combined through multivariate analysis. The degree of alternative splicing and the fragment size of the paired-end reads were identified as the variables with the greatest influence on the assembly results. Moreover, read length and fragment size had different influences on the reconstruction of longer and shorter transcripts. These results underscore the importance of understanding the composition of the transcriptome under study, and making experimental design decisions related to the need to work with reads and fragments of different sizes. In addition, the choice of assembly software will positively impact the final assembly outcome. This selection will affect the completeness of represented genes and assembled isoforms, as well as contribute to error reduction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Bioinformatics and Biology Insights
Bioinformatics and Biology Insights BIOCHEMICAL RESEARCH METHODS-
CiteScore
6.80
自引率
1.70%
发文量
36
审稿时长
8 weeks
期刊介绍: Bioinformatics and Biology Insights is an open access, peer-reviewed journal that considers articles on bioinformatics methods and their applications which must pertain to biological insights. All papers should be easily amenable to biologists and as such help bridge the gap between theories and applications.
期刊最新文献
Cliotide U1, a Novel Antimicrobial Peptide Isolated From Urtica Dioica Leaves. Resistify: A Novel NLR Classifier That Reveals Helitron-Associated NLR Expansion in Solanaceae. Advancing Regulatory Genomics With Machine Learning. Bioinformatic Annotation of Transposon DNA Processing Genes on the Long-Read Genome Assembly of Caenorhabditis elegans. Comprehensive Analysis of CRISPR-Cas Systems and Their Influence on Antibiotic Resistance in Salmonella enterica Strains.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1