通过机器学习从植物基因组中自动管理LTR反转录转座子文库。

IF 1.5 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Integrative Bioinformatics Pub Date : 2022-07-12 eCollection Date: 2022-09-01 DOI:10.1515/jib-2021-0036
Simon Orozco-Arias, Mariana S Candamil-Cortes, Paula A Jaimes, Estiven Valencia-Castrillon, Reinel Tabares-Soto, Gustavo Isaza, Romain Guyot
{"title":"通过机器学习从植物基因组中自动管理LTR反转录转座子文库。","authors":"Simon Orozco-Arias,&nbsp;Mariana S Candamil-Cortes,&nbsp;Paula A Jaimes,&nbsp;Estiven Valencia-Castrillon,&nbsp;Reinel Tabares-Soto,&nbsp;Gustavo Isaza,&nbsp;Romain Guyot","doi":"10.1515/jib-2021-0036","DOIUrl":null,"url":null,"abstract":"<p><p>Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (<i>Oryza granulata</i>) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.\",\"authors\":\"Simon Orozco-Arias,&nbsp;Mariana S Candamil-Cortes,&nbsp;Paula A Jaimes,&nbsp;Estiven Valencia-Castrillon,&nbsp;Reinel Tabares-Soto,&nbsp;Gustavo Isaza,&nbsp;Romain Guyot\",\"doi\":\"10.1515/jib-2021-0036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (<i>Oryza granulata</i>) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.</p>\",\"PeriodicalId\":53625,\"journal\":{\"name\":\"Journal of Integrative Bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2022-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Integrative Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/jib-2021-0036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2021-0036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

转座因子是一种可移动的序列,它可以移动并插入到染色体中,在内部或外部刺激下激活,使生物体具有适应环境的能力。在基因组数据中标注转座因子目前被认为是理解生物体关键方面的关键任务,如表型变异性、物种进化和基因组大小等。由于它们复制的方式,LTR逆转录转座子是植物中最常见的转座子,在某些情况下占所有DNA信息的80%。为了标注这些元素,通常创建一个参考文库,执行一个管理过程,消除TE片段和假阳性,然后使用同源性方法在基因组中进行标注。然而,管理过程可能需要数周时间,需要大量的手工工作和多个耗时的生物信息学软件的执行。在这里,我们提出了一种基于机器学习的方法来对植物基因组自动执行这一过程,获得高达91.18%的f1得分。该方法在4种植物中进行了测试,仅用22.61 s就获得了93.6%的f1分数(Oryza granulata),而生物信息学方法大约需要6小时。这表明基于ml的方法是有效的,可以用于大规模测序项目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.

Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Integrative Bioinformatics
Journal of Integrative Bioinformatics Medicine-Medicine (all)
CiteScore
3.10
自引率
5.30%
发文量
27
审稿时长
12 weeks
期刊最新文献
MCMVDRP: a multi-channel multi-view deep learning framework for cancer drug response prediction. Leonhard Med, a trusted research environment for processing sensitive research data. Exploring animal behaviour multilayer networks in immersive environments - a conceptual framework. Inferences on the evolution of the ascorbic acid synthesis pathway in insects using Phylogenetic Tree Collapser (PTC), a tool for the automated collapsing of phylogenetic trees using taxonomic information. Specifications of standards in systems and synthetic biology: status, developments, and tools in 2024.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1