Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

IF 1.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Journal of Nucleic Acids Pub Date : 2012-01-01 Epub Date: 2012-11-07 DOI:10.1155/2012/652979
Philip H Williams, Rod Eyles, Georg Weiller
{"title":"Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.","authors":"Philip H Williams, Rod Eyles, Georg Weiller","doi":"10.1155/2012/652979","DOIUrl":null,"url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require \"read count\" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.</p>","PeriodicalId":16575,"journal":{"name":"Journal of Nucleic Acids","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nucleic Acids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2012/652979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/11/7 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过使用 C5.0 决策树的监督机器学习进行植物 MicroRNA 预测。
微小核糖核酸(miRNA)是长度在 20 到 22 个核苷酸之间的非蛋白质编码 RNA,可减少蛋白质的产生。目前正在研究不同类型的序列数据,包括基因组和转录组序列,以寻找新型 miRNA。多种机器学习方法已成功预测了 miRNA 前体、成熟 miRNA 和其他非蛋白编码序列。MirTools、mirDeep2 和 miRanalyzer 要求在输入序列中包含 "读数",这限制了它们在深度测序数据中的应用。我们的目标是利用不同物种的横截面来训练一个预测器,以准确预测训练集之外的 miRNA。我们希望这个系统在预测时不需要读数,因此可以应用于从基因组、EST 或 RNA-seq 数据源中提取的短序列。我们通过监督机器学习开发了一种 miRNA 预测决策树模型。它只需要在包括候选前体的序列窗口内有相应的基因组或转录组,这样就能收集到所需的序列特征。训练预测器的一些最关键特征是 miRNA:miRNA(∗) 双链的能量和双链中错配的数量。我们介绍了一种跨物种植物 miRNA 预测器,该预测器的灵敏度为 84.08%,特异性为 98.53%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Nucleic Acids
Journal of Nucleic Acids BIOCHEMISTRY & MOLECULAR BIOLOGY-
CiteScore
3.10
自引率
21.70%
发文量
5
审稿时长
12 weeks
期刊最新文献
Dual Detection of Hepatitis B and C Viruses Using CRISPR-Cas Systems and Lateral Flow Assay. Genetic Polymorphisms and Forensic Parameters of Thirteen X-Chromosome Markers in the Iraqi Kurdish Population Synthesis and Evaluation of MGB Polyamide-Oligonucleotide Conjugates as Gene Expression Control Compounds. Comparing Two Methods for the Isolation of Exosomes. Development of a Reference Method and Materials for Quantitative Measurement of UV-Induced DNA Damage in Mammalian Cells: Comparison of Comet Assay and Cell Viability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1