Effective clustering of microRNA sequences by N-grams and feature weighting

Yuan Yi, J. Guan, Shuigeng Zhou
{"title":"Effective clustering of microRNA sequences by N-grams and feature weighting","authors":"Yuan Yi, J. Guan, Shuigeng Zhou","doi":"10.1109/ISB.2012.6314137","DOIUrl":null,"url":null,"abstract":"MicroRNA (miRNA in short) is a kind of small RNAs that acts as an important post-transcriptional regulator with the Argonaute family of proteins to regulate target mRNAs in animals and plants etc. Since its first recognition as a distinct class of small RNA molecules in the early 1990s, tens of thousands of miRNAs have been identified experimentally or computationally. Currently, the focus of miRNAs study is on single-miRNA functions that usually result in gene silencing and repression. With the rapid increase of miRNAs, biologists have manually organized these miRNAs into biologically meaningful families to facilitate further study. As the members in the same family tend to share similar biochemical functions, a high quality family organization will shed lights on the functions of unknown miRNAs. However, manually grouping large amounts of miRNAs is not only time-consuming but also expensive. In this paper, we employ a clustering method with N-grams and feature weighting to automatically group miRNAs into separate clusters (families). Our method is evaluated with datasets constructed from the online miRNA database miRBase. Experimental results show that the clustering method can successfully distinguishes most miRNA families, and outperforms the traditional K-means clustering algorithm and the average-link clustering approach.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 6th International Conference on Systems Biology (ISB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISB.2012.6314137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

MicroRNA (miRNA in short) is a kind of small RNAs that acts as an important post-transcriptional regulator with the Argonaute family of proteins to regulate target mRNAs in animals and plants etc. Since its first recognition as a distinct class of small RNA molecules in the early 1990s, tens of thousands of miRNAs have been identified experimentally or computationally. Currently, the focus of miRNAs study is on single-miRNA functions that usually result in gene silencing and repression. With the rapid increase of miRNAs, biologists have manually organized these miRNAs into biologically meaningful families to facilitate further study. As the members in the same family tend to share similar biochemical functions, a high quality family organization will shed lights on the functions of unknown miRNAs. However, manually grouping large amounts of miRNAs is not only time-consuming but also expensive. In this paper, we employ a clustering method with N-grams and feature weighting to automatically group miRNAs into separate clusters (families). Our method is evaluated with datasets constructed from the online miRNA database miRBase. Experimental results show that the clustering method can successfully distinguishes most miRNA families, and outperforms the traditional K-means clustering algorithm and the average-link clustering approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于n -图和特征加权的microRNA序列有效聚类
MicroRNA(简称miRNA)是一类小rna,与Argonaute蛋白家族一起作为重要的转录后调节剂,调控动物、植物等的靶mrna。自20世纪90年代初首次将其识别为一类独特的小RNA分子以来,数以万计的mirna已经通过实验或计算被识别出来。目前,mirna研究的重点是单mirna功能,通常导致基因沉默和抑制。随着mirna的快速增加,生物学家已经将这些mirna手工组织成具有生物学意义的家族,以方便进一步的研究。由于同一家族的成员往往具有相似的生化功能,高质量的家族组织将有助于揭示未知mirna的功能。然而,手工对大量的mirna进行分组不仅耗时而且昂贵。在本文中,我们采用N-grams和特征加权的聚类方法将mirna自动分组到单独的簇(族)中。我们的方法用在线miRNA数据库miRBase构建的数据集进行了评估。实验结果表明,该聚类方法能够成功区分大多数miRNA家族,优于传统的K-means聚类算法和平均链接聚类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A fixed-point blind source extraction algorithm and its application to ECG data analysis Comparing two models based on the transcriptional regulation by KaiC of cyanobacteria rhythm Predicting protein complexes via the integration of multiple biological information Effective clustering of microRNA sequences by N-grams and feature weighting RNA-seq coverage effects on biological pathways and GO tag clouds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1