利用增量多重介质聚类烟曲霉DNA序列

T. Ajayan, P. Sony, Janu R. Panicker, S. Shailesh
{"title":"利用增量多重介质聚类烟曲霉DNA序列","authors":"T. Ajayan, P. Sony, Janu R. Panicker, S. Shailesh","doi":"10.1109/ICACC.2015.19","DOIUrl":null,"url":null,"abstract":"Clustering DNA sequences of Aspergillus fumigatus is a process that groups a set of sequences into clusters such that the similarity among sequences in the same cluster is high, while that among the sequences in different clusters is low. The main objective of this clustering is to obtain a more refined clustering techinque inorder to analyze biological data and to bunch DNA sequences to many clusters more easily. CDHIT and DNACLUST are the two existing approaches used in bioinformatics for clustering sequences. The major disadvantage of both approach is that longest sequence is selected as the cluster representative. As DNA sequences are enomorous in number, the traditional clustering algorithm are infeasible for analysis. To handle such large DNA sequences, a modified version of incremental clustering using multiple medoids has been proposed. The key idea is to find multiple representative sequences like medoids to represent a cluster in a chunk and final DNA analysis is carried out based on those identified medoids from all the chunks. The main advantage of this incremental clustering is that it uses multiple medoids to represent each cluster in each chunk which capture the pattern structure more accurately. Not only that it overcomes the disadvantages of existing techniques but also has the mechanism to make use of DNA sequence relationship among those identified medoids that serves as a side information to help the final DNA sequence clustering. The proposed incremental approach outperforms existing clustering approaches in terms of clustering accuracy.","PeriodicalId":368544,"journal":{"name":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering DNA Sequences of Aspergillus Fumigatus Using Incremental Multiple Medoids\",\"authors\":\"T. Ajayan, P. Sony, Janu R. Panicker, S. Shailesh\",\"doi\":\"10.1109/ICACC.2015.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering DNA sequences of Aspergillus fumigatus is a process that groups a set of sequences into clusters such that the similarity among sequences in the same cluster is high, while that among the sequences in different clusters is low. The main objective of this clustering is to obtain a more refined clustering techinque inorder to analyze biological data and to bunch DNA sequences to many clusters more easily. CDHIT and DNACLUST are the two existing approaches used in bioinformatics for clustering sequences. The major disadvantage of both approach is that longest sequence is selected as the cluster representative. As DNA sequences are enomorous in number, the traditional clustering algorithm are infeasible for analysis. To handle such large DNA sequences, a modified version of incremental clustering using multiple medoids has been proposed. The key idea is to find multiple representative sequences like medoids to represent a cluster in a chunk and final DNA analysis is carried out based on those identified medoids from all the chunks. The main advantage of this incremental clustering is that it uses multiple medoids to represent each cluster in each chunk which capture the pattern structure more accurately. Not only that it overcomes the disadvantages of existing techniques but also has the mechanism to make use of DNA sequence relationship among those identified medoids that serves as a side information to help the final DNA sequence clustering. The proposed incremental approach outperforms existing clustering approaches in terms of clustering accuracy.\",\"PeriodicalId\":368544,\"journal\":{\"name\":\"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACC.2015.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACC.2015.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

烟曲霉DNA序列聚类是指将一组序列聚类,使同一聚类中的序列相似性较高,而不同聚类中的序列相似性较低的过程。这种聚类的主要目的是获得一种更精细的聚类技术,以便更容易地分析生物数据和将DNA序列聚类成许多簇。CDHIT和DNACLUST是生物信息学中用于序列聚类的两种现有方法。这两种方法的主要缺点是选择最长序列作为聚类代表。由于DNA序列数量庞大,传统的聚类算法难以进行分析。为了处理如此大的DNA序列,提出了一种使用多介质的增量聚类的改进版本。该方法的关键思想是找到多个具有代表性的序列,如介质序列来表示块中的一个簇,并基于从所有块中识别出的介质序列进行最终的DNA分析。这种增量聚类的主要优点是,它使用多个介质来表示每个块中的每个簇,从而更准确地捕获模式结构。该方法不仅克服了现有方法的不足,而且具有利用被鉴定介质之间的DNA序列关系作为侧信息来帮助最终DNA序列聚类的机制。所提出的增量方法在聚类精度方面优于现有的聚类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Clustering DNA Sequences of Aspergillus Fumigatus Using Incremental Multiple Medoids
Clustering DNA sequences of Aspergillus fumigatus is a process that groups a set of sequences into clusters such that the similarity among sequences in the same cluster is high, while that among the sequences in different clusters is low. The main objective of this clustering is to obtain a more refined clustering techinque inorder to analyze biological data and to bunch DNA sequences to many clusters more easily. CDHIT and DNACLUST are the two existing approaches used in bioinformatics for clustering sequences. The major disadvantage of both approach is that longest sequence is selected as the cluster representative. As DNA sequences are enomorous in number, the traditional clustering algorithm are infeasible for analysis. To handle such large DNA sequences, a modified version of incremental clustering using multiple medoids has been proposed. The key idea is to find multiple representative sequences like medoids to represent a cluster in a chunk and final DNA analysis is carried out based on those identified medoids from all the chunks. The main advantage of this incremental clustering is that it uses multiple medoids to represent each cluster in each chunk which capture the pattern structure more accurately. Not only that it overcomes the disadvantages of existing techniques but also has the mechanism to make use of DNA sequence relationship among those identified medoids that serves as a side information to help the final DNA sequence clustering. The proposed incremental approach outperforms existing clustering approaches in terms of clustering accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementation of NTCIP in Road Traffic Controllers for Traffic Signal Coordination AutoScaling of VM in Private And Public Cloud Environment with Debt Assessment Fuzzy Cautious Adaptive Random Early Detection for Heterogeneous Network Enhancing the Accuracy of Movie Recommendation System Based on Probabilistic Data Structure and Graph Database Compact Band Notched UWB Filter for Wireless Communication Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1