A Novel Approach for Clustering High Dimensional Data Using Kernal Hubness

M. Amina, Farook K. Syed
{"title":"A Novel Approach for Clustering High Dimensional Data Using Kernal Hubness","authors":"M. Amina, Farook K. Syed","doi":"10.1109/ICACC.2015.67","DOIUrl":null,"url":null,"abstract":"Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.","PeriodicalId":368544,"journal":{"name":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACC.2015.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种利用核中心聚类高维数据的新方法
目前,高维数据的聚类已经成为一个非常繁琐的过程,几乎在各个领域都可以看到。高维数据的主要缺点是维度的诅咒。随着数据集规模的增长,数据点变得稀疏,区域密度变得更小,使得数据难以聚类,这进一步降低了用于聚类的传统算法的性能。为了路由这些工具,引入了基于hub的算法。这些算法影响数据点在k近邻之间的分布。中心度是一种非引导方法,它可以找出哪些点在数据集中最近邻中出现的频率比其他点高。基于集线器的聚类主要采用K-hubs、Hubness比例聚类和Hubness比例K-means三种算法。K-hubs算法用于初始化集群的hub。采用huness Proportional Clustering (HPC)算法对概率数据模型进行分组。轮毂比例k -均值(HPKM)算法集成了基于轮毂的质心选择和划分过程。这些算法主要用于提高系统的效率和预测精度。该方法的主要缺点是迭代次数随维数的增加而增加。为了克服这一缺点,提出了一种基于核映射和中心现象相结合的新算法。该算法检测数据集中任意形状的聚类,并通过最小化簇内距离和最大化簇间距离来提高聚类性能,从而提高聚类质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementation of NTCIP in Road Traffic Controllers for Traffic Signal Coordination AutoScaling of VM in Private And Public Cloud Environment with Debt Assessment Fuzzy Cautious Adaptive Random Early Detection for Heterogeneous Network Enhancing the Accuracy of Movie Recommendation System Based on Probabilistic Data Structure and Graph Database Compact Band Notched UWB Filter for Wireless Communication Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1