Online incremental clustering with distance metric learning for high dimensional data

S. Okada, T. Nishida
{"title":"Online incremental clustering with distance metric learning for high dimensional data","authors":"S. Okada, T. Nishida","doi":"10.1109/IJCNN.2011.6033478","DOIUrl":null,"url":null,"abstract":"In this paper, we present a novel incremental clustering algorithm which assigns of a set of observations into clusters and learns the distance metric iteratively in an incremental manner. The proposed algorithm SOINN-AML is composed based on the Self-organizing Incremental Neural Network (Shen et al 2006), which represents the distribution of unlabeled data and reports a reasonable number of clusters. SOINN adopts a competitive Hebbian rule for each input signal, and distance between nodes is measured using the Euclidean distance. Such algorithms rely on the distance metric for the input data patterns. Distance Metric Learning (DML) learns a distance metric for the high dimensional input space of data that preserves the distance relation among the training data. DML is not performed for input space of data in SOINN based approaches. SOINN-AML learns input space of data by using the Adaptive Distance Metric Learning (AML) algorithm which is one of the DML algorithms. It improves the incremental clustering performance of the SOINN algorithm by optimizing the distance metric in the case that input data space is high dimensional. In experimental results, we evaluate the performance by using two artificial datasets, seven real datasets from the UCI dataset and three real image datasets. We have found that the proposed algorithm outperforms conventional algorithms including SOINN (Shen et al 2006) and Enhanced SOINN (Shen et al 2007). The improvement of clustering accuracy (NMI) is between 0.03 and 0.13 compared to state of the art SOINN based approaches.","PeriodicalId":415833,"journal":{"name":"The 2011 International Joint Conference on Neural Networks","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2011 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2011.6033478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

In this paper, we present a novel incremental clustering algorithm which assigns of a set of observations into clusters and learns the distance metric iteratively in an incremental manner. The proposed algorithm SOINN-AML is composed based on the Self-organizing Incremental Neural Network (Shen et al 2006), which represents the distribution of unlabeled data and reports a reasonable number of clusters. SOINN adopts a competitive Hebbian rule for each input signal, and distance between nodes is measured using the Euclidean distance. Such algorithms rely on the distance metric for the input data patterns. Distance Metric Learning (DML) learns a distance metric for the high dimensional input space of data that preserves the distance relation among the training data. DML is not performed for input space of data in SOINN based approaches. SOINN-AML learns input space of data by using the Adaptive Distance Metric Learning (AML) algorithm which is one of the DML algorithms. It improves the incremental clustering performance of the SOINN algorithm by optimizing the distance metric in the case that input data space is high dimensional. In experimental results, we evaluate the performance by using two artificial datasets, seven real datasets from the UCI dataset and three real image datasets. We have found that the proposed algorithm outperforms conventional algorithms including SOINN (Shen et al 2006) and Enhanced SOINN (Shen et al 2007). The improvement of clustering accuracy (NMI) is between 0.03 and 0.13 compared to state of the art SOINN based approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于距离度量学习的高维数据在线增量聚类
本文提出了一种新的增量聚类算法,该算法将一组观测值分配到聚类中,并以增量的方式迭代学习距离度量。本文提出的SOINN-AML算法是基于自组织增量神经网络(Self-organizing Incremental Neural Network, Shen et al . 2006)组成的,它代表了未标记数据的分布,并报告了合理数量的聚类。SOINN对每个输入信号采用竞争Hebbian规则,节点间距离采用欧氏距离测量。这种算法依赖于输入数据模式的距离度量。距离度量学习(Distance Metric Learning, DML)是对数据的高维输入空间学习一种保持训练数据之间距离关系的距离度量。在基于SOINN的方法中,对数据的输入空间不执行DML。SOINN-AML使用自适应距离度量学习(AML)算法学习数据的输入空间,该算法是DML算法中的一种。在输入数据空间为高维的情况下,通过优化距离度量来提高SOINN算法的增量聚类性能。在实验结果中,我们使用2个人工数据集、7个来自UCI数据集的真实数据集和3个真实图像数据集来评估性能。我们发现所提出的算法优于传统的SOINN (Shen et al . 2006)和Enhanced SOINN (Shen et al . 2007)算法。与基于SOINN的最先进方法相比,聚类精度(NMI)的改进在0.03到0.13之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Chaos of protein folding EEG-based brain dynamics of driving distraction Residential energy system control and management using adaptive dynamic programming How the core theory of CLARION captures human decision-making Wiener systems for reconstruction of missing seismic traces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1