Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data

Baby C. J., Kamalika Bhattacharjee
{"title":"Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data","authors":"Baby C. J., Kamalika Bhattacharjee","doi":"arxiv-2408.02250","DOIUrl":null,"url":null,"abstract":"This work proposes a hierarchical clustering algorithm for high-dimensional\ndatasets using the cyclic space of reversible finite cellular automata. In\ncellular automaton (CA) based clustering, if two objects belong to the same\ncycle, they are closely related and considered as part of the same cluster.\nHowever, if a high-dimensional dataset is clustered using the cycles of one CA,\nclosely related objects may belong to different cycles. This paper identifies\nthe relationship between objects in two different cycles based on the median of\nall elements in each cycle so that they can be grouped in the next stage.\nFurther, to minimize the number of intermediate clusters which in turn reduces\nthe computational cost, a rule selection strategy is taken to find the best\nrules based on information propagation and cycle structure. After encoding the\ndataset using frequency-based encoding such that the consecutive data elements\nmaintain a minimum hamming distance in encoded form, our proposed clustering\nalgorithm iterates over three stages to finally cluster the data elements into\nthe desired number of clusters given by user. This algorithm can be applied to\nvarious fields, including healthcare, sports, chemical research, agriculture,\netc. When verified over standard benchmark datasets with various performance\nmetrics, our algorithm is at par with the existing algorithms with quadratic\ntime complexity.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This work proposes a hierarchical clustering algorithm for high-dimensional datasets using the cyclic space of reversible finite cellular automata. In cellular automaton (CA) based clustering, if two objects belong to the same cycle, they are closely related and considered as part of the same cluster. However, if a high-dimensional dataset is clustered using the cycles of one CA, closely related objects may belong to different cycles. This paper identifies the relationship between objects in two different cycles based on the median of all elements in each cycle so that they can be grouped in the next stage. Further, to minimize the number of intermediate clusters which in turn reduces the computational cost, a rule selection strategy is taken to find the best rules based on information propagation and cycle structure. After encoding the dataset using frequency-based encoding such that the consecutive data elements maintain a minimum hamming distance in encoded form, our proposed clustering algorithm iterates over three stages to finally cluster the data elements into the desired number of clusters given by user. This algorithm can be applied to various fields, including healthcare, sports, chemical research, agriculture, etc. When verified over standard benchmark datasets with various performance metrics, our algorithm is at par with the existing algorithms with quadratic time complexity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用可逆二元蜂窝自动机对高维数据进行分层聚类
本研究提出了一种利用可逆有限蜂窝自动机循环空间对高维数据集进行分层聚类的算法。在基于细胞自动机(CA)的聚类中,如果两个对象属于同一个循环,那么它们就是密切相关的,并被视为同一个聚类的一部分。然而,如果使用一个 CA 的循环对高维数据集进行聚类,那么密切相关的对象可能属于不同的循环。本文根据每个循环中所有元素的中位数来识别两个不同循环中的对象之间的关系,以便在下一阶段对它们进行分组。此外,为了尽量减少中间聚类的数量,从而降低计算成本,本文采用了一种规则选择策略,根据信息传播和循环结构来寻找最佳规则。在使用基于频率的编码对数据集进行编码,使连续的数据元素在编码形式下保持最小的汉明距离之后,我们提出的聚类算法将经过三个阶段的迭代,最终将数据元素聚类到用户给出的所需数量的聚类中。该算法可应用于医疗保健、体育、化学研究、农业等多个领域。在对标准基准数据集进行各种性能指标验证时,我们的算法与现有算法不相上下,其复杂度为四倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Query Learning of Advice and Nominal Automata Well-Behaved (Co)algebraic Semantics of Regular Expressions in Dafny Run supports and initial algebra supports of weighted automata Alternating hierarchy of sushifts defined by nondeterministic plane-walking automata $\mathbb{N}$-polyregular functions arise from well-quasi-orderings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1