Data Analysis Using Representation Theory and Clustering Algorithms

Suboh Alkhushayni, T. Choi, Du’a Alzaleq
{"title":"Data Analysis Using Representation Theory and Clustering Algorithms","authors":"Suboh Alkhushayni, T. Choi, Du’a Alzaleq","doi":"10.37394/23205.2020.19.38","DOIUrl":null,"url":null,"abstract":"This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.","PeriodicalId":332148,"journal":{"name":"WSEAS TRANSACTIONS ON COMPUTERS","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WSEAS TRANSACTIONS ON COMPUTERS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37394/23205.2020.19.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于表示理论和聚类算法的数据分析
这项工作旨在通过持久性同调以及有向图的表示来扩展数据分析领域的知识。具体地说,我们研究了如何使用聚合层次聚类算法和方法分析同源聚类组。此外,在R studio中提供的Wine数据使用各种聚类算法(如分层聚类、K-Means聚类和PAM聚类)进行分析。分析的目的是找出哪种聚类方法适合给定的数值数据集。通过对数据的测试,我们试图在这三种聚类算法中找到最优的聚类方法——聚类层次聚类方法;K-Means, PAM和随机森林方法。通过将各模型的精度值与品种系数进行比较,得出K-Means方法在处理数值变量时最有用的结论。另一方面,PAM聚类和Gower随机森林是处理分类变量时最有益的方法。在给定数据集的情况下,通过进行适当的分析,所有这些测试都可以确定聚类组的最佳数量。利用这些项目,我们可以将我们的方法应用于几个工业领域,如临床、商业和其他领域。例如,在临床社会中,人们可以根据每个患者的常见疾病,需要的治疗以及其他事情来划分不同的组。此外,对于业务领域,人们可以根据边际利润、边际成本或其他经济指标期望得到几个集群组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Medical Image Classification using a Many to Many Relation, Multilayered Fuzzy Systems and AI Aspects of Symmetry in Petri Nets Chaos in Order: Applying ML, NLP, and Chaos Theory in Open Source Intelligence for Counter-Terrorism Combinatorial Optimization of Engineering Systems based on Diagrammatic Design Federated Learning: Attacks and Defenses, Rewards, Energy Efficiency: Past, Present and Future
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1