Application of nonlinear clustering optimization algorithm in web data mining of cloud computing

IF 2.4 Q2 ENGINEERING, MECHANICAL Nonlinear Engineering - Modeling and Application Pub Date : 2023-01-01 DOI:10.1515/nleng-2022-0239

Yan Zhang

{"title":"Application of nonlinear clustering optimization algorithm in web data mining of cloud computing","authors":"Yan Zhang","doi":"10.1515/nleng-2022-0239","DOIUrl":null,"url":null,"abstract":"Abstract To improve data mining and data clustering performance to improve the efficiency of the cloud computing platform, the author proposes a bionic optimized clustering data extraction algorithm based on cloud computing platform. According to the Gaussian distribution function graph, the degree of aggregation of the categories and the distribution of data points of the same category can be judged more intuitively. The cloud computing platform has the characteristics of large amount of data and high dimension. In the process of solving the distance between all sample points and the center point, after each center point update, the optimization function needs to be re-executed, the author mainly uses clustering evaluation methods such as PBM-index and DB-index. The simulation data object is the Iris dataset in UCI, and N = 500 samples are selected for simulation. The experiment result shows that when P is not greater than 15, the PBM value changes very little, and when P = 20, the PBM performance of all the four clustering algorithms decreased significantly. When the sample size is increased from 50,000 to 100,000, the DB performance of this algorithm does not change much, and the DB value tends to be stable. In terms of clustering operation time, the K-means algorithm has obvious advantages, the DBSCAN algorithm is the most time-consuming, and the operation time of wolf pack clustering and Mean-shift is in the middle. In the actual application process, the number of samples for each training can be dynamically adjusted according to the actual needs, in order to improve the applicability of the wolf pack clustering algorithm in specific application scenarios. Flattening in cloud computing for data clusters, this algorithm is compared with the common clustering algorithm in PBM. DB also shows better performance.","PeriodicalId":37863,"journal":{"name":"Nonlinear Engineering - Modeling and Application","volume":"197 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Engineering - Modeling and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/nleng-2022-0239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract To improve data mining and data clustering performance to improve the efficiency of the cloud computing platform, the author proposes a bionic optimized clustering data extraction algorithm based on cloud computing platform. According to the Gaussian distribution function graph, the degree of aggregation of the categories and the distribution of data points of the same category can be judged more intuitively. The cloud computing platform has the characteristics of large amount of data and high dimension. In the process of solving the distance between all sample points and the center point, after each center point update, the optimization function needs to be re-executed, the author mainly uses clustering evaluation methods such as PBM-index and DB-index. The simulation data object is the Iris dataset in UCI, and N = 500 samples are selected for simulation. The experiment result shows that when P is not greater than 15, the PBM value changes very little, and when P = 20, the PBM performance of all the four clustering algorithms decreased significantly. When the sample size is increased from 50,000 to 100,000, the DB performance of this algorithm does not change much, and the DB value tends to be stable. In terms of clustering operation time, the K-means algorithm has obvious advantages, the DBSCAN algorithm is the most time-consuming, and the operation time of wolf pack clustering and Mean-shift is in the middle. In the actual application process, the number of samples for each training can be dynamically adjusted according to the actual needs, in order to improve the applicability of the wolf pack clustering algorithm in specific application scenarios. Flattening in cloud computing for data clusters, this algorithm is compared with the common clustering algorithm in PBM. DB also shows better performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非线性聚类优化算法在云计算web数据挖掘中的应用

摘要为了提高数据挖掘和数据聚类性能，提高云计算平台的效率，作者提出了一种基于云计算平台的仿生优化聚类数据提取算法。根据高斯分布函数图，可以更直观地判断类别的聚集程度和同一类别数据点的分布情况。云计算平台具有数据量大、维度高的特点。在求解所有样本点与中心点之间距离的过程中，每次中心点更新后，都需要重新执行优化函数，作者主要使用PBM-index、DB-index等聚类评价方法。仿真数据对象为UCI中的Iris数据集，选取N = 500个样本进行仿真。实验结果表明，当P不大于15时，PBM值变化很小，而当P = 20时，四种聚类算法的PBM性能均显著下降。当样本量从5万增加到10万时，该算法的DB性能变化不大，DB值趋于稳定。在聚类操作时间上，K-means算法优势明显，DBSCAN算法耗时最长，狼群聚类和Mean-shift的操作时间居中。在实际应用过程中，可以根据实际需要动态调整每次训练的样本数量，以提高狼群聚类算法在具体应用场景中的适用性。将该算法与PBM中常用的聚类算法进行了比较。DB也表现出更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Nonlinear Engineering - Modeling and Application Multiple-

CiteScore

6.20

自引率

3.60%

发文量

审稿时长

44 weeks

期刊介绍： The Journal of Nonlinear Engineering aims to be a platform for sharing original research results in theoretical, experimental, practical, and applied nonlinear phenomena within engineering. It serves as a forum to exchange ideas and applications of nonlinear problems across various engineering disciplines. Articles are considered for publication if they explore nonlinearities in engineering systems, offering realistic mathematical modeling, utilizing nonlinearity for new designs, stabilizing systems, understanding system behavior through nonlinearity, optimizing systems based on nonlinear interactions, and developing algorithms to harness and leverage nonlinear elements.