{"title":"Big-Data Clustering with Genetic Algorithm","authors":"Afsaneh Mortezanezhad, Ebrahim Daneshifar","doi":"10.1109/KBEI.2019.8735076","DOIUrl":null,"url":null,"abstract":"The data emerging from Internet of Things (IoT) usually exhibits a wide variety and is big in terms of quantity, i.e., Big-Data. As a preprocessing step, clustering is a regular task in such systems and normally is done using evolutionary algorithms. In this paper, we propose a new automatic clustering algorithm based on Genetic Algorithm (GA), in which, it is NOT mandatory to know the number of clusters. The proposed algorithm uses a very short chromosome encoding and proposes relevant crossover and mutation operators that lead to a very good clustering performance. Our algorithm, uses an unsupervised learning paradigm to classify the data points into clusters. To demonstrate the performance of the proposed algorithm, it is evaluated with balanced/unbalanced real-world data containing 13-tuple data vectors, and also with a 1.000.000-sample artificially generated random data set. At either cases, our algorithm outperforms the other algorithms.","PeriodicalId":339990,"journal":{"name":"2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KBEI.2019.8735076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The data emerging from Internet of Things (IoT) usually exhibits a wide variety and is big in terms of quantity, i.e., Big-Data. As a preprocessing step, clustering is a regular task in such systems and normally is done using evolutionary algorithms. In this paper, we propose a new automatic clustering algorithm based on Genetic Algorithm (GA), in which, it is NOT mandatory to know the number of clusters. The proposed algorithm uses a very short chromosome encoding and proposes relevant crossover and mutation operators that lead to a very good clustering performance. Our algorithm, uses an unsupervised learning paradigm to classify the data points into clusters. To demonstrate the performance of the proposed algorithm, it is evaluated with balanced/unbalanced real-world data containing 13-tuple data vectors, and also with a 1.000.000-sample artificially generated random data set. At either cases, our algorithm outperforms the other algorithms.
物联网(Internet of Things, IoT)产生的数据通常种类繁多,数量庞大,即大数据(big data)。作为预处理步骤,聚类是这类系统中的常规任务,通常使用进化算法完成。本文提出了一种新的基于遗传算法的自动聚类算法,该算法不需要知道聚类的个数。该算法使用了非常短的染色体编码,并提出了相关的交叉和突变算子,从而获得了非常好的聚类性能。我们的算法使用无监督学习范式将数据点分类到簇中。为了证明所提出算法的性能,使用包含13个元组数据向量的平衡/不平衡现实数据以及人工生成的1,000,000个样本随机数据集对其进行了评估。在这两种情况下,我们的算法都优于其他算法。