{"title":"聚类遗传算法调查:分类与实证分析","authors":"Hermes Robles-Berumen , Amelia Zafra , Sebastián Ventura","doi":"10.1016/j.swevo.2024.101720","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering, an unsupervised learning technique, aims to group patterns into clusters where similar patterns are grouped together, while dissimilar ones are placed in different clusters. This task can present itself as a complex optimization problem due to the extensive search space generated by all potential data partitions. Genetic Algorithms (GAs) have emerged as efficient tools for addressing this task. Consequently, significant advancements and numerous proposals have been developed in this field.</div><div>This work offers a comprehensive and critical review of state-of-the-art mono-objective Genetic Algorithms (GAs) for partitional clustering. From a more theoretical standpoint, it examines 22 well-known proposals in detail, covering their encoding strategies, objective functions, genetic operators, local search methods, and parent selection strategies. Based on this information, a specific taxonomy is proposed. In addition, from a more practical standpoint, a detailed experimental study is conducted to discern the advantages and disadvantages of approaches. Specifically, 22 different cluster validation indices are considered to compare the performance of clustering techniques. This evaluation is performed across 94 datasets encompassing diverse configurations, including the number of classes, separation between classes, and pattern dimensionality. Results reveal interesting findings, such as the key role of local search in optimizing results and reducing search space. Additionally, representations based on centroids and labels demonstrate greater efficiency and crossover and mutation operators do not prove to be as relevant. Ultimately, while the results are satisfactory, real-world clustering problems introduce additional complexity, especially for algorithms aiming to determine the number of clusters, resulting in diminished performance and the need for new approaches to be explored. Code, datasets and instructions to run algorithms in the LEAL library are available in an associated repository, in order to facilitate future experiments in this environment.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"91 ","pages":"Article 101720"},"PeriodicalIF":8.2000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A survey of genetic algorithms for clustering: Taxonomy and empirical analysis\",\"authors\":\"Hermes Robles-Berumen , Amelia Zafra , Sebastián Ventura\",\"doi\":\"10.1016/j.swevo.2024.101720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Clustering, an unsupervised learning technique, aims to group patterns into clusters where similar patterns are grouped together, while dissimilar ones are placed in different clusters. This task can present itself as a complex optimization problem due to the extensive search space generated by all potential data partitions. Genetic Algorithms (GAs) have emerged as efficient tools for addressing this task. Consequently, significant advancements and numerous proposals have been developed in this field.</div><div>This work offers a comprehensive and critical review of state-of-the-art mono-objective Genetic Algorithms (GAs) for partitional clustering. From a more theoretical standpoint, it examines 22 well-known proposals in detail, covering their encoding strategies, objective functions, genetic operators, local search methods, and parent selection strategies. Based on this information, a specific taxonomy is proposed. In addition, from a more practical standpoint, a detailed experimental study is conducted to discern the advantages and disadvantages of approaches. Specifically, 22 different cluster validation indices are considered to compare the performance of clustering techniques. This evaluation is performed across 94 datasets encompassing diverse configurations, including the number of classes, separation between classes, and pattern dimensionality. Results reveal interesting findings, such as the key role of local search in optimizing results and reducing search space. Additionally, representations based on centroids and labels demonstrate greater efficiency and crossover and mutation operators do not prove to be as relevant. Ultimately, while the results are satisfactory, real-world clustering problems introduce additional complexity, especially for algorithms aiming to determine the number of clusters, resulting in diminished performance and the need for new approaches to be explored. Code, datasets and instructions to run algorithms in the LEAL library are available in an associated repository, in order to facilitate future experiments in this environment.</div></div>\",\"PeriodicalId\":48682,\"journal\":{\"name\":\"Swarm and Evolutionary Computation\",\"volume\":\"91 \",\"pages\":\"Article 101720\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2024-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm and Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S221065022400258X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022400258X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A survey of genetic algorithms for clustering: Taxonomy and empirical analysis
Clustering, an unsupervised learning technique, aims to group patterns into clusters where similar patterns are grouped together, while dissimilar ones are placed in different clusters. This task can present itself as a complex optimization problem due to the extensive search space generated by all potential data partitions. Genetic Algorithms (GAs) have emerged as efficient tools for addressing this task. Consequently, significant advancements and numerous proposals have been developed in this field.
This work offers a comprehensive and critical review of state-of-the-art mono-objective Genetic Algorithms (GAs) for partitional clustering. From a more theoretical standpoint, it examines 22 well-known proposals in detail, covering their encoding strategies, objective functions, genetic operators, local search methods, and parent selection strategies. Based on this information, a specific taxonomy is proposed. In addition, from a more practical standpoint, a detailed experimental study is conducted to discern the advantages and disadvantages of approaches. Specifically, 22 different cluster validation indices are considered to compare the performance of clustering techniques. This evaluation is performed across 94 datasets encompassing diverse configurations, including the number of classes, separation between classes, and pattern dimensionality. Results reveal interesting findings, such as the key role of local search in optimizing results and reducing search space. Additionally, representations based on centroids and labels demonstrate greater efficiency and crossover and mutation operators do not prove to be as relevant. Ultimately, while the results are satisfactory, real-world clustering problems introduce additional complexity, especially for algorithms aiming to determine the number of clusters, resulting in diminished performance and the need for new approaches to be explored. Code, datasets and instructions to run algorithms in the LEAL library are available in an associated repository, in order to facilitate future experiments in this environment.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.