{"title":"Reflecting on a Decade of Evolution: MapReduce‐Based Advances in Partitioning‐Based, Hierarchical‐Based, and Density‐Based Clustering (2013–2023)","authors":"Tanvir Habib Sardar","doi":"10.1002/widm.1566","DOIUrl":null,"url":null,"abstract":"The traditional clustering algorithms are not appropriate for large real‐world datasets or big data, which is attributable to computational expensiveness and scalability issues. As a solution, the last decade's research headed towards distributed clustering using the MapReduce framework. This study conducts a bibliometric review to assess, establish, and measure the patterns and trends of the MapReduce‐based partitioning, hierarchical, and density clustering algorithms over the past decade (2013–2023). A digital text‐mining‐based comprehensive search technique with multiple field‐specific keywords, inclusion measures, and exclusion criteria is employed to obtain the research landscape from the Scopus database. The Scopus‐obtained data is analyzed using the VOSViewer software tool and coded using the R statistical analysis tool. The analysis identifies the numbers of scholarly articles, diversities of article sources, their impact and growth patterns, details of most influential authors and co‐authors, most cited articles, most contributing affiliations and countries, and their collaborations, use of different keywords and their impact, and so forth. The study further explores the articles and reports the methodologies employed for designing MapReduce‐based counterparts of traditional partitioning, hierarchical, and density clustering algorithms and their optimizations and hybridizations. Finally, the study lists the main research challenges encountered in the past decade for MapReduce‐based partitioning, hierarchical, and density clustering. It suggests possible areas for future research to contribute further in this field.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.1566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The traditional clustering algorithms are not appropriate for large real‐world datasets or big data, which is attributable to computational expensiveness and scalability issues. As a solution, the last decade's research headed towards distributed clustering using the MapReduce framework. This study conducts a bibliometric review to assess, establish, and measure the patterns and trends of the MapReduce‐based partitioning, hierarchical, and density clustering algorithms over the past decade (2013–2023). A digital text‐mining‐based comprehensive search technique with multiple field‐specific keywords, inclusion measures, and exclusion criteria is employed to obtain the research landscape from the Scopus database. The Scopus‐obtained data is analyzed using the VOSViewer software tool and coded using the R statistical analysis tool. The analysis identifies the numbers of scholarly articles, diversities of article sources, their impact and growth patterns, details of most influential authors and co‐authors, most cited articles, most contributing affiliations and countries, and their collaborations, use of different keywords and their impact, and so forth. The study further explores the articles and reports the methodologies employed for designing MapReduce‐based counterparts of traditional partitioning, hierarchical, and density clustering algorithms and their optimizations and hybridizations. Finally, the study lists the main research challenges encountered in the past decade for MapReduce‐based partitioning, hierarchical, and density clustering. It suggests possible areas for future research to contribute further in this field.