A. Urio-Larrea;H. Camargo;G. Lucca;T. Asmus;C. Marco-Detchart;L. Schick;C. Lopez-Molina;J. Andreu-Perez;H. Bustince;G. P. Dimuro
{"title":"Data Stream Clustering: Introducing Recursively Extendable Aggregation Functions for Incremental Cluster Fusion Processes","authors":"A. Urio-Larrea;H. Camargo;G. Lucca;T. Asmus;C. Marco-Detchart;L. Schick;C. Lopez-Molina;J. Andreu-Perez;H. Bustince;G. P. Dimuro","doi":"10.1109/TCYB.2025.3527862","DOIUrl":null,"url":null,"abstract":"In data stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this article is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. First, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing recursively extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Second, we propose two approaches to compare clusters: 1) similarity and 2) overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the-art DS clustering algorithms found in the literature.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 3","pages":"1421-1435"},"PeriodicalIF":10.5000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10874210/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In data stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this article is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. First, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing recursively extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Second, we propose two approaches to compare clusters: 1) similarity and 2) overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the-art DS clustering algorithms found in the literature.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.