{"title":"Neighborhood relation-based incremental label propagation algorithm for partially labeled hybrid data","authors":"Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li","doi":"10.1007/s10994-024-06560-9","DOIUrl":null,"url":null,"abstract":"<p>Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06560-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.
期刊介绍:
Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.