{"title":"基于图拓扑的可扩展归纳式半监督分类器与样本加权法","authors":"Fadi Dornaika, Zoulfikar Ibrahim, Alirezah Bosaghzadeh","doi":"10.1145/3643645","DOIUrl":null,"url":null,"abstract":"<p>Recently, graph-based semi-supervised learning (GSSL) has garnered significant interest in the realms of machine learning and pattern recognition. Although some of the proposed methods have made some progress, there are still some shortcomings that need to be overcome. There are three main limitations. First, the graphs used in these approaches are usually predefined regardless of the task at hand. Second, due to the use of graphs, almost all approaches are unable to process and consider data with a very large number of unlabeled samples. Thirdly, the imbalance of the topology of the samples is very often not taken into account. In particular, processing large datasets with GSSL might pose challenges in terms of computational resource feasibility. In this paper, we present a scalable and inductive GSSL method. We broaden the scope of the graph topology imbalance paradigm to extensive databases. Second, we employ the calculated weights of the labeled sample for the label-matching term in the global objective function. This leads to a unified, scalable, semi-supervised learning model that allows simultaneous labeling of unlabeled data, projection of the feature space onto the labeling space, along with the graph matrix of anchors. In the proposed scheme, the integration of labels and features from anchors is applied for the adaptive construction of the anchor graph. Experimental results were performed on four large databases: NORB, RCV1, Covtype, and MNIST. These experiments demonstrate that the proposed method exhibits superior performance when compared to existing scalable semi-supervised learning models.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"8 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable and Inductive Semi-supervised Classifier with Sample Weighting Based on Graph Topology\",\"authors\":\"Fadi Dornaika, Zoulfikar Ibrahim, Alirezah Bosaghzadeh\",\"doi\":\"10.1145/3643645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recently, graph-based semi-supervised learning (GSSL) has garnered significant interest in the realms of machine learning and pattern recognition. Although some of the proposed methods have made some progress, there are still some shortcomings that need to be overcome. There are three main limitations. First, the graphs used in these approaches are usually predefined regardless of the task at hand. Second, due to the use of graphs, almost all approaches are unable to process and consider data with a very large number of unlabeled samples. Thirdly, the imbalance of the topology of the samples is very often not taken into account. In particular, processing large datasets with GSSL might pose challenges in terms of computational resource feasibility. In this paper, we present a scalable and inductive GSSL method. We broaden the scope of the graph topology imbalance paradigm to extensive databases. Second, we employ the calculated weights of the labeled sample for the label-matching term in the global objective function. This leads to a unified, scalable, semi-supervised learning model that allows simultaneous labeling of unlabeled data, projection of the feature space onto the labeling space, along with the graph matrix of anchors. In the proposed scheme, the integration of labels and features from anchors is applied for the adaptive construction of the anchor graph. Experimental results were performed on four large databases: NORB, RCV1, Covtype, and MNIST. These experiments demonstrate that the proposed method exhibits superior performance when compared to existing scalable semi-supervised learning models.</p>\",\"PeriodicalId\":49249,\"journal\":{\"name\":\"ACM Transactions on Knowledge Discovery from Data\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Knowledge Discovery from Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3643645\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643645","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Scalable and Inductive Semi-supervised Classifier with Sample Weighting Based on Graph Topology
Recently, graph-based semi-supervised learning (GSSL) has garnered significant interest in the realms of machine learning and pattern recognition. Although some of the proposed methods have made some progress, there are still some shortcomings that need to be overcome. There are three main limitations. First, the graphs used in these approaches are usually predefined regardless of the task at hand. Second, due to the use of graphs, almost all approaches are unable to process and consider data with a very large number of unlabeled samples. Thirdly, the imbalance of the topology of the samples is very often not taken into account. In particular, processing large datasets with GSSL might pose challenges in terms of computational resource feasibility. In this paper, we present a scalable and inductive GSSL method. We broaden the scope of the graph topology imbalance paradigm to extensive databases. Second, we employ the calculated weights of the labeled sample for the label-matching term in the global objective function. This leads to a unified, scalable, semi-supervised learning model that allows simultaneous labeling of unlabeled data, projection of the feature space onto the labeling space, along with the graph matrix of anchors. In the proposed scheme, the integration of labels and features from anchors is applied for the adaptive construction of the anchor graph. Experimental results were performed on four large databases: NORB, RCV1, Covtype, and MNIST. These experiments demonstrate that the proposed method exhibits superior performance when compared to existing scalable semi-supervised learning models.
期刊介绍:
TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.