{"title":"GKF-PUAL: A group kernel-free approach to positive-unlabeled learning with variable selection","authors":"","doi":"10.1016/j.ins.2024.121574","DOIUrl":null,"url":null,"abstract":"<div><div>Variable selection is important for classification of data with many irrelevant predicting variables, but it has not yet been well studied in positive-unlabeled (PU) learning, where classifiers have to be trained without labelled-negative instances. In this paper, we propose a group kernel-free PU classifier with asymmetric loss (GKF-PUAL) to achieve quadratic PU classification with group-lasso regularisation embedded for variable selection. We also propose a five-block algorithm to solve the optimization problem of GKF-PUAL. Our experimental results reveal the superiority of GKF-PUAL in both PU classification and variable selection, improving the baseline PUAL by more than 10% in F1-score across four benchmark datasets and removing over 70% of irrelevant variables on six benchmark datasets. The code for GKF-PUAL is at <span><span>https://github.com/tkks22123/GKF-PUAL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014889","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Variable selection is important for classification of data with many irrelevant predicting variables, but it has not yet been well studied in positive-unlabeled (PU) learning, where classifiers have to be trained without labelled-negative instances. In this paper, we propose a group kernel-free PU classifier with asymmetric loss (GKF-PUAL) to achieve quadratic PU classification with group-lasso regularisation embedded for variable selection. We also propose a five-block algorithm to solve the optimization problem of GKF-PUAL. Our experimental results reveal the superiority of GKF-PUAL in both PU classification and variable selection, improving the baseline PUAL by more than 10% in F1-score across four benchmark datasets and removing over 70% of irrelevant variables on six benchmark datasets. The code for GKF-PUAL is at https://github.com/tkks22123/GKF-PUAL.
变量选择对于具有许多不相关预测变量的数据分类非常重要,但在正向无标记(PU)学习中还没有得到很好的研究,在这种学习中,分类器必须在没有标记负实例的情况下进行训练。在本文中,我们提出了一种具有非对称损失的无组核 PU 分类器(GKF-PUAL),通过嵌入用于变量选择的组-拉索正则化来实现二次 PU 分类。我们还提出了一种五块算法来解决 GKF-PUAL 的优化问题。我们的实验结果表明,GKF-PUAL 在 PU 分类和变量选择方面都具有优越性,在四个基准数据集上的 F1 分数比基准 PUAL 提高了 10%以上,并在六个基准数据集上去除了 70% 以上的无关变量。GKF-PUAL 的代码见 https://github.com/tkks22123/GKF-PUAL。
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.