Persistent Laplacian-enhanced algorithm for scarcely labeled data classification

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-09-13 DOI:10.1007/s10994-024-06616-w

Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei

{"title":"Persistent Laplacian-enhanced algorithm for scarcely labeled data classification","authors":"Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei","doi":"10.1007/s10994-024-06616-w","DOIUrl":null,"url":null,"abstract":"<p>The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"176 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06616-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于稀少标记数据分类的持续拉普拉斯增强算法

许多机器学习（ML）方法的成功在很大程度上取决于是否拥有大量的标记数据。然而，对于许多应用来说，获取足够多的标记数据既昂贵又耗时，而且还受到道德约束。半监督学习（SSL）是一种在应对这一挑战方面显示出巨大价值的方法；这种技术在训练过程中同时使用标记数据和非标记数据，但标记数据往往比非标记数据少得多，而非标记数据通常相对容易获得，而且成本低廉。事实上，在医疗分析、自然语言处理或语音识别等标注数据成本特别昂贵的应用中，SSL 方法尤其有用。在各个领域取得巨大成功的 SSL 方法中，有一个子集涉及集成了基于图的技术的算法。由于图形框架提供了大量信息，这些程序很受欢迎。在这项工作中，我们通过将持久谱图理论与经典的梅里曼-本斯-奥舍（MBO）方案相结合，提出了一种基于代数拓扑的半监督方法，称为持久拉普拉斯增强图 MBO。具体来说，我们使用过滤程序生成链复数序列和相关的简复数族，并由此构建持久拉普拉斯族。总体而言，这是一种非常高效的程序，与许多 ML 技术相比，它所需的标记数据要少得多，而且既适用于小型数据集，也适用于大型数据集。我们对该方法的分类性能进行了评估，结果表明该技术优于其他现有的半监督算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.