Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu
{"title":"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\nlearning tasks, which reduces the need for human labor. Previous studies\nprimarily focus on effectively utilising the labelled and unlabeled data to\nimprove performance. However, we observe that how to select samples for\nlabelling also significantly impacts performance, particularly under extremely\nlow-budget settings. The sample selection task in SSL has been under-explored\nfor a long time. To fill in this gap, we propose a Representative and Diverse\nSample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\nto minimise a novel criterion $\\alpha$-Maximum Mean Discrepancy ($\\alpha$-MMD),\nRDSS samples a representative and diverse subset for annotation from the\nunlabeled data. We demonstrate that minimizing $\\alpha$-MMD enhances the\ngeneralization ability of low-budget learning. Experimental results show that\nRDSS consistently improves the performance of several popular SSL frameworks\nand outperforms the state-of-the-art sample selection approaches used in Active\nLearning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\nannotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve performance. However, we observe that how to select samples for labelling also significantly impacts performance, particularly under extremely low-budget settings. The sample selection task in SSL has been under-explored for a long time. To fill in this gap, we propose a Representative and Diverse Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm to minimise a novel criterion $\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD), RDSS samples a representative and diverse subset for annotation from the unlabeled data. We demonstrate that minimizing $\alpha$-MMD enhances the generalization ability of low-budget learning. Experimental results show that RDSS consistently improves the performance of several popular SSL frameworks and outperforms the state-of-the-art sample selection approaches used in Active Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained annotation budgets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过代表性和多样性样本选择加强半监督学习
半监督学习(SSL)已成为许多深度学习任务的首选范式,它减少了对人力的需求。以往的研究主要集中在有效利用标记数据和未标记数据来提高性能。然而,我们发现,如何选择标记样本也会对性能产生重大影响,尤其是在预算极低的情况下。长期以来,SSL 中的样本选择任务一直未得到充分探索。为了填补这一空白,我们提出了一种代表性和多样性样本选择方法(RDSS)。通过采用改进的弗兰克-沃尔夫算法(Frank-Wolfe algorithm)来最小化一个新标准($\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD)),RDSS从未标明的数据中采样出一个具有代表性和多样性的注释子集。我们证明,最小化$\alpha$-MMD可以增强低预算学习的泛化能力。实验结果表明,即使在标注预算受限的情况下,RDSS 也能持续提高几种流行的 SSL 框架的性能,并优于主动学习(ActiveLearning,AL)和半监督主动学习(Semi-Supervised Active Learning,SSAL)中使用的最先进的样本选择方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features The Impact of Element Ordering on LM Agent Performance Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Extended Deep Submodular Functions Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1