Cost Effective Multi-label Active Learning via Querying Subexamples

Xia Chen, Guoxian Yu, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang
{"title":"Cost Effective Multi-label Active Learning via Querying Subexamples","authors":"Xia Chen, Guoxian Yu, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang","doi":"10.1109/ICDM.2018.00109","DOIUrl":null,"url":null,"abstract":"Multi-label active learning addresses the scarce labeled example problem by querying the most valuable unlabeled examples, or example-label pairs, to achieve a better performance with limited query cost. Current multi-label active learning methods require the scrutiny of the whole example in order to obtain its annotation. In contrast, one can find positive evidence with respect to a label by examining specific patterns (i.e., subexample), rather than the whole example, thus making the annotation process more efficient. Based on this observation, we propose a novel two-stage cost effective multi-label active learning framework, called CMAL. In the first stage, a novel example-label pair selection strategy is introduced. Our strategy leverages label correlation and label space sparsity of multi-label examples to select the most uncertain example-label pairs. Specifically, the unknown relevant label of an example can be inferred from the correlated labels that are already assigned to the example, thus reducing the uncertainty of the unknown label. In addition, the larger the number of relevant examples of a particular label, the smaller the uncertainty of the label is. In the second stage, CMAL queries the most plausible positive subexample-label pairs of the selected example-label pairs. Comprehensive experiments on multi-label datasets collected from different domains demonstrate the effectiveness of our proposed approach on cost effective queries. We also show that leveraging label correlation and label sparsity contribute to saving costs.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Multi-label active learning addresses the scarce labeled example problem by querying the most valuable unlabeled examples, or example-label pairs, to achieve a better performance with limited query cost. Current multi-label active learning methods require the scrutiny of the whole example in order to obtain its annotation. In contrast, one can find positive evidence with respect to a label by examining specific patterns (i.e., subexample), rather than the whole example, thus making the annotation process more efficient. Based on this observation, we propose a novel two-stage cost effective multi-label active learning framework, called CMAL. In the first stage, a novel example-label pair selection strategy is introduced. Our strategy leverages label correlation and label space sparsity of multi-label examples to select the most uncertain example-label pairs. Specifically, the unknown relevant label of an example can be inferred from the correlated labels that are already assigned to the example, thus reducing the uncertainty of the unknown label. In addition, the larger the number of relevant examples of a particular label, the smaller the uncertainty of the label is. In the second stage, CMAL queries the most plausible positive subexample-label pairs of the selected example-label pairs. Comprehensive experiments on multi-label datasets collected from different domains demonstrate the effectiveness of our proposed approach on cost effective queries. We also show that leveraging label correlation and label sparsity contribute to saving costs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于查询子样本的高效多标签主动学习
多标签主动学习通过查询最有价值的未标记示例或示例-标签对来解决稀缺标记示例问题,以有限的查询成本获得更好的性能。目前的多标签主动学习方法需要对整个样本进行仔细检查才能获得其注释。相反,可以通过检查特定的模式(例如,子示例)而不是整个示例来找到关于标签的肯定证据,从而使注释过程更有效。基于这一观察,我们提出了一种新的两阶段成本有效的多标签主动学习框架,称为CMAL。在第一阶段,提出了一种新的样本-标签对选择策略。我们的策略利用多标签示例的标签相关性和标签空间稀疏性来选择最不确定的示例-标签对。具体来说,一个例子的未知相关标签可以从已经分配给该例子的相关标签中推断出来,从而减少了未知标签的不确定性。此外,特定标签的相关样例数量越多,该标签的不确定性越小。在第二阶段,CMAL查询所选样例标签对中最可信的正子样例标签对。从不同领域收集的多标签数据集的综合实验证明了我们提出的方法在成本效益查询方面的有效性。我们还表明,利用标签相关性和标签稀疏性有助于节省成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entire Regularization Path for Sparse Nonnegative Interaction Model Accelerating Experimental Design by Incorporating Experimenter Hunches Title Page i An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains Social Recommendation with Missing Not at Random Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1