{"title":"Hierarchical Active Learning with Overlapping Regions.","authors":"Zhipeng Luo,&nbsp;Milos Hauskrecht","doi":"10.1145/3340531.3412022","DOIUrl":null,"url":null,"abstract":"<p><p>Learning of classification models from real-world data often requires substantial human effort devoted to <i>instance</i> annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - <i>region</i>-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a <i>subpopulation</i> of data instances; the region's label is a human assessment of the class <i>proportion</i> of the data subpopulation. By using <i>learning from label proportions</i> algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a <i>hierarchical active learning</i> solution that aims at incrementally building a <i>concise</i> hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2020 ","pages":"1045-1054"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3340531.3412022","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3340531.3412022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Learning of classification models from real-world data often requires substantial human effort devoted to instance annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - region-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a subpopulation of data instances; the region's label is a human assessment of the class proportion of the data subpopulation. By using learning from label proportions algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a hierarchical active learning solution that aims at incrementally building a concise hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有重叠区域的分层主动学习。
从真实世界的数据中学习分类模型通常需要大量的人力投入到实例注释中。由于这个过程非常耗时和昂贵,因此找到降低注释成本的有效方法对于构建这样的模型至关重要。为了解决这个问题,我们探索了一种新型的人类反馈——基于区域的反馈。简而言之,区域被定义为输入数据空间的超立方子空间,表示数据实例的子种群;区域的标签是对数据亚群的类比例的人类评估。通过使用从标签比例算法中学习,可以从这些标记区域中学习基于实例的分类器。一般来说,关键的挑战是在给定的数据空间中可以定义和查询无限多个区域。为了最小化基于区域的查询的数量和复杂性,我们提出并开发了一种分层主动学习解决方案,旨在逐步构建简洁的区域层次结构。此外,为了避免建立可能与类无关的区域层次结构,我们进一步提出并行增长多个不同的层次结构,并扩展这些信息更多的层次结构。通过对大量数据集的实验,我们证明了使用基于区域的反馈的方法可以从非常少和简单的查询中学习到非常好的分类器,因此在减少构建分类模型所需的人工注释方面非常有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enabling Health Data Sharing with Fine-Grained Privacy. MedCV: An Interactive Visualization System for Patient Cohort Identification from Medical Claim Data. PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark. From Product Searches to Conversational Agents for E-Commerce Non-Visual Accessibility Assessment of Videos.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1