Hierarchical Active Learning with Overlapping Regions.

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management Pub Date : 2020-10-01 DOI:10.1145/3340531.3412022

Zhipeng Luo, Milos Hauskrecht

{"title":"Hierarchical Active Learning with Overlapping Regions.","authors":"Zhipeng Luo, Milos Hauskrecht","doi":"10.1145/3340531.3412022","DOIUrl":null,"url":null,"abstract":"Learning of classification models from real-world data often requires substantial human effort devoted to instance annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - region-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a subpopulation of data instances; the region's label is a human assessment of the class proportion of the data subpopulation. By using learning from label proportions algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a hierarchical active learning solution that aims at incrementally building a concise hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2020 ","pages":"1045-1054"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3340531.3412022","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3340531.3412022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Learning of classification models from real-world data often requires substantial human effort devoted to instance annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - region-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a subpopulation of data instances; the region's label is a human assessment of the class proportion of the data subpopulation. By using learning from label proportions algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a hierarchical active learning solution that aims at incrementally building a concise hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有重叠区域的分层主动学习。

从真实世界的数据中学习分类模型通常需要大量的人力投入到实例注释中。由于这个过程非常耗时和昂贵，因此找到降低注释成本的有效方法对于构建这样的模型至关重要。为了解决这个问题，我们探索了一种新型的人类反馈——基于区域的反馈。简而言之，区域被定义为输入数据空间的超立方子空间，表示数据实例的子种群;区域的标签是对数据亚群的类比例的人类评估。通过使用从标签比例算法中学习，可以从这些标记区域中学习基于实例的分类器。一般来说，关键的挑战是在给定的数据空间中可以定义和查询无限多个区域。为了最小化基于区域的查询的数量和复杂性，我们提出并开发了一种分层主动学习解决方案，旨在逐步构建简洁的区域层次结构。此外，为了避免建立可能与类无关的区域层次结构，我们进一步提出并行增长多个不同的层次结构，并扩展这些信息更多的层次结构。通过对大量数据集的实验，我们证明了使用基于区域的反馈的方法可以从非常少和简单的查询中学习到非常好的分类器，因此在减少构建分类模型所需的人工注释方面非常有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

自引率

0.00%

发文量