用不完美的标签进行教学

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI:10.1145/3366423.3380099

Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He

{"title":"用不完美的标签进行教学","authors":"Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He","doi":"10.1145/3366423.3380099","DOIUrl":null,"url":null,"abstract":"The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Crowd Teaching with Imperfect Labels\",\"authors\":\"Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He\",\"doi\":\"10.1145/3366423.3380099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.\",\"PeriodicalId\":20754,\"journal\":{\"name\":\"Proceedings of The Web Conference 2020\",\"volume\":\"129 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of The Web Conference 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3366423.3380099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Web Conference 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366423.3380099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

对标注标签来训练机器学习模型的需求导致了众包的激增——从非专家那里收集标签。给定一个不完美的标记集，我们如何利用从业余人群工作者那里获得的标签信息来提高数据质量，而不是从头开始注释?此外，是否有一种方法可以教业余人群工作者使用这个不完美的标签集来提高他们的标签性能?在本文中，我们的目标是通过一种新的交互式教学框架来回答这两个问题，该框架使用视觉解释来同时教授和衡量群体工作者的信心水平。由于现实应用中对细粒度标签信息的巨大需求，我们从一个现实但具有挑战性的假设开始，即教师和人群工作者都不是完美的。然后，我们提出了一个自适应方案，可以通过一系列互动来改善两者:教师使用标记的数据来教授工人，作为回报，工人根据自己的专业知识提供标签和相关的置信度。特别是，教师使用从不完美标记集学习到的经验风险最小化器进行教学;假设工作者在学习过程中存在遗忘行为，其学习速度取决于教学项目的解释难度。此外，根据工人进行标记时的信心水平，我们还表明，教师使用的经验风险最小值是利用无偏替代损失的未知目标概念的可靠和现实的替代品。最后，通过多个真实世界图像和文本数据集的实验证明了该框架的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Crowd Teaching with Imperfect Labels

The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of The Web Conference 2020

自引率

0.00%

发文量