Crowd Teaching with Imperfect Labels

Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He
{"title":"Crowd Teaching with Imperfect Labels","authors":"Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He","doi":"10.1145/3366423.3380099","DOIUrl":null,"url":null,"abstract":"The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Web Conference 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366423.3380099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用不完美的标签进行教学
对标注标签来训练机器学习模型的需求导致了众包的激增——从非专家那里收集标签。给定一个不完美的标记集,我们如何利用从业余人群工作者那里获得的标签信息来提高数据质量,而不是从头开始注释?此外,是否有一种方法可以教业余人群工作者使用这个不完美的标签集来提高他们的标签性能?在本文中,我们的目标是通过一种新的交互式教学框架来回答这两个问题,该框架使用视觉解释来同时教授和衡量群体工作者的信心水平。由于现实应用中对细粒度标签信息的巨大需求,我们从一个现实但具有挑战性的假设开始,即教师和人群工作者都不是完美的。然后,我们提出了一个自适应方案,可以通过一系列互动来改善两者:教师使用标记的数据来教授工人,作为回报,工人根据自己的专业知识提供标签和相关的置信度。特别是,教师使用从不完美标记集学习到的经验风险最小化器进行教学;假设工作者在学习过程中存在遗忘行为,其学习速度取决于教学项目的解释难度。此外,根据工人进行标记时的信心水平,我们还表明,教师使用的经验风险最小值是利用无偏替代损失的未知目标概念的可靠和现实的替代品。最后,通过多个真实世界图像和文本数据集的实验证明了该框架的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gone, Gone, but Not Really, and Gone, But Not forgotten: A Typology of Website Recoverability Those who are left behind: A chronicle of internet access in Cuba Towards Automated Technologies in the Referencing Quality of Wikidata Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022 WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1