Snorkel MeTaL: Weak Supervision for Multi-Task Learning.

Alex Ratner, Braden Hancock, Jared Dunnmon, Roger Goldman, Christopher Ré
{"title":"Snorkel MeTaL: Weak Supervision for Multi-Task Learning.","authors":"Alex Ratner, Braden Hancock, Jared Dunnmon, Roger Goldman, Christopher Ré","doi":"10.1145/3209889.3209898","DOIUrl":null,"url":null,"abstract":"<p><p>Many real-world machine learning problems are challenging to tackle for two reasons: (i) they involve multiple sub-tasks at different levels of granularity; and (ii) they require large volumes of labeled training data. We propose Snorkel MeTaL, an end-to-end system for multi-task learning that leverages <i>weak</i> supervision provided at <i>multiple levels of granularity</i> by domain expert users. In MeTaL, a user specifies a problem consisting of multiple, hierarchically-related sub-tasks-for example, classifying a document at multiple levels of granularity-and then provides <i>labeling functions</i> for each sub-task as weak supervision. MeTaL learns a re-weighted model of these labeling functions, and uses the combined signal to train a hierarchical multi-task network which is automatically compiled from the structure of the sub-tasks. Using MeTaL on a radiology report triage task and a fine-grained news classification task, we achieve average gains of 11.2 accuracy points over a baseline supervised approach and 9.5 accuracy points over the predictions of the user-provided labeling functions.</p>","PeriodicalId":92710,"journal":{"name":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","volume":"2018 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436830/pdf/nihms-993812.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209889.3209898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many real-world machine learning problems are challenging to tackle for two reasons: (i) they involve multiple sub-tasks at different levels of granularity; and (ii) they require large volumes of labeled training data. We propose Snorkel MeTaL, an end-to-end system for multi-task learning that leverages weak supervision provided at multiple levels of granularity by domain expert users. In MeTaL, a user specifies a problem consisting of multiple, hierarchically-related sub-tasks-for example, classifying a document at multiple levels of granularity-and then provides labeling functions for each sub-task as weak supervision. MeTaL learns a re-weighted model of these labeling functions, and uses the combined signal to train a hierarchical multi-task network which is automatically compiled from the structure of the sub-tasks. Using MeTaL on a radiology report triage task and a fine-grained news classification task, we achieve average gains of 11.2 accuracy points over a baseline supervised approach and 9.5 accuracy points over the predictions of the user-provided labeling functions.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Snorkel MeTaL:多任务学习的弱监督。
现实世界中的许多机器学习问题都很难解决,原因有二:(i) 它们涉及不同粒度的多个子任务;(ii) 它们需要大量标注的训练数据。我们提出了 Snorkel MeTaL,这是一个用于多任务学习的端到端系统,可利用领域专家用户提供的多粒度弱监督。在 MeTaL 中,用户指定一个由多个层次相关的子任务组成的问题--例如,对文档进行多级分类--然后为每个子任务提供标签函数作为弱监督。MeTaL 学习这些标注函数的重新加权模型,并利用综合信号训练分层多任务网络,该网络由子任务结构自动编译而成。使用 MeTaL 完成放射报告分流任务和细粒度新闻分类任务后,我们的平均准确率比基准监督方法提高了 11.2 个百分点,比用户提供的标签函数预测准确率提高了 9.5 个百分点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modelling Machine Learning Algorithms on Relational Data with Datalog Towards Interactive Curation & Automatic Tuning of ML Pipelines Avatar: Large Scale Entity Resolution of Heterogeneous User Profiles Learning Efficiently Over Heterogeneous Databases: Sampling and Constraints to the Rescue Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1