通过标量注释解决校准评估中的分选问题

Zhengping Jiang, Anqi Liu, Benjamnin Van Durme
{"title":"通过标量注释解决校准评估中的分选问题","authors":"Zhengping Jiang, Anqi Liu, Benjamnin Van Durme","doi":"10.1162/tacl_a_00636","DOIUrl":null,"url":null,"abstract":"Abstract Computational linguistics models commonly target the prediction of discrete—categorical—labels. When assessing how well-calibrated these model predictions are, popular evaluation schemes require practitioners to manually determine a binning scheme: grouping labels into bins to approximate true label posterior. The problem is that these metrics are sensitive to binning decisions. We consider two solutions to the binning problem that apply at the stage of data annotation: collecting either distributed (redundant) labels or direct scalar value assignment. In this paper, we show that although both approaches address the binning problem by evaluating instance-level calibration, direct scalar assignment is significantly more cost-effective. We provide theoretical analysis and empirical evidence to support our proposal for dataset creators to adopt scalar annotation protocols to enable a higher-quality assessment of model calibration.","PeriodicalId":506323,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"46 8","pages":"120-136"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing the Binning Problem in Calibration Assessment through Scalar Annotations\",\"authors\":\"Zhengping Jiang, Anqi Liu, Benjamnin Van Durme\",\"doi\":\"10.1162/tacl_a_00636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Computational linguistics models commonly target the prediction of discrete—categorical—labels. When assessing how well-calibrated these model predictions are, popular evaluation schemes require practitioners to manually determine a binning scheme: grouping labels into bins to approximate true label posterior. The problem is that these metrics are sensitive to binning decisions. We consider two solutions to the binning problem that apply at the stage of data annotation: collecting either distributed (redundant) labels or direct scalar value assignment. In this paper, we show that although both approaches address the binning problem by evaluating instance-level calibration, direct scalar assignment is significantly more cost-effective. We provide theoretical analysis and empirical evidence to support our proposal for dataset creators to adopt scalar annotation protocols to enable a higher-quality assessment of model calibration.\",\"PeriodicalId\":506323,\"journal\":{\"name\":\"Transactions of the Association for Computational Linguistics\",\"volume\":\"46 8\",\"pages\":\"120-136\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1162/tacl_a_00636\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/tacl_a_00636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摘要 计算语言学模型通常以离散分类标签预测为目标。在评估这些模型预测的校准程度时,流行的评估方案要求从业人员手动确定分选方案:将标签分组,以近似真实标签后验。问题在于,这些指标对分选决策非常敏感。我们考虑了两种适用于数据注释阶段的分档问题解决方案:收集分布式(冗余)标签或直接标量值赋值。在本文中,我们表明尽管这两种方法都是通过评估实例级校准来解决分选问题,但直接标量赋值的成本效益要高得多。我们提供了理论分析和经验证据,以支持我们的建议,即数据集创建者应采用标量注释协议,以便对模型校准进行更高质量的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Addressing the Binning Problem in Calibration Assessment through Scalar Annotations
Abstract Computational linguistics models commonly target the prediction of discrete—categorical—labels. When assessing how well-calibrated these model predictions are, popular evaluation schemes require practitioners to manually determine a binning scheme: grouping labels into bins to approximate true label posterior. The problem is that these metrics are sensitive to binning decisions. We consider two solutions to the binning problem that apply at the stage of data annotation: collecting either distributed (redundant) labels or direct scalar value assignment. In this paper, we show that although both approaches address the binning problem by evaluating instance-level calibration, direct scalar assignment is significantly more cost-effective. We provide theoretical analysis and empirical evidence to support our proposal for dataset creators to adopt scalar annotation protocols to enable a higher-quality assessment of model calibration.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Beyond Boundaries: A Human-like Approach for Question Answering over Structured and Unstructured Information Sources Addressing the Binning Problem in Calibration Assessment through Scalar Annotations An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation Addressing the Binning Problem in Calibration Assessment through Scalar Annotations An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1