Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2024-10-02 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1374162
Hongshu Liu, Nabeel Seedat, Julia Ive
{"title":"Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing.","authors":"Hongshu Liu, Nabeel Seedat, Julia Ive","doi":"10.3389/frai.2024.1374162","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision-making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which have been labeled automatically (self-supervised mode) and tend to overfit.</p><p><strong>Methods: </strong>In this study, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain.</p><p><strong>Results: </strong>We demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of three uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.</p><p><strong>Discussion: </strong>Our conclusions highlight the utility of probabilistic models applied to \"noisy\" labels and that similar methods could provide utility for Natural Language Processing (NLP) based automated labeling tasks.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480042/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1374162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision-making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which have been labeled automatically (self-supervised mode) and tend to overfit.

Methods: In this study, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain.

Results: We demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of three uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.

Discussion: Our conclusions highlight the utility of probabilistic models applied to "noisy" labels and that similar methods could provide utility for Natural Language Processing (NLP) based automated labeling tasks.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为临床自然语言处理中的半监督学习自动数据标注中的分歧建模。
导言:提供不确定性准确估计值的计算模型对于医疗决策相关的风险管理至关重要。由于许多最先进的系统都是使用自动标注的数据(自我监督模式)进行训练的,因此往往会出现过拟合的情况,这一点尤为重要:在本研究中,我们将一系列当前最先进的预测模型应用于放射学报告中的观察结果检测问题,对其不确定性估计的质量进行了调查。这一问题在医疗保健领域的自然语言处理中仍未得到充分研究:结果:我们证明了高斯过程(GPs)在量化基于负对数预测概率(NLPP)评估指标和平均最大预测置信水平(MMPCL)的三种不确定性标签的风险方面具有卓越的性能,同时保持了强大的预测性能:我们的结论强调了应用于 "噪声 "标签的概率模型的实用性,类似的方法可为基于自然语言处理(NLP)的自动标签任务提供实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Impact of hypertension on coronary artery plaques and FFR-CT in type 2 diabetes mellitus patients: evaluation utilizing artificial intelligence processed coronary computed tomography angiography. Using large language models to support pre-service teachers mathematical reasoning-an exploratory study on ChatGPT as an instrument for creating mathematical proofs in geometry. Prediction of unobserved bifurcation by unsupervised extraction of slowly time-varying system parameter dynamics from time series using reservoir computing. Enzyme catalytic efficiency prediction: employing convolutional neural networks and XGBoost. Heuristic machine learning approaches for identifying phishing threats across web and email platforms.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1