审核深度学习方法中的学习关联,从临床文本中提取种族和民族。

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2024-01-11 eCollection Date: 2023-01-01
Oliver J Bear Don't Walk Iv, Adrienne Pichon, Harry Reyes Nieva, Tony Sun, Jaan Altosaar, Karthik Natarajan, Adler Perotte, Peter Tarczy-Hornoch, Dina Demner-Fushman, Noémie Elhadad
{"title":"审核深度学习方法中的学习关联,从临床文本中提取种族和民族。","authors":"Oliver J Bear Don't Walk Iv, Adrienne Pichon, Harry Reyes Nieva, Tony Sun, Jaan Altosaar, Karthik Natarajan, Adler Perotte, Peter Tarczy-Hornoch, Dina Demner-Fushman, Noémie Elhadad","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785932/pdf/","citationCount":"0","resultStr":"{\"title\":\"Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.\",\"authors\":\"Oliver J Bear Don't Walk Iv, Adrienne Pichon, Harry Reyes Nieva, Tony Sun, Jaan Altosaar, Karthik Natarajan, Adler Perotte, Peter Tarczy-Hornoch, Dina Demner-Fushman, Noémie Elhadad\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.</p>\",\"PeriodicalId\":72180,\"journal\":{\"name\":\"AMIA ... Annual Symposium proceedings. AMIA Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785932/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AMIA ... Annual Symposium proceedings. AMIA Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

完整而准确的种族和民族(RE)患者信息对于生物医学信息学研究的许多领域都非常重要,例如定义和描述队列、进行质量评估以及识别健康不公平现象。患者级别的 RE 数据在结构化数据源中往往不准确或缺失,但可以通过临床笔记和自然语言处理 (NLP) 得到补充。近年来,NLP 在大型语言模型方面取得了许多进步,但偏见仍是一个经常未得到解决的问题,研究表明,对某些种族/民族群体使用有害和负面语言的频率高于其他群体。我们提出了一种方法,通过测量模型衍生的显著特征与人工识别的 RE 相关文本跨度之间的一致性,来审核为识别临床文本中的 RE 信息而训练的模型的学习关联。我们的研究表明,虽然模型表面上表现良好,但如果不加以解决,RE 识别模型存在着与所学关联相关的问题,并有可能在未来造成危害。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.

Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Fair Patient-Trial Matching via Patient-Criterion Level Fairness Constraint. Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets. Transferable and Interpretable Treatment Effectiveness Prediction for Ovarian Cancer via Multimodal Deep Learning. Understanding Cancer Caregiving and Predicting Burden: An Analytics and Machine Learning Approach. Usability and Recall Evaluation of Virtual Reality Ontology Object Manipulation (VROOM) System.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1