That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data.

Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C Wallace
{"title":"That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data.","authors":"Denis Jered McInerney,&nbsp;Geoffrey Young,&nbsp;Jan-Willem van de Meent,&nbsp;Byron C Wallace","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenomena described in free-text. While past work has suggested that attention \"heatmaps\" can be interpreted in this manner, there has been little evaluation of such alignments. We compare alignments from a state-of-the-art multimodal (image and text) model for EHR with human annotations that link image regions to sentences. Our main finding is that the text has an often weak or unintuitive influence on attention; alignments do not consistently reflect basic anatomical information. Moreover, synthetic modifications - such as substituting \"left\" for \"right\" - do not substantially influence highlights. Simple techniques such as allowing the model to opt out of attending to the image and few-shot finetuning show promise in terms of their ability to improve alignments with very little or no supervision. We make our code and checkpoints open-source.</p>","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"2022 ","pages":"3626-3648"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124183/pdf/nihms-1890274.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenomena described in free-text. While past work has suggested that attention "heatmaps" can be interpreted in this manner, there has been little evaluation of such alignments. We compare alignments from a state-of-the-art multimodal (image and text) model for EHR with human annotations that link image regions to sentences. Our main finding is that the text has an often weak or unintuitive influence on attention; alignments do not consistently reflect basic anatomical information. Moreover, synthetic modifications - such as substituting "left" for "right" - do not substantially influence highlights. Simple techniques such as allowing the model to opt out of attending to the image and few-shot finetuning show promise in terms of their ability to improve alignments with very little or no supervision. We make our code and checkpoints open-source.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
那是错误的肺!评估和改进医疗数据无监督多模态编码器的可解释性。
在电子健康记录(EHRs)上预训练多模态模型提供了一种学习表征的方法,这种表征可以在最少的监督下转移到下游任务。最近的多模态模型诱导图像区域和句子之间的软局部对齐。这在医学领域是特别有趣的,其中对齐可能突出显示图像中与自由文本中描述的特定现象相关的区域。虽然过去的研究表明,注意力“热图”可以用这种方式解释,但对这种排列的评估很少。我们比较了电子病历中最先进的多模态(图像和文本)模型与将图像区域链接到句子的人工注释的对齐。我们的主要发现是,文本对注意力的影响通常是微弱的或非直觉的;排列不能一致地反映基本的解剖信息。此外,合成修饰——例如将“右”替换为“左”——不会对高亮显示产生实质性影响。简单的技术,如允许模型选择不关注图像和少量镜头微调,在很少或没有监督的情况下改善对齐的能力方面表现出了希望。我们将代码和检查点开源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. Hierarchical Pretraining on Multimodal Electronic Health Records. An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives. A Comprehensive Evaluation of Biomedical Entity Linking Models. Sentence-Incremental Neural Coreference Resolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1