VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

Yongxin Xu, Kai Yang, Chaohe Zhang, Peinie Zou, Zhiyuan Wang, Hongxin Ding, Junfeng Zhao, Yasha Wang, Bing Xie
{"title":"VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data","authors":"Yongxin Xu, Kai Yang, Chaohe Zhang, Peinie Zou, Zhiyuan Wang, Hongxin Ding, Junfeng Zhao, Yasha Wang, Bing Xie","doi":"10.24963/ijcai.2023/547","DOIUrl":null,"url":null,"abstract":"Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Joint Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/ijcai.2023/547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
VecoCare:访问序列-临床记录联合学习在医疗保健数据中的诊断预测
由于实际诊断预测场景中使用的电子健康记录(EHR)数据的不足,大多数工作都致力于从结构化的EHR数据(例如,时间医疗事件,实验室测试结果等)或非结构化数据(例如,临床记录等)中学习强大的患者表示。然而,从两者中合成丰富的信息仍然需要探索。首先,它们之间的异构语义偏差严重阻碍了表征空间的综合,而表征空间的综合对诊断预测至关重要。其次,部分临床记录的混杂质量导致待预测患者的不充分代表。第三,典型的注意机制主要集中于对同类患者信息的聚合,忽略了来自其他患者的重要辅助信息。为了应对这些挑战,我们提出了一种新颖的就诊顺序-临床记录联合学习方法,称为VecoCare。该算法以顺序预训练的方式执行基于Gromov-Wasserstein距离(GWD)的对比学习任务和自适应屏蔽语言模型任务,以减少异构语义偏差。经过预训练后,VecoCare通过双通道检索机制进一步聚合相似和不相似患者的信息。我们在两个真实世界的数据集上进行了诊断预测实验,这表明VecoCare优于最先进的方法。此外,VecoCare的发现与医学研究结果一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Formal Verification of Neuro-symbolic Multi-agent Systems RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time AI and Decision Support for Sustainable Socio-Ecosystems Contrastive Learning and Reward Smoothing for Deep Portfolio Management
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1