从多源数据中学习医学概念表征

Tian Bai, Brian L Egleston, Richard Bleicher, Slobodan Vucetic
{"title":"从多源数据中学习医学概念表征","authors":"Tian Bai, Brian L Egleston, Richard Bleicher, Slobodan Vucetic","doi":"10.24963/ijcai.2019/680","DOIUrl":null,"url":null,"abstract":"<p><p>Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.</p>","PeriodicalId":73334,"journal":{"name":"IJCAI : proceedings of the conference","volume":"2019 ","pages":"4897-4903"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7047512/pdf/nihms-1558151.pdf","citationCount":"0","resultStr":"{\"title\":\"Medical Concept Representation Learning from Multi-source Data.\",\"authors\":\"Tian Bai, Brian L Egleston, Richard Bleicher, Slobodan Vucetic\",\"doi\":\"10.24963/ijcai.2019/680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.</p>\",\"PeriodicalId\":73334,\"journal\":{\"name\":\"IJCAI : proceedings of the conference\",\"volume\":\"2019 \",\"pages\":\"4897-4903\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7047512/pdf/nihms-1558151.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCAI : proceedings of the conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24963/ijcai.2019/680\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCAI : proceedings of the conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/ijcai.2019/680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在许多自然语言处理任务中,将单词表示为低维向量非常有用。这一想法已扩展到医疗领域,医疗索赔中列出的医疗代码被表示为向量,以促进探索性分析和预测建模。然而,根据医疗服务提供者的类型,医疗报销单可能使用来自不同本体或本体组合的医疗代码,这就使得表征的学习变得复杂。为了能够正确利用这种多源医疗索赔数据,我们提出了一种在同一向量空间中表示来自不同本体的医疗代码的方法。我们首先修改了代码间相似性的点式互信息(PMI)度量。然后,我们为 word2vec 模型开发了一种新的负采样方法,该方法可对修改后的 PMI 矩阵进行隐式因式分解。我们在代码交叉引用问题上对新方法进行了评估,该问题旨在识别不同本体中的相似代码。在实验中,我们评估了 ICD-9 和 CPT 医疗代码本体之间的交叉引用。我们的结果表明,与现有的几种方法相比,拟议方法学习的代码矢量表示提供了更优越的交叉引用效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Medical Concept Representation Learning from Multi-source Data.

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predictive Modeling with Temporal Graphical Representation on Electronic Health Records. Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning. Stabilizing and Enhancing Link Prediction through Deepened Graph Auto-Encoders. RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1