Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques.

Ramakanth Kavuluru, Sifei Han, Daniel Harris
{"title":"Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques.","authors":"Ramakanth Kavuluru,&nbsp;Sifei Han,&nbsp;Daniel Harris","doi":"10.1007/978-3-642-38457-8_7","DOIUrl":null,"url":null,"abstract":"<p><p>Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.</p>","PeriodicalId":91830,"journal":{"name":"Advances in artificial intelligence. Canadian Society for Computational Studies of Intelligence. Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-642-38457-8_7","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in artificial intelligence. Canadian Society for Computational Studies of Intelligence. Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-642-38457-8_7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于知识和提取文本摘要技术的emr诊断码无监督提取。
诊断代码从医疗记录中提取,用于计费和报销,以及用于质量控制和队列识别等次要用途。在美国,这些代码来自标准术语ICD-9- cm,源自国际疾病分类(ICD)。ICD-9代码通常由训练有素的编码员按照特定的编码准则读取患者医疗记录中可用的所有工件来提取。为了帮助编码员完成这一手动过程,本文提出了一种无监督集成方法,从电子病历(emr)中的文本叙述中自动提取ICD-9诊断代码。早期对自动提取的尝试集中在单个文件上,如放射学报告和出院摘要。在这里,我们使用了一个更现实的数据集,并从肯塔基大学医学中心1000名住院患者的电子病历中提取了ICD-9代码。采用命名实体识别(NER)、基于图的医学概念映射和提取文本摘要技术,实现了基于实例的平均查全率为0.42,平均查准率为0.47;与仅使用NER的基线相比,我们注意到基于图的方法在召回率方面提高了12%,使用提取文本摘要方法在精度方面提高了7%。虽然诊断码是复杂的概念,通常以文本形式表达,具有显著的长距离非局部依赖关系,但我们目前的工作显示了无监督方法在提取部分代码方面的潜力。因此,我们的发现特别适用于难以获得大量训练数据的代码提取任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Solving the Permutation Heijunka Flow Shop Scheduling Problem with Non-unit Demands for Jobs Speech Intention Classification with Multimodal Deep Learning. Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques. A novel reinforcement learning architecture for continuous state and action spaces A Graph Cellular Automata Model to Study the Spreading of an Infectious Disease
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1