Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.

Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira
{"title":"Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.","authors":"Vijayaraghavan Bashyam, Craig Morioka, Suzie El-Saden, Alex At Bui, Ricky K Taira","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.</p>","PeriodicalId":91274,"journal":{"name":"Indian journal of medical informatics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9592058/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian journal of medical informatics","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A patient's electronic medical record contains a large number of medical reports and imaging studies. Identifying the relevant information in order to make a diagnosis can be a time consuming process that can easily overwhelm the physician. Summarizing key clinical information for physicians evaluating brain tumor patients is an ongoing research project at our institution. Notably, identifying documents associated with brain tumor is an important step in collecting the data relevant for summarization. Current electronic medical record systems lack meta-information which is useful in structuring heterogeneous medical information. Thus, identifying reports relevant to a particular task cannot be easily retrieved from a structured database. This necessitates content analysis methods for identifying relevant reports. This paper reports a system designed to identify brain-tumor related reports from an assorted collection of clinical reports. A large collection of clinical reports was obtained from our university hospital database. A domain expert manually annotated the documents classifying them into `related' and ùnrelated' categories. A multinomial naïve Bayes classifier was trained to use word level and UMLS concept level features from the reports to identify brain tumor related reports from the assorted collection. The system was trained on 90% and tested on 10% of the manually annotated corpus. A ten-fold cross validation is reported. Performance of the system was best (f-score 94.7) when the system was trained using both word level and UMLS concept level features. Using UMLS concepts improved classifier accuracy.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用多项式天真贝叶斯分类器和 UMLS 从各种报告集中识别相关医疗报告。
病人的电子病历包含大量医疗报告和成像研究。识别相关信息以便做出诊断是一个耗时的过程,很容易让医生不知所措。为评估脑肿瘤患者的医生总结关键临床信息是我们机构正在进行的一个研究项目。值得注意的是,识别与脑肿瘤相关的文件是收集总结相关数据的重要一步。目前的电子病历系统缺乏元信息,而元信息对于异构医疗信息的结构化非常有用。因此,从结构化数据库中检索与特定任务相关的报告并不容易。这就需要采用内容分析方法来识别相关报告。本文报告了一个旨在从各种临床报告中识别脑肿瘤相关报告的系统。我们从大学医院数据库中获取了大量临床报告。一位领域专家对文档进行了人工标注,将其分为 "相关 "和 "不相关 "两类。我们训练了一个多项式天真贝叶斯分类器,利用报告中的词级和 UMLS 概念级特征,从各类报告中识别出与脑肿瘤相关的报告。该系统在 90% 的人工注释语料库中进行了训练,并在 10% 的人工注释语料库中进行了测试。报告进行了十倍交叉验证。当系统同时使用单词级和 UMLS 概念级特征进行训练时,其性能最佳(f-score 94.7)。使用 UMLS 概念提高了分类器的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Training Needs of Telemedicine Staff – A Pilot Study Error Resilient Transmission and Security filtering of Medical Images Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1