Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.

IF 1.1 Q4 MEDICINE, RESEARCH & EXPERIMENTAL Sovremennye Tehnologii v Medicine Pub Date : 2022-01-01 DOI:10.17691/stm2022.14.6.04
А N Khoruzhaya, D V Kozlov, К M Arzamasov, E I Kremneva
{"title":"Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.","authors":"А N Khoruzhaya,&nbsp;D V Kozlov,&nbsp;К M Arzamasov,&nbsp;E I Kremneva","doi":"10.17691/stm2022.14.6.04","DOIUrl":null,"url":null,"abstract":"<p><p><b>The aim of the study</b> is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.</p><p><strong>Materials and methods: </strong>The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.</p><p><strong>Results: </strong>According to the test results, the designed and trained algorithm in the binary classification of the CT reports \"with signs of ICH\" and \"without signs of ICH\" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.</p><p><strong>Conclusion: </strong>The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.</p>","PeriodicalId":51886,"journal":{"name":"Sovremennye Tehnologii v Medicine","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171057/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sovremennye Tehnologii v Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17691/stm2022.14.6.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of the study is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.

Materials and methods: The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.

Results: According to the test results, the designed and trained algorithm in the binary classification of the CT reports "with signs of ICH" and "without signs of ICH" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.

Conclusion: The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于决策树算法的颅内出血CT影像学报告文本分析。
本研究的目的是创建、训练和测试使用决策树模型分析脑CT文本报告的算法,以解决颅内出血(ICH)迹象存在/不存在的简单二值分类任务。材料和方法:最初的数据是从统一医疗信息和分析系统(URIS UMIAS)的统一放射信息服务下载的,其中包含56个住院医疗机构中通过非对比CT获得的34,188项研究。使用自然语言符号和统计处理库NLTK (Natural Language Toolkit, version 3.6.5)和机器学习库scikit-learn进行数据分析和预处理,scikit-learn包含用于分类任务的工具。根据选定的14个ICH相关关键词,以及33个表示ICH不存在的停止短语,自动选择CT调查并进行后续专家验证。根据3980份方案描述的样本,形成了两类调查:包含ICH描述和不包含ICH描述。以决策树算法为模型,解决了二值分类问题。为了评估模型的性能,将CT调查按7:3的比例随机分成样本。在3980个协议中,2786个分配给训练数据集,1194个分配给测试数据集。结果:根据试验结果,设计并训练的算法对“有脑出血征象”和“无脑出血征象”的CT报告进行二值分类,敏感性为0.94,特异性为0.88,f值为0.83。结论:所开发和训练的放射学报告分析算法对颅内出血征象的脑CT具有较高的准确率,可用于解决二值分类问题和创建合适的数据集。然而,需要手工修改CT研究以确保质量控制,这是有限的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Sovremennye Tehnologii v Medicine
Sovremennye Tehnologii v Medicine MEDICINE, RESEARCH & EXPERIMENTAL-
CiteScore
1.80
自引率
0.00%
发文量
38
期刊最新文献
Sphingomyelins of Local Fat Depots and Blood Serum as Promising Biomarkers of Cardiovascular Diseases Neurogenetics of Brain Connectivity: Current Approaches to the Study (Review) Brain–Computer Interfaces with Intracortical Implants for Motor and Communication Functions Compensation: Review of Recent Developments Evaluation of the Feasibility of Using Commercial Wound Coatings as a Carrier Matrix for Bacteriophages Performance of the Models Predicting In-Hospital Mortality in Patients with ST-Segment Elevation Myocardial Infarction with Predictors in Categorical and Continuous Forms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1