А N Khoruzhaya, D V Kozlov, К M Arzamasov, E I Kremneva
{"title":"基于决策树算法的颅内出血CT影像学报告文本分析。","authors":"А N Khoruzhaya, D V Kozlov, К M Arzamasov, E I Kremneva","doi":"10.17691/stm2022.14.6.04","DOIUrl":null,"url":null,"abstract":"<p><p><b>The aim of the study</b> is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.</p><p><strong>Materials and methods: </strong>The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.</p><p><strong>Results: </strong>According to the test results, the designed and trained algorithm in the binary classification of the CT reports \"with signs of ICH\" and \"without signs of ICH\" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.</p><p><strong>Conclusion: </strong>The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.</p>","PeriodicalId":51886,"journal":{"name":"Sovremennye Tehnologii v Medicine","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171057/pdf/","citationCount":"0","resultStr":"{\"title\":\"Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.\",\"authors\":\"А N Khoruzhaya, D V Kozlov, К M Arzamasov, E I Kremneva\",\"doi\":\"10.17691/stm2022.14.6.04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>The aim of the study</b> is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.</p><p><strong>Materials and methods: </strong>The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.</p><p><strong>Results: </strong>According to the test results, the designed and trained algorithm in the binary classification of the CT reports \\\"with signs of ICH\\\" and \\\"without signs of ICH\\\" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.</p><p><strong>Conclusion: </strong>The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.</p>\",\"PeriodicalId\":51886,\"journal\":{\"name\":\"Sovremennye Tehnologii v Medicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171057/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sovremennye Tehnologii v Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17691/stm2022.14.6.04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sovremennye Tehnologii v Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17691/stm2022.14.6.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
摘要
本研究的目的是创建、训练和测试使用决策树模型分析脑CT文本报告的算法,以解决颅内出血(ICH)迹象存在/不存在的简单二值分类任务。材料和方法:最初的数据是从统一医疗信息和分析系统(URIS UMIAS)的统一放射信息服务下载的,其中包含56个住院医疗机构中通过非对比CT获得的34,188项研究。使用自然语言符号和统计处理库NLTK (Natural Language Toolkit, version 3.6.5)和机器学习库scikit-learn进行数据分析和预处理,scikit-learn包含用于分类任务的工具。根据选定的14个ICH相关关键词,以及33个表示ICH不存在的停止短语,自动选择CT调查并进行后续专家验证。根据3980份方案描述的样本,形成了两类调查:包含ICH描述和不包含ICH描述。以决策树算法为模型,解决了二值分类问题。为了评估模型的性能,将CT调查按7:3的比例随机分成样本。在3980个协议中,2786个分配给训练数据集,1194个分配给测试数据集。结果:根据试验结果,设计并训练的算法对“有脑出血征象”和“无脑出血征象”的CT报告进行二值分类,敏感性为0.94,特异性为0.88,f值为0.83。结论:所开发和训练的放射学报告分析算法对颅内出血征象的脑CT具有较高的准确率,可用于解决二值分类问题和创建合适的数据集。然而,需要手工修改CT研究以确保质量控制,这是有限的。
Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.
The aim of the study is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.
Materials and methods: The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.
Results: According to the test results, the designed and trained algorithm in the binary classification of the CT reports "with signs of ICH" and "without signs of ICH" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.
Conclusion: The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.