Interpretable machine learning text classification for clinical computed tomography reports – a case study of temporal bone fracture

Computer methods and programs in biomedicine update Pub Date : 2023-01-01 DOI:10.1016/j.cmpbup.2023.100104

Tong Ling , Luo Jake , Jazzmyne Adams , Kristen Osinski , Xiaoyu Liu , David Friedland

{"title":"Interpretable machine learning text classification for clinical computed tomography reports – a case study of temporal bone fracture","authors":"Tong Ling , Luo Jake , Jazzmyne Adams , Kristen Osinski , Xiaoyu Liu , David Friedland","doi":"10.1016/j.cmpbup.2023.100104","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Machine learning (ML) has demonstrated success in classifying patients’ diagnostic outcomes in free-text clinical notes. However, due to the machine learning model's complexity, interpreting the mechanism behind classification results remains difficult.</p></div><div><h3>Methods</h3><p>We investigated interpretable representations of text-based machine learning classification models. We created machine learning models to classify temporal bone fractures based on 164 temporal bone computed tomography (CT) text reports. We adopted the XGBoost, Support Vector Machine, Logistic Regression, and Random Forest algorithms. To interpret models, we used two major methodologies: (1) We calculated the average word frequency score (WFS) for keywords. The word frequency score shows the frequency gap between positively and negatively classified cases. (2) We used Local Interpretable Model-Agnostic Explanations (LIME) to show the word-level contribution to bone fracture classification.</p></div><div><h3>Results</h3><p>In temporal bone fracture classification, the random forest model achieved an average F1-score of 0.93. WFS revealed a difference in keyword usage between fracture and non-fracture cases. Additionally, LIME visualized the keywords' contributions to the classification results. The evaluation of LIME-based interpretation achieved the highest interpreting accuracy of 0.97.</p></div><div><h3>Conclusion</h3><p>The interpretable text explainer can improve physicians' understanding of machine learning predictions. By providing simple visualization, our model can increase the trust of computerized models. Our model supports more transparent computerized decision-making in clinical settings.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"3 ","pages":"Article 100104"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990023000137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Machine learning (ML) has demonstrated success in classifying patients’ diagnostic outcomes in free-text clinical notes. However, due to the machine learning model's complexity, interpreting the mechanism behind classification results remains difficult.

Methods

We investigated interpretable representations of text-based machine learning classification models. We created machine learning models to classify temporal bone fractures based on 164 temporal bone computed tomography (CT) text reports. We adopted the XGBoost, Support Vector Machine, Logistic Regression, and Random Forest algorithms. To interpret models, we used two major methodologies: (1) We calculated the average word frequency score (WFS) for keywords. The word frequency score shows the frequency gap between positively and negatively classified cases. (2) We used Local Interpretable Model-Agnostic Explanations (LIME) to show the word-level contribution to bone fracture classification.

Results

In temporal bone fracture classification, the random forest model achieved an average F1-score of 0.93. WFS revealed a difference in keyword usage between fracture and non-fracture cases. Additionally, LIME visualized the keywords' contributions to the classification results. The evaluation of LIME-based interpretation achieved the highest interpreting accuracy of 0.97.

Conclusion

The interpretable text explainer can improve physicians' understanding of machine learning predictions. By providing simple visualization, our model can increase the trust of computerized models. Our model supports more transparent computerized decision-making in clinical settings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

临床计算机断层扫描报告的可解释机器学习文本分类——以颞骨骨折为例

机器学习(ML)在自由文本临床记录中对患者的诊断结果进行分类方面取得了成功。然而，由于机器学习模型的复杂性，解释分类结果背后的机制仍然很困难。方法研究基于文本的机器学习分类模型的可解释表示。我们基于164份颞骨计算机断层扫描(CT)文本报告创建了机器学习模型来对颞骨骨折进行分类。我们采用了XGBoost、支持向量机、逻辑回归和随机森林算法。为了解释模型，我们使用了两种主要方法:(1)我们计算了关键词的平均词频得分(WFS)。词频得分显示了积极分类和消极分类案例之间的频率差距。(2)我们使用局部可解释模型不可知论解释(LIME)来显示词水平对骨折分类的贡献。结果在颞骨骨折分类中，随机森林模型的平均f1评分为0.93。WFS显示骨折与非骨折病例在关键词使用上的差异。此外，LIME将关键词对分类结果的贡献可视化。评价结果表明，基于lime的解译准确率最高，为0.97。结论可解释性文本解释器可以提高医生对机器学习预测的理解。通过提供简单的可视化，我们的模型可以增加计算机模型的信任度。我们的模型支持临床环境中更透明的计算机决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊