Tong Ling , Luo Jake , Jazzmyne Adams , Kristen Osinski , Xiaoyu Liu , David Friedland
{"title":"Interpretable machine learning text classification for clinical computed tomography reports – a case study of temporal bone fracture","authors":"Tong Ling , Luo Jake , Jazzmyne Adams , Kristen Osinski , Xiaoyu Liu , David Friedland","doi":"10.1016/j.cmpbup.2023.100104","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Machine learning (ML) has demonstrated success in classifying patients’ diagnostic outcomes in free-text clinical notes. However, due to the machine learning model's complexity, interpreting the mechanism behind classification results remains difficult.</p></div><div><h3>Methods</h3><p>We investigated interpretable representations of text-based machine learning classification models. We created machine learning models to classify temporal bone fractures based on 164 temporal bone computed tomography (CT) text reports. We adopted the XGBoost, Support Vector Machine, Logistic Regression, and Random Forest algorithms. To interpret models, we used two major methodologies: (1) We calculated the average word frequency score (WFS) for keywords. The word frequency score shows the frequency gap between positively and negatively classified cases. (2) We used Local Interpretable Model-Agnostic Explanations (LIME) to show the word-level contribution to bone fracture classification.</p></div><div><h3>Results</h3><p>In temporal bone fracture classification, the random forest model achieved an average F1-score of 0.93. WFS revealed a difference in keyword usage between fracture and non-fracture cases. Additionally, LIME visualized the keywords' contributions to the classification results. The evaluation of LIME-based interpretation achieved the highest interpreting accuracy of 0.97.</p></div><div><h3>Conclusion</h3><p>The interpretable text explainer can improve physicians' understanding of machine learning predictions. By providing simple visualization, our model can increase the trust of computerized models. Our model supports more transparent computerized decision-making in clinical settings.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"3 ","pages":"Article 100104"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990023000137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Machine learning (ML) has demonstrated success in classifying patients’ diagnostic outcomes in free-text clinical notes. However, due to the machine learning model's complexity, interpreting the mechanism behind classification results remains difficult.
Methods
We investigated interpretable representations of text-based machine learning classification models. We created machine learning models to classify temporal bone fractures based on 164 temporal bone computed tomography (CT) text reports. We adopted the XGBoost, Support Vector Machine, Logistic Regression, and Random Forest algorithms. To interpret models, we used two major methodologies: (1) We calculated the average word frequency score (WFS) for keywords. The word frequency score shows the frequency gap between positively and negatively classified cases. (2) We used Local Interpretable Model-Agnostic Explanations (LIME) to show the word-level contribution to bone fracture classification.
Results
In temporal bone fracture classification, the random forest model achieved an average F1-score of 0.93. WFS revealed a difference in keyword usage between fracture and non-fracture cases. Additionally, LIME visualized the keywords' contributions to the classification results. The evaluation of LIME-based interpretation achieved the highest interpreting accuracy of 0.97.
Conclusion
The interpretable text explainer can improve physicians' understanding of machine learning predictions. By providing simple visualization, our model can increase the trust of computerized models. Our model supports more transparent computerized decision-making in clinical settings.