Mohsen Askar , Lars Småbrekke , Einar Holsbø , Lars Ailo Bongo , Kristian Svendsen
{"title":"“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”","authors":"Mohsen Askar , Lars Småbrekke , Einar Holsbø , Lars Ailo Bongo , Kristian Svendsen","doi":"10.1016/j.rcsop.2024.100463","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.</p></div><div><h3>Objectives</h3><p>To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.</p></div><div><h3>Methods</h3><p>The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.</p><p>The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.</p></div><div><h3>Results</h3><p>Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.</p></div><div><h3>Conclusions</h3><p>Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.</p></div>","PeriodicalId":73003,"journal":{"name":"Exploratory research in clinical and social pharmacy","volume":"14 ","pages":"Article 100463"},"PeriodicalIF":1.8000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266727662400060X/pdfft?md5=97bf02f99058457c9ad310ec9e29b460&pid=1-s2.0-S266727662400060X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Exploratory research in clinical and social pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266727662400060X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.
Objectives
To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.
Methods
The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.
The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.
Results
Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.
Conclusions
Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.