"利用网络分析模块化对健康代码系统进行分组,降低机器学习模型的维度"

IF 1.8 Q3 PHARMACOLOGY & PHARMACY Exploratory research in clinical and social pharmacy Pub Date : 2024-06-01 DOI:10.1016/j.rcsop.2024.100463
Mohsen Askar , Lars Småbrekke , Einar Holsbø , Lars Ailo Bongo , Kristian Svendsen
{"title":"\"利用网络分析模块化对健康代码系统进行分组,降低机器学习模型的维度\"","authors":"Mohsen Askar ,&nbsp;Lars Småbrekke ,&nbsp;Einar Holsbø ,&nbsp;Lars Ailo Bongo ,&nbsp;Kristian Svendsen","doi":"10.1016/j.rcsop.2024.100463","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.</p></div><div><h3>Objectives</h3><p>To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.</p></div><div><h3>Methods</h3><p>The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.</p><p>The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.</p></div><div><h3>Results</h3><p>Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.</p></div><div><h3>Conclusions</h3><p>Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.</p></div>","PeriodicalId":73003,"journal":{"name":"Exploratory research in clinical and social pharmacy","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266727662400060X/pdfft?md5=97bf02f99058457c9ad310ec9e29b460&pid=1-s2.0-S266727662400060X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”\",\"authors\":\"Mohsen Askar ,&nbsp;Lars Småbrekke ,&nbsp;Einar Holsbø ,&nbsp;Lars Ailo Bongo ,&nbsp;Kristian Svendsen\",\"doi\":\"10.1016/j.rcsop.2024.100463\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.</p></div><div><h3>Objectives</h3><p>To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.</p></div><div><h3>Methods</h3><p>The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.</p><p>The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.</p></div><div><h3>Results</h3><p>Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.</p></div><div><h3>Conclusions</h3><p>Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.</p></div>\",\"PeriodicalId\":73003,\"journal\":{\"name\":\"Exploratory research in clinical and social pharmacy\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S266727662400060X/pdfft?md5=97bf02f99058457c9ad310ec9e29b460&pid=1-s2.0-S266727662400060X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Exploratory research in clinical and social pharmacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S266727662400060X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Exploratory research in clinical and social pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266727662400060X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

摘要

背景医疗保健和药学相关研究中的机器学习(ML)预测模型在编码高维医疗保健编码系统(HCS)(如 ICD、ATC 和 DRG 编码)时面临挑战,因为需要在降低模型维度和减少信息丢失之间权衡利弊。方法利用 MIMIC-III 数据集创建多病网络,其中 ICD-9 代码为节点,边为共享相同 ICD-9 代码对的患者人数。模块化检测算法使用不同的分辨率阈值生成 6 组模块。评估了四种分组策略对预测重症监护病房 90 天再入院率的影响。比较的分组策略包括1)对代码进行二进制编码;2)按网络模块对代码进行分组;3)按 ICD-9 层次结构的最高级别对代码进行分组;4)使用单级临床分类软件 (CCS) 进行分组。同样的方法也适用于 DRG 代码的编码,但比较仅限于二进制编码的单一模块化阈值。使用逻辑回归、非线性内核支持向量机和梯度提升机算法对性能进行了评估。结果采用模块化编码的模型优于未分组编码的二进制编码模型。所有算法的准确率都有所提高,模块化编码的准确率从 0.736 提高到 0.78,二进制编码的准确率从 0.727 提高到 0.779。几乎所有算法的 AUC、召回率和精确度也都有所提高。与其他分组方法相比,模块化编码的 AUC(从 0.813 到 0.837)和精度(从 0.752 到 0.782)通常略高。在所使用的三种算法中,使用模块化编码的模型显示出优于或类似于其他编码方法的性能。模块化编码还具有其他优势,例如它既可用于分层 HCS,也可用于非分层 HCS,该方法与临床相关,并能增强 ML 模型的临床解释能力。为了便于在未来的研究中使用这种方法,我们开发了一个 Python 软件包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”

Background

Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.

Objectives

To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.

Methods

The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.

The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.

Results

Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.

Conclusions

Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
审稿时长
103 days
期刊最新文献
Montelukast deprescribing in outpatient specialty clinics: A single center cross-sectional study Appropriateness of direct oral anticoagulant dosing in patients with atrial fibrillation at a tertiary care hospital in Thailand Comparing nursing medication rounds before and after implementation of automated dispensing cabinets: A time and motion study Translation, transcultural adaptation, and validation of the Brazilian Portuguese version of the general medication adherence scale (GMAS) in patients with high blood pressure A cross-sectional survey exploring organizational readiness to implement community pharmacy-based opioid counseling and naloxone services in rural versus urban settings in Alabama
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1