Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM

IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2025-04-01 Epub Date: 2025-01-25 DOI:10.1016/j.ijmedinf.2025.105809
Milad Rad , Alireza Rafiei , Jocelyn Grunwell , Rishikesan Kamaleswaran
{"title":"Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM","authors":"Milad Rad ,&nbsp;Alireza Rafiei ,&nbsp;Jocelyn Grunwell ,&nbsp;Rishikesan Kamaleswaran","doi":"10.1016/j.ijmedinf.2025.105809","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The regression of small imbalanced horizontal datasets is an important problem in bioinformatics due to rare but vital data points impacting model performance. Most clinical studies suffer from imbalance in their distribution which impacts the learning ability of regression or classification models. The imbalance once combined with the small number of samples reduces the prediction performance. An improvement in the trainability of small imbalanced datasets hugely improves the potency of current prediction models that rely on a small set of valuable expensive samples.</div></div><div><h3>Materials and methods</h3><div>A method called Stability Selection has been used to overcome the high dimensionality problem, which arises when the sample sizes are relatively small compared to the number of features. The method was used to improve the performance of the Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN), an imbalance removal algorithm. To test the new pipeline, a small imbalanced cohort of pediatric ICU patients was used to predict the number of Ventilator-Free Days (VFD) a patient may experience for an admission period of 28 days due to respiratory illnesses.</div></div><div><h3>Results</h3><div>Our model demonstrated its effectiveness by overcoming label imbalance while predicting almost all the non-surviving patients in the test dataset using Stability Selection before applying SMOGN. Our study also highlighted the importance of Pediatrics Risk of Mortality (PRISM) as a powerful VFD predictor if combined with other clinical features.</div></div><div><h3>Conclusion</h3><div>This paper shows how a hybrid strategy of Stability Selection, SMOGN, and regression can improve the outcome of highly imbalanced datasets and reduce the probability of highly expensive false negative detections in severe acute respiratory disease syndrome cases. The proposed modeling pipeline can reduce the overall VFD regression error but is also expandable to other regressable features. We also showed the importance of PRISM as a strong VFD predictor.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105809"},"PeriodicalIF":4.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000267","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

The regression of small imbalanced horizontal datasets is an important problem in bioinformatics due to rare but vital data points impacting model performance. Most clinical studies suffer from imbalance in their distribution which impacts the learning ability of regression or classification models. The imbalance once combined with the small number of samples reduces the prediction performance. An improvement in the trainability of small imbalanced datasets hugely improves the potency of current prediction models that rely on a small set of valuable expensive samples.

Materials and methods

A method called Stability Selection has been used to overcome the high dimensionality problem, which arises when the sample sizes are relatively small compared to the number of features. The method was used to improve the performance of the Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN), an imbalance removal algorithm. To test the new pipeline, a small imbalanced cohort of pediatric ICU patients was used to predict the number of Ventilator-Free Days (VFD) a patient may experience for an admission period of 28 days due to respiratory illnesses.

Results

Our model demonstrated its effectiveness by overcoming label imbalance while predicting almost all the non-surviving patients in the test dataset using Stability Selection before applying SMOGN. Our study also highlighted the importance of Pediatrics Risk of Mortality (PRISM) as a powerful VFD predictor if combined with other clinical features.

Conclusion

This paper shows how a hybrid strategy of Stability Selection, SMOGN, and regression can improve the outcome of highly imbalanced datasets and reduce the probability of highly expensive false negative detections in severe acute respiratory disease syndrome cases. The proposed modeling pipeline can reduce the overall VFD regression error but is also expandable to other regressable features. We also showed the importance of PRISM as a strong VFD predictor.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用稳定性选择和SMOGN解决小型不平衡水平数据集回归:儿童重症监护病房无通气天数预测的案例研究及PRISM的重要性
目的:小型不平衡水平数据集的回归是生物信息学中的一个重要问题,因为很少但重要的数据点会影响模型的性能。大多数临床研究存在分布不平衡的问题,影响了回归模型或分类模型的学习能力。一旦不平衡加上样本数量少,就会降低预测性能。对小型不平衡数据集的可训练性的改进极大地提高了当前依赖于一小部分有价值的昂贵样本的预测模型的有效性。材料和方法:一种称为稳定性选择的方法已被用于克服高维问题,当样本量相对于特征数量相对较小时,就会出现高维问题。利用该方法改进了基于高斯噪声的合成少数派过采样技术(SMOGN)的不平衡去除算法的性能。为了测试新的管道,我们使用了一个小型的不平衡儿科ICU患者队列来预测患者因呼吸系统疾病在28天的入院期内可能经历的无呼吸机天数(VFD)。结果:在应用SMOGN之前,我们的模型克服了标签不平衡,同时使用稳定性选择预测了测试数据集中几乎所有的非存活患者,证明了其有效性。我们的研究还强调了儿科死亡风险(PRISM)作为一个强大的VFD预测因素的重要性,如果与其他临床特征相结合。结论:本文展示了稳定性选择、SMOGN和回归的混合策略如何改善高度不平衡数据集的结果,并降低严重急性呼吸系统疾病综合征病例中代价高昂的假阴性检测的概率。提出的建模管道可以减少整体VFD回归误差,但也可以扩展到其他可回归特征。我们还显示了PRISM作为一个强大的VFD预测因子的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Research using smartwatches for the measurement of physical activity, sedentary behavior, and sleep: A scoping review Characterizing nursing home care team communication via text messaging: A social network analysis Comparing large language models and human experts in interpreting MRI reports for personalized patient education Beyond the conventional: Artificial intelligence in identifying risk factors in sports injuries. A scoping review A visualized nurse-led ePRO system for chemotherapy toxicity: Content design and validation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1