Text Mining on Hospital Stay Durations and Management of Sickle Cell Disease Patients

Mohammed Gollapalli, Latifa Alabdullatif, Farah Alsuwayeh, Moodhi Aljouali, Alhanoof Alhunief, Zaina Batook
{"title":"Text Mining on Hospital Stay Durations and Management of Sickle Cell Disease Patients","authors":"Mohammed Gollapalli, Latifa Alabdullatif, Farah Alsuwayeh, Moodhi Aljouali, Alhanoof Alhunief, Zaina Batook","doi":"10.1109/CICN56167.2022.10008265","DOIUrl":null,"url":null,"abstract":"Sickle cell disease (SCD) is a genetic blood disorder characterized by clumping of red blood cells, preventing blood and oxygen from reaching all parts of the body. SCD disease is very common in Sub-Saharan Africa, the Mediterranean basin, and the eastern regions of Saudi Arabia due to high consanguineous marriage practices. Patients are frequently admitted due to the prevalence of multiple organ damage among SCD patients as a result of repeated vascular occlusion, resulting in a large amount of medical notes recorded by doctors and nurses during each clinical trial. In this study, 12 years of SCD patient de-identified data (2018–2020) were obtained officially from the hospital and experimented with in relation to SCD patient medical notes. We used a text mining framework to analyze and predict the length of stay (LoS) of SCD patients using three machine learning (ML) models: XGBoost, Decision Tree, and KNN. The most frequently occurring words were extracted from 62,847 SCD medical screening records using text mining. Furthermore, feature models were created to investigate the effect of increasing or decreasing the number of terms on model performance. The XGBoost algorithm produced the best results, with 94.3% accuracy, while the other algorithms produced results of 93.5% for Decision Tree and 90.7% for KNN. The findings suggest that predicting the length of stay of SCD patients is highly feasible, allowing for better utilization of medical personnel and resources.","PeriodicalId":287589,"journal":{"name":"2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN56167.2022.10008265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Sickle cell disease (SCD) is a genetic blood disorder characterized by clumping of red blood cells, preventing blood and oxygen from reaching all parts of the body. SCD disease is very common in Sub-Saharan Africa, the Mediterranean basin, and the eastern regions of Saudi Arabia due to high consanguineous marriage practices. Patients are frequently admitted due to the prevalence of multiple organ damage among SCD patients as a result of repeated vascular occlusion, resulting in a large amount of medical notes recorded by doctors and nurses during each clinical trial. In this study, 12 years of SCD patient de-identified data (2018–2020) were obtained officially from the hospital and experimented with in relation to SCD patient medical notes. We used a text mining framework to analyze and predict the length of stay (LoS) of SCD patients using three machine learning (ML) models: XGBoost, Decision Tree, and KNN. The most frequently occurring words were extracted from 62,847 SCD medical screening records using text mining. Furthermore, feature models were created to investigate the effect of increasing or decreasing the number of terms on model performance. The XGBoost algorithm produced the best results, with 94.3% accuracy, while the other algorithms produced results of 93.5% for Decision Tree and 90.7% for KNN. The findings suggest that predicting the length of stay of SCD patients is highly feasible, allowing for better utilization of medical personnel and resources.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
镰状细胞病患者住院时间和管理的文本挖掘
镰状细胞病(SCD)是一种遗传性血液疾病,其特征是红细胞聚集,阻止血液和氧气到达身体的所有部位。由于近亲婚姻盛行,SCD在撒哈拉以南非洲、地中海盆地和沙特阿拉伯东部地区非常常见。由于SCD患者反复血管闭塞导致多脏器损害,患者经常入院,导致每次临床试验时医生和护士都要记录大量病历。在本研究中,从医院正式获得了12年的SCD患者去识别数据(2018-2020),并对SCD患者的医疗记录进行了实验。我们使用文本挖掘框架,使用三种机器学习(ML)模型:XGBoost、Decision Tree和KNN来分析和预测SCD患者的住院时间(LoS)。使用文本挖掘从62,847份SCD医疗筛查记录中提取出出现频率最高的单词。此外,还建立了特征模型来研究增加或减少术语数量对模型性能的影响。XGBoost算法产生了最好的结果,准确率为94.3%,而其他算法对Decision Tree的准确率为93.5%,对KNN的准确率为90.7%。研究结果表明,预测SCD患者的住院时间是高度可行的,可以更好地利用医务人员和资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of Downhole Pressure while Tripping A Parallelized Genetic Algorithms approach to Community Energy Systems Planning Application of Artificial Neural Network to Estimate Students Performance in Scholastic Assessment Test A New Intelligent System for Evaluating and Assisting Students in Laboratory Learning Management System Performance Evaluation of Machine Learning Models on Apache Spark: An Empirical Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1