使用数据挖掘技术预测脑卒中患者死亡率

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Acta Informatica Pragensia Pub Date : 2021-11-21 DOI:10.18267/j.aip.163
Zahra Hadianfard, Hadi Lotfnezhad Afshar, S. Nazarbaghi, B. Rahimi, T. Timpka
{"title":"使用数据挖掘技术预测脑卒中患者死亡率","authors":"Zahra Hadianfard, Hadi Lotfnezhad Afshar, S. Nazarbaghi, B. Rahimi, T. Timpka","doi":"10.18267/j.aip.163","DOIUrl":null,"url":null,"abstract":"The mortality due to stroke is increasing. Accurate prediction of stroke-caused death is very important for healthcare. Data mining methods are novel ways to predict these mortality risks. The aim of this study is to employ popular data mining algorithms to predict the survival of stroke patients and extract decision rules. The data on stroke patients (n=4149) were collected from paper medical records. Missing data were managed using the multiple imputation method. Also, the target variable was balanced using methods such as over-sampling, under-sampling and Synthetic Minority Oversampling (SMOTE). The support vector machine (SVM), decision tree, and logistic regression (LR) algorithms were employed to predict the survival of stroke patients. Also, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm was used to extract the decision rules from the main dataset. LR outperformed other algorithms in terms of accuracy (76.96%), sensitivity (79.06%) and kappa (33.34). However, specificity (65.35%) and AUC (0.77) scores were lower than those of other algorithms. An independent dataset with 234 records was selected to challenge the LR algorithm with the best performance from the main dataset. After employing this algorithm on the external validation dataset, its performance was improved in accuracy (79.91%), sensitivity (83.94%), kappa (39.26) and AUC (0.8), but not in specificity (60.98%). The constructed model predicted the survival of stroke patients with high scores and useful rules were extracted for clinical usage.","PeriodicalId":36592,"journal":{"name":"Acta Informatica Pragensia","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2021-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predicting Mortality in Patients with Stroke Using Data Mining Techniques\",\"authors\":\"Zahra Hadianfard, Hadi Lotfnezhad Afshar, S. Nazarbaghi, B. Rahimi, T. Timpka\",\"doi\":\"10.18267/j.aip.163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The mortality due to stroke is increasing. Accurate prediction of stroke-caused death is very important for healthcare. Data mining methods are novel ways to predict these mortality risks. The aim of this study is to employ popular data mining algorithms to predict the survival of stroke patients and extract decision rules. The data on stroke patients (n=4149) were collected from paper medical records. Missing data were managed using the multiple imputation method. Also, the target variable was balanced using methods such as over-sampling, under-sampling and Synthetic Minority Oversampling (SMOTE). The support vector machine (SVM), decision tree, and logistic regression (LR) algorithms were employed to predict the survival of stroke patients. Also, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm was used to extract the decision rules from the main dataset. LR outperformed other algorithms in terms of accuracy (76.96%), sensitivity (79.06%) and kappa (33.34). However, specificity (65.35%) and AUC (0.77) scores were lower than those of other algorithms. An independent dataset with 234 records was selected to challenge the LR algorithm with the best performance from the main dataset. After employing this algorithm on the external validation dataset, its performance was improved in accuracy (79.91%), sensitivity (83.94%), kappa (39.26) and AUC (0.8), but not in specificity (60.98%). The constructed model predicted the survival of stroke patients with high scores and useful rules were extracted for clinical usage.\",\"PeriodicalId\":36592,\"journal\":{\"name\":\"Acta Informatica Pragensia\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2021-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Informatica Pragensia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18267/j.aip.163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica Pragensia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18267/j.aip.163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

摘要

中风导致的死亡率正在上升。准确预测中风导致的死亡对医疗保健非常重要。数据挖掘方法是预测这些死亡风险的新方法。本研究的目的是采用流行的数据挖掘算法来预测中风患者的生存率并提取决策规则。脑卒中患者(n=4149)的数据来自纸质医疗记录。缺失数据采用多重插补法进行管理。此外,使用过采样、欠采样和合成少数过采样(SMOTE)等方法平衡目标变量。采用支持向量机(SVM)、决策树和逻辑回归(LR)算法预测脑卒中患者的生存率。此外,使用重复增量修剪以产生误差减少(RIPPER)算法从主数据集中提取决策规则。LR在准确性(76.96%)、敏感性(79.06%)和kappa(33.34)方面优于其他算法。然而,特异性(65.35%)和AUC(0.77)得分低于其他算法。从主数据集中选择了一个具有234条记录的独立数据集来挑战具有最佳性能的LR算法。在外部验证数据集上使用该算法后,其性能在准确性(79.91%)、敏感性(83.94%)、kappa(39.26)和AUC(0.8)方面有所提高,但在特异性(60.98%)方面没有提高。构建的模型预测了高分脑卒中患者的生存率,并提取了有用的规则供临床使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Mortality in Patients with Stroke Using Data Mining Techniques
The mortality due to stroke is increasing. Accurate prediction of stroke-caused death is very important for healthcare. Data mining methods are novel ways to predict these mortality risks. The aim of this study is to employ popular data mining algorithms to predict the survival of stroke patients and extract decision rules. The data on stroke patients (n=4149) were collected from paper medical records. Missing data were managed using the multiple imputation method. Also, the target variable was balanced using methods such as over-sampling, under-sampling and Synthetic Minority Oversampling (SMOTE). The support vector machine (SVM), decision tree, and logistic regression (LR) algorithms were employed to predict the survival of stroke patients. Also, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm was used to extract the decision rules from the main dataset. LR outperformed other algorithms in terms of accuracy (76.96%), sensitivity (79.06%) and kappa (33.34). However, specificity (65.35%) and AUC (0.77) scores were lower than those of other algorithms. An independent dataset with 234 records was selected to challenge the LR algorithm with the best performance from the main dataset. After employing this algorithm on the external validation dataset, its performance was improved in accuracy (79.91%), sensitivity (83.94%), kappa (39.26) and AUC (0.8), but not in specificity (60.98%). The constructed model predicted the survival of stroke patients with high scores and useful rules were extracted for clinical usage.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Informatica Pragensia
Acta Informatica Pragensia Social Sciences-Library and Information Sciences
CiteScore
1.70
自引率
0.00%
发文量
26
审稿时长
12 weeks
期刊最新文献
Visualisation of User Stories in UML Models: A Systematic Literature Review Safe Haven for Asian Equity Markets During Financial Distress: Bitcoin Versus Gold Consumer Behaviour in Gamified Environment: A Bibliometric and Systematic Literature Review in Business and Management Area Impact of Women Driving Rights on Adoption and Usage of E-hailing Applications in Saudi Arabia Use of Data Mining for Analysis of Czech Real Estate Market
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1