{"title":"Sepsis Prediction in Intensive Care Unit Using Ensemble of XGboost Models","authors":"M. Zabihi, S. Kiranyaz, M. Gabbouj","doi":"10.23919/CinC49843.2019.9005564","DOIUrl":null,"url":null,"abstract":"Sepsis is caused by the dysregulated host response to infection and potentially is the main cause of 6 million death annually. It is a highly dynamic syndrome and therefore the early prediction of sepsis plays a key role in reducing its high associated mortality. However, this is a challenging task because there is no specific and accurate test or scoring system to perform early prediction. In this paper, we present a systematic approach for sepsis prediction. We also propose a new set of features to model the missingness in clinical data. The pipeline of the proposed method comprises three major components: feature extraction, feature selection, and classification. In total, 407 features are extracted from the clinical data. Then, five different sets of features are selected using a wrapper feature selection algorithm based on XGboost. The selected features are extracted from both valid and missing clinical data. Afterwards, an ensemble model consists of five XGboost models is used for sepsis prediction. The proposed algorithm is ranked officially as third place in the PhysioNet/Computing in Cardiology Challenge 2019 with an overall utility score of 0.339 on the unseen test dataset (our team name: Separatrix).","PeriodicalId":6697,"journal":{"name":"2019 Computing in Cardiology (CinC)","volume":"23 1","pages":"Page 1-Page 4"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CinC49843.2019.9005564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36
Abstract
Sepsis is caused by the dysregulated host response to infection and potentially is the main cause of 6 million death annually. It is a highly dynamic syndrome and therefore the early prediction of sepsis plays a key role in reducing its high associated mortality. However, this is a challenging task because there is no specific and accurate test or scoring system to perform early prediction. In this paper, we present a systematic approach for sepsis prediction. We also propose a new set of features to model the missingness in clinical data. The pipeline of the proposed method comprises three major components: feature extraction, feature selection, and classification. In total, 407 features are extracted from the clinical data. Then, five different sets of features are selected using a wrapper feature selection algorithm based on XGboost. The selected features are extracted from both valid and missing clinical data. Afterwards, an ensemble model consists of five XGboost models is used for sepsis prediction. The proposed algorithm is ranked officially as third place in the PhysioNet/Computing in Cardiology Challenge 2019 with an overall utility score of 0.339 on the unseen test dataset (our team name: Separatrix).
败血症是由宿主对感染反应失调引起的,可能是每年造成600万人死亡的主要原因。脓毒症是一种高度动态的综合征,因此脓毒症的早期预测在降低其高相关死亡率方面起着关键作用。然而,这是一项具有挑战性的任务,因为没有具体而准确的测试或评分系统来进行早期预测。在本文中,我们提出了一种系统的脓毒症预测方法。我们还提出了一套新的特征来模拟临床数据的缺失。该方法的流程包括三个主要部分:特征提取、特征选择和分类。总共从临床数据中提取了407个特征。然后,使用基于XGboost的包装器特征选择算法选择五组不同的特征。所选择的特征是从有效和缺失的临床数据中提取的。然后,使用由五个XGboost模型组成的集成模型进行脓毒症预测。所提出的算法在2019年PhysioNet/Computing in Cardiology Challenge中正式排名第三,在未见过的测试数据集(我们的团队名称:Separatrix)上的总体效用得分为0.339。