Tree-Based Models Using Random Grid Search Optimization for Disease Classification Based on Environmental Factors: A Case Study on Asthma Hospitalizations
{"title":"Tree-Based Models Using Random Grid Search Optimization for Disease Classification Based on Environmental Factors: A Case Study on Asthma Hospitalizations","authors":"P. Nanthakumaran, L. Liyanage","doi":"10.1109/PRML52754.2021.9520720","DOIUrl":null,"url":null,"abstract":"An understanding on the exposure to environmental factors aggravating global disease burden can aid mitigating it. Generally, a class of generalized linear models and generalized additive models are used in predicting disease burden whereas, tree-based models are underused. The objective of this paper is to evaluate the performance of different tree-based models namely decision tree, random forest, gradient boosted tree and stochastic gradient boosted trees in predicting asthma attack based on short-term exposure to environmental factors and to examine the environmental factors triggering asthma attack. A sample of patients during 2013 - 2015 from different parts of Victoria was considered. The study area for the considered study period had reasonably good air quality and relatively humid environment. The tree-based models were tuned using random grid search optimization with bootstrapping to address over-fitting. The models considered performed well in predicting asthma attacks in terms of area under the receiver operating curve (ROC AUC) (>0.82). All the gradient boosted trees (accuracy = 76%; recall = 63%; F2-score = 64%) showed better overall prediction whereas decision tree (accuracy = 71%; recall = 75%; F2-score = 71%) outperformed other models in identifying the positive cases. Tree-based models revealed that O3 exposure consistently influence Asthma. Further, decision tree revealed O3 exposure < 13 ppb or with high O3 exposure >= 13 ppb, and with [SO2 exposure < 0.5 ppb and maximum wind speed > 5.4. km/hr.] influenced Asthma. In addition, relative humidity and exposure to CO were also detected in other tree-based models as relevant predictors triggering asthma attacks.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An understanding on the exposure to environmental factors aggravating global disease burden can aid mitigating it. Generally, a class of generalized linear models and generalized additive models are used in predicting disease burden whereas, tree-based models are underused. The objective of this paper is to evaluate the performance of different tree-based models namely decision tree, random forest, gradient boosted tree and stochastic gradient boosted trees in predicting asthma attack based on short-term exposure to environmental factors and to examine the environmental factors triggering asthma attack. A sample of patients during 2013 - 2015 from different parts of Victoria was considered. The study area for the considered study period had reasonably good air quality and relatively humid environment. The tree-based models were tuned using random grid search optimization with bootstrapping to address over-fitting. The models considered performed well in predicting asthma attacks in terms of area under the receiver operating curve (ROC AUC) (>0.82). All the gradient boosted trees (accuracy = 76%; recall = 63%; F2-score = 64%) showed better overall prediction whereas decision tree (accuracy = 71%; recall = 75%; F2-score = 71%) outperformed other models in identifying the positive cases. Tree-based models revealed that O3 exposure consistently influence Asthma. Further, decision tree revealed O3 exposure < 13 ppb or with high O3 exposure >= 13 ppb, and with [SO2 exposure < 0.5 ppb and maximum wind speed > 5.4. km/hr.] influenced Asthma. In addition, relative humidity and exposure to CO were also detected in other tree-based models as relevant predictors triggering asthma attacks.