{"title":"Modelling unbalanced catastrophic health expenditure data by using machine-learning methods","authors":"Songul Cinaroglu","doi":"10.1002/isaf.1483","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out-of-pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing catastrophic OOP health expenditure was 0.14. Balanced oversampling was performed, and 30 artificial data sets were generated with sizes of 5% and 98% of the original data size. The balanced oversampled data set provided accurate predictions, and random forest exhibited superior performance in identifying households facing catastrophic OOP health expenditure (area under the receiver operating characteristic curve, AUC = 0.8765; classification accuracy, CA = 0.7936; sensitivity = 0.7765; specificity = 0.8552; <span><i>F</i><sub>1</sub> = 0.7797</span>).</p>\n </div>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"27 4","pages":"168-181"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/isaf.1483","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
引用次数: 4
Abstract
This study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out-of-pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing catastrophic OOP health expenditure was 0.14. Balanced oversampling was performed, and 30 artificial data sets were generated with sizes of 5% and 98% of the original data size. The balanced oversampled data set provided accurate predictions, and random forest exhibited superior performance in identifying households facing catastrophic OOP health expenditure (area under the receiver operating characteristic curve, AUC = 0.8765; classification accuracy, CA = 0.7936; sensitivity = 0.7765; specificity = 0.8552; F1 = 0.7797).
期刊介绍:
Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.