机器学习算法和类分布中有序结果变量处理的相对预测性能

Journal of behavioral data science Pub Date : 2022-12-16 DOI:10.35566/jbds/v2n2/suzuki

Honoka Suzuki, Oscar Gonzalez

{"title":"机器学习算法和类分布中有序结果变量处理的相对预测性能","authors":"Honoka Suzuki, Oscar Gonzalez","doi":"10.35566/jbds/v2n2/suzuki","DOIUrl":null,"url":null,"abstract":"Abstract Ordinal variables, such as those measured on a five-point Likert scale, are ubiquitous in the behavioral sciences. However, machine learning methods for modeling ordinal outcome variables (i.e., ordinal classification) are not as well-developed or widely utilized, compared to classification and regression methods for modeling nominal and continuous outcomes, respectively. Consequently, ordinal outcomes are often treated “naively” as nominal or continuous outcomes in practice. This study builds upon previous literature that has examined the predictive performance of such naïve approaches of treating ordinal outcome variables compared to ordinal classification methods in machine learning. We conducted a Monte Carlo simulation study to systematically assess the relative predictive performance of an ordinal classification approach proposed by Frank and Hall (2001) against naïve approaches according to two key factors that have received limited attention in previous literature: (1) the machine learning algorithm being used to implement the approaches and (2) the class distribution of the ordinal outcome variable. The consideration of these important, practical factors expands our knowledge on the consequences of naïve treatments of ordinal outcomes, which are shown in this study to vary substantially according to these factors. Given the ubiquity of ordinal measures coupled with the growing presence of machine learning applications in the behavioral sciences, these are important considerations for building high-performing predictive models in the field.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Relative Predictive Performance of Treatments of Ordinal Outcome Variables across Machine Learning Algorithms and Class Distributions\",\"authors\":\"Honoka Suzuki, Oscar Gonzalez\",\"doi\":\"10.35566/jbds/v2n2/suzuki\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Ordinal variables, such as those measured on a five-point Likert scale, are ubiquitous in the behavioral sciences. However, machine learning methods for modeling ordinal outcome variables (i.e., ordinal classification) are not as well-developed or widely utilized, compared to classification and regression methods for modeling nominal and continuous outcomes, respectively. Consequently, ordinal outcomes are often treated “naively” as nominal or continuous outcomes in practice. This study builds upon previous literature that has examined the predictive performance of such naïve approaches of treating ordinal outcome variables compared to ordinal classification methods in machine learning. We conducted a Monte Carlo simulation study to systematically assess the relative predictive performance of an ordinal classification approach proposed by Frank and Hall (2001) against naïve approaches according to two key factors that have received limited attention in previous literature: (1) the machine learning algorithm being used to implement the approaches and (2) the class distribution of the ordinal outcome variable. The consideration of these important, practical factors expands our knowledge on the consequences of naïve treatments of ordinal outcomes, which are shown in this study to vary substantially according to these factors. Given the ubiquity of ordinal measures coupled with the growing presence of machine learning applications in the behavioral sciences, these are important considerations for building high-performing predictive models in the field.\",\"PeriodicalId\":93575,\"journal\":{\"name\":\"Journal of behavioral data science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of behavioral data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35566/jbds/v2n2/suzuki\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of behavioral data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35566/jbds/v2n2/suzuki","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在行为科学中，用李克特五点量表测量的有序变量无处不在。然而，与分别用于标称结果和连续结果建模的分类和回归方法相比，用于模拟有序结果变量(即有序分类)的机器学习方法并没有得到很好的发展或广泛的应用。因此，在实践中，顺序结果经常被“天真地”视为名义或连续结果。本研究建立在先前文献的基础上，这些文献研究了naïve处理有序结果变量的方法与机器学习中的有序分类方法的预测性能。我们进行了蒙特卡罗模拟研究，根据两个关键因素系统地评估Frank和Hall(2001)提出的有序分类方法与naïve方法的相对预测性能，这些因素在以前的文献中受到的关注有限:(1)用于实现方法的机器学习算法和(2)有序结果变量的类分布。考虑到这些重要的、实际的因素，扩展了我们对naïve治疗对正常结果的影响的认识，在本研究中，这些结果根据这些因素有很大的不同。考虑到有序度量的普遍存在以及机器学习在行为科学中的应用日益增长，这些都是在该领域构建高性能预测模型的重要考虑因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Relative Predictive Performance of Treatments of Ordinal Outcome Variables across Machine Learning Algorithms and Class Distributions

Abstract Ordinal variables, such as those measured on a five-point Likert scale, are ubiquitous in the behavioral sciences. However, machine learning methods for modeling ordinal outcome variables (i.e., ordinal classification) are not as well-developed or widely utilized, compared to classification and regression methods for modeling nominal and continuous outcomes, respectively. Consequently, ordinal outcomes are often treated “naively” as nominal or continuous outcomes in practice. This study builds upon previous literature that has examined the predictive performance of such naïve approaches of treating ordinal outcome variables compared to ordinal classification methods in machine learning. We conducted a Monte Carlo simulation study to systematically assess the relative predictive performance of an ordinal classification approach proposed by Frank and Hall (2001) against naïve approaches according to two key factors that have received limited attention in previous literature: (1) the machine learning algorithm being used to implement the approaches and (2) the class distribution of the ordinal outcome variable. The consideration of these important, practical factors expands our knowledge on the consequences of naïve treatments of ordinal outcomes, which are shown in this study to vary substantially according to these factors. Given the ubiquity of ordinal measures coupled with the growing presence of machine learning applications in the behavioral sciences, these are important considerations for building high-performing predictive models in the field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of behavioral data science

自引率

0.00%

发文量