Investigating lay evaluations of models

IF 2.3 3区心理学 Q2 PSYCHOLOGY, EXPERIMENTAL Thinking & Reasoning Pub Date : 2021-11-09 DOI:10.1080/13546783.2021.1999327

P. Kane, S. Broomell

{"title":"Investigating lay evaluations of models","authors":"P. Kane, S. Broomell","doi":"10.1080/13546783.2021.1999327","DOIUrl":null,"url":null,"abstract":"Abstract Many important decisions depend on unknown states of the world. Society is increasingly relying on statistical predictive models to make decisions in these cases. While predictive models are useful, previous research has documented that (a) individual decision makers distrust models and (b) people’s predictions are often worse than those of models. These findings indicate a lack of awareness of how to evaluate predictions generally. This includes concepts like the loss function used to aggregate errors or whether error is training error or generalisation error. To address this gap, we present three studies testing how lay people visually evaluate the predictive accuracy of models. We found that (a) participant judgements of prediction errors were more similar to absolute error than squared error (Study 1), (b) we did not detect a difference in participant reactions to training error versus generalisation error (Study 2), and (c) participants rated complex models as more accurate when comparing two models, but rated simple models as more accurate when shown single models in isolation (Study 3). When communicating about models, researchers should be aware that the public’s visual evaluation of models may disagree with their method of measuring errors and that many may fail to recognise overfitting.","PeriodicalId":47270,"journal":{"name":"Thinking & Reasoning","volume":"67 1","pages":"569 - 604"},"PeriodicalIF":2.3000,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Thinking & Reasoning","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1080/13546783.2021.1999327","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Many important decisions depend on unknown states of the world. Society is increasingly relying on statistical predictive models to make decisions in these cases. While predictive models are useful, previous research has documented that (a) individual decision makers distrust models and (b) people’s predictions are often worse than those of models. These findings indicate a lack of awareness of how to evaluate predictions generally. This includes concepts like the loss function used to aggregate errors or whether error is training error or generalisation error. To address this gap, we present three studies testing how lay people visually evaluate the predictive accuracy of models. We found that (a) participant judgements of prediction errors were more similar to absolute error than squared error (Study 1), (b) we did not detect a difference in participant reactions to training error versus generalisation error (Study 2), and (c) participants rated complex models as more accurate when comparing two models, but rated simple models as more accurate when shown single models in isolation (Study 3). When communicating about models, researchers should be aware that the public’s visual evaluation of models may disagree with their method of measuring errors and that many may fail to recognise overfitting.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

调查对模型的评价

许多重要的决策依赖于未知的世界状态。社会越来越依赖统计预测模型在这些情况下做出决定。虽然预测模型是有用的，但之前的研究已经证明:(a)个体决策者不信任模型，(b)人们的预测通常比模型更糟糕。这些发现表明，人们缺乏对如何普遍评估预测的认识。这包括诸如用于汇总错误的损失函数之类的概念，或者错误是训练错误还是泛化错误。为了解决这一差距，我们提出了三个研究，测试外行人如何直观地评估模型的预测准确性。我们发现(a)参与者对预测误差的判断更类似于绝对误差，而不是平方误差(研究1)，(b)我们没有发现参与者对训练误差和泛化误差的反应有差异(研究2)，(c)参与者在比较两个模型时认为复杂模型更准确，但在单独展示单个模型时认为简单模型更准确(研究3)。研究人员应该意识到，公众对模型的视觉评价可能与他们测量误差的方法不一致，而且许多人可能无法识别过拟合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Thinking & Reasoning PSYCHOLOGY, EXPERIMENTAL-

CiteScore

6.50

自引率

11.50%

发文量