Measuring the Reliability of a Gamified Stroop Task: Quantitative Experiment

IF 3.8 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES JMIR Serious Games Pub Date : 2024-04-10 DOI:10.2196/50315

Katelyn Wiley, Phaedra Berger, M. A. Friehs, R. Mandryk

{"title":"Measuring the Reliability of a Gamified Stroop Task: Quantitative Experiment","authors":"Katelyn Wiley, Phaedra Berger, M. A. Friehs, R. Mandryk","doi":"10.2196/50315","DOIUrl":null,"url":null,"abstract":"Background Few gamified cognitive tasks are subjected to rigorous examination of psychometric properties, despite their use in experimental and clinical settings. Even small manipulations to cognitive tasks require extensive research to understand their effects. Objective This study aims to investigate how game elements can affect the reliability of scores on a Stroop task. We specifically investigated performance consistency within and across sessions. Methods We created 2 versions of the Stroop task, with and without game elements, and then tested each task with participants at 2 time points. The gamified task used points and feedback as game elements. In this paper, we report on the reliability of the gamified Stroop task in terms of internal consistency and test-retest reliability, compared with the control task. We used a permutation approach to evaluate internal consistency. For test-retest reliability, we calculated the Pearson correlation and intraclass correlation coefficients between each time point. We also descriptively compared the reliability of scores on a trial-by-trial basis, considering the different trial types. Results At the first time point, the Stroop effect was reduced in the game condition, indicating an increase in performance. Participants in the game condition had faster reaction times (P=.005) and lower error rates (P=.04) than those in the basic task condition. Furthermore, the game condition led to higher measures of internal consistency at both time points for reaction times and error rates, which indicates a more consistent response pattern. For reaction time in the basic task condition, at time 1, rSpearman-Brown=0.78, 95% CI 0.64-0.89. At time 2, rSpearman-Brown=0.64, 95% CI 0.40-0.81. For reaction time, in the game condition, at time 1, rSpearman-Brown=0.83, 95% CI 0.71-0.91. At time 2, rSpearman-Brown=0.76, 95% CI 0.60-0.88. Similarly, for error rates in the basic task condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. For error rates in the game condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. Test-retest reliability analysis revealed a distinctive performance pattern depending on the trial type, which may be reflective of motivational differences between task versions. In short, especially in the incongruent trials where cognitive conflict occurs, performance in the game condition reaches peak consistency after 100 trials, whereas performance consistency drops after 50 trials for the basic version and only catches up to the game after 250 trials. Conclusions Even subtle gamification can impact task performance albeit not only in terms of a direct difference in performance between conditions. People playing the game reach peak performance sooner, and their performance is more consistent within and across sessions. We advocate for a closer examination of the impact of game elements on performance.","PeriodicalId":14795,"journal":{"name":"JMIR Serious Games","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Serious Games","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/50315","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background Few gamified cognitive tasks are subjected to rigorous examination of psychometric properties, despite their use in experimental and clinical settings. Even small manipulations to cognitive tasks require extensive research to understand their effects. Objective This study aims to investigate how game elements can affect the reliability of scores on a Stroop task. We specifically investigated performance consistency within and across sessions. Methods We created 2 versions of the Stroop task, with and without game elements, and then tested each task with participants at 2 time points. The gamified task used points and feedback as game elements. In this paper, we report on the reliability of the gamified Stroop task in terms of internal consistency and test-retest reliability, compared with the control task. We used a permutation approach to evaluate internal consistency. For test-retest reliability, we calculated the Pearson correlation and intraclass correlation coefficients between each time point. We also descriptively compared the reliability of scores on a trial-by-trial basis, considering the different trial types. Results At the first time point, the Stroop effect was reduced in the game condition, indicating an increase in performance. Participants in the game condition had faster reaction times (P=.005) and lower error rates (P=.04) than those in the basic task condition. Furthermore, the game condition led to higher measures of internal consistency at both time points for reaction times and error rates, which indicates a more consistent response pattern. For reaction time in the basic task condition, at time 1, rSpearman-Brown=0.78, 95% CI 0.64-0.89. At time 2, rSpearman-Brown=0.64, 95% CI 0.40-0.81. For reaction time, in the game condition, at time 1, rSpearman-Brown=0.83, 95% CI 0.71-0.91. At time 2, rSpearman-Brown=0.76, 95% CI 0.60-0.88. Similarly, for error rates in the basic task condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. For error rates in the game condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. Test-retest reliability analysis revealed a distinctive performance pattern depending on the trial type, which may be reflective of motivational differences between task versions. In short, especially in the incongruent trials where cognitive conflict occurs, performance in the game condition reaches peak consistency after 100 trials, whereas performance consistency drops after 50 trials for the basic version and only catches up to the game after 250 trials. Conclusions Even subtle gamification can impact task performance albeit not only in terms of a direct difference in performance between conditions. People playing the game reach peak performance sooner, and their performance is more consistent within and across sessions. We advocate for a closer examination of the impact of game elements on performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

测量游戏化 Stroop 任务的可靠性：定量实验

背景尽管游戏化认知任务被广泛应用于实验和临床环境中，但很少有游戏化认知任务经过严格的心理测量学检验。即使是对认知任务的微小操作，也需要进行广泛的研究才能了解其效果。目的本研究旨在探讨游戏元素如何影响 Stroop 任务得分的可靠性。我们特别研究了在同一环节和不同环节中的表现一致性。方法我们制作了两个版本的 Stroop 任务，分别包含游戏元素和不包含游戏元素，然后在两个时间点分别对参与者进行测试。游戏化任务使用积分和反馈作为游戏元素。在本文中，我们报告了游戏化 Stroop 任务与对照任务相比，在内部一致性和测试-再测可靠性方面的可靠性。我们采用置换法评估内部一致性。在重测可靠性方面，我们计算了每个时间点之间的皮尔逊相关系数和类内相关系数。考虑到试验类型不同，我们还对每次试验的得分可靠性进行了描述性比较。结果在第一个时间点，游戏条件下的 Stroop 效应减弱，表明成绩有所提高。与基本任务条件下的参与者相比，游戏条件下的参与者反应时间更快（P=.005），错误率更低（P=.04）。此外，游戏条件下的反应时间和错误率在两个时间点上的内部一致性都更高，这表明反应模式更加一致。就基本任务条件下的反应时间而言，在时间 1，rSpearman-Brown=0.78，95% CI 0.64-0.89。在时间 2，rSpearman-Brown=0.64，95% CI 0.40-0.81。在反应时间方面，在游戏条件下，时间 1 时，rSpearman-Brown=0.83，95% CI 0.71-0.91。在时间 2，rSpearman-Brown=0.76，95% CI 0.60-0.88。同样，对于基本任务条件下的错误率，在时间 1，rSpearman-Brown=0.76，95% CI 0.62-0.87。在时间 2，rSpearman-Brown=0.74，95% CI 0.58-0.86。对于游戏条件下的错误率，在时间 1，rSpearman-Brown=0.76，95% CI 0.62-0.87。第 2 次时，rSpearman-Brown=0.74，95% CI 0.58-0.86。重测信度分析表明，试验类型不同，表现模式也不同，这可能反映了不同任务版本之间的动机差异。简而言之，特别是在发生认知冲突的不一致试验中，游戏条件下的成绩在 100 次试验后达到最高一致性，而基本版本的成绩一致性在 50 次试验后下降，在 250 次试验后才赶上游戏。结论即使是微妙的游戏化也会影响任务表现，尽管不仅仅是在不同条件下表现的直接差异。玩游戏的人可以更快地达到巅峰状态，而且他们在不同阶段的表现也更加一致。我们主张对游戏元素对成绩的影响进行更深入的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JMIR Serious Games Medicine-Rehabilitation

CiteScore

7.30

自引率

10.00%

发文量

审稿时长

12 weeks

期刊介绍： JMIR Serious Games (JSG, ISSN 2291-9279) is a sister journal of the Journal of Medical Internet Research (JMIR), one of the most cited journals in health informatics (Impact Factor 2016: 5.175). JSG has a projected impact factor (2016) of 3.32. JSG is a multidisciplinary journal devoted to computer/web/mobile applications that incorporate elements of gaming to solve serious problems such as health education/promotion, teaching and education, or social change.The journal also considers commentary and research in the fields of video games violence and video games addiction.