Pub Date : 2024-10-01Epub Date: 2024-06-24DOI: 10.3758/s13428-024-02445-w
Sarah T O'Brien, Nerisa Dozo, Jordan D X Hinton, Ella K Moeck, Rio Susanto, Glenn T Jayaputera, Richard O Sinnott, Duy Vu, Mario Alvarez-Jimenez, John Gleeson, Peter Koval
Traditionally, behavioral, social, and health science researchers have relied on global/retrospective survey methods administered cross-sectionally (i.e., on a single occasion) or longitudinally (i.e., on several occasions separated by weeks, months, or years). More recently, social and health scientists have added daily life survey methods (also known as intensive longitudinal methods or ambulatory assessment) to their toolkit. These methods (e.g., daily diaries, experience sampling, ecological momentary assessment) involve dense repeated assessments in everyday settings. To facilitate research using daily life survey methods, we present SEMA3 ( http://www.SEMA3.com ), a platform for designing and administering intensive longitudinal daily life surveys via Android and iOS smartphones. SEMA3 fills an important gap by providing researchers with a free, intuitive, and flexible platform with basic and advanced functionality. In this article, we describe SEMA3's development history and system architecture, provide an overview of how to design a study using SEMA3 and outline its key features, and discuss the platform's limitations and propose directions for future development of SEMA3.
{"title":"SEMA<sup>3</sup>: A free smartphone platform for daily life surveys.","authors":"Sarah T O'Brien, Nerisa Dozo, Jordan D X Hinton, Ella K Moeck, Rio Susanto, Glenn T Jayaputera, Richard O Sinnott, Duy Vu, Mario Alvarez-Jimenez, John Gleeson, Peter Koval","doi":"10.3758/s13428-024-02445-w","DOIUrl":"10.3758/s13428-024-02445-w","url":null,"abstract":"<p><p>Traditionally, behavioral, social, and health science researchers have relied on global/retrospective survey methods administered cross-sectionally (i.e., on a single occasion) or longitudinally (i.e., on several occasions separated by weeks, months, or years). More recently, social and health scientists have added daily life survey methods (also known as intensive longitudinal methods or ambulatory assessment) to their toolkit. These methods (e.g., daily diaries, experience sampling, ecological momentary assessment) involve dense repeated assessments in everyday settings. To facilitate research using daily life survey methods, we present SEMA<sup>3</sup> ( http://www.SEMA3.com ), a platform for designing and administering intensive longitudinal daily life surveys via Android and iOS smartphones. SEMA<sup>3</sup> fills an important gap by providing researchers with a free, intuitive, and flexible platform with basic and advanced functionality. In this article, we describe SEMA<sup>3</sup>'s development history and system architecture, provide an overview of how to design a study using SEMA<sup>3</sup> and outline its key features, and discuss the platform's limitations and propose directions for future development of SEMA<sup>3</sup>.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141445376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-12DOI: 10.3758/s13428-024-02454-9
Younes Strittmatter, Markus W H Spitzer, Nadja Ging-Jehli, Sebastian Musslick
Online experiments are increasingly gaining traction in the behavioral sciences. Despite this, behavioral researchers have largely continued to use keyboards as the primary input devices for such online studies, overlooking the ubiquity of touchscreens in everyday use. This paper presents an open-source touchscreen extension for jsPsych, a JavaScript framework designed for conducting online experiments. We additionally evaluated the touchscreen extension assessing whether typical behavioral findings from two distinct perceptual decision-making tasks - the random-dot kinematogram and the Stroop task - can similarly be observed when administered via touchscreen devices compared to keyboard devices. Our findings indicate similar performance metrics for each paradigm between the touchscreen and keyboard versions of the experiments. Specifically, we observe similar psychometric curves in the random-dot kinematogram across the touchscreen and keyboard versions. Similarly, in the Stroop task, we detect significant task, congruency, and sequential congruency effects in both experiment versions. We conclude that our open-source touchscreen extension serves as a promising tool for data collection in online behavioral experiments on forced-choice tasks.
{"title":"A jsPsych touchscreen extension for behavioral research on touch-enabled interfaces.","authors":"Younes Strittmatter, Markus W H Spitzer, Nadja Ging-Jehli, Sebastian Musslick","doi":"10.3758/s13428-024-02454-9","DOIUrl":"10.3758/s13428-024-02454-9","url":null,"abstract":"<p><p>Online experiments are increasingly gaining traction in the behavioral sciences. Despite this, behavioral researchers have largely continued to use keyboards as the primary input devices for such online studies, overlooking the ubiquity of touchscreens in everyday use. This paper presents an open-source touchscreen extension for jsPsych, a JavaScript framework designed for conducting online experiments. We additionally evaluated the touchscreen extension assessing whether typical behavioral findings from two distinct perceptual decision-making tasks - the random-dot kinematogram and the Stroop task - can similarly be observed when administered via touchscreen devices compared to keyboard devices. Our findings indicate similar performance metrics for each paradigm between the touchscreen and keyboard versions of the experiments. Specifically, we observe similar psychometric curves in the random-dot kinematogram across the touchscreen and keyboard versions. Similarly, in the Stroop task, we detect significant task, congruency, and sequential congruency effects in both experiment versions. We conclude that our open-source touchscreen extension serves as a promising tool for data collection in online behavioral experiments on forced-choice tasks.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-12DOI: 10.3758/s13428-024-02451-y
Diana J N Armbruster-Genç, Rebecca A Rammensee, Stefanie M Jungmann, Philine Drake, Michèle Wessa, Ulrike Basten
Interpretation biases in the processing of ambiguous affective information are assumed to play an important role in the onset and maintenance of emotional disorders. Reports of low reliability for experimental measures of cognitive biases have called into question previous findings on the association of these measures with markers of mental health and demonstrated the need to systematically evaluate measurement reliability for measures of cognitive biases. We evaluated reliability and correlations with self-report measures of mental health for interpretation bias scores derived from the Ambiguous Cue Task (ACT), an experimental paradigm for the assessment of approach-avoidance behavior towards ambiguous affective stimuli. For a non-clinical sample, the measurement of an interpretation bias with the ACT showed high internal consistency (rSB = .91 - .96, N = 354) and acceptable 2-week test-retest correlations (rPearson = .61 - .65, n = 109). Correlations between the ACT interpretation bias scores and mental health-related self-report measures of personality and well-being were generally small (r ≤ |.11|) and statistically not significant when correcting for multiple comparisons. These findings suggest that in non-clinical populations, individual differences in the interpretation of ambiguous affective information as assessed with the ACT do not show a clear association with self-report markers of mental health. However, in allowing for a highly reliable measurement of interpretation bias, the ACT provides a valuable tool for studies considering potentially small effect sizes in non-clinical populations by studying bigger samples as well as for work on clinical populations, for which potentially greater effects can be expected.
{"title":"The Ambiguous Cue Task: Measurement reliability of an experimental paradigm for the assessment of interpretation bias and associations with mental health.","authors":"Diana J N Armbruster-Genç, Rebecca A Rammensee, Stefanie M Jungmann, Philine Drake, Michèle Wessa, Ulrike Basten","doi":"10.3758/s13428-024-02451-y","DOIUrl":"10.3758/s13428-024-02451-y","url":null,"abstract":"<p><p>Interpretation biases in the processing of ambiguous affective information are assumed to play an important role in the onset and maintenance of emotional disorders. Reports of low reliability for experimental measures of cognitive biases have called into question previous findings on the association of these measures with markers of mental health and demonstrated the need to systematically evaluate measurement reliability for measures of cognitive biases. We evaluated reliability and correlations with self-report measures of mental health for interpretation bias scores derived from the Ambiguous Cue Task (ACT), an experimental paradigm for the assessment of approach-avoidance behavior towards ambiguous affective stimuli. For a non-clinical sample, the measurement of an interpretation bias with the ACT showed high internal consistency (r<sub>SB</sub> = .91 - .96, N = 354) and acceptable 2-week test-retest correlations (r<sub>Pearson</sub> = .61 - .65, n = 109). Correlations between the ACT interpretation bias scores and mental health-related self-report measures of personality and well-being were generally small (r ≤ |.11|) and statistically not significant when correcting for multiple comparisons. These findings suggest that in non-clinical populations, individual differences in the interpretation of ambiguous affective information as assessed with the ACT do not show a clear association with self-report markers of mental health. However, in allowing for a highly reliable measurement of interpretation bias, the ACT provides a valuable tool for studies considering potentially small effect sizes in non-clinical populations by studying bigger samples as well as for work on clinical populations, for which potentially greater effects can be expected.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362423/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-02-22DOI: 10.3758/s13428-024-02364-w
Amanda J Fairchild, Yunhang Yin, Amanda N Baraldi, Oscar L Olvera Astivia, Dexin Shi
Monte Carlo simulation studies are among the primary scientific outputs contributed by methodologists, guiding application of various statistical tools in practice. Although methodological researchers routinely extend simulation study findings through follow-up work, few studies are ever replicated. Simulation studies are susceptible to factors that can contribute to replicability failures, however. This paper sought to conduct a meta-scientific study by replicating one highly cited simulation study (Curran et al., Psychological Methods, 1, 16-29, 1996) that investigated the robustness of normal theory maximum likelihood (ML)-based chi-square fit statistics under multivariate nonnormality. We further examined the generalizability of the original study findings across different nonnormal data generation algorithms. Our replication results were generally consistent with original findings, but we discerned several differences. Our generalizability results were more mixed. Only two results observed under the original data generation algorithm held completely across other algorithms examined. One of the most striking findings we observed was that results associated with the independent generator (IG) data generation algorithm vastly differed from other procedures examined and suggested that ML was robust to nonnormality for the particular factor model used in the simulation. Findings point to the reality that extant methodological recommendations may not be universally valid in contexts where multiple data generation algorithms exist for a given data characteristic. We recommend that researchers consider multiple approaches to generating a specific data or model characteristic (when more than one is available) to optimize the generalizability of simulation results.
蒙特卡罗模拟研究是方法论研究人员的主要科学成果之一,指导着各种统计工具在实践中的应用。尽管方法论研究人员经常通过后续工作扩展模拟研究结果,但很少有研究是重复的。然而,模拟研究很容易受到一些因素的影响,导致复制失败。本文试图通过复制一项被高度引用的模拟研究(Curran 等人,《心理学方法》,1, 16-29, 1996 年)来开展一项元科学研究,该研究调查了基于正态理论最大似然法 (ML) 的卡方拟合统计量在多元非正态性下的稳健性。我们进一步检验了原始研究结果在不同的非正态数据生成算法中的通用性。我们的复制结果与原始研究结果基本一致,但也发现了一些不同之处。我们的可推广性结果喜忧参半。只有两个在原始数据生成算法下观察到的结果在其他算法下完全成立。我们观察到的最惊人的发现之一是,与独立生成器(IG)数据生成算法相关的结果与所研究的其他程序大不相同,这表明 ML 对模拟中使用的特定因子模型的非正态性具有稳健性。研究结果表明,在特定数据特征存在多种数据生成算法的情况下,现有的方法建议可能并不普遍有效。我们建议研究人员考虑采用多种方法生成特定的数据或模型特征(当有多种方法可用时),以优化模拟结果的可推广性。
{"title":"Many nonnormalities, one simulation: Do different data generation algorithms affect study results?","authors":"Amanda J Fairchild, Yunhang Yin, Amanda N Baraldi, Oscar L Olvera Astivia, Dexin Shi","doi":"10.3758/s13428-024-02364-w","DOIUrl":"10.3758/s13428-024-02364-w","url":null,"abstract":"<p><p>Monte Carlo simulation studies are among the primary scientific outputs contributed by methodologists, guiding application of various statistical tools in practice. Although methodological researchers routinely extend simulation study findings through follow-up work, few studies are ever replicated. Simulation studies are susceptible to factors that can contribute to replicability failures, however. This paper sought to conduct a meta-scientific study by replicating one highly cited simulation study (Curran et al., Psychological Methods, 1, 16-29, 1996) that investigated the robustness of normal theory maximum likelihood (ML)-based chi-square fit statistics under multivariate nonnormality. We further examined the generalizability of the original study findings across different nonnormal data generation algorithms. Our replication results were generally consistent with original findings, but we discerned several differences. Our generalizability results were more mixed. Only two results observed under the original data generation algorithm held completely across other algorithms examined. One of the most striking findings we observed was that results associated with the independent generator (IG) data generation algorithm vastly differed from other procedures examined and suggested that ML was robust to nonnormality for the particular factor model used in the simulation. Findings point to the reality that extant methodological recommendations may not be universally valid in contexts where multiple data generation algorithms exist for a given data characteristic. We recommend that researchers consider multiple approaches to generating a specific data or model characteristic (when more than one is available) to optimize the generalizability of simulation results.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139929818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-02-28DOI: 10.3758/s13428-024-02367-7
Po-Yi Chen, Fan Jia, Wei Wu, Min-Heng Wang, Tzi-Yang Chao
Multi-informant studies are popular in social and behavioral science. However, their data analyses are challenging because data from different informants carry both shared and unique information and are often incomplete. Using Monte Carlo Simulation, the current study compares three approaches that can be used to analyze incomplete multi-informant data when there is a distinction between reference and nonreference informants. These approaches include a two-method measurement model for planned missing data (2MM-PMD), treating nonreference informants' reports as auxiliary variables with the full-information maximum likelihood method or multiple imputation, and listwise deletion. The result suggests that 2MM-PMD, when correctly specified and data are missing at random, has the best overall performance among the examined approaches regarding point estimates, type I error rates, and statistical power. In addition, it is also more robust to data that are not missing at random.
多信息提供者研究在社会和行为科学领域很受欢迎。然而,由于来自不同信息提供者的数据既有共享信息,也有独特信息,而且往往不完整,因此其数据分析具有挑战性。本研究使用蒙特卡洛模拟(Monte Carlo Simulation),比较了在参考线人和非参考线人之间存在区别时,可用于分析不完整的多线人数据的三种方法。这些方法包括计划缺失数据的双方法测量模型(2MM-PMD)、用全信息最大似然法或多重估算法将非参考信息提供者的报告视为辅助变量,以及列表删除法。结果表明,2MM-PMD 在指定正确且数据随机缺失的情况下,在点估计、I 型误差率和统计能力方面的总体表现是所研究方法中最好的。此外,它对非随机缺失数据也更稳健。
{"title":"Dealing with missing data in multi-informant studies: A comparison of approaches.","authors":"Po-Yi Chen, Fan Jia, Wei Wu, Min-Heng Wang, Tzi-Yang Chao","doi":"10.3758/s13428-024-02367-7","DOIUrl":"10.3758/s13428-024-02367-7","url":null,"abstract":"<p><p>Multi-informant studies are popular in social and behavioral science. However, their data analyses are challenging because data from different informants carry both shared and unique information and are often incomplete. Using Monte Carlo Simulation, the current study compares three approaches that can be used to analyze incomplete multi-informant data when there is a distinction between reference and nonreference informants. These approaches include a two-method measurement model for planned missing data (2MM-PMD), treating nonreference informants' reports as auxiliary variables with the full-information maximum likelihood method or multiple imputation, and listwise deletion. The result suggests that 2MM-PMD, when correctly specified and data are missing at random, has the best overall performance among the examined approaches regarding point estimates, type I error rates, and statistical power. In addition, it is also more robust to data that are not missing at random.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139989187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-03-14DOI: 10.3758/s13428-024-02382-8
Biao Chen, Junjie Bu, Xu Jiang, Ping Wang, Yan Xie, Zhuoyun Wang, Zhen Liang, Shengzhao Zhang
Response latency is a critical parameter in studying human behavior, representing the time interval between the onset of stimulus and the response. However, different time between devices can introduce errors. Serial port synchronization signal can mitigate this, but limited information is available regarding their accuracy. Optical signals offer another option, but the difference in the positioning of optical signals and visual stimuli can introduce errors, and there have been limited reports of error reduction. This study aims to investigate methods for reducing the time errors. We used the Psychtoolbox to generate visual stimuli and serial port synchronization signals to explore their accuracy. Subsequently, we proposed a calibration formula to minimize the error between optical signals and visual stimuli. The findings are as follows: Firstly, the serial port synchronization signal presenting precedes visual stimulation, with a smaller lead time observed at higher refresh rates. Secondly, the lead time increases as the stimulus position deviates to the right and downwards. In Linux and IOPort(), serial port synchronization signals exhibited greater accuracy. Considering the poor accuracy and the multiple influencing factors associated with serial port synchronization signals, it is recommended to use optical signals to complete time synchronization. The results indicate that under the darkening process, the time error is - 0.23 ~ 0.08 ms (mean). This calibration formula can help measure the response latency accurately. This study provides valuable insights for optimizing experimental design and improving the accuracy of response latency. Although it only involves visual stimuli, the methods and results of this study can still serve as a reference.
{"title":"The discrepancy in timing between synchronous signals and visual stimulation should not be underestimated.","authors":"Biao Chen, Junjie Bu, Xu Jiang, Ping Wang, Yan Xie, Zhuoyun Wang, Zhen Liang, Shengzhao Zhang","doi":"10.3758/s13428-024-02382-8","DOIUrl":"10.3758/s13428-024-02382-8","url":null,"abstract":"<p><p>Response latency is a critical parameter in studying human behavior, representing the time interval between the onset of stimulus and the response. However, different time between devices can introduce errors. Serial port synchronization signal can mitigate this, but limited information is available regarding their accuracy. Optical signals offer another option, but the difference in the positioning of optical signals and visual stimuli can introduce errors, and there have been limited reports of error reduction. This study aims to investigate methods for reducing the time errors. We used the Psychtoolbox to generate visual stimuli and serial port synchronization signals to explore their accuracy. Subsequently, we proposed a calibration formula to minimize the error between optical signals and visual stimuli. The findings are as follows: Firstly, the serial port synchronization signal presenting precedes visual stimulation, with a smaller lead time observed at higher refresh rates. Secondly, the lead time increases as the stimulus position deviates to the right and downwards. In Linux and IOPort(), serial port synchronization signals exhibited greater accuracy. Considering the poor accuracy and the multiple influencing factors associated with serial port synchronization signals, it is recommended to use optical signals to complete time synchronization. The results indicate that under the darkening process, the time error is - 0.23 ~ 0.08 ms (mean). This calibration formula can help measure the response latency accurately. This study provides valuable insights for optimizing experimental design and improving the accuracy of response latency. Although it only involves visual stimuli, the methods and results of this study can still serve as a reference.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140130648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-05-08DOI: 10.3758/s13428-024-02413-4
Jordan Revol, Ginette Lafit, Eva Ceulemans
Researchers increasingly study short-term dynamic processes that evolve within single individuals using N = 1 studies. The processes of interest are typically captured by fitting a VAR(1) model to the resulting data. A crucial question is how to perform sample-size planning and thus decide on the number of measurement occasions that are needed. The most popular approach is to perform a power analysis, which focuses on detecting the effects of interest. We argue that performing sample-size planning based on out-of-sample predictive accuracy yields additional important information regarding potential overfitting of the model. Predictive accuracy quantifies how well the estimated VAR(1) model will allow predicting unseen data from the same individual. We propose a new simulation-based sample-size planning method called predictive accuracy analysis (PAA), and an associated Shiny app. This approach makes use of a novel predictive accuracy metric that accounts for the multivariate nature of the prediction problem. We showcase how the values of the different VAR(1) model parameters impact power and predictive accuracy-based sample-size recommendations using simulated data sets and real data applications. The range of recommended sample sizes is smaller for predictive accuracy analysis than for power analysis.
{"title":"A new sample-size planning approach for person-specific VAR(1) studies: Predictive accuracy analysis.","authors":"Jordan Revol, Ginette Lafit, Eva Ceulemans","doi":"10.3758/s13428-024-02413-4","DOIUrl":"10.3758/s13428-024-02413-4","url":null,"abstract":"<p><p>Researchers increasingly study short-term dynamic processes that evolve within single individuals using N = 1 studies. The processes of interest are typically captured by fitting a VAR(1) model to the resulting data. A crucial question is how to perform sample-size planning and thus decide on the number of measurement occasions that are needed. The most popular approach is to perform a power analysis, which focuses on detecting the effects of interest. We argue that performing sample-size planning based on out-of-sample predictive accuracy yields additional important information regarding potential overfitting of the model. Predictive accuracy quantifies how well the estimated VAR(1) model will allow predicting unseen data from the same individual. We propose a new simulation-based sample-size planning method called predictive accuracy analysis (PAA), and an associated Shiny app. This approach makes use of a novel predictive accuracy metric that accounts for the multivariate nature of the prediction problem. We showcase how the values of the different VAR(1) model parameters impact power and predictive accuracy-based sample-size recommendations using simulated data sets and real data applications. The range of recommended sample sizes is smaller for predictive accuracy analysis than for power analysis.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140875721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-06-06DOI: 10.3758/s13428-024-02427-y
Alisa M Loosen, Tricia X F Seow, Tobias U Hauser
Rapid adaptation to sudden changes in the environment is a hallmark of flexible human behaviour. Many computational, neuroimaging, and even clinical investigations studying this cognitive process have relied on a behavioural paradigm known as the predictive-inference task. However, the psychometric quality of this task has never been examined, leaving unanswered whether it is indeed suited to capture behavioural variation on a within- and between-subject level. Using a large-scale test-retest design (T1: N = 330; T2: N = 219), we assessed the internal (internal consistency) and temporal (test-retest reliability) stability of the task's most used measures. We show that the main measures capturing flexible belief and behavioural adaptation yield good internal consistency and overall satisfying test-retest reliability. However, some more complex markers of flexible behaviour show lower psychometric quality. Our findings have implications for the large corpus of previous studies using this task and provide clear guidance as to which measures should and should not be used in future studies.
{"title":"Consistency within change: Evaluating the psychometric properties of a widely used predictive-inference task.","authors":"Alisa M Loosen, Tricia X F Seow, Tobias U Hauser","doi":"10.3758/s13428-024-02427-y","DOIUrl":"10.3758/s13428-024-02427-y","url":null,"abstract":"<p><p>Rapid adaptation to sudden changes in the environment is a hallmark of flexible human behaviour. Many computational, neuroimaging, and even clinical investigations studying this cognitive process have relied on a behavioural paradigm known as the predictive-inference task. However, the psychometric quality of this task has never been examined, leaving unanswered whether it is indeed suited to capture behavioural variation on a within- and between-subject level. Using a large-scale test-retest design (T1: N = 330; T2: N = 219), we assessed the internal (internal consistency) and temporal (test-retest reliability) stability of the task's most used measures. We show that the main measures capturing flexible belief and behavioural adaptation yield good internal consistency and overall satisfying test-retest reliability. However, some more complex markers of flexible behaviour show lower psychometric quality. Our findings have implications for the large corpus of previous studies using this task and provide clear guidance as to which measures should and should not be used in future studies.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2023-08-25DOI: 10.3758/s13428-023-02213-2
Yury Shevchenko, Ulf-Dietrich Reips
This manuscript presents a novel geofencing method in behavioral research. Geofencing, built upon geolocation technology, constitutes virtual fences around specific locations. Every time a participant crosses the virtual border around the geofenced area, an event can be triggered on a smartphone, e.g., the participant may be asked to complete a survey. The geofencing method can alleviate the problems of constant location tracking, such as recording sensitive geolocation information and battery drain. In scenarios where locations for geofencing are determined by participants (e.g., home, workplace), no location data need to be transferred to the researcher, so this method can ensure privacy and anonymity. Given the widespread use of smartphones and mobile Internet, geofencing has become a feasible tool in studying human behavior and cognition outside of the laboratory. The method can help advance theoretical and applied psychological science at a new frontier of context-aware research. At the same time, there is a lack of guidance on how and when geofencing can be applied in research. This manuscript aims to fill the gap and ease the adoption of the geofencing method. We describe the current challenges and implementations in geofencing and present three empirical studies in which we evaluated the geofencing method using the Samply application, a tool for mobile experience sampling research. The studies show that sensitivity and precision of geofencing were affected by the type of event, location radius, environment, operating system, and user behavior. Potential implications and recommendations for behavioral research are discussed.
{"title":"Geofencing in location-based behavioral research: Methodology, challenges, and implementation.","authors":"Yury Shevchenko, Ulf-Dietrich Reips","doi":"10.3758/s13428-023-02213-2","DOIUrl":"10.3758/s13428-023-02213-2","url":null,"abstract":"<p><p>This manuscript presents a novel geofencing method in behavioral research. Geofencing, built upon geolocation technology, constitutes virtual fences around specific locations. Every time a participant crosses the virtual border around the geofenced area, an event can be triggered on a smartphone, e.g., the participant may be asked to complete a survey. The geofencing method can alleviate the problems of constant location tracking, such as recording sensitive geolocation information and battery drain. In scenarios where locations for geofencing are determined by participants (e.g., home, workplace), no location data need to be transferred to the researcher, so this method can ensure privacy and anonymity. Given the widespread use of smartphones and mobile Internet, geofencing has become a feasible tool in studying human behavior and cognition outside of the laboratory. The method can help advance theoretical and applied psychological science at a new frontier of context-aware research. At the same time, there is a lack of guidance on how and when geofencing can be applied in research. This manuscript aims to fill the gap and ease the adoption of the geofencing method. We describe the current challenges and implementations in geofencing and present three empirical studies in which we evaluated the geofencing method using the Samply application, a tool for mobile experience sampling research. The studies show that sensitivity and precision of geofencing were affected by the type of event, location radius, environment, operating system, and user behavior. Potential implications and recommendations for behavioral research are discussed.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10428016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-08DOI: 10.3758/s13428-024-02452-x
Cameron S Kay
To detect careless and insufficient effort (C/IE) survey responders, researchers can use infrequency items - items that almost no one agrees with (e.g., "When a friend greets me, I generally try to say nothing back") - and frequency items - items that almost everyone agrees with (e.g., "I try to listen when someone I care about is telling me something"). Here, we provide initial validation for two sets of these items: the 14-item Invalid Responding Inventory for Statements (IDRIS) and the 6-item Invalid Responding Inventory for Adjectives (IDRIA). Across six studies (N1 = 536; N2 = 701; N3 = 500; N4 = 499; N5 = 629, N6 = 562), we found consistent evidence that the IDRIS is capable of detecting C/IE responding among statement-based scales (e.g., the HEXACO-PI-R) and the IDRIA is capable of detecting C/IE responding among both adjective-based scales (e.g., the Lex-20) and adjective-derived scales (e.g., the BFI-2). These findings were robust across different analytic approaches (e.g., Pearson correlations; Spearman rank-order correlations), different indices of C/IE responding (e.g., person-total correlations; semantic synonyms; horizontal cursor variability), and different sample types (e.g., US undergraduate students; Nigerian survey panel participants). Taken together, these results provide promising evidence for the utility of the IDRIS and IDRIA in detecting C/IE responding.
{"title":"Validating the IDRIS and IDRIA: Two infrequency/frequency scales for detecting careless and insufficient effort survey responders.","authors":"Cameron S Kay","doi":"10.3758/s13428-024-02452-x","DOIUrl":"10.3758/s13428-024-02452-x","url":null,"abstract":"<p><p>To detect careless and insufficient effort (C/IE) survey responders, researchers can use infrequency items - items that almost no one agrees with (e.g., \"When a friend greets me, I generally try to say nothing back\") - and frequency items - items that almost everyone agrees with (e.g., \"I try to listen when someone I care about is telling me something\"). Here, we provide initial validation for two sets of these items: the 14-item Invalid Responding Inventory for Statements (IDRIS) and the 6-item Invalid Responding Inventory for Adjectives (IDRIA). Across six studies (N<sub>1</sub> = 536; N<sub>2</sub> = 701; N<sub>3</sub> = 500; N<sub>4</sub> = 499; N<sub>5</sub> = 629, N<sub>6</sub> = 562), we found consistent evidence that the IDRIS is capable of detecting C/IE responding among statement-based scales (e.g., the HEXACO-PI-R) and the IDRIA is capable of detecting C/IE responding among both adjective-based scales (e.g., the Lex-20) and adjective-derived scales (e.g., the BFI-2). These findings were robust across different analytic approaches (e.g., Pearson correlations; Spearman rank-order correlations), different indices of C/IE responding (e.g., person-total correlations; semantic synonyms; horizontal cursor variability), and different sample types (e.g., US undergraduate students; Nigerian survey panel participants). Taken together, these results provide promising evidence for the utility of the IDRIS and IDRIA in detecting C/IE responding.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141557942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}