Pub Date : 2023-07-01DOI: 10.1177/25152459231162559
F. Pargent, Ramona Schoedel, Clemens Stachl
Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to evaluate the performance of ML models (bias-variance trade-off, performance measures, k-fold cross-validation). In Module 2, we introduce the nonlinear random forest, a type of ML model that is particularly user-friendly and well suited to predicting psychological outcomes. Module 3 is about performing empirical benchmark experiments (comparing the performance of several ML models on multiple data sets). Finally, in Module 4, we discuss the interpretation of ML models, including permutation variable importance measures, effect plots (partial-dependence plots, individual conditional-expectation profiles), and the concept of model fairness. Throughout the tutorial, intuitive descriptions of theoretical concepts are provided, with as few mathematical formulas as possible, and followed by code examples using the mlr3 and companion packages in R. Key practical-analysis steps are demonstrated on the publicly available PhoneStudy data set (N = 624), which includes more than 1,800 variables from smartphone sensing to predict Big Five personality trait scores. The article contains a checklist to be used as a reminder of important elements when performing, reporting, or reviewing ML analyses in psychology. Additional examples and more advanced concepts are demonstrated in online materials (https://osf.io/9273g/).
{"title":"Best Practices in Supervised Machine Learning: A Tutorial for Psychologists","authors":"F. Pargent, Ramona Schoedel, Clemens Stachl","doi":"10.1177/25152459231162559","DOIUrl":"https://doi.org/10.1177/25152459231162559","url":null,"abstract":"Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to evaluate the performance of ML models (bias-variance trade-off, performance measures, k-fold cross-validation). In Module 2, we introduce the nonlinear random forest, a type of ML model that is particularly user-friendly and well suited to predicting psychological outcomes. Module 3 is about performing empirical benchmark experiments (comparing the performance of several ML models on multiple data sets). Finally, in Module 4, we discuss the interpretation of ML models, including permutation variable importance measures, effect plots (partial-dependence plots, individual conditional-expectation profiles), and the concept of model fairness. Throughout the tutorial, intuitive descriptions of theoretical concepts are provided, with as few mathematical formulas as possible, and followed by code examples using the mlr3 and companion packages in R. Key practical-analysis steps are demonstrated on the publicly available PhoneStudy data set (N = 624), which includes more than 1,800 variables from smartphone sensing to predict Big Five personality trait scores. The article contains a checklist to be used as a reminder of important elements when performing, reporting, or reviewing ML analyses in psychology. Additional examples and more advanced concepts are demonstrated in online materials (https://osf.io/9273g/).","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41742704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231162567
Stefano Coretta, Joseph V. Casillas, S. Roessig, M. Franke, Byron Ahn, Ali H. Al-Hoorie, Jalal Al-Tamimi, Najd E. Alotaibi, Mohammed AlShakhori, Ruth Altmiller, Pablo Arantes, Angeliki A. Athanasopoulou, M. Baese-Berk, George Bailey, Cheman Baira A Sangma, Eleonora J. Beier, Gabriela M. Benavides, Nicole Benker, Emelia P. BensonMeyer, Nina R. Benway, G. Berry, Liwen Bing, Christina Bjorndahl, Mariska A. Bolyanatz, A. Braver, V. Brown, Alicia M. Brown, A. Brugos, E. Buchanan, Tanna Butlin, Andrés Buxó-Lugo, Coline Caillol, F. Cangemi, C. Carignan, S. Carraturo, Tiphaine Caudrelier, Eleanor Chodroff, Michelle Cohn, Johanna Cronenberg, O. Crouzet, Erica L. Dagar, Charlotte Dawson, Carissa A. Diantoro, Marie Dokovova, Shiloh Drake, Fengting Du, Margaux Dubuis, Florent Duême, M. Durward, Ander Egurtzegi, M. Elsherif, J. Esser, Emmanuel Ferragne, F. Ferreira, Lauren K. Fink, Sara Finley, Kurtis Foster, P. Foulkes, Rosa Franzke, Gabriel Frazer-McKee, R. Fromont, Christina García, Jason Geller, Camille L Grasso,
Recent empirical studies have highlighted the large degree of analytic flexibility in data analysis that can lead to substantially different conclusions based on the same data set. Thus, researchers have expressed their concerns that these researcher degrees of freedom might facilitate bias and can lead to claims that do not stand the test of time. Even greater flexibility is to be expected in fields in which the primary data lend themselves to a variety of possible operationalizations. The multidimensional, temporally extended nature of speech constitutes an ideal testing ground for assessing the variability in analytic approaches, which derives not only from aspects of statistical modeling but also from decisions regarding the quantification of the measured behavior. In this study, we gave the same speech-production data set to 46 teams of researchers and asked them to answer the same research question, resulting in substantial variability in reported effect sizes and their interpretation. Using Bayesian meta-analytic tools, we further found little to no evidence that the observed variability can be explained by analysts’ prior beliefs, expertise, or the perceived quality of their analyses. In light of this idiosyncratic variability, we recommend that researchers more transparently share details of their analysis, strengthen the link between theoretical construct and quantitative system, and calibrate their (un)certainty in their conclusions.
{"title":"Multidimensional Signals and Analytic Flexibility: Estimating Degrees of Freedom in Human-Speech Analyses","authors":"Stefano Coretta, Joseph V. Casillas, S. Roessig, M. Franke, Byron Ahn, Ali H. Al-Hoorie, Jalal Al-Tamimi, Najd E. Alotaibi, Mohammed AlShakhori, Ruth Altmiller, Pablo Arantes, Angeliki A. Athanasopoulou, M. Baese-Berk, George Bailey, Cheman Baira A Sangma, Eleonora J. Beier, Gabriela M. Benavides, Nicole Benker, Emelia P. BensonMeyer, Nina R. Benway, G. Berry, Liwen Bing, Christina Bjorndahl, Mariska A. Bolyanatz, A. Braver, V. Brown, Alicia M. Brown, A. Brugos, E. Buchanan, Tanna Butlin, Andrés Buxó-Lugo, Coline Caillol, F. Cangemi, C. Carignan, S. Carraturo, Tiphaine Caudrelier, Eleanor Chodroff, Michelle Cohn, Johanna Cronenberg, O. Crouzet, Erica L. Dagar, Charlotte Dawson, Carissa A. Diantoro, Marie Dokovova, Shiloh Drake, Fengting Du, Margaux Dubuis, Florent Duême, M. Durward, Ander Egurtzegi, M. Elsherif, J. Esser, Emmanuel Ferragne, F. Ferreira, Lauren K. Fink, Sara Finley, Kurtis Foster, P. Foulkes, Rosa Franzke, Gabriel Frazer-McKee, R. Fromont, Christina García, Jason Geller, Camille L Grasso, ","doi":"10.1177/25152459231162567","DOIUrl":"https://doi.org/10.1177/25152459231162567","url":null,"abstract":"Recent empirical studies have highlighted the large degree of analytic flexibility in data analysis that can lead to substantially different conclusions based on the same data set. Thus, researchers have expressed their concerns that these researcher degrees of freedom might facilitate bias and can lead to claims that do not stand the test of time. Even greater flexibility is to be expected in fields in which the primary data lend themselves to a variety of possible operationalizations. The multidimensional, temporally extended nature of speech constitutes an ideal testing ground for assessing the variability in analytic approaches, which derives not only from aspects of statistical modeling but also from decisions regarding the quantification of the measured behavior. In this study, we gave the same speech-production data set to 46 teams of researchers and asked them to answer the same research question, resulting in substantial variability in reported effect sizes and their interpretation. Using Bayesian meta-analytic tools, we further found little to no evidence that the observed variability can be explained by analysts’ prior beliefs, expertise, or the perceived quality of their analyses. In light of this idiosyncratic variability, we recommend that researchers more transparently share details of their analysis, strengthen the link between theoretical construct and quantitative system, and calibrate their (un)certainty in their conclusions.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48478010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231186577
D. Villanova
In a recent article, André (2022) addressed the decision to exclude outliers using a threshold across conditions or within conditions and offered a clear recommendation to avoid within-conditions exclusions because of the possibility for large false-positive inflation. In this commentary, I note that André’s simulations did not include the situation for which within-conditions exclusion has previously been recommended—when across-conditions exclusion would exacerbate selection bias. Examining test performance in this situation confirms the recommendation for within-conditions exclusion in such a circumstance. Critically, the suitability of exclusion criteria must be considered in relationship to assumptions about data-generating mechanisms.
{"title":"The Appropriateness of Outlier Exclusion Approaches Depends on the Expected Contamination: Commentary on André (2022)","authors":"D. Villanova","doi":"10.1177/25152459231186577","DOIUrl":"https://doi.org/10.1177/25152459231186577","url":null,"abstract":"In a recent article, André (2022) addressed the decision to exclude outliers using a threshold across conditions or within conditions and offered a clear recommendation to avoid within-conditions exclusions because of the possibility for large false-positive inflation. In this commentary, I note that André’s simulations did not include the situation for which within-conditions exclusion has previously been recommended—when across-conditions exclusion would exacerbate selection bias. Examining test performance in this situation confirms the recommendation for within-conditions exclusion in such a circumstance. Critically, the suitability of exclusion criteria must be considered in relationship to assumptions about data-generating mechanisms.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47146534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231187531
David A. A. Baranger, Megan C. Finsaas, Brandon L. Goldstein, Colin E. Vize, Donald R. Lynam, Thomas M. Olino
Interaction analyses (also termed “moderation” analyses or “moderated multiple regression”) are a form of linear regression analysis designed to test whether the association between two variables changes when conditioned on a third variable. It can be challenging to perform a power analysis for interactions with existing software, particularly when variables are correlated and continuous. Moreover, although power is affected by main effects, their correlation, and variable reliability, it can be unclear how to incorporate these effects into a power analysis. The R package InteractionPoweR and associated Shiny apps allow researchers with minimal or no programming experience to perform analytic and simulation-based power analyses for interactions. At minimum, these analyses require the Pearson’s correlation between variables and sample size, and additional parameters, including reliability and the number of discrete levels that a variable takes (e.g., binary or Likert scale), can optionally be specified. In this tutorial, we demonstrate how to perform power analyses using our package and give examples of how power can be affected by main effects, correlations between main effects, reliability, and variable distributions. We also include a brief discussion of how researchers may select an appropriate interaction effect size when performing a power analysis.
{"title":"Tutorial: Power Analyses for Interaction Effects in Cross-Sectional Regressions","authors":"David A. A. Baranger, Megan C. Finsaas, Brandon L. Goldstein, Colin E. Vize, Donald R. Lynam, Thomas M. Olino","doi":"10.1177/25152459231187531","DOIUrl":"https://doi.org/10.1177/25152459231187531","url":null,"abstract":"Interaction analyses (also termed “moderation” analyses or “moderated multiple regression”) are a form of linear regression analysis designed to test whether the association between two variables changes when conditioned on a third variable. It can be challenging to perform a power analysis for interactions with existing software, particularly when variables are correlated and continuous. Moreover, although power is affected by main effects, their correlation, and variable reliability, it can be unclear how to incorporate these effects into a power analysis. The R package InteractionPoweR and associated Shiny apps allow researchers with minimal or no programming experience to perform analytic and simulation-based power analyses for interactions. At minimum, these analyses require the Pearson’s correlation between variables and sample size, and additional parameters, including reliability and the number of discrete levels that a variable takes (e.g., binary or Likert scale), can optionally be specified. In this tutorial, we demonstrate how to perform power analyses using our package and give examples of how power can be affected by main effects, correlations between main effects, reliability, and variable distributions. We also include a brief discussion of how researchers may select an appropriate interaction effect size when performing a power analysis.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135811941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231186615
Ryan M. McManus, L. Young, Joseph Sweetman
When experimental psychologists make a claim (e.g., “Participants judged X as morally worse than Y”), how many participants are represented? Such claims are often based exclusively on group-level analyses; here, psychologists often fail to report or perhaps even investigate how many participants judged X as morally worse than Y. More troubling, group-level analyses do not necessarily generalize to the person level: “the group-to-person generalizability problem.” We first argue for the necessity of designing experiments that allow investigation of whether claims represent most participants. Second, we report findings that in a survey of researchers (and laypeople), most interpret claims based on group-level effects as being intended to represent most participants in a study. Most believe this ought to be the case if a claim is used to support a general, person-level psychological theory. Third, building on prior approaches, we document claims in the experimental-psychology literature, derived from sets of typical group-level analyses, that describe only a (sometimes tiny) minority of participants. Fourth, we reason through an example from our own research to illustrate this group-to-person generalizability problem. In addition, we demonstrate how claims from sets of simulated group-level effects can emerge without a single participant’s responses matching these patterns. Fifth, we conduct four experiments that rule out several methodology-based noise explanations of the problem. Finally, we propose a set of simple and flexible options to help researchers confront the group-to-person generalizability problem in their own work.
{"title":"Psychology Is a Property of Persons, Not Averages or Distributions: Confronting the Group-to-Person Generalizability Problem in Experimental Psychology","authors":"Ryan M. McManus, L. Young, Joseph Sweetman","doi":"10.1177/25152459231186615","DOIUrl":"https://doi.org/10.1177/25152459231186615","url":null,"abstract":"When experimental psychologists make a claim (e.g., “Participants judged X as morally worse than Y”), how many participants are represented? Such claims are often based exclusively on group-level analyses; here, psychologists often fail to report or perhaps even investigate how many participants judged X as morally worse than Y. More troubling, group-level analyses do not necessarily generalize to the person level: “the group-to-person generalizability problem.” We first argue for the necessity of designing experiments that allow investigation of whether claims represent most participants. Second, we report findings that in a survey of researchers (and laypeople), most interpret claims based on group-level effects as being intended to represent most participants in a study. Most believe this ought to be the case if a claim is used to support a general, person-level psychological theory. Third, building on prior approaches, we document claims in the experimental-psychology literature, derived from sets of typical group-level analyses, that describe only a (sometimes tiny) minority of participants. Fourth, we reason through an example from our own research to illustrate this group-to-person generalizability problem. In addition, we demonstrate how claims from sets of simulated group-level effects can emerge without a single participant’s responses matching these patterns. Fifth, we conduct four experiments that rule out several methodology-based noise explanations of the problem. Finally, we propose a set of simple and flexible options to help researchers confront the group-to-person generalizability problem in their own work.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45191954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231182319
S. Jak, Terrence D. Jorgensen, Debby ten Hove, Barbara Nevicka
When multiple items are used to measure cluster-level constructs with individual-level responses, multilevel confirmatory factor models are useful. How to model constructs across levels is still an active area of research in which competing methods are available to capture what can be interpreted as a valid representation of cluster-level phenomena. Moreover, the terminology used for the cluster-level constructs in such models varies across researchers. We therefore provide an overview of used terminology and modeling approaches for cluster-level constructs measured through individual responses. We classify the constructs based on whether (a) the target of measurement is at the cluster level or at the individual level and (b) the construct requires a measurement model. Next, we discuss various two-level factor models that have been proposed for multilevel constructs that require a measurement model, and we show that the so-called doubly latent model with cross-level invariance of factor loadings is appropriate for all types of constructs that require a measurement model. We provide two illustrations using empirical data from students and organizational teams on stimulating teaching and on conflict in organizational teams, respectively.
{"title":"Modeling Cluster-Level Constructs Measured by Individual Responses: Configuring a Shared Approach","authors":"S. Jak, Terrence D. Jorgensen, Debby ten Hove, Barbara Nevicka","doi":"10.1177/25152459231182319","DOIUrl":"https://doi.org/10.1177/25152459231182319","url":null,"abstract":"When multiple items are used to measure cluster-level constructs with individual-level responses, multilevel confirmatory factor models are useful. How to model constructs across levels is still an active area of research in which competing methods are available to capture what can be interpreted as a valid representation of cluster-level phenomena. Moreover, the terminology used for the cluster-level constructs in such models varies across researchers. We therefore provide an overview of used terminology and modeling approaches for cluster-level constructs measured through individual responses. We classify the constructs based on whether (a) the target of measurement is at the cluster level or at the individual level and (b) the construct requires a measurement model. Next, we discuss various two-level factor models that have been proposed for multilevel constructs that require a measurement model, and we show that the so-called doubly latent model with cross-level invariance of factor loadings is appropriate for all types of constructs that require a measurement model. We provide two illustrations using empirical data from students and organizational teams on stimulating teaching and on conflict in organizational teams, respectively.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43411974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231183912
J. Bottesini, Christie Aschwanden, M. Rhemtulla, S. Vazire
What information do science journalists use when evaluating psychology findings? We examined this in a preregistered, controlled experiment by manipulating four factors in descriptions of fictitious behavioral-psychology studies: (a) the study’s sample size, (b) the representativeness of the study’s sample, (c) the p value associated with the finding, and (d) institutional prestige of the researcher who conducted the study. We investigated the effects of these manipulations on 181 real journalists’ perceptions of each study’s trustworthiness and newsworthiness. Sample size was the only factor that had a robust influence on journalists’ ratings of how trustworthy and newsworthy a finding was; larger sample sizes led to an increase of about two-thirds of 1 point on a 7-point scale. University prestige had no effect in this controlled setting, and the effects of sample representativeness and of p values were inconclusive, but any effects in this setting are likely quite small. Exploratory analyses suggest that other types of prestige might be more important (i.e., journal prestige) and that study design (experimental vs. correlational) may also affect trustworthiness and newsworthiness.
{"title":"How Do Science Journalists Evaluate Psychology Research?","authors":"J. Bottesini, Christie Aschwanden, M. Rhemtulla, S. Vazire","doi":"10.1177/25152459231183912","DOIUrl":"https://doi.org/10.1177/25152459231183912","url":null,"abstract":"What information do science journalists use when evaluating psychology findings? We examined this in a preregistered, controlled experiment by manipulating four factors in descriptions of fictitious behavioral-psychology studies: (a) the study’s sample size, (b) the representativeness of the study’s sample, (c) the p value associated with the finding, and (d) institutional prestige of the researcher who conducted the study. We investigated the effects of these manipulations on 181 real journalists’ perceptions of each study’s trustworthiness and newsworthiness. Sample size was the only factor that had a robust influence on journalists’ ratings of how trustworthy and newsworthy a finding was; larger sample sizes led to an increase of about two-thirds of 1 point on a 7-point scale. University prestige had no effect in this controlled setting, and the effects of sample representativeness and of p values were inconclusive, but any effects in this setting are likely quite small. Exploratory analyses suggest that other types of prestige might be more important (i.e., journal prestige) and that study design (experimental vs. correlational) may also affect trustworthiness and newsworthiness.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45965296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231178738
Yannick Roos, Michael D. Krämer, D. Richter, Ramona Schoedel, C. Wrzus
Mobile sensing is a promising method that allows researchers to directly observe human social behavior in daily life using people’s mobile phones. To date, limited knowledge exists on how well mobile sensing can assess the quantity and quality of social interactions. We therefore examined the agreement among experience sampling, day reconstruction, and mobile sensing in the assessment of multiple aspects of daily social interactions (i.e., face-to-face interactions, calls, and text messages) and the possible unique access to social interactions that each method has. Over 2 days, 320 smartphone users (51% female, age range = 18–80, M = 39.53 years) answered up to 20 experience-sampling questionnaires about their social behavior and reconstructed their days in a daily diary. Meanwhile, face-to-face and smartphone-mediated social interactions were assessed with mobile sensing. The results showed some agreement between measurements of face-to-face interactions and high agreement between measurements of smartphone-mediated interactions. Still, a large number of social interactions were captured by only one of the methods, and the quality of social interactions is still difficult to capture with mobile sensing. We discuss limitations and the unique benefits of day reconstruction, experience sampling, and mobile sensing for assessing social behavior in daily life.
{"title":"Does Your Smartphone “Know” Your Social Life? A Methodological Comparison of Day Reconstruction, Experience Sampling, and Mobile Sensing","authors":"Yannick Roos, Michael D. Krämer, D. Richter, Ramona Schoedel, C. Wrzus","doi":"10.1177/25152459231178738","DOIUrl":"https://doi.org/10.1177/25152459231178738","url":null,"abstract":"Mobile sensing is a promising method that allows researchers to directly observe human social behavior in daily life using people’s mobile phones. To date, limited knowledge exists on how well mobile sensing can assess the quantity and quality of social interactions. We therefore examined the agreement among experience sampling, day reconstruction, and mobile sensing in the assessment of multiple aspects of daily social interactions (i.e., face-to-face interactions, calls, and text messages) and the possible unique access to social interactions that each method has. Over 2 days, 320 smartphone users (51% female, age range = 18–80, M = 39.53 years) answered up to 20 experience-sampling questionnaires about their social behavior and reconstructed their days in a daily diary. Meanwhile, face-to-face and smartphone-mediated social interactions were assessed with mobile sensing. The results showed some agreement between measurements of face-to-face interactions and high agreement between measurements of smartphone-mediated interactions. Still, a large number of social interactions were captured by only one of the methods, and the quality of social interactions is still difficult to capture with mobile sensing. We discuss limitations and the unique benefits of day reconstruction, experience sampling, and mobile sensing for assessing social behavior in daily life.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44651155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231178728
Nicolas Sommet, David L. Weissman, Nicolas Cheutin, Andrew J. Elliot
Power analysis for first-order interactions poses two challenges: (a) Conducting an appropriate power analysis is difficult because the typical expected effect size of an interaction depends on its shape, and (b) achieving sufficient power is difficult because interactions are often modest in size. This article consists of three parts. In the first part, we address the first challenge. We first use a fictional study to explain the difference between power analyses for interactions and main effects. Then, we introduce an intuitive taxonomy of 12 types of interactions based on the shape of the interaction (reversed, fully attenuated, partially attenuated) and the size of the simple slopes (median, smaller, larger), and we offer mathematically derived sample-size recommendations to detect each interaction with a power of .80/.90/.95 (for two-tailed tests in between-participants designs). In the second part, we address the second challenge. We first describe a preregistered metastudy (159 studies from recent articles in influential psychology journals) showing that the median power to detect interactions of a typical size is .18. Then, we use simulations (≈900,000,000 data sets) to generate power curves for the 12 types of interactions and test three approaches to increase power without increasing sample size: (a) preregistering one-tailed tests (+21% gain), (b) using a mixed design (+75% gain), and (c) preregistering contrast analysis for a fully attenuated interaction (+62% gain). In the third part, we introduce INT×Power ( www.intxpower.com ), a web application that enables users to draw their interaction and determine the sample size needed to reach the power of their choice with the option of using/combining these approaches.
{"title":"How Many Participants Do I Need to Test an Interaction? Conducting an Appropriate Power Analysis and Achieving Sufficient Power to Detect an Interaction","authors":"Nicolas Sommet, David L. Weissman, Nicolas Cheutin, Andrew J. Elliot","doi":"10.1177/25152459231178728","DOIUrl":"https://doi.org/10.1177/25152459231178728","url":null,"abstract":"Power analysis for first-order interactions poses two challenges: (a) Conducting an appropriate power analysis is difficult because the typical expected effect size of an interaction depends on its shape, and (b) achieving sufficient power is difficult because interactions are often modest in size. This article consists of three parts. In the first part, we address the first challenge. We first use a fictional study to explain the difference between power analyses for interactions and main effects. Then, we introduce an intuitive taxonomy of 12 types of interactions based on the shape of the interaction (reversed, fully attenuated, partially attenuated) and the size of the simple slopes (median, smaller, larger), and we offer mathematically derived sample-size recommendations to detect each interaction with a power of .80/.90/.95 (for two-tailed tests in between-participants designs). In the second part, we address the second challenge. We first describe a preregistered metastudy (159 studies from recent articles in influential psychology journals) showing that the median power to detect interactions of a typical size is .18. Then, we use simulations (≈900,000,000 data sets) to generate power curves for the 12 types of interactions and test three approaches to increase power without increasing sample size: (a) preregistering one-tailed tests (+21% gain), (b) using a mixed design (+75% gain), and (c) preregistering contrast analysis for a fully attenuated interaction (+62% gain). In the third part, we introduce INT×Power ( www.intxpower.com ), a web application that enables users to draw their interaction and determine the sample size needed to reach the power of their choice with the option of using/combining these approaches.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136260479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1177/25152459231174029
W. W. Loh, Dongning Ren
In psychological research, longitudinal study designs are often used to examine the effects of a naturally observed predictor (i.e., treatment) on an outcome over time. But causal inference of longitudinal data in the presence of time-varying confounding is notoriously challenging. In this tutorial, we introduce g-estimation, a well-established estimation strategy from the causal inference literature. G-estimation is a powerful analytic tool designed to handle time-varying confounding variables affected by treatment. We offer step-by-step guidance on implementing the g-estimation method using standard parametric regression functions familiar to psychological researchers and commonly available in statistical software. To facilitate hands-on usage, we provide software code at each step using the open-source statistical software R. All the R code presented in this tutorial are publicly available online.
{"title":"A Tutorial on Causal Inference in Longitudinal Data With Time-Varying Confounding Using G-Estimation","authors":"W. W. Loh, Dongning Ren","doi":"10.1177/25152459231174029","DOIUrl":"https://doi.org/10.1177/25152459231174029","url":null,"abstract":"In psychological research, longitudinal study designs are often used to examine the effects of a naturally observed predictor (i.e., treatment) on an outcome over time. But causal inference of longitudinal data in the presence of time-varying confounding is notoriously challenging. In this tutorial, we introduce g-estimation, a well-established estimation strategy from the causal inference literature. G-estimation is a powerful analytic tool designed to handle time-varying confounding variables affected by treatment. We offer step-by-step guidance on implementing the g-estimation method using standard parametric regression functions familiar to psychological researchers and commonly available in statistical software. To facilitate hands-on usage, we provide software code at each step using the open-source statistical software R. All the R code presented in this tutorial are publicly available online.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46760911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}