Pub Date : 2025-01-13DOI: 10.1177/10944281241310574
Cameron J. Borgholthaus, Alaric Bourgoin, Peter D. Harms, Joshua V. White, Tyler N. A. Fezzey
Nearly 2 decades ago, Cycyota and Harrison (2006) documented a concerning trend of declining executive survey response rates and projected a continued decrease in the future. Their seminal work has significantly influenced the methodologies of upper echelons survey research. Our study examines the manner in which Cycyota and Harrison’s paper has impacted the existing upper echelons literature and replicates their study by analyzing peer-reviewed studies published post-2006. We reveal that executive response rates have largely stabilized since Cycyota and Harrison’s initial findings. Furthermore, we expand upon their research by identifying specific geographical contexts and contact methodologies associated with higher (and lower) response rates. Finally, we lend insight into the evolving landscape of executive survey research and offer practical implications for future methodological endeavors in the upper echelons.
{"title":"Surveying the Upper Echelons: An Update to Cycyota and Harrison (2006) on Top Manager Response Rates and Recommendations for the Future","authors":"Cameron J. Borgholthaus, Alaric Bourgoin, Peter D. Harms, Joshua V. White, Tyler N. A. Fezzey","doi":"10.1177/10944281241310574","DOIUrl":"https://doi.org/10.1177/10944281241310574","url":null,"abstract":"Nearly 2 decades ago, Cycyota and Harrison (2006) documented a concerning trend of declining executive survey response rates and projected a continued decrease in the future. Their seminal work has significantly influenced the methodologies of upper echelons survey research. Our study examines the manner in which Cycyota and Harrison’s paper has impacted the existing upper echelons literature and replicates their study by analyzing peer-reviewed studies published post-2006. We reveal that executive response rates have largely stabilized since Cycyota and Harrison’s initial findings. Furthermore, we expand upon their research by identifying specific geographical contexts and contact methodologies associated with higher (and lower) response rates. Finally, we lend insight into the evolving landscape of executive survey research and offer practical implications for future methodological endeavors in the upper echelons.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"9 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-17DOI: 10.1177/10944281241300952
Kira F. Schabram, Christopher G. Myers, Ashley E. Hardin
While other applied sciences systematically distinguish between manipulation designs, organizational research does not. Herein, we disentangle distinct applications that differ in how the manipulation is deployed, analyzed, and interpreted in support of hypotheses. First, we define two archetypes: treatments, experimental designs that expose participants to different levels/types of a manipulation of theoretical interest, and primes, manipulations that are not of theoretical interest but generate variance in a state that is. We position these and creative derivations (e.g., interventions and invariant prompts) as specialized tools in our methodological kit. Second, we review 450 manipulations published in leading organizational journals to identify each type's prevalence and application in our field. From this we derive our guiding thesis that while treatments offer unique advantages (foremost establishing causality), they are not always possible, nor the best fit for a research question; in these cases, a non-causal but accurate test of theory, such as a prime design, may prove superior to a causal but inaccurate test. We conclude by outlining best practices for selection, execution, and evaluation by researchers, reviewers, and readers.
{"title":"Manipulation in Organizational Research: On Executing and Interpreting Designs from Treatments to Primes","authors":"Kira F. Schabram, Christopher G. Myers, Ashley E. Hardin","doi":"10.1177/10944281241300952","DOIUrl":"https://doi.org/10.1177/10944281241300952","url":null,"abstract":"While other applied sciences systematically distinguish between manipulation designs, organizational research does not. Herein, we disentangle distinct applications that differ in how the manipulation is deployed, analyzed, and interpreted in support of hypotheses. First, we define two archetypes: treatments, experimental designs that expose participants to different levels/types of a manipulation of theoretical interest, and primes, manipulations that are not of theoretical interest but generate variance in a state that is. We position these and creative derivations (e.g., interventions and invariant prompts) as specialized tools in our methodological kit. Second, we review 450 manipulations published in leading organizational journals to identify each type's prevalence and application in our field. From this we derive our guiding thesis that while treatments offer unique advantages (foremost establishing causality), they are not always possible, nor the best fit for a research question; in these cases, a non-causal but accurate test of theory, such as a prime design, may prove superior to a causal but inaccurate test. We conclude by outlining best practices for selection, execution, and evaluation by researchers, reviewers, and readers.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"86 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142841975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-13DOI: 10.1177/10944281241279790
Harriet Lingel, Paul-Christian Bürkner, Klaus G. Melchers, Niklas Schulte
In graded paired comparisons (GPCs), two items are compared using a multipoint rating scale. GPCs are expected to reduce faking compared with Likert-type scales and to produce more reliable, less ipsative trait scores than traditional binary forced-choice formats. To investigate the statistical properties of GPCs, we simulated 960 conditions in which we varied six independent factors and additionally implemented conditions with algorithmically optimized item combinations. Using Thurstonian IRT models, good reliabilities and low ipsativity of trait score estimates were achieved for questionnaires with 50% unequally keyed item pairs or equally keyed item pairs with an optimized combination of loadings. However, in conditions with 20% unequally keyed item pairs and equally keyed conditions without optimization, reliabilities were lower with evidence of ipsativity. Overall, more response categories led to higher reliabilities and nearly fully normative trait scores. In an empirical example, we demonstrate the identified mechanisms under both honest and faking conditions and study the effects of social desirability matching on reliability. In sum, our studies inform about the psychometric properties of GPCs under different conditions and make specific recommendations for improving these properties.
{"title":"Measuring Personality When Stakes Are High: Are Graded Paired Comparisons a More Reliable Alternative to Traditional Forced-Choice Methods?","authors":"Harriet Lingel, Paul-Christian Bürkner, Klaus G. Melchers, Niklas Schulte","doi":"10.1177/10944281241279790","DOIUrl":"https://doi.org/10.1177/10944281241279790","url":null,"abstract":"In graded paired comparisons (GPCs), two items are compared using a multipoint rating scale. GPCs are expected to reduce faking compared with Likert-type scales and to produce more reliable, less ipsative trait scores than traditional binary forced-choice formats. To investigate the statistical properties of GPCs, we simulated 960 conditions in which we varied six independent factors and additionally implemented conditions with algorithmically optimized item combinations. Using Thurstonian IRT models, good reliabilities and low ipsativity of trait score estimates were achieved for questionnaires with 50% unequally keyed item pairs or equally keyed item pairs with an optimized combination of loadings. However, in conditions with 20% unequally keyed item pairs and equally keyed conditions without optimization, reliabilities were lower with evidence of ipsativity. Overall, more response categories led to higher reliabilities and nearly fully normative trait scores. In an empirical example, we demonstrate the identified mechanisms under both honest and faking conditions and study the effects of social desirability matching on reliability. In sum, our studies inform about the psychometric properties of GPCs under different conditions and make specific recommendations for improving these properties.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"29 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1177/10944281241284941
Richard F.J. Haans, Marc J. Mertens
Websites represent a crucial avenue for organizations to reach customers, attract talent, and disseminate information to stakeholders. Despite their importance, strikingly little work in the domain of organization and management research has tapped into this source of longitudinal big data. In this paper, we highlight the unique nature and profound potential of longitudinal website data and present novel open-source code- and databases that make these data accessible. Specifically, our codebase offers a general-purpose setup, building on four central steps to scrape historical websites using the Wayback Machine. Our open-access CompuCrawl database was built using this four-step approach. It contains websites of North American firms in the Compustat database between 1996 and 2020—covering 11,277 firms with 86,303 firm/year observations and 1,617,675 webpages. We describe the coverage of our database and illustrate its use by applying word-embedding models to reveal the evolving meaning of the concept of “sustainability” over time. Finally, we outline several avenues for future research enabled by our step-by-step longitudinal web scraping approach and our CompuCrawl database.
{"title":"The Internet Never Forgets: A Four-Step Scraping Tutorial, Codebase, and Database for Longitudinal Organizational Website Data","authors":"Richard F.J. Haans, Marc J. Mertens","doi":"10.1177/10944281241284941","DOIUrl":"https://doi.org/10.1177/10944281241284941","url":null,"abstract":"Websites represent a crucial avenue for organizations to reach customers, attract talent, and disseminate information to stakeholders. Despite their importance, strikingly little work in the domain of organization and management research has tapped into this source of longitudinal big data. In this paper, we highlight the unique nature and profound potential of longitudinal website data and present novel open-source code- and databases that make these data accessible. Specifically, our codebase offers a general-purpose setup, building on four central steps to scrape historical websites using the Wayback Machine. Our open-access CompuCrawl database was built using this four-step approach. It contains websites of North American firms in the Compustat database between 1996 and 2020—covering 11,277 firms with 86,303 firm/year observations and 1,617,675 webpages. We describe the coverage of our database and illustrate its use by applying word-embedding models to reveal the evolving meaning of the concept of “sustainability” over time. Finally, we outline several avenues for future research enabled by our step-by-step longitudinal web scraping approach and our CompuCrawl database.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"140 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1177/10944281241271323
Bo Zhang, R. Philip Chalmers, Lingyue Li, Tianjun Sun, Louis Tay
When modeling responses to items measuring non-cognitive constructs that require introspection (e.g., personality, attitude), most studies have assumed that respondents follow the same item response process—either a dominance or an unfolding one. Nevertheless, the results are not equivocal, as some preliminary evidence suggests that some people use an unfolding response process, whereas others use a dominance response process. To enhance item response modeling, it is critical to develop measurement models that can accommodate heterogeneity in the item response processes. Therefore, we proposed the Mixture Dominance-Unfolding Model (MixDUM) to formally identify this potential population heterogeneity. Monte Carlo simulations showed that MixDUM possessed reasonably good statistical properties. Moreover, ignoring item response process heterogeneity was detrimental to item parameter estimation and led to less accurate selection outcomes. An empirical study was conducted in which respondents completed focal personality scales under either an honest condition or a simulated job application condition, to demonstrate the utility of MixDUM. The findings indicated (1) that MixDUM provided the best fit across scales, (2) that approximately 55–60% of respondents utilized an unfolding response process, (3) that respondents exhibited moderate consistency in their use of response processes across scales, (4) that narcissism consistently negatively predicted the use of an unfolding response process, and (5) that the criterion-related validity of focal personality scores varied across latent classes for certain criteria. To encourage its use, we provided a tutorial on the implementation of MixDUM in the R package mirt.
{"title":"One Size Does Not Fit All: Unraveling Item Response Process Heterogeneity Using the Mixture Dominance-Unfolding Model (MixDUM)","authors":"Bo Zhang, R. Philip Chalmers, Lingyue Li, Tianjun Sun, Louis Tay","doi":"10.1177/10944281241271323","DOIUrl":"https://doi.org/10.1177/10944281241271323","url":null,"abstract":"When modeling responses to items measuring non-cognitive constructs that require introspection (e.g., personality, attitude), most studies have assumed that respondents follow the same item response process—either a dominance or an unfolding one. Nevertheless, the results are not equivocal, as some preliminary evidence suggests that some people use an unfolding response process, whereas others use a dominance response process. To enhance item response modeling, it is critical to develop measurement models that can accommodate heterogeneity in the item response processes. Therefore, we proposed the Mixture Dominance-Unfolding Model (MixDUM) to formally identify this potential population heterogeneity. Monte Carlo simulations showed that MixDUM possessed reasonably good statistical properties. Moreover, ignoring item response process heterogeneity was detrimental to item parameter estimation and led to less accurate selection outcomes. An empirical study was conducted in which respondents completed focal personality scales under either an honest condition or a simulated job application condition, to demonstrate the utility of MixDUM. The findings indicated (1) that MixDUM provided the best fit across scales, (2) that approximately 55–60% of respondents utilized an unfolding response process, (3) that respondents exhibited moderate consistency in their use of response processes across scales, (4) that narcissism consistently negatively predicted the use of an unfolding response process, and (5) that the criterion-related validity of focal personality scores varied across latent classes for certain criteria. To encourage its use, we provided a tutorial on the implementation of MixDUM in the R package mirt.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"27 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142233372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1177/10944281241271249
Andrew B. Speer, James Perrotta, Tobias L. Kordsmeyer
When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.
{"title":"Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text","authors":"Andrew B. Speer, James Perrotta, Tobias L. Kordsmeyer","doi":"10.1177/10944281241271249","DOIUrl":"https://doi.org/10.1177/10944281241271249","url":null,"abstract":"When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"98 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1177/10944281241261913
James A. Grand, Michael T. Braun, Goran Kuljanin
Computational modeling holds significant promise as a tool for improving how theory is developed, expressed, and used to inform empirical research and evaluation efforts. However, the knowledge and skillsets needed to build computational models are rarely developed in the training received by social and organizational scientists. The purpose of this manuscript is to provide an accessible introduction to and reference for building computational models to represent theory. We first discuss important principles and recommendations for “thinking about” theory and developing explanatory accounts in ways that facilitate translating their core assumptions, specifications, and ideas into a computational model. Next, we address some frequently asked questions related to building computational models that introduce several fundamental tasks/concepts involved in building models to represent theory and demonstrate how they can be implemented in the R programming language to produce executable model code. The accompanying supplemental materials describes additional considerations relevant to building and using computational models, provides multiple examples of complete computational model code written in R, and an interactive application offering guided practice on key model-building tasks/concepts in R.
计算模型作为一种工具,在改进理论的开发、表达和使用方式,为实证研究和评估工作提供信息方面大有可为。然而,建立计算模型所需的知识和技能很少在社会和组织科学家接受的培训中得到发展。本手稿的目的是为建立代表理论的计算模型提供通俗易懂的介绍和参考。我们首先讨论了 "思考 "理论和开发解释性描述的重要原则和建议,这些原则和建议有助于将理论的核心假设、规范和观点转化为计算模型。接下来,我们讨论了一些与建立计算模型有关的常见问题,介绍了建立模型以表示理论所涉及的几项基本任务/概念,并演示了如何用 R 编程语言实现这些任务/概念,以生成可执行的模型代码。随书附赠的补充材料介绍了与构建和使用计算模型相关的其他注意事项,提供了多个用 R 语言编写的完整计算模型代码示例,并提供了一个交互式应用程序,指导读者练习用 R 语言构建模型的关键任务/概念。
{"title":"Hello World! Building Computational Models to Represent Social and Organizational Theory","authors":"James A. Grand, Michael T. Braun, Goran Kuljanin","doi":"10.1177/10944281241261913","DOIUrl":"https://doi.org/10.1177/10944281241261913","url":null,"abstract":"Computational modeling holds significant promise as a tool for improving how theory is developed, expressed, and used to inform empirical research and evaluation efforts. However, the knowledge and skillsets needed to build computational models are rarely developed in the training received by social and organizational scientists. The purpose of this manuscript is to provide an accessible introduction to and reference for building computational models to represent theory. We first discuss important principles and recommendations for “thinking about” theory and developing explanatory accounts in ways that facilitate translating their core assumptions, specifications, and ideas into a computational model. Next, we address some frequently asked questions related to building computational models that introduce several fundamental tasks/concepts involved in building models to represent theory and demonstrate how they can be implemented in the R programming language to produce executable model code. The accompanying supplemental materials describes additional considerations relevant to building and using computational models, provides multiple examples of complete computational model code written in R, and an interactive application offering guided practice on key model-building tasks/concepts in R.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"1 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141764124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1177/10944281241264027
Louis Hickman, Josh Liff, Caleb Rottman, Charles Calderwood
While machine learning (ML) can validly score psychological constructs from behavior, several conditions often change across studies, making it difficult to understand why the psychometric properties of ML models differ across studies. We address this gap in the context of automatically scored interviews. Across multiple datasets, for interview- or question-level scoring of self-reported, tested, and interviewer-rated constructs, we manipulate the training sample size and natural language processing (NLP) method while observing differences in ground truth reliability. We examine how these factors influence the ML model scores’ test–retest reliability and convergence, and we develop multilevel models for estimating the convergent-related validity of ML model scores in similar interviews. When the ground truth is interviewer ratings, hundreds of observations are adequate for research purposes, while larger samples are recommended for practitioners to support generalizability across populations and time. However, self-reports and tested constructs require larger training samples. Particularly when the ground truth is interviewer ratings, NLP embedding methods improve upon count-based methods. Given mixed findings regarding ground truth reliability, we discuss future research possibilities on factors that affect supervised ML models’ psychometric properties.
虽然机器学习(ML)可以有效地从行为中对心理结构进行评分,但在不同的研究中,有几个条件经常会发生变化,因此很难理解为什么不同研究中的 ML 模型的心理测量特性会有所不同。我们在自动评分访谈中解决了这一空白。在多个数据集中,对于自我报告、测试和面试官评分的访谈或问题级评分,我们操纵了训练样本大小和自然语言处理(NLP)方法,同时观察了基本真实可靠性的差异。我们研究了这些因素如何影响 ML 模型得分的重测可靠性和收敛性,并开发了多层次模型来估计类似访谈中 ML 模型得分的收敛性相关有效性。当基本事实是访谈者的评分时,数百个观察样本就足以满足研究目的,而对于从业人员来说,则建议使用更大的样本,以支持跨人群和跨时间的普适性。然而,自我报告和经过测试的结构需要更大的训练样本。特别是当基本真实情况是访谈者的评分时,NLP 嵌入方法比基于计数的方法更有优势。鉴于有关基本真实可靠性的研究结果好坏参半,我们讨论了未来研究影响有监督 ML 模型心理计量特性的因素的可能性。
{"title":"The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties","authors":"Louis Hickman, Josh Liff, Caleb Rottman, Charles Calderwood","doi":"10.1177/10944281241264027","DOIUrl":"https://doi.org/10.1177/10944281241264027","url":null,"abstract":"While machine learning (ML) can validly score psychological constructs from behavior, several conditions often change across studies, making it difficult to understand why the psychometric properties of ML models differ across studies. We address this gap in the context of automatically scored interviews. Across multiple datasets, for interview- or question-level scoring of self-reported, tested, and interviewer-rated constructs, we manipulate the training sample size and natural language processing (NLP) method while observing differences in ground truth reliability. We examine how these factors influence the ML model scores’ test–retest reliability and convergence, and we develop multilevel models for estimating the convergent-related validity of ML model scores in similar interviews. When the ground truth is interviewer ratings, hundreds of observations are adequate for research purposes, while larger samples are recommended for practitioners to support generalizability across populations and time. However, self-reports and tested constructs require larger training samples. Particularly when the ground truth is interviewer ratings, NLP embedding methods improve upon count-based methods. Given mixed findings regarding ground truth reliability, we discuss future research possibilities on factors that affect supervised ML models’ psychometric properties.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"37 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141764243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1177/10944281241246772
Wen Wei Loh, Dongning Ren
Understanding the experiences of vulnerable workers is an important scientific pursuit. For example, research interest is often in quantifying the impacts of adverse exposures such as discrimination, exclusion, harassment, or job insecurity, among others. However, routine approaches have only focused on the average treatment effect, which encapsulates the impact of an exposure (e.g., discrimination) applied to the entire study population—including those who were not exposed. In this paper, we propose using a more refined causal quantity uniquely suited to address such causal queries: The effect of treatment on the treated (ETT) from the causal inference literature. We explain why the ETT is a more pertinent causal estimand for investigating the experiences of vulnerable workers by highlighting three appealing features: Better interpretability, greater accuracy, and enhanced robustness to violations of empirically untestable causal assumptions. We further describe how to estimate the ETT by introducing and comparing two estimators. Both estimators are conferred with a so-called doubly robust property. We hope the current proposal empowers organizational scholars in their crucial endeavors dedicated to understanding the vulnerable workforce.
了解弱势工人的经历是一项重要的科学追求。例如,研究兴趣往往在于量化歧视、排斥、骚扰或工作不稳定等不利暴露的影响。然而,常规方法只关注平均处理效果,即某一暴露(如歧视)对整个研究人群(包括未暴露人群)的影响。在本文中,我们建议使用一种更精细的因果量,它非常适合解决此类因果问题:因果推断文献中的治疗对被治疗者的影响(ETT)。我们通过强调三个吸引人的特点来解释为什么 ETT 是调查弱势工人经历的更相关的因果估计量:更好的可解释性、更高的准确性以及对违反经验上无法检验的因果假设的稳健性。通过介绍和比较两种估计方法,我们进一步介绍了如何估计 ETT。这两个估计器都具有所谓的双重稳健性。我们希望当前的建议能够增强组织学者的能力,使他们能够致力于了解弱势劳动力的重要工作。
{"title":"Enhancing Causal Pursuits in Organizational Science: Targeting the Effect of Treatment on the Treated in Research on Vulnerable Populations","authors":"Wen Wei Loh, Dongning Ren","doi":"10.1177/10944281241246772","DOIUrl":"https://doi.org/10.1177/10944281241246772","url":null,"abstract":"Understanding the experiences of vulnerable workers is an important scientific pursuit. For example, research interest is often in quantifying the impacts of adverse exposures such as discrimination, exclusion, harassment, or job insecurity, among others. However, routine approaches have only focused on the average treatment effect, which encapsulates the impact of an exposure (e.g., discrimination) applied to the entire study population—including those who were not exposed. In this paper, we propose using a more refined causal quantity uniquely suited to address such causal queries: The effect of treatment on the treated (ETT) from the causal inference literature. We explain why the ETT is a more pertinent causal estimand for investigating the experiences of vulnerable workers by highlighting three appealing features: Better interpretability, greater accuracy, and enhanced robustness to violations of empirically untestable causal assumptions. We further describe how to estimate the ETT by introducing and comparing two estimators. Both estimators are conferred with a so-called doubly robust property. We hope the current proposal empowers organizational scholars in their crucial endeavors dedicated to understanding the vulnerable workforce.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"51 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1177/10944281241245444
Linda Jakob Sadeh, Avital Baikovich, Tammar B. Zilber
This article proposes a framework for reflexive choice in qualitative research, centering on social interaction. Interaction, fundamental to social and organizational life, has been studied extensively. Yet, researchers can get lost in the plethora of methodological tools, hampering reflexive choice. Our proposed framework consists of four dimensions of interaction (content, communication patterns, emotions, and roles), intersecting with five levels of analysis (individual, dyadic, group, organizational, and sociocultural), as well as three overarching analytic principles (following the dynamic, consequential, and contextual nature of interaction). For each intersection between dimension and level, we specify analytical questions, empirical markers, and references to exemplary works. The framework functions both as a compass, indicating potential directions for research design and data collection methods, and as a roadmap, illuminating pathways at the analysis stage. Our contributions are twofold: First, our framework fleshes out the broad spectrum of available methods for analyzing interaction, providing pragmatic tools for the researcher to reflexively choose from. Second, we highlight the broader relevance of maps, such as our own, for enhancing reflexive methodological choices.
{"title":"Analyzing Social Interaction in Organizations: A Roadmap for Reflexive Choice","authors":"Linda Jakob Sadeh, Avital Baikovich, Tammar B. Zilber","doi":"10.1177/10944281241245444","DOIUrl":"https://doi.org/10.1177/10944281241245444","url":null,"abstract":"This article proposes a framework for reflexive choice in qualitative research, centering on social interaction. Interaction, fundamental to social and organizational life, has been studied extensively. Yet, researchers can get lost in the plethora of methodological tools, hampering reflexive choice. Our proposed framework consists of four dimensions of interaction (content, communication patterns, emotions, and roles), intersecting with five levels of analysis (individual, dyadic, group, organizational, and sociocultural), as well as three overarching analytic principles (following the dynamic, consequential, and contextual nature of interaction). For each intersection between dimension and level, we specify analytical questions, empirical markers, and references to exemplary works. The framework functions both as a compass, indicating potential directions for research design and data collection methods, and as a roadmap, illuminating pathways at the analysis stage. Our contributions are twofold: First, our framework fleshes out the broad spectrum of available methods for analyzing interaction, providing pragmatic tools for the researcher to reflexively choose from. Second, we highlight the broader relevance of maps, such as our own, for enhancing reflexive methodological choices.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"9 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140637754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}