Pub Date : 2026-03-11DOI: 10.1177/10944281251413746
Andrew B. Speer, Angie Y. Delacruz, Takudzwa A. Chawota, James Perrotta, Cort W. Rudolph
Alternative approaches to personality measurement, such as open-ended narrative-based assessments, have potential advantages for organizational research and practice. In this research, we investigate factors that affect valid application of natural language processing (NLP) for scoring open-ended personality assessments and when, how, and why such assessments capture personality-related variance. Using a large sample of responses to open-ended assessments, convergence between NLP scores and self-report target scores increased as the degree of customization and the sophistication of the underlying model increased, with the worst psychometric performance occurring for zero-shot large language model (LLM) scores and the best for fine-tuned LLM scores. However, all scoring methods exhibited evidence of validity. Additionally, when trained to predict direct evaluations of the narrative responses, correlations with target scores were large ( M = .83). NLP scores also exhibited discriminant and criterion-related validity evidence. However, validity was contingent upon the methodological rigor employed in developing writing prompts. Prompts designed to elicit trait-relevant information outperformed generic prompts, and this occurred because trait-specific prompts increased the amount of trait-relevant information (i.e., narrative units), which was associated with enhanced convergence with target scores.
{"title":"Unpacking the Validity of Open-Ended Personality Assessments Using Fine-Tuned Large Language Models","authors":"Andrew B. Speer, Angie Y. Delacruz, Takudzwa A. Chawota, James Perrotta, Cort W. Rudolph","doi":"10.1177/10944281251413746","DOIUrl":"https://doi.org/10.1177/10944281251413746","url":null,"abstract":"Alternative approaches to personality measurement, such as open-ended narrative-based assessments, have potential advantages for organizational research and practice. In this research, we investigate factors that affect valid application of natural language processing (NLP) for scoring open-ended personality assessments and when, how, and why such assessments capture personality-related variance. Using a large sample of responses to open-ended assessments, convergence between NLP scores and self-report target scores increased as the degree of customization and the sophistication of the underlying model increased, with the worst psychometric performance occurring for zero-shot large language model (LLM) scores and the best for fine-tuned LLM scores. However, all scoring methods exhibited evidence of validity. Additionally, when trained to predict direct evaluations of the narrative responses, correlations with target scores were large ( <jats:italic toggle=\"yes\">M</jats:italic> = .83). NLP scores also exhibited discriminant and criterion-related validity evidence. However, validity was contingent upon the methodological rigor employed in developing writing prompts. Prompts designed to elicit trait-relevant information outperformed generic prompts, and this occurred because trait-specific prompts increased the amount of trait-relevant information (i.e., narrative units), which was associated with enhanced convergence with target scores.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"15 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147393412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-02DOI: 10.1177/10944281251408073
Imran Kadolkar, Divya V Doshi, Scott Tonidandel, Jose M Cortina
Sentiment analysis (SA) has grown considerably in organizational science research over the past two decades, particularly in the last few years. While enthusiasm for integrating advanced natural language processing algorithms is encouraging, authors are not reaping the benefits of such tools fully. Our systematic review of SA application in the organizational sciences suggests that authors struggle to appreciate all of the decisions that are inherent to SA, the choices that are available at each decision point, and the consequences of each choice. To address this gap, we use a working example to illustrate four critical decision points authors confront when conducting SA, and the subsequent impact different choices can have on one's conclusion. Decision points include selecting the SA method, computing a sentiment score, preprocessing the data, and using an appropriate level of analysis. We conclude with a framework outlining five dimensions (e.g., accuracy, interpretability, computational cost) to guide the selection of an SA approach based on study goals and needs, along with seven recommendations to authors wishing to apply SA.
{"title":"I Gotta Feeling: Advancing Sentiment Analysis in Organizational Science","authors":"Imran Kadolkar, Divya V Doshi, Scott Tonidandel, Jose M Cortina","doi":"10.1177/10944281251408073","DOIUrl":"https://doi.org/10.1177/10944281251408073","url":null,"abstract":"Sentiment analysis (SA) has grown considerably in organizational science research over the past two decades, particularly in the last few years. While enthusiasm for integrating advanced natural language processing algorithms is encouraging, authors are not reaping the benefits of such tools fully. Our systematic review of SA application in the organizational sciences suggests that authors struggle to appreciate all of the decisions that are inherent to SA, the choices that are available at each decision point, and the consequences of each choice. To address this gap, we use a working example to illustrate four critical decision points authors confront when conducting SA, and the subsequent impact different choices can have on one's conclusion. Decision points include selecting the SA method, computing a sentiment score, preprocessing the data, and using an appropriate level of analysis. We conclude with a framework outlining five dimensions (e.g., accuracy, interpretability, computational cost) to guide the selection of an SA approach based on study goals and needs, along with seven recommendations to authors wishing to apply SA.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"250 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147358803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prototypes—internalized knowledge structures of the most typical or characteristic features of a concept—are important because they influence cognitive processing. Yet prototype analysis, the method used to examine prototypes, appears relatively underutilized in organizational research. To introduce prototype analysis to a wider audience of organizational scholars, we conducted a critical methodological literature review following a six-step procedure. Seventy-three prototype analyses published in 35 journals were categorized and their content analyzed. A prototype analysis typically includes a sequence of independent studies conducted over two stages, recently referred to as the standard procedure. Our review makes several contributions, including development of a taxonomy of prototype analysis applications, clarification of the standard procedure of a prototype analysis and possible variations, and suggestions for organizational research. Benefits of undertaking a prototype analysis include improved understanding of abstract workplace concepts that are difficult to measure directly, the ability to compare cross-cultural prototypes, and an approach for investigating the issue of construct redundancy. We conclude with best-practice recommendations, implications for organizational scholarship, methodological limitations, and future research suggestions.
{"title":"Application of Prototype Analysis to Organizational Research: A Critical Methodological Review","authors":"Sandra Kiffin-Petersen, Sharon Purchase, Doina Olaru","doi":"10.1177/10944281251399210","DOIUrl":"https://doi.org/10.1177/10944281251399210","url":null,"abstract":"Prototypes—internalized knowledge structures of the most typical or characteristic features of a concept—are important because they influence cognitive processing. Yet prototype analysis, the method used to examine prototypes, appears relatively underutilized in organizational research. To introduce prototype analysis to a wider audience of organizational scholars, we conducted a critical methodological literature review following a six-step procedure. Seventy-three prototype analyses published in 35 journals were categorized and their content analyzed. A prototype analysis typically includes a sequence of independent studies conducted over two stages, recently referred to as the standard procedure. Our review makes several contributions, including development of a taxonomy of prototype analysis applications, clarification of the standard procedure of a prototype analysis and possible variations, and suggestions for organizational research. Benefits of undertaking a prototype analysis include improved understanding of abstract workplace concepts that are difficult to measure directly, the ability to compare cross-cultural prototypes, and an approach for investigating the issue of construct redundancy. We conclude with best-practice recommendations, implications for organizational scholarship, methodological limitations, and future research suggestions.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"22 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1177/10944281251377139
Dane P. Blevins, David J. Skandera, Roberto Ragozzino
Our paper provides a conceptualization of magnitude-based hypotheses (MBHs). We define an MBH as a specific type of hypothesis that tests for relative differences in the independent impact (i.e., effect size difference) of at least two explanatory variables on a given outcome. We reviewed 1,715 articles across eight leading management journals and found that nearly 10% (165) of articles feature an MBH, employing 41 distinct methodological approaches to test them. However, approximately 40% of these papers show missteps in the post-estimation process required to evaluate MBHs. To address this issue, we offer a conceptual framework, an empirical illustration using Bayesian analysis and frequentist statistics, and a decision-tree guideline that outlines key steps for evaluating MBHs. Overall, we contribute a framework for applying MBHs, demonstrating how they can shift theoretical inquiry from binary questions of whether an effect exists, to more comparative questions about how much a construct matters,compared to what, and under which conditions.
{"title":"Understanding Relative Differences with Magnitude-Based Hypotheses: A Methodological Conceptualization and Data Illustration","authors":"Dane P. Blevins, David J. Skandera, Roberto Ragozzino","doi":"10.1177/10944281251377139","DOIUrl":"https://doi.org/10.1177/10944281251377139","url":null,"abstract":"Our paper provides a conceptualization of magnitude-based hypotheses (MBHs). We define an MBH as a specific type of hypothesis that tests for relative differences in the independent impact (i.e., effect size difference) of at least two explanatory variables on a given outcome. We reviewed 1,715 articles across eight leading management journals and found that nearly 10% (165) of articles feature an MBH, employing 41 distinct methodological approaches to test them. However, approximately 40% of these papers show missteps in the post-estimation process required to evaluate MBHs. To address this issue, we offer a conceptual framework, an empirical illustration using Bayesian analysis and frequentist statistics, and a decision-tree guideline that outlines key steps for evaluating MBHs. Overall, we contribute a framework for applying MBHs, demonstrating how they can shift theoretical inquiry from binary questions of whether an effect exists, to more comparative questions about how much a construct matters,compared to what, and under which conditions.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"50 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145241869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30DOI: 10.1177/10944281251377154
Duc Cuong Nguyen, Catherine Welch
Researchers, engineers, and entrepreneurs are enthusiastically exploring and promoting ways to apply generative artificial intelligence (GenAI) tools to qualitative data analysis. From promises of automated coding and thematic analysis to functioning as a virtual research assistant that supports researchers in diverse interpretive and analytical tasks, the potential applications of GenAI in qualitative research appear vast. In this paper, we take a step back and ask what sort of technological artifact is GenAI and evaluate whether it is appropriate for qualitative data analysis. We provide an accessible, technologically informed analysis of GenAI, specifically large language models (LLMs), and put to the test the claimed transformative potential of using GenAI in qualitative data analysis. Our evaluation illustrates significant shortcomings that, if the technology is adopted uncritically by management researchers, will introduce unacceptable epistemic risks. We explore these epistemic risks and emphasize that the essence of qualitative data analysis lies in the interpretation of meaning, an inherently human capability.
{"title":"Generative Artificial Intelligence in Qualitative Data Analysis: Analyzing—Or Just Chatting?","authors":"Duc Cuong Nguyen, Catherine Welch","doi":"10.1177/10944281251377154","DOIUrl":"https://doi.org/10.1177/10944281251377154","url":null,"abstract":"Researchers, engineers, and entrepreneurs are enthusiastically exploring and promoting ways to apply generative artificial intelligence (GenAI) tools to qualitative data analysis. From promises of automated coding and thematic analysis to functioning as a virtual research assistant that supports researchers in diverse interpretive and analytical tasks, the potential applications of GenAI in qualitative research appear vast. In this paper, we take a step back and ask what sort of technological artifact is GenAI and evaluate whether it is appropriate for qualitative data analysis. We provide an accessible, technologically informed analysis of GenAI, specifically large language models (LLMs), and put to the test the claimed transformative potential of using GenAI in qualitative data analysis. Our evaluation illustrates significant shortcomings that, if the technology is adopted uncritically by management researchers, will introduce unacceptable epistemic risks. We explore these epistemic risks and emphasize that the essence of qualitative data analysis lies in the interpretation of meaning, an inherently human capability.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"25 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145254611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-08DOI: 10.1177/10944281251346804
Stephanie Schrage, Constantine Andriopoulos, Marianne W. Lewis, Wendy K. Smith
Research is a paradoxical process. Scholars confront conflicting yet interwoven pressures, considering methodologies that engage complexity and simplicity, induction and deduction, novelty and continuity, and more. Paradox theory offers insights that embrace such tensions, providing empirical examples that harness creative friction to foster more novel and useful, rigorous, and relevant research. Leveraging this lens, we open a conversation on research tensions, developing the foundations of a Paradox Approach to Methods applicable to organization studies more broadly. To do so, we first identify tensions raised at six methodological decision points: research scope, construct definition, underlying assumptions, data collection, data analysis, and interpretation. Second, we build on paradox theory to identify navigating practices: accepting, differentiating, integrating, and knotting. By doing so, we contribute to organizational research broadly by embracing methods of tensions to advance scholarly insight.
{"title":"Unleashing the Creative Potential of Research Tensions: Toward a Paradox Approach to Methods","authors":"Stephanie Schrage, Constantine Andriopoulos, Marianne W. Lewis, Wendy K. Smith","doi":"10.1177/10944281251346804","DOIUrl":"https://doi.org/10.1177/10944281251346804","url":null,"abstract":"Research is a paradoxical process. Scholars confront conflicting yet interwoven pressures, considering methodologies that engage complexity and simplicity, induction and deduction, novelty and continuity, and more. Paradox theory offers insights that embrace such tensions, providing empirical examples that harness creative friction to foster more novel and useful, rigorous, and relevant research. Leveraging this lens, we open a conversation on research tensions, developing the foundations of a Paradox Approach to Methods applicable to organization studies more broadly. To do so, we first identify tensions raised at six methodological decision points: research scope, construct definition, underlying assumptions, data collection, data analysis, and interpretation. Second, we build on paradox theory to identify navigating practices: accepting, differentiating, integrating, and knotting. By doing so, we contribute to organizational research broadly by embracing methods of tensions to advance scholarly insight.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"21 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-07DOI: 10.1177/10944281251350687
Philseok Lee, Mina Son, Steven Zhou, Sean Joo, Zihao Jia, Virginia Cheng
Over the past two decades, forced-choice (FC) measures have received considerable attention from researchers and practitioners in industrial and organizational psychology. Despite the growing body of research on FC measures, there has not yet been a comprehensive review synthesizing the diverse lines of research. This article bridges this gap by presenting a systematic review of post-2000 literature on FC measures, addressing ten critical questions, including: 1) validity evidence, 2) faking resistance, 3) FC IRT models, 4) FC test design, 5) FC measure development, 6) test-taker reactions and response processes, 7) measurement and predictive bias, 8) reliability, 9) computerized adaptive testing, and 10) random responding. The review adopts a historical perspective, tracing the development of FC measures and highlighting key empirical findings, methodological advances, current trends, and future directions. By synthesizing a substantial body of evidence across multiple research streams, this article serves as a valuable resource, providing insights into the psychometric properties, theoretical underpinnings, and practical applications of FC measures in organizational contexts such as personnel selection, development, and assessment.
{"title":"The Journey of Forced Choice Measurement Over 80 Years: Past, Present, and Future","authors":"Philseok Lee, Mina Son, Steven Zhou, Sean Joo, Zihao Jia, Virginia Cheng","doi":"10.1177/10944281251350687","DOIUrl":"https://doi.org/10.1177/10944281251350687","url":null,"abstract":"Over the past two decades, forced-choice (FC) measures have received considerable attention from researchers and practitioners in industrial and organizational psychology. Despite the growing body of research on FC measures, there has not yet been a comprehensive review synthesizing the diverse lines of research. This article bridges this gap by presenting a systematic review of post-2000 literature on FC measures, addressing ten critical questions, including: 1) validity evidence, 2) faking resistance, 3) FC IRT models, 4) FC test design, 5) FC measure development, 6) test-taker reactions and response processes, 7) measurement and predictive bias, 8) reliability, 9) computerized adaptive testing, and 10) random responding. The review adopts a historical perspective, tracing the development of FC measures and highlighting key empirical findings, methodological advances, current trends, and future directions. By synthesizing a substantial body of evidence across multiple research streams, this article serves as a valuable resource, providing insights into the psychometric properties, theoretical underpinnings, and practical applications of FC measures in organizational contexts such as personnel selection, development, and assessment.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"109 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-24DOI: 10.1177/10944281251334778
Torsten Biemann, Irmela F. Koch-Bayram, Madleen Meier-Barthold, Herman Aguinis
Careless responses by survey participants threaten data quality and lead to misleading substantive conclusions that result in theory and practice derailments. Prior research developed valuable precautionary and post-hoc approaches to detect certain types of careless responding. However, existing approaches fail to detect certain repeated response patterns, such as diagonal-lining and alternating responses. Moreover, some existing approaches risk falsely flagging careful response patterns. To address these challenges, we developed a methodological advancement based on first-order Markov chains called Lazy Respondents (Laz.R) that relies on predicting careless responses based on prior responses. We analyzed two large datasets and conducted an experimental study to compare careless responding indices to Laz.R and provide evidence that its use improves validity. To facilitate the use of Laz.R, we describe a procedure for establishing sample-specific cutoff values for careless respondents using the “kneedle algorithm” and make an R Shiny application available to produce all calculations. We expect that using Laz.R in combination with other approaches will help mitigate the threat of careless responses and improve the accuracy of substantive conclusions in future research.
{"title":"Using Markov Chains to Detect Careless Responding in Survey Research","authors":"Torsten Biemann, Irmela F. Koch-Bayram, Madleen Meier-Barthold, Herman Aguinis","doi":"10.1177/10944281251334778","DOIUrl":"https://doi.org/10.1177/10944281251334778","url":null,"abstract":"Careless responses by survey participants threaten data quality and lead to misleading substantive conclusions that result in theory and practice derailments. Prior research developed valuable precautionary and post-hoc approaches to detect certain types of careless responding. However, existing approaches fail to detect certain repeated response patterns, such as diagonal-lining and alternating responses. Moreover, some existing approaches risk falsely flagging careful response patterns. To address these challenges, we developed a methodological advancement based on first-order Markov chains called <jats:italic>Lazy Respondents</jats:italic> (Laz.R) that relies on predicting careless responses based on prior responses. We analyzed two large datasets and conducted an experimental study to compare careless responding indices to Laz.R and provide evidence that its use improves validity. To facilitate the use of Laz.R, we describe a procedure for establishing sample-specific cutoff values for careless respondents using the “kneedle algorithm” and make an R Shiny application available to produce all calculations. We expect that using Laz.R in combination with other approaches will help mitigate the threat of careless responses and improve the accuracy of substantive conclusions in future research.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"235 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144479220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-24DOI: 10.1177/10944281251346404
Andrew B. Speer, Frederick L. Oswald, Dan J. Putka
Machine learning and artificial intelligence (AI) are increasingly used within organizational research and practice to generate scores representing constructs (e.g., social effectiveness) or behaviors/events (e.g., turnover probability). Ensuring the reliability of AI scores is critical in these contexts, and yet reliability estimates are reported in inconsistent ways, if at all. The current article critically examines reliability estimation for AI scores. We describe different uses of AI scores and how this informs the data and model needed for estimating reliability. Additionally, we distinguish between reliability and validity evidence within this context. We also highlight how the parallel test assumption is required when relying on correlations between AI scores and established measures as an index of reliability, and yet this assumption is frequently violated. We then provide methods that are appropriate for reliability estimation for AI scores that are sensitive to the generalizations one aims to make. In conclusion, we assert that AI reliability estimation is a challenging task that requires a thorough understanding of the issues presented, but a task that is essential to responsible AI work in organizational contexts.
{"title":"Reliability Evidence for AI-Based Scores in Organizational Contexts: Applying Lessons Learned From Psychometrics","authors":"Andrew B. Speer, Frederick L. Oswald, Dan J. Putka","doi":"10.1177/10944281251346404","DOIUrl":"https://doi.org/10.1177/10944281251346404","url":null,"abstract":"Machine learning and artificial intelligence (AI) are increasingly used within organizational research and practice to generate scores representing constructs (e.g., social effectiveness) or behaviors/events (e.g., turnover probability). Ensuring the reliability of AI scores is critical in these contexts, and yet reliability estimates are reported in inconsistent ways, if at all. The current article critically examines reliability estimation for AI scores. We describe different uses of AI scores and how this informs the data and model needed for estimating reliability. Additionally, we distinguish between reliability and validity evidence within this context. We also highlight how the parallel test assumption is required when relying on correlations between AI scores and established measures as an index of reliability, and yet this assumption is frequently violated. We then provide methods that are appropriate for reliability estimation for AI scores that are sensitive to the generalizations one aims to make. In conclusion, we assert that AI reliability estimation is a challenging task that requires a thorough understanding of the issues presented, but a task that is essential to responsible AI work in organizational contexts.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"25 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144371285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-26DOI: 10.1177/10944281251341571
Andrea Simonetti, Michele Tumminello, Pasquale Massimo Picone, Anna Minà
Scholars conduct systematic literature reviews to summarize knowledge and identify gaps in understanding. Machine learning can assist researchers in carrying out these studies. This paper introduces a machine learning toolkit that employs Network Analysis and Natural Language Processing methods to extract textual features and categorize academic papers. The toolkit comprises two algorithms that enable researchers to: (a) select relevant studies for a given theme; and (b) identify the main topics within that theme. We demonstrate the effectiveness of our toolkit by analyzing three streams of literature: cobranding, coopetition, and the psychological resilience of entrepreneurs. By comparing the results obtained through our toolkit with previously published literature reviews, we highlight its advantages in enhancing transparency, coherence, and comprehensiveness in literature reviews. We also provide quantitative evidence about the toolkit's efficacy in addressing the challenges inherent in conducting a literature review, as compared with state-of-the-art Natural Language Processing methods. Finally, we discuss the critical role of researchers in implementing and overseeing a literature review aided by our toolkit.
{"title":"A Machine Learning Toolkit for Selecting Studies and Topics in Systematic Literature Reviews","authors":"Andrea Simonetti, Michele Tumminello, Pasquale Massimo Picone, Anna Minà","doi":"10.1177/10944281251341571","DOIUrl":"https://doi.org/10.1177/10944281251341571","url":null,"abstract":"Scholars conduct systematic literature reviews to summarize knowledge and identify gaps in understanding. Machine learning can assist researchers in carrying out these studies. This paper introduces a machine learning toolkit that employs Network Analysis and Natural Language Processing methods to extract textual features and categorize academic papers. The toolkit comprises two algorithms that enable researchers to: (a) select relevant studies for a given theme; and (b) identify the main topics within that theme. We demonstrate the effectiveness of our toolkit by analyzing three streams of literature: cobranding, coopetition, and the psychological resilience of entrepreneurs. By comparing the results obtained through our toolkit with previously published literature reviews, we highlight its advantages in enhancing transparency, coherence, and comprehensiveness in literature reviews. We also provide quantitative evidence about the toolkit's efficacy in addressing the challenges inherent in conducting a literature review, as compared with state-of-the-art Natural Language Processing methods. Finally, we discuss the critical role of researchers in implementing and overseeing a literature review aided by our toolkit.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"51 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144145566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}