Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text

IF 8.9 2区管理学 Q1 MANAGEMENT Organizational Research Methods Pub Date : 2024-08-28 DOI:10.1177/10944281241271249

Andrew B. Speer, James Perrotta, Tobias L. Kordsmeyer

{"title":"Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text","authors":"Andrew B. Speer, James Perrotta, Tobias L. Kordsmeyer","doi":"10.1177/10944281241271249","DOIUrl":null,"url":null,"abstract":"When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.","PeriodicalId":19689,"journal":{"name":"Organizational Research Methods","volume":"98 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Organizational Research Methods","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1177/10944281241271249","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MANAGEMENT","Score":null,"Total":0}

引用次数: 0

Abstract

When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从容应对：现成与微调的绩效考核文本监督建模比较

在评估文本时，有监督的自然语言处理（NLP）模型历来被用于测量组织科学中的目标构造。然而，这些模型的开发需要大量资源。新兴的 "现成 "大型语言模型（LLM）为评估组织结构提供了一种无需构建定制模型的方法。然而，目前还不清楚现成的 LLM 是否能准确地对组织结构进行评分，也不清楚推断有效性需要哪些证据。在本研究中，我们比较了有监督的 NLP 模型和现成的 LLM 模型（ChatGPT-3.5 和 ChatGPT-4）的有效性。在六个组织数据集和数千条评论中，我们发现有监督的 NLP 生成的分数比人工编码员更可靠。不过，尽管不是专门为此目的开发的，我们发现现成的 LLM 也能产生与监督模型类似的心理测量属性，只是心理测量属性稍差一些。我们将这些发现与更广泛的验证考虑因素联系起来，并提出了一个决策图，以指导研究人员和从业人员如何使用现成的 LLM 模型对目标构念进行评分，包括指导如何将心理测量证据 "移植 "到新的环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Organizational Research Methods Multiple-

CiteScore

23.20

自引率

3.20%

发文量

期刊介绍： Organizational Research Methods (ORM) was founded with the aim of introducing pertinent methodological advancements to researchers in organizational sciences. The objective of ORM is to promote the application of current and emerging methodologies to advance both theory and research practices. Articles are expected to be comprehensible to readers with a background consistent with the methodological and statistical training provided in contemporary organizational sciences doctoral programs. The text should be presented in a manner that facilitates accessibility. For instance, highly technical content should be placed in appendices, and authors are encouraged to include example data and computer code when relevant. Additionally, authors should explicitly outline how their contribution has the potential to advance organizational theory and research practice.