Paul R. Sackett, Christopher M. Berry, Filip Lievens, Charlene Zhang
Bobko et al. (2024) responded to Sackett et al.'s (2022) compilation of meta-analytic evidence for the validity of a wide variety of measures used as predictors of overall job performance, offering a set of alternative methodological choices which they term “considered estimation” to counter the Sackett et al. approach of “conservative estimation.” Here we offer a rebuttal to Bobko et al. A primary concern is that Bobko et al. apply the label “conservative estimation” to the full range of methodological choices made by Sackett et al. Yet, we clarify the narrow and specific meaning of “conservative estimation,” and note that that the bulk of Bobko et al.'s concerns are independent of the principle of conservative estimation. We also respond to Bobko et al.'s two key concerns, namely, comparing validity estimates when one is corrected for range restriction and one is not and comparing validity estimates for predictors reflecting psychological constructs and those reflecting measurement methods, and also briefly address a range of other critiques offered by Bobko et al.
Bobko et al.(2024)回应了Sackett et al.(2022)对用于预测整体工作绩效的各种措施有效性的荟萃分析证据的汇编,提供了一套替代方法选择,他们称之为“考虑估计”,以对抗Sackett et al.的“保守估计”方法。在此,我们对Bobko等人的观点提出反驳。一个主要的问题是,Bobko等人将“保守估计”的标签应用于Sackett等人所做的全部方法选择。然而,我们澄清了“保守估计”的狭义和特定含义,并注意到Bobko等人的大部分关注点是独立于保守估计原则的。我们还回应了Bobko等人的两个关键问题,即,比较在范围限制和没有范围限制的情况下的效度估计,比较反映心理结构的预测因子和反映测量方法的预测因子的效度估计,并简要地解决Bobko等人提出的一系列其他批评。
{"title":"Issues in Contrasting Conservative and Considered Estimation: A Reply to Bobko et al. (2024)","authors":"Paul R. Sackett, Christopher M. Berry, Filip Lievens, Charlene Zhang","doi":"10.1111/ijsa.70016","DOIUrl":"https://doi.org/10.1111/ijsa.70016","url":null,"abstract":"<p>Bobko et al. (2024) responded to Sackett et al.'s (2022) compilation of meta-analytic evidence for the validity of a wide variety of measures used as predictors of overall job performance, offering a set of alternative methodological choices which they term “considered estimation” to counter the Sackett et al. approach of “conservative estimation.” Here we offer a rebuttal to Bobko et al. A primary concern is that Bobko et al. apply the label “conservative estimation” to the full range of methodological choices made by Sackett et al. Yet, we clarify the narrow and specific meaning of “conservative estimation,” and note that that the bulk of Bobko et al.'s concerns are independent of the principle of conservative estimation. We also respond to Bobko et al.'s two key concerns, namely, comparing validity estimates when one is corrected for range restriction and one is not and comparing validity estimates for predictors reflecting psychological constructs and those reflecting measurement methods, and also briefly address a range of other critiques offered by Bobko et al.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 3","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144514573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chet Robie, Jane Phillips, Joshua S. Bourdage, Neil D. Christiansen, Patrick D. Dunlop, Stephen D. Risavy, Andrew B. Speer
Large language models (LLMs), such as ChatGPT, have reshaped opportunities and challenges across various fields, including human resources (HR). Concerns have arisen about the potential for personality assessment manipulation using LLMs, posing a risk to the validity of these tools. This threat is a reality: recent research suggests that many candidates are using AI to complete pre-hire assessments. This study addresses this problem by examining whether ChatGPT can outperform humans in faking personality assessments while avoiding detection. To explore this, two experiments were conducted focusing on assessing job-relevant traits, with and without coaching, and with two methods of identifying faking, specifically using an impression management (IM) measure and an overclaiming questionnaire (OCQ). For each study, we used responses from 100 working adults recruited via the Prolific platform, which were compared to 100 replications from ChatGPT. The results revealed that while ChatGPT showed some ability to manipulate assessments, without coaching it did not consistently outperform humans. Coaching had a minimal impact on reducing IM scores for either humans or ChatGPT, but reduced OCQ bias scores for ChatGPT. These findings highlight the limitations of current faking detection measures and emphasize the need for further research to refine methods for ensuring the integrity of personality assessments in HR, particularly as artificial intelligence becomes more available to candidates.
{"title":"Can ChatGPT Outperform Humans in Faking a Personality Assessment While Avoiding Detection?","authors":"Chet Robie, Jane Phillips, Joshua S. Bourdage, Neil D. Christiansen, Patrick D. Dunlop, Stephen D. Risavy, Andrew B. Speer","doi":"10.1111/ijsa.70015","DOIUrl":"https://doi.org/10.1111/ijsa.70015","url":null,"abstract":"<p>Large language models (LLMs), such as ChatGPT, have reshaped opportunities and challenges across various fields, including human resources (HR). Concerns have arisen about the potential for personality assessment manipulation using LLMs, posing a risk to the validity of these tools. This threat is a reality: recent research suggests that many candidates are using AI to complete pre-hire assessments. This study addresses this problem by examining whether ChatGPT can outperform humans in faking personality assessments while avoiding detection. To explore this, two experiments were conducted focusing on assessing job-relevant traits, with and without coaching, and with two methods of identifying faking, specifically using an impression management (IM) measure and an overclaiming questionnaire (OCQ). For each study, we used responses from 100 working adults recruited via the Prolific platform, which were compared to 100 replications from ChatGPT. The results revealed that while ChatGPT showed some ability to manipulate assessments, without coaching it did not consistently outperform humans. Coaching had a minimal impact on reducing IM scores for either humans or ChatGPT, but reduced OCQ bias scores for ChatGPT. These findings highlight the limitations of current faking detection measures and emphasize the need for further research to refine methods for ensuring the integrity of personality assessments in HR, particularly as artificial intelligence becomes more available to candidates.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 3","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144185908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence (AI) is increasingly used to score employment interviews in the early stages of the hiring process, but AI algorithms may be particularly prone to interviewee faking. Our study compared the extent to which people can improve their scores on self-report scales, structured and less structured human interview ratings, and AI interview ratings. Further, we replicate and extend prior research by examining how interviewee abilities and impression management tactics influence score inflation across scoring methods. Participants (N = 152) completed simulated, asynchronous interviews in honest and applicant-like conditions in a within-subjects design. The AI algorithms in the study were trained to replicate question-level structured interview ratings. Participants' scores increased most on self-reports (overall Cohen's d = 0.62) and least on AI interview ratings (overall Cohen's d = 0.14), although AI score increases were similar to those observed for human interview ratings (overall Cohen's d = 0.22). On average, across conditions, AI interview ratings converged more strongly with structured human ratings based on behaviorally anchored rating scales than with less structured human ratings. Verbal ability only predicted score improvement on self-reports, while increased use of honest defensive impression management tactics predicted improvement in AI and less structured human interview scores. Ability to identify criteria did not predict score improvement. Overall, these AI interview scores behaved similarly to structured human ratings. We discuss future possibilities for investigating faking in AI interviews, given that interviewees may try to “game” the system when aware that they are being evaluated by AI.
在招聘过程的早期阶段,人工智能(AI)越来越多地用于给面试打分,但人工智能算法可能特别容易让面试者作弊。我们的研究比较了人们在自我报告量表、结构化和非结构化人类访谈评分以及人工智能访谈评分方面提高得分的程度。此外,我们通过研究受访者的能力和印象管理策略如何影响评分方法的得分膨胀来复制和扩展先前的研究。参与者(N = 152)在受试者内部设计中,在诚实和类似申请人的条件下完成了模拟的异步面试。研究中的人工智能算法经过训练,可以复制问题级结构化访谈评级。参与者在自我报告上的得分增加最多(总体Cohen’s d = 0.62),在人工智能面试评分上的得分增加最少(总体Cohen’s d = 0.14),尽管人工智能得分的增长与人类面试评分的增长相似(总体Cohen’s d = 0.22)。平均而言,在各种情况下,人工智能面试评分与基于行为锚定评分量表的结构化人类评分的趋同程度要高于结构化程度较低的人类评分。语言能力只能预测自我报告的分数提高,而诚实的防御性印象管理策略的使用增加可以预测人工智能和不那么结构化的人类面试分数的提高。识别标准的能力并不能预测分数的提高。总的来说,这些人工智能面试分数的表现与结构化的人类评分相似。我们讨论了在人工智能面试中调查造假的未来可能性,因为当受访者意识到他们正在被人工智能评估时,他们可能会试图“玩弄”系统。
{"title":"Can Interviewees Fake Out AI? Comparing the Susceptibility and Mechanisms of Faking Across Self-Reports, Human Interview Ratings, and AI Interview Ratings","authors":"Louis Hickman, Josh Liff, Colin Willis, Emily Kim","doi":"10.1111/ijsa.70014","DOIUrl":"https://doi.org/10.1111/ijsa.70014","url":null,"abstract":"<p>Artificial intelligence (AI) is increasingly used to score employment interviews in the early stages of the hiring process, but AI algorithms may be particularly prone to interviewee faking. Our study compared the extent to which people can improve their scores on self-report scales, structured and less structured human interview ratings, and AI interview ratings. Further, we replicate and extend prior research by examining how interviewee abilities and impression management tactics influence score inflation across scoring methods. Participants (<i>N</i> = 152) completed simulated, asynchronous interviews in honest and applicant-like conditions in a within-subjects design. The AI algorithms in the study were trained to replicate question-level structured interview ratings. Participants' scores increased most on self-reports (overall Cohen's <i>d</i> = 0.62) and least on AI interview ratings (overall Cohen's <i>d</i> = 0.14), although AI score increases were similar to those observed for human interview ratings (overall Cohen's <i>d</i> = 0.22). On average, across conditions, AI interview ratings converged more strongly with structured human ratings based on behaviorally anchored rating scales than with less structured human ratings. Verbal ability only predicted score improvement on self-reports, while increased use of honest defensive impression management tactics predicted improvement in AI and less structured human interview scores. Ability to identify criteria did not predict score improvement. Overall, these AI interview scores behaved similarly to structured human ratings. We discuss future possibilities for investigating faking in AI interviews, given that interviewees may try to “game” the system when aware that they are being evaluated by AI.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143909386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica Röhner, Mia Degro, Ronald. R. Holden, Astrid Schütz
In laboratory faking research, participants are often instructed to respond honestly (generic instructions [GIs], control condition) or to fake (personnel-selection scenario [PSS], faking condition). Considering the research on instruction-level contextualization, a PSS might not only motivate participants to fake but might also promote the adoption of a work frame of reference (FOR). Thus, differences in responses between faking and control conditions could partly result from FOR effects. (Full) item-level contextualization can also be used to promote the adoption of a work FOR, and the adoption through this route is stronger than through instruction manipulation. We combined the two approaches to disentangle FOR and faking, conducted a 4-wave longitudinal study with a 2 (instructions: GIs vs. PSS) × 2 (full item-level work contextualization absent vs. present) repeated-measures design (N = 309), and compared the effects of these conditions on three HEXACO-PI-R scales (Conscientiousness, Emotionality, Honesty-Humility). Irrespective of the investigated personality trait, the ANOVAs revealed significant main effects. As expected, compared with GIs, the PSS increased the adoption of a work FOR, and the effects were smaller than the effects of full item-level work contextualization present (vs. absent). Also, as expected, the PSS (vs. GIs) and full item-level work contextualization present (vs. absent) changed participants' scale mean scores. However, importantly, there were no interaction effects. Exploratory mediation analyses indicated direct rather than indirect (mediator: adoption of a work FOR) effects of instructions on participants' scale mean scores. In conclusion, the internal validity of faking research is not threatened by confounding FOR effects.
在实验室作假研究中,参与者经常被要求如实回答(一般指示[GIs],控制条件)或虚假回答(人员选择情景[PSS],作假条件)。考虑到教学层面情境化的研究,PSS不仅可以激励参与者学习,还可以促进工作参考框架(FOR)的采用。因此,伪造条件和控制条件之间的反应差异可能部分是由FOR效应造成的。(全)项目级语境化也可以用来促进工作FOR的采用,并且通过这种途径的采用比通过指令操作的采用更强。我们将这两种方法结合起来,以2(指令:GIs vs. PSS) × 2(完整项目层面的工作情境化缺失vs.存在)重复测量设计(N = 309)进行了一项4波纵向研究,并比较了这些条件对三个HEXACO-PI-R量表(尽责性、情绪性、诚实-谦卑性)的影响。无论被调查的人格特质如何,方差分析显示显著的主效应。正如预期的那样,与GIs相比,PSS增加了工作情境化的采用,其影响小于完整项目级工作情境化存在的影响(与不存在的影响相比)。此外,正如预期的那样,PSS(相对于GIs)和完整项目层面的工作情境化存在(相对于不存在)改变了参与者的量表平均得分。然而,重要的是,没有相互作用的影响。探索性中介分析表明,指导对参与者量表平均得分的影响是直接的,而不是间接的(中介:采用工作FOR)。综上所述,伪造研究的内部效度不受混杂效应的威胁。
{"title":"A Registered Report to Disentangle the Effects of Frame of Reference and Faking in the Personnel-Selection Scenario Paradigm","authors":"Jessica Röhner, Mia Degro, Ronald. R. Holden, Astrid Schütz","doi":"10.1111/ijsa.70012","DOIUrl":"https://doi.org/10.1111/ijsa.70012","url":null,"abstract":"<p>In laboratory faking research, participants are often instructed to respond honestly (generic instructions [GIs], control condition) or to fake (personnel-selection scenario [PSS], faking condition). Considering the research on instruction-level contextualization, a PSS might not only motivate participants to fake but might also promote the adoption of a work frame of reference (FOR). Thus, differences in responses between faking and control conditions could partly result from FOR effects. (Full) item-level contextualization can also be used to promote the adoption of a work FOR, and the adoption through this route is stronger than through instruction manipulation. We combined the two approaches to disentangle FOR and faking, conducted a 4-wave longitudinal study with a 2 (instructions: GIs vs. PSS) × 2 (full item-level work contextualization absent vs. present) repeated-measures design (<i>N</i> = 309), and compared the effects of these conditions on three HEXACO-PI-R scales (Conscientiousness, Emotionality, Honesty-Humility). Irrespective of the investigated personality trait, the ANOVAs revealed significant main effects. As expected, compared with GIs, the PSS increased the adoption of a work FOR, and the effects were smaller than the effects of full item-level work contextualization present (vs. absent). Also, as expected, the PSS (vs. GIs) and full item-level work contextualization present (vs. absent) changed participants' scale mean scores. However, importantly, there were no interaction effects. Exploratory mediation analyses indicated direct rather than indirect (mediator: adoption of a work FOR) effects of instructions on participants' scale mean scores. In conclusion, the internal validity of faking research is <i>not threatened</i> by confounding FOR effects.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143901092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}