{"title":"社会的技术构建:比较 GPT-4 和英国职业评估中的人类受访者","authors":"Paweł Gmyrek, Christoph Lutz, Gemma Newlands","doi":"10.1111/bjir.12840","DOIUrl":null,"url":null,"abstract":"Despite initial research about the biases and perceptions of large language models (LLMs), we lack evidence on how LLMs evaluate occupations, especially in comparison to human evaluators. In this paper, we present a systematic comparison of occupational evaluations by GPT‐4 with those from an in‐depth, high‐quality and recent human respondents survey in the UK. Covering the full ISCO‐08 occupational landscape, with 580 occupations and two distinct metrics (prestige and social value), our findings indicate that GPT‐4 and human scores are highly correlated across all ISCO‐08 major groups. At the same time, GPT‐4 substantially under‐ or overestimates the occupational prestige and social value of many occupations, particularly for emerging digital and stigmatized or illicit occupations. Our analyses show both the potential and risk of using LLM‐generated data for sociological and occupational research. We also discuss the policy implications of our findings for the integration of LLM tools into the world of work.","PeriodicalId":47846,"journal":{"name":"British Journal of Industrial Relations","volume":"76 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A technological construction of society: Comparing GPT‐4 and human respondents for occupational evaluation in the UK\",\"authors\":\"Paweł Gmyrek, Christoph Lutz, Gemma Newlands\",\"doi\":\"10.1111/bjir.12840\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite initial research about the biases and perceptions of large language models (LLMs), we lack evidence on how LLMs evaluate occupations, especially in comparison to human evaluators. In this paper, we present a systematic comparison of occupational evaluations by GPT‐4 with those from an in‐depth, high‐quality and recent human respondents survey in the UK. Covering the full ISCO‐08 occupational landscape, with 580 occupations and two distinct metrics (prestige and social value), our findings indicate that GPT‐4 and human scores are highly correlated across all ISCO‐08 major groups. At the same time, GPT‐4 substantially under‐ or overestimates the occupational prestige and social value of many occupations, particularly for emerging digital and stigmatized or illicit occupations. Our analyses show both the potential and risk of using LLM‐generated data for sociological and occupational research. We also discuss the policy implications of our findings for the integration of LLM tools into the world of work.\",\"PeriodicalId\":47846,\"journal\":{\"name\":\"British Journal of Industrial Relations\",\"volume\":\"76 1\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British Journal of Industrial Relations\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1111/bjir.12840\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INDUSTRIAL RELATIONS & LABOR\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Industrial Relations","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1111/bjir.12840","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INDUSTRIAL RELATIONS & LABOR","Score":null,"Total":0}
A technological construction of society: Comparing GPT‐4 and human respondents for occupational evaluation in the UK
Despite initial research about the biases and perceptions of large language models (LLMs), we lack evidence on how LLMs evaluate occupations, especially in comparison to human evaluators. In this paper, we present a systematic comparison of occupational evaluations by GPT‐4 with those from an in‐depth, high‐quality and recent human respondents survey in the UK. Covering the full ISCO‐08 occupational landscape, with 580 occupations and two distinct metrics (prestige and social value), our findings indicate that GPT‐4 and human scores are highly correlated across all ISCO‐08 major groups. At the same time, GPT‐4 substantially under‐ or overestimates the occupational prestige and social value of many occupations, particularly for emerging digital and stigmatized or illicit occupations. Our analyses show both the potential and risk of using LLM‐generated data for sociological and occupational research. We also discuss the policy implications of our findings for the integration of LLM tools into the world of work.
期刊介绍:
BJIR (British Journal of Industrial Relations) is an influential and authoritative journal which is essential reading for all academics and practitioners interested in work and employment relations. It is the highest ranked European journal in the Industrial Relations & Labour category of the Social Sciences Citation Index. BJIR aims to present the latest research on developments on employment and work from across the globe that appeal to an international readership. Contributions are drawn from all of the main social science disciplines, deal with a broad range of employment topics and express a range of viewpoints.