Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek
{"title":"生成语言模型表现出社会身份偏见。","authors":"Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek","doi":"10.1038/s43588-024-00741-1","DOIUrl":null,"url":null,"abstract":"Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 1","pages":"65-75"},"PeriodicalIF":12.0000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774750/pdf/","citationCount":"0","resultStr":"{\"title\":\"Generative language models exhibit social identity biases\",\"authors\":\"Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek\",\"doi\":\"10.1038/s43588-024-00741-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.\",\"PeriodicalId\":74246,\"journal\":{\"name\":\"Nature computational science\",\"volume\":\"5 1\",\"pages\":\"65-75\"},\"PeriodicalIF\":12.0000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774750/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature computational science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.nature.com/articles/s43588-024-00741-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00741-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Generative language models exhibit social identity biases
Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.