Generative language models exhibit social identity biases

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Nature computational science Pub Date : 2024-12-12 DOI:10.1038/s43588-024-00741-1
Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek
{"title":"Generative language models exhibit social identity biases","authors":"Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek","doi":"10.1038/s43588-024-00741-1","DOIUrl":null,"url":null,"abstract":"Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 1","pages":"65-75"},"PeriodicalIF":12.0000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774750/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00741-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生成语言模型表现出社会身份偏见。
社会身份偏见,特别是倾向于支持自己的群体(群体内团结)和贬低其他群体(群体外敌意),深深植根于人类的心理和社会行为。然而,这种偏见是否也存在于人工智能系统中尚不清楚。在这里,我们展示了大型语言模型(llm)表现出与人类相似的社会身份偏见模式。通过对77个不同的法学硕士进行句子补全提示(例如,“我们是……”),我们证明了几乎所有的基本模型和一些指令调整和偏好调整的模型都显示出明显的群体内偏爱和群体外背离。这些偏见在受控的实验环境和自然的人与法学硕士的对话中都表现出来。然而,我们发现仔细管理训练数据和专门的微调可以大大降低偏差水平。这些发现对开发更公平的人工智能系统具有重要意义,并强调迫切需要了解人类与法学硕士的互动如何强化现有的社会偏见。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
11.70
自引率
0.00%
发文量
0
期刊最新文献
Based on the science, diversity matters Enhancing synthesis prediction via machine learning Harnessing large language models for data-scarce learning of polymer properties. A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies MultiSTAAR delivers multi-trait rare variant analysis of biobank-scale sequencing data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1