The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat
{"title":"The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat","authors":"Aleksandra Urman , Mykola Makhortykh","doi":"10.1016/j.tele.2024.102211","DOIUrl":null,"url":null,"abstract":"<div><div>This article presents a comparative analysis of political bias in the outputs of three Large Language Model (LLM)-based chatbots – ChatGPT (GPT3.5, GPT4, GPT4o), Bing Chat, and Bard/Gemini – in response to political queries concerning the authoritarian regime in Russia. We investigate whether safeguards implemented in these chatbots contribute to the censorship of information that is viewed as harmful by the regime, in particular information about Vladimir Putin and the Russian war against Ukraine, and whether these safeguards enable the generation of false claims, in particular in relation to the regime’s internal and external opponents. To detect whether LLM safeguards reiterate political bias, the article compares the outputs of prompts focusing on Putin’s regime and the ones dealing with the Russian opposition and the US and Ukrainian politicians. It also examines whether the degree of bias varies depending on the language of the prompt and compares outputs concerning political personalities and issues across three languages: Russian, Ukrainian, and English. The results reveal significant disparities in how individual chatbots withhold politics-related information or produce false claims in relation to it. Notably, Bard consistently refused to respond to queries about Vladimir Putin in Russian, even when the relevant information was accessible via Google Search, and generally followed the censorship guidelines that, according to Yandex-related data leaks, were issued by the Russian authorities. A subsequent evaluation of Gemini showed that the chatbot restricts political information beyond what was officially confirmed by Google. In terms of false claims, we find substantial variation across languages with Ukrainian and Russian prompts generating false information more often and Bard being more prone to produce false claims in relation to Russian regime opponents (e.g., Navalny or Zelenskyy) than other chatbots. We also found that while GPT4 and GPT4o generate less factually incorrect information, both models still make mistakes, with their prevalence being higher in Russian and Ukrainian than in English. This research aims to stimulate further dialogue and research on developing safeguards against the misuse of LLMs outside of democratic environments.</div></div>","PeriodicalId":48257,"journal":{"name":"Telematics and Informatics","volume":"96 ","pages":"Article 102211"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematics and Informatics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736585324001151","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This article presents a comparative analysis of political bias in the outputs of three Large Language Model (LLM)-based chatbots – ChatGPT (GPT3.5, GPT4, GPT4o), Bing Chat, and Bard/Gemini – in response to political queries concerning the authoritarian regime in Russia. We investigate whether safeguards implemented in these chatbots contribute to the censorship of information that is viewed as harmful by the regime, in particular information about Vladimir Putin and the Russian war against Ukraine, and whether these safeguards enable the generation of false claims, in particular in relation to the regime’s internal and external opponents. To detect whether LLM safeguards reiterate political bias, the article compares the outputs of prompts focusing on Putin’s regime and the ones dealing with the Russian opposition and the US and Ukrainian politicians. It also examines whether the degree of bias varies depending on the language of the prompt and compares outputs concerning political personalities and issues across three languages: Russian, Ukrainian, and English. The results reveal significant disparities in how individual chatbots withhold politics-related information or produce false claims in relation to it. Notably, Bard consistently refused to respond to queries about Vladimir Putin in Russian, even when the relevant information was accessible via Google Search, and generally followed the censorship guidelines that, according to Yandex-related data leaks, were issued by the Russian authorities. A subsequent evaluation of Gemini showed that the chatbot restricts political information beyond what was officially confirmed by Google. In terms of false claims, we find substantial variation across languages with Ukrainian and Russian prompts generating false information more often and Bard being more prone to produce false claims in relation to Russian regime opponents (e.g., Navalny or Zelenskyy) than other chatbots. We also found that while GPT4 and GPT4o generate less factually incorrect information, both models still make mistakes, with their prevalence being higher in Russian and Ukrainian than in English. This research aims to stimulate further dialogue and research on developing safeguards against the misuse of LLMs outside of democratic environments.
期刊介绍:
Telematics and Informatics is an interdisciplinary journal that publishes cutting-edge theoretical and methodological research exploring the social, economic, geographic, political, and cultural impacts of digital technologies. It covers various application areas, such as smart cities, sensors, information fusion, digital society, IoT, cyber-physical technologies, privacy, knowledge management, distributed work, emergency response, mobile communications, health informatics, social media's psychosocial effects, ICT for sustainable development, blockchain, e-commerce, and e-government.