{"title":"Understanding and evaluating harms of AI-generated image captions in political images","authors":"Habiba Sarhan, Simon Hegelich","doi":"10.3389/fpos.2023.1245684","DOIUrl":null,"url":null,"abstract":"The use of AI-generated image captions has been increasing. Scholars of disability studies have long studied accessibility and AI issues concerning technology bias, focusing on image captions and tags. However, less attention has been paid to the individuals and social groups depicted in images and captioned using AI. Further research is needed to understand the underlying representational harms that could affect these social groups. This paper investigates the potential representational harms to social groups depicted in images. There is a high risk of harming certain social groups, either by stereotypical descriptions or erasing their identities from the caption, which could affect the understandings, beliefs, and attitudes that people hold about these specific groups. For the purpose of this article, 1,000 images with human-annotated captions were collected from news agencies “politics” sections. Microsoft's Azure Cloud Services was used to generate AI-generated captions with the December 2021 public version. The pattern observed from the politically salient images gathered and their captions highlight the tendency of the model used to generate more generic descriptions, which may potentially harm misrepresented social groups. Consequently, a balance between those harms needs to be struck, which is intertwined with the trade-off between generating generic vs. specific descriptions. The decision to generate generic descriptions, being extra cautious not to use stereotypes, erases and demeans excluded and already underrepresented social groups, while the decision to generate specific descriptions stereotypes social groups as well as reifies them. The appropriate trade-off is, therefore, crucial, especially when examining politically salient images.","PeriodicalId":34431,"journal":{"name":"Frontiers in Political Science","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Political Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fpos.2023.1245684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTERNATIONAL RELATIONS","Score":null,"Total":0}
引用次数: 1
Abstract
The use of AI-generated image captions has been increasing. Scholars of disability studies have long studied accessibility and AI issues concerning technology bias, focusing on image captions and tags. However, less attention has been paid to the individuals and social groups depicted in images and captioned using AI. Further research is needed to understand the underlying representational harms that could affect these social groups. This paper investigates the potential representational harms to social groups depicted in images. There is a high risk of harming certain social groups, either by stereotypical descriptions or erasing their identities from the caption, which could affect the understandings, beliefs, and attitudes that people hold about these specific groups. For the purpose of this article, 1,000 images with human-annotated captions were collected from news agencies “politics” sections. Microsoft's Azure Cloud Services was used to generate AI-generated captions with the December 2021 public version. The pattern observed from the politically salient images gathered and their captions highlight the tendency of the model used to generate more generic descriptions, which may potentially harm misrepresented social groups. Consequently, a balance between those harms needs to be struck, which is intertwined with the trade-off between generating generic vs. specific descriptions. The decision to generate generic descriptions, being extra cautious not to use stereotypes, erases and demeans excluded and already underrepresented social groups, while the decision to generate specific descriptions stereotypes social groups as well as reifies them. The appropriate trade-off is, therefore, crucial, especially when examining politically salient images.