{"title":"多模态计算还是解释?自动理解与批判性理解英语种族主义备忘录中的文本-图像关系","authors":"Chiara Polli , Maria Grazia Sindoni","doi":"10.1016/j.dcm.2024.100755","DOIUrl":null,"url":null,"abstract":"<div><p>This paper discusses the epistemological differences between the label ‘multimodal’ in computational and sociosemiotic terms by addressing the challenges of automatic detection of hate speech in racist memes, considered as germane families of multimodal artifacts. Assuming that text-image interplays, such is the case of memes, may be extremely complex to disentangle by AI-driven models, the paper adopts a sociosemiotic multimodal critical approach to discuss the challenges of automatic detection of hateful memes on the Internet. As a case study, we select two different English-language datasets, 1) the Hateful Memes Challenge (HMC) Dataset, which was built by the Facebook AI Research group in 2020, and 2) the Text-Image Cluster (TIC) Dataset, including manually collected user-generated (UG) hateful memes. By discussing different combinations of non-hateful/hateful texts and non-hateful/hateful images, we will show how humour, intertextuality, and anomalous juxtapositions of texts and images, as well as contextual cultural knowledge, may make AI-based automatic interpretation incorrect, biased or misleading. In our conclusions, we will argue the case for the development of computational models that incorporate insights from sociosemiotics and multimodal critical discourse analysis.</p></div>","PeriodicalId":46649,"journal":{"name":"Discourse Context & Media","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211695824000011/pdfft?md5=34f53fbb557f1af17cea42644e44a543&pid=1-s2.0-S2211695824000011-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Multimodal computation or interpretation? Automatic vs. critical understanding of text-image relations in racist memes in English\",\"authors\":\"Chiara Polli , Maria Grazia Sindoni\",\"doi\":\"10.1016/j.dcm.2024.100755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper discusses the epistemological differences between the label ‘multimodal’ in computational and sociosemiotic terms by addressing the challenges of automatic detection of hate speech in racist memes, considered as germane families of multimodal artifacts. Assuming that text-image interplays, such is the case of memes, may be extremely complex to disentangle by AI-driven models, the paper adopts a sociosemiotic multimodal critical approach to discuss the challenges of automatic detection of hateful memes on the Internet. As a case study, we select two different English-language datasets, 1) the Hateful Memes Challenge (HMC) Dataset, which was built by the Facebook AI Research group in 2020, and 2) the Text-Image Cluster (TIC) Dataset, including manually collected user-generated (UG) hateful memes. By discussing different combinations of non-hateful/hateful texts and non-hateful/hateful images, we will show how humour, intertextuality, and anomalous juxtapositions of texts and images, as well as contextual cultural knowledge, may make AI-based automatic interpretation incorrect, biased or misleading. In our conclusions, we will argue the case for the development of computational models that incorporate insights from sociosemiotics and multimodal critical discourse analysis.</p></div>\",\"PeriodicalId\":46649,\"journal\":{\"name\":\"Discourse Context & Media\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2211695824000011/pdfft?md5=34f53fbb557f1af17cea42644e44a543&pid=1-s2.0-S2211695824000011-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discourse Context & Media\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211695824000011\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discourse Context & Media","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211695824000011","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
Multimodal computation or interpretation? Automatic vs. critical understanding of text-image relations in racist memes in English
This paper discusses the epistemological differences between the label ‘multimodal’ in computational and sociosemiotic terms by addressing the challenges of automatic detection of hate speech in racist memes, considered as germane families of multimodal artifacts. Assuming that text-image interplays, such is the case of memes, may be extremely complex to disentangle by AI-driven models, the paper adopts a sociosemiotic multimodal critical approach to discuss the challenges of automatic detection of hateful memes on the Internet. As a case study, we select two different English-language datasets, 1) the Hateful Memes Challenge (HMC) Dataset, which was built by the Facebook AI Research group in 2020, and 2) the Text-Image Cluster (TIC) Dataset, including manually collected user-generated (UG) hateful memes. By discussing different combinations of non-hateful/hateful texts and non-hateful/hateful images, we will show how humour, intertextuality, and anomalous juxtapositions of texts and images, as well as contextual cultural knowledge, may make AI-based automatic interpretation incorrect, biased or misleading. In our conclusions, we will argue the case for the development of computational models that incorporate insights from sociosemiotics and multimodal critical discourse analysis.