Huaicheng Fang, Fuqing Zhu, Jizhong Han, Songlin Hu
{"title":"通过图像标题监督的多模态仇恨模因检测","authors":"Huaicheng Fang, Fuqing Zhu, Jizhong Han, Songlin Hu","doi":"10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00221","DOIUrl":null,"url":null,"abstract":"A large amount of hateful speech exist on the Internet in the form of text and images uploaded by social media users. Recently, multimodal hateful speech detection task has attracted more and more researchers to invest, producing some representative work for perceiving the negative samples. For this special multimodal task, the ability of multimodal semantic information understanding is particularly crucial. However, the existing models have insufficient understanding ability of image modality semantic compared with the text modality, due to the appearance complexity of each image. Therefore, this paper utilizes the text modality which is well understood by the model to improve understanding ability of image modality semantic. Specifically, this paper proposes an image caption supervision (ICS) auxiliary method for multimodal hateful speech detection, where the image caption is designed to supervise the feature learning of images for further understanding the semantic information. On the Facebook Hateful Memes dataset, the proposed ICS method outperforms some state-of-the-art unimodal and multimodal baselines, demonstrating the effectiveness of ICS.","PeriodicalId":43791,"journal":{"name":"Scalable Computing-Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Hateful Memes Detection via Image Caption Supervision\",\"authors\":\"Huaicheng Fang, Fuqing Zhu, Jizhong Han, Songlin Hu\",\"doi\":\"10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A large amount of hateful speech exist on the Internet in the form of text and images uploaded by social media users. Recently, multimodal hateful speech detection task has attracted more and more researchers to invest, producing some representative work for perceiving the negative samples. For this special multimodal task, the ability of multimodal semantic information understanding is particularly crucial. However, the existing models have insufficient understanding ability of image modality semantic compared with the text modality, due to the appearance complexity of each image. Therefore, this paper utilizes the text modality which is well understood by the model to improve understanding ability of image modality semantic. Specifically, this paper proposes an image caption supervision (ICS) auxiliary method for multimodal hateful speech detection, where the image caption is designed to supervise the feature learning of images for further understanding the semantic information. On the Facebook Hateful Memes dataset, the proposed ICS method outperforms some state-of-the-art unimodal and multimodal baselines, demonstrating the effectiveness of ICS.\",\"PeriodicalId\":43791,\"journal\":{\"name\":\"Scalable Computing-Practice and Experience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scalable Computing-Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scalable Computing-Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Multimodal Hateful Memes Detection via Image Caption Supervision
A large amount of hateful speech exist on the Internet in the form of text and images uploaded by social media users. Recently, multimodal hateful speech detection task has attracted more and more researchers to invest, producing some representative work for perceiving the negative samples. For this special multimodal task, the ability of multimodal semantic information understanding is particularly crucial. However, the existing models have insufficient understanding ability of image modality semantic compared with the text modality, due to the appearance complexity of each image. Therefore, this paper utilizes the text modality which is well understood by the model to improve understanding ability of image modality semantic. Specifically, this paper proposes an image caption supervision (ICS) auxiliary method for multimodal hateful speech detection, where the image caption is designed to supervise the feature learning of images for further understanding the semantic information. On the Facebook Hateful Memes dataset, the proposed ICS method outperforms some state-of-the-art unimodal and multimodal baselines, demonstrating the effectiveness of ICS.
期刊介绍:
The area of scalable computing has matured and reached a point where new issues and trends require a professional forum. SCPE will provide this avenue by publishing original refereed papers that address the present as well as the future of parallel and distributed computing. The journal will focus on algorithm development, implementation and execution on real-world parallel architectures, and application of parallel and distributed computing to the solution of real-life problems.