{"title":"MIMIC:印地语-英语代码混合语言多模态互联网内容中的厌女症识别","authors":"Aakash Singh, Deepawali Sharma, Vivek Kumar Singh","doi":"10.1145/3656169","DOIUrl":null,"url":null,"abstract":"<p>Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorff's alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is suitable for advancing research and development in the area.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language\",\"authors\":\"Aakash Singh, Deepawali Sharma, Vivek Kumar Singh\",\"doi\":\"10.1145/3656169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorff's alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is suitable for advancing research and development in the area.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3656169\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656169","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
多年来,社交媒体已成为人们表达观点和分享各方面想法的最流行平台之一。现在,社交媒体的内容包括文字、图片、视频等多种形式。人们感兴趣的一种类型是 "备忘录",它通常将文字和图片结合在一起。值得一提的是,社交媒体作为一个不受监管的平台,有时也会出现发布歧视性、攻击性和仇恨性内容的情况。这些内容会对用户的在线福祉产生不利影响。因此,开发自动检测此类内容的计算模型非常重要,以便采取适当的纠正措施。因此,自动检测此类内容的研究工作主要集中在文本方面。然而,多模态数据的融合(如备忘录中的数据)给开发可处理此类数据的计算模型带来了各种挑战,对于低资源语言来说更是如此。在这些挑战中,缺乏合适的数据集来开发处理低资源语言中memes的计算模型是一个主要问题。这项工作试图通过提供一个由 5,054 个印地语-英语混合语代码组成的大型数据集来弥补这一研究空白,这些数据集由三个独立的注释者手动注释。它由两个子任务组成:(i) 子任务-1(二元分类,涉及将备忘录标记为厌恶或非厌恶)和 (ii) 子任务-2(将备忘录分为不同类别的多标签分类)。数据质量通过计算克里彭多夫α进行评估。然后在三种情况下对数据应用不同的计算模型:纯文本模型、纯图像模型和使用融合技术的多模态模型。结果表明,所提出的使用融合技术的多模态方法可能是识别多模态互联网内容中厌女症的首选,而且该数据集适合用于推进该领域的研究和开发。
MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language
Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorff's alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is suitable for advancing research and development in the area.
期刊介绍:
The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to:
-Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc.
-Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc.
-Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition.
-Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc.
-Machine Translation involving Asian or low-resource languages.
-Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc.
-Information Extraction and Filtering: including automatic abstraction, user profiling, etc.
-Speech processing: including text-to-speech synthesis and automatic speech recognition.
-Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc.
-Cross-lingual information processing involving Asian or low-resource languages.
-Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.