{"title":"Transformer-based Multimodal Contextual Co-encoding for Humour Detection","authors":"Boya Deng, Jiayin Tian, Hao Li","doi":"10.1109/CoST57098.2022.00067","DOIUrl":null,"url":null,"abstract":"Humor, a unique expression of the human language system different from other emotions, plays a very important role in human communication. Previous works on humor detection have been mostly limited to a single textual modality. From the perspective of human humor perception, various aspects such as text, intonation, mannerisms, and body language can convey humor. From the perspective of the structure of jokes, any combination of textual, acoustic, and visual modalities in various positions in the context can form unexpected humor. Therefore, information that exists among multiple modalities and contexts should be considered simultaneously in humor detection. This paper proposes a humor detection model based on the transformer and contextual co-encoding called Transformer-based Multimodal Contextual Co-encoding (TMCC). The model uses the transformer-based multi-head attention to capture potential information across modalities and contexts first. Then, it uses a convolutional autoencoder to further fuse the overall feature matrix and reduce dimensionality. Finally, a simple multilayer perceptron is used to predict the humor labels. By comparing with common baselines of humor detection, it is demonstrated that our model achieves some performance improvement. The availability of each part of the model is demonstrated through a series of ablation studies.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoST57098.2022.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Humor, a unique expression of the human language system different from other emotions, plays a very important role in human communication. Previous works on humor detection have been mostly limited to a single textual modality. From the perspective of human humor perception, various aspects such as text, intonation, mannerisms, and body language can convey humor. From the perspective of the structure of jokes, any combination of textual, acoustic, and visual modalities in various positions in the context can form unexpected humor. Therefore, information that exists among multiple modalities and contexts should be considered simultaneously in humor detection. This paper proposes a humor detection model based on the transformer and contextual co-encoding called Transformer-based Multimodal Contextual Co-encoding (TMCC). The model uses the transformer-based multi-head attention to capture potential information across modalities and contexts first. Then, it uses a convolutional autoencoder to further fuse the overall feature matrix and reduce dimensionality. Finally, a simple multilayer perceptron is used to predict the humor labels. By comparing with common baselines of humor detection, it is demonstrated that our model achieves some performance improvement. The availability of each part of the model is demonstrated through a series of ablation studies.