{"title":"基于BERT-ResNet50的多模态情感分析","authors":"Senchang Zhang, Yue He, Lei Li, Yaowen Dou","doi":"10.1117/12.2679113","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that the information difference between modalities in the current multimodal sentiment analysis model and the insufficient fusion between modalities lead to the low accuracy of network prediction, this paper designs a multimodal sentiment analysis model based on BERT-ResNet50. The model uses BERT and ResNet50 to extract text and image features respectively, fuses multi-modal information through the encoder layer of Transformer, and finally uses the Softmax layer to classify multi-modal information. The dataset used in this paper is the Twitter sarcasm public dataset. Through experiments, the BERT-ResNet50 model proposed in this paper is higher than the comparison models in accuracy, recall rate and F1 value, and the accuracy reaches 74.05%. Ablation experiments show that the accuracy of the model in multi-modal sentiment analysis is higher than that in single-modal sentiment analysis.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal sentiment analysis with BERT-ResNet50\",\"authors\":\"Senchang Zhang, Yue He, Lei Li, Yaowen Dou\",\"doi\":\"10.1117/12.2679113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problem that the information difference between modalities in the current multimodal sentiment analysis model and the insufficient fusion between modalities lead to the low accuracy of network prediction, this paper designs a multimodal sentiment analysis model based on BERT-ResNet50. The model uses BERT and ResNet50 to extract text and image features respectively, fuses multi-modal information through the encoder layer of Transformer, and finally uses the Softmax layer to classify multi-modal information. The dataset used in this paper is the Twitter sarcasm public dataset. Through experiments, the BERT-ResNet50 model proposed in this paper is higher than the comparison models in accuracy, recall rate and F1 value, and the accuracy reaches 74.05%. Ablation experiments show that the accuracy of the model in multi-modal sentiment analysis is higher than that in single-modal sentiment analysis.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2679113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2679113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Aiming at the problem that the information difference between modalities in the current multimodal sentiment analysis model and the insufficient fusion between modalities lead to the low accuracy of network prediction, this paper designs a multimodal sentiment analysis model based on BERT-ResNet50. The model uses BERT and ResNet50 to extract text and image features respectively, fuses multi-modal information through the encoder layer of Transformer, and finally uses the Softmax layer to classify multi-modal information. The dataset used in this paper is the Twitter sarcasm public dataset. Through experiments, the BERT-ResNet50 model proposed in this paper is higher than the comparison models in accuracy, recall rate and F1 value, and the accuracy reaches 74.05%. Ablation experiments show that the accuracy of the model in multi-modal sentiment analysis is higher than that in single-modal sentiment analysis.