The inter-symbol interference caused by channel distortion in the communication process seriously affects the communication quality, and this problem is often solved by equalization technology. The principle of the traditional blind equalization constant modulus algorithm (CMA) with fixed step is introduced, and the problem of fast convergence speed and small steady-state error is analyzed by simulation. In order to solve this problem, a blind equalization algorithm with variable step size based on CMA is proposed. In order to solve the problem of fast convergence speed and small steady-state error, the CMA algorithm is improved and the principle of the improved algorithm is described, and the influence of the step on the performance of the algorithm is analyzed. Finally, the simulation experiment proves that the improved algorithm can speed up the convergence speed and keep a small steady-state error at the same time.
{"title":"Study on Variable Step Size Blind Equalization Algorithm Based on CMA","authors":"Mingyu Yang, Dongming Xu","doi":"10.1145/3573942.3573997","DOIUrl":"https://doi.org/10.1145/3573942.3573997","url":null,"abstract":"The inter-symbol interference caused by channel distortion in the communication process seriously affects the communication quality, and this problem is often solved by equalization technology. The principle of the traditional blind equalization constant modulus algorithm (CMA) with fixed step is introduced, and the problem of fast convergence speed and small steady-state error is analyzed by simulation. In order to solve this problem, a blind equalization algorithm with variable step size based on CMA is proposed. In order to solve the problem of fast convergence speed and small steady-state error, the CMA algorithm is improved and the principle of the improved algorithm is described, and the influence of the step on the performance of the algorithm is analyzed. Finally, the simulation experiment proves that the improved algorithm can speed up the convergence speed and keep a small steady-state error at the same time.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115643709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a lightweight target detection network, YOLOv5 is popular in the industry for its advantages of fast speed and small model, but the detection accuracy is not very high. In response to this problem, we propose an improved model SN-YOLO based on YOLOv5. First, we introduce Softer-NMS as the post-processing method of the model, which will make the prediction box more accurate. Secondly, we improved the loss function of the original algorithm and introduced the SIOU loss function to optimize the model and improve the accuracy of the algorithm. Finally, in order to improve the feature extraction ability of the backbone, we implanted the CBAM (Convolutional block attention module) module into the algorithm. We validate the model using the 2007 and 2012 datasets of PASCAL VOC. The experimental results show that SN-YOLO has a great improvement over the original model in all aspects. The effectiveness of the algorithm is verified.
{"title":"SN-YOLO: Improved YOLOv5 with Softer-NMS and SIOU for Object Detection","authors":"Wanyu Deng, Zhen Wang","doi":"10.1145/3573942.3574029","DOIUrl":"https://doi.org/10.1145/3573942.3574029","url":null,"abstract":"As a lightweight target detection network, YOLOv5 is popular in the industry for its advantages of fast speed and small model, but the detection accuracy is not very high. In response to this problem, we propose an improved model SN-YOLO based on YOLOv5. First, we introduce Softer-NMS as the post-processing method of the model, which will make the prediction box more accurate. Secondly, we improved the loss function of the original algorithm and introduced the SIOU loss function to optimize the model and improve the accuracy of the algorithm. Finally, in order to improve the feature extraction ability of the backbone, we implanted the CBAM (Convolutional block attention module) module into the algorithm. We validate the model using the 2007 and 2012 datasets of PASCAL VOC. The experimental results show that SN-YOLO has a great improvement over the original model in all aspects. The effectiveness of the algorithm is verified.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134236159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linfeng Yang, Zhixiang Zhu, Chenwu Wang, Pei Wang, Shaobo Hei
For action recognition, two-stream networks consisting of RGB and optical flow has been widely used, showing high recognition accuracy. However, optical flow computation is time-consuming and requires a large amount of storage space, and the recognition efficiency is very low. To alleviate this problem, we propose an Adaptive Multi-Scale Residual (AMSR) module and a Long Short Term Motion Squeeze (LSMS) module, which are inserted into the 2D convolutional neural network to improve the accuracy of action recognition and achieve a balance of accuracy and speed. The AMSR module adaptively fuses multi-scale feature maps to fully utilize the semantic information provided by deep feature maps and the detailed information provided by shallow feature maps. The LSMS module is a learnable lightweight motion feature extractor for learning long-term motion features of adjacent and non-adjacent frames, thus replacing the traditional optical flow and improving the accuracy of action recognition. Experimental results on UCF-101 and HMDB-51 datasets demonstrate that the method proposed in this paper achieves competitive performance compared to state-of-the-art methods with only a small increase in parameters and computational cost.
{"title":"Joint Multi-Scale Residual and Motion Feature Learning for Action Recognition","authors":"Linfeng Yang, Zhixiang Zhu, Chenwu Wang, Pei Wang, Shaobo Hei","doi":"10.1145/3573942.3574082","DOIUrl":"https://doi.org/10.1145/3573942.3574082","url":null,"abstract":"For action recognition, two-stream networks consisting of RGB and optical flow has been widely used, showing high recognition accuracy. However, optical flow computation is time-consuming and requires a large amount of storage space, and the recognition efficiency is very low. To alleviate this problem, we propose an Adaptive Multi-Scale Residual (AMSR) module and a Long Short Term Motion Squeeze (LSMS) module, which are inserted into the 2D convolutional neural network to improve the accuracy of action recognition and achieve a balance of accuracy and speed. The AMSR module adaptively fuses multi-scale feature maps to fully utilize the semantic information provided by deep feature maps and the detailed information provided by shallow feature maps. The LSMS module is a learnable lightweight motion feature extractor for learning long-term motion features of adjacent and non-adjacent frames, thus replacing the traditional optical flow and improving the accuracy of action recognition. Experimental results on UCF-101 and HMDB-51 datasets demonstrate that the method proposed in this paper achieves competitive performance compared to state-of-the-art methods with only a small increase in parameters and computational cost.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current face forgery methods based on deep learning are becoming more mature and abundant, and existing detection techniques have some limitations and applicability issues that make it difficult to effectively detect such behaviour. In this paper, we propose an enhanced dual-stream FC_2_stream network model based on dual-stream networks to detect forged regions in manipulated face images through end-to-end training of the images. The RGB stream is used to extract features from the RGB image to find the forged traces; the noise stream uses the filtering layer of the SRM (Steganalysis Rich Model) model to extract the noise features and find the inconsistency between the noise in the real region and the forged region in the fake face, then the features of the two streams are fused with a bilinear pooling layer to predict the forged region, and finally the forged region is determined by whether the blending boundary of the forged image is displayed to determine the image authenticity. Experiments conducted on four benchmark datasets show that our model is still effective against forgeries generated by unknown face manipulation methods, and also demonstrate the superior generalisation capability of our model.
{"title":"A Novel Face Forgery Detection Method Based on Augmented Dual-Stream Networks","authors":"Yumei Liu, Yong Zhang, Weiran Liu","doi":"10.1145/3573942.3574030","DOIUrl":"https://doi.org/10.1145/3573942.3574030","url":null,"abstract":"The current face forgery methods based on deep learning are becoming more mature and abundant, and existing detection techniques have some limitations and applicability issues that make it difficult to effectively detect such behaviour. In this paper, we propose an enhanced dual-stream FC_2_stream network model based on dual-stream networks to detect forged regions in manipulated face images through end-to-end training of the images. The RGB stream is used to extract features from the RGB image to find the forged traces; the noise stream uses the filtering layer of the SRM (Steganalysis Rich Model) model to extract the noise features and find the inconsistency between the noise in the real region and the forged region in the fake face, then the features of the two streams are fused with a bilinear pooling layer to predict the forged region, and finally the forged region is determined by whether the blending boundary of the forged image is displayed to determine the image authenticity. Experiments conducted on four benchmark datasets show that our model is still effective against forgeries generated by unknown face manipulation methods, and also demonstrate the superior generalisation capability of our model.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130377352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to solve the problem that the micro moving gesture features are not obvious and difficult to be identified, a micro moving gesture recognition method based on multi-scale fusion deep network for millimeter wave radar is proposed in this paper. The method is mainly composed of 2D convolution module, multi-scale fusion module and attention mechanism module. The multi-scale fusion module is composed of three residual blocks of different scales, which can obtain receptive fields of different sizes and obtain multi-scale features. Meanwhile, residual blocks of different scales are fused to increase the diversity of the network and better extract the deep features of the data. The Squeeze-and-congestion (SE) attention mechanism module is added to suppress the channel characteristics with little information. This improves the network identification accuracy and reduces the number of parameters and computation. The experimental results show that this method is simple to implement, doesn't need to do complex data preprocessing. The convergence speed of the network is fast, which can realize the effective recognition of the micro moving gesture.
{"title":"Radar micro moving gesture recognition method based on multi-scale fusion deep network","authors":"Zhiqiang Bao, Tiantian Liu","doi":"10.1145/3573942.3574076","DOIUrl":"https://doi.org/10.1145/3573942.3574076","url":null,"abstract":"In order to solve the problem that the micro moving gesture features are not obvious and difficult to be identified, a micro moving gesture recognition method based on multi-scale fusion deep network for millimeter wave radar is proposed in this paper. The method is mainly composed of 2D convolution module, multi-scale fusion module and attention mechanism module. The multi-scale fusion module is composed of three residual blocks of different scales, which can obtain receptive fields of different sizes and obtain multi-scale features. Meanwhile, residual blocks of different scales are fused to increase the diversity of the network and better extract the deep features of the data. The Squeeze-and-congestion (SE) attention mechanism module is added to suppress the channel characteristics with little information. This improves the network identification accuracy and reduces the number of parameters and computation. The experimental results show that this method is simple to implement, doesn't need to do complex data preprocessing. The convergence speed of the network is fast, which can realize the effective recognition of the micro moving gesture.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134167521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most of the traditional detection methods for crop diseases and insect pests are manually operated in the field according to the experience and technology of the staff, which have the disadvantages of long time and low efficiency. With the development of deep learning technology, the application of complex deep neural network algorithm models in the field of crop diseases and insect pests can effectively solve the above problems, however, the current research on the identification method of crop diseases and insect pests only focuses on the identification and analysis of single crop diseases and insect pests, and does not analyze and improve the analysis and improvement of various crops. Therefore, this paper proposes a recognition model of crop pests and diseases based on convolutional neural network. First, on the bilinear network model, the ResNet50 network is used as the feature extractor, that is, the backbone network of the network, instead of the original VGG-D and VGG-M backbone networks. Secondly, a connect module is added to design the bilinear network model and the extractor to do mutual outer product with the previous features of different levels, so that it is connected with the outer product of the feature vector. Finally, the loss function is used to conduct experiments on the AI Challenger 2018 crop pest and disease dataset. The experimental results show that the average recognition rate of the improved B-CNN-ResNet50-connect network model reaches 89.62%.
{"title":"Research on Recognition Model of Crop Diseases and Insect Pests Based on Convolutional Neural Network","authors":"Pi Qiao, Zilu Wang","doi":"10.1145/3573942.3574087","DOIUrl":"https://doi.org/10.1145/3573942.3574087","url":null,"abstract":"Most of the traditional detection methods for crop diseases and insect pests are manually operated in the field according to the experience and technology of the staff, which have the disadvantages of long time and low efficiency. With the development of deep learning technology, the application of complex deep neural network algorithm models in the field of crop diseases and insect pests can effectively solve the above problems, however, the current research on the identification method of crop diseases and insect pests only focuses on the identification and analysis of single crop diseases and insect pests, and does not analyze and improve the analysis and improvement of various crops. Therefore, this paper proposes a recognition model of crop pests and diseases based on convolutional neural network. First, on the bilinear network model, the ResNet50 network is used as the feature extractor, that is, the backbone network of the network, instead of the original VGG-D and VGG-M backbone networks. Secondly, a connect module is added to design the bilinear network model and the extractor to do mutual outer product with the previous features of different levels, so that it is connected with the outer product of the feature vector. Finally, the loss function is used to conduct experiments on the AI Challenger 2018 crop pest and disease dataset. The experimental results show that the average recognition rate of the improved B-CNN-ResNet50-connect network model reaches 89.62%.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133422484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a chaotic encryption algorithm based on Latin squares and adaptive Z-diffusion. First, in order to improve the defects of the traditional Sine system, two-dimensional enhance Sine chaotic system (2D-ESCS) is designed. In terms of bifurcation diagram, Lyapunov exponent and NIST, we can observe that 2D-ESCS have continuous and large chaotic ranges. Second, the generation of Latin squares through pseudorandom sequences generated by 2D-ESCS and then perform scrambling operation with the image. Third, adaptive Z-diffusion depends on the location of the pixels. the cipher image is calculated by different combinations of pseudorandom numbers, plain images pixel values and intermediate cipher image pixel values. Finally, simulation experiments and security analysis show that the proposed algorithm has a high security level to resist various cryptanalytic attacks and a high execution efficiency.
{"title":"Image Encryption Algorithm Based on Latin Squares and Adaptive Z-Diffusion","authors":"Yangguang Lou, Shu-cui Xie, Jianzhong Zhang","doi":"10.1145/3573942.3574062","DOIUrl":"https://doi.org/10.1145/3573942.3574062","url":null,"abstract":"This paper proposes a chaotic encryption algorithm based on Latin squares and adaptive Z-diffusion. First, in order to improve the defects of the traditional Sine system, two-dimensional enhance Sine chaotic system (2D-ESCS) is designed. In terms of bifurcation diagram, Lyapunov exponent and NIST, we can observe that 2D-ESCS have continuous and large chaotic ranges. Second, the generation of Latin squares through pseudorandom sequences generated by 2D-ESCS and then perform scrambling operation with the image. Third, adaptive Z-diffusion depends on the location of the pixels. the cipher image is calculated by different combinations of pseudorandom numbers, plain images pixel values and intermediate cipher image pixel values. Finally, simulation experiments and security analysis show that the proposed algorithm has a high security level to resist various cryptanalytic attacks and a high execution efficiency.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133683634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the industrial Internet environment, the introduction of network slicing supports the connection of a large number of devices with different service requirements (QoS) sharing the same physical resources. Aiming at the problem of the adaptability of massive terminal devices and networks in industrial heterogeneous scenarios, this paper proposes a network slice prediction algorithm based on multi-dimensional and deep neural network (MDNN) based on the multi-dimensional resource network requirements of different terminal devices in specific industrial scenarios. The network slice prediction algorithm predicts the network resources required by the device at the next moment according to the historical network requirements and historical slice selection of the device, and selects the appropriate network slice for the device according to the prediction result. The simulation results show that the prediction accuracy of the proposed algorithm can reach 98.70%, which greatly improves the adaptability of the device and the network.
{"title":"Industrial Internet Network Slice Prediction Algorithm Based on Multidimensional and Deep Neural Networks","authors":"Jihong Zhao, Gao-Jing Peng","doi":"10.1145/3573942.3573989","DOIUrl":"https://doi.org/10.1145/3573942.3573989","url":null,"abstract":"In the industrial Internet environment, the introduction of network slicing supports the connection of a large number of devices with different service requirements (QoS) sharing the same physical resources. Aiming at the problem of the adaptability of massive terminal devices and networks in industrial heterogeneous scenarios, this paper proposes a network slice prediction algorithm based on multi-dimensional and deep neural network (MDNN) based on the multi-dimensional resource network requirements of different terminal devices in specific industrial scenarios. The network slice prediction algorithm predicts the network resources required by the device at the next moment according to the historical network requirements and historical slice selection of the device, and selects the appropriate network slice for the device according to the prediction result. The simulation results show that the prediction accuracy of the proposed algorithm can reach 98.70%, which greatly improves the adaptability of the device and the network.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133822980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to solve the video-based crowd counting problem. Firstly, the first ten layers of VGG-16 are used as the backbone network to extract features, and a single layer of ConvLSTM captures the time correlation of adjacent frames. Then, stacked dilated convolutional layers are used to enlarge the receptive field without increasing the computational load. Finally, a convolutional block attention module is introduced with the adaptive refinement of feature mapping. Its ability to emphasize or suppress information in the channel and spatial dimensions aids information dissemination. Experimental results on the two reference datasets (i.e., Mall and WorldExpo'10) show that the proposed method further improves the accuracy of crowd counting and is superior to the other existing crowd counting methods.
{"title":"A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting","authors":"Shangjie Zhang, Yuelei Xiao","doi":"10.1145/3573942.3574069","DOIUrl":"https://doi.org/10.1145/3573942.3574069","url":null,"abstract":"For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to solve the video-based crowd counting problem. Firstly, the first ten layers of VGG-16 are used as the backbone network to extract features, and a single layer of ConvLSTM captures the time correlation of adjacent frames. Then, stacked dilated convolutional layers are used to enlarge the receptive field without increasing the computational load. Finally, a convolutional block attention module is introduced with the adaptive refinement of feature mapping. Its ability to emphasize or suppress information in the channel and spatial dimensions aids information dissemination. Experimental results on the two reference datasets (i.e., Mall and WorldExpo'10) show that the proposed method further improves the accuracy of crowd counting and is superior to the other existing crowd counting methods.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115537956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimodal sentiment analysis refers to the use of computers to analyze and identify the emotions that people want to express through the extracted multimodal sentiment features, and it plays a significant role in human-computer interaction and financial market prediction. Most existing approaches to multimodal sentiment analysis use contextual information for modeling, and while this modeling approach can effectively capture the contextual connections within modalities, the correlations between modalities are often overlooked, and the correlations between modalities are also critical to the final recognition results in multimodal sentiment analysis. Therefore, this paper proposes a multimodal sentiment analysis approach based on the universal transformer, a framework that uses the universal transformer to model the connections between multiple modalities while employing effective feature extraction methods to capture the contextual connections of individual modalities. We evaluated our proposed method on two benchmark datasets for multimodal sentiment analysis, CMU-MOSI and CMU-MOSEI, and the results outperformed other methods of the same type.
{"title":"A Method with Universal Transformer for Multimodal Sentiment Analysis","authors":"Hao Ai, Ying Liu, Jie Fang, Sheikh Faisal Rashid","doi":"10.1145/3573942.3573968","DOIUrl":"https://doi.org/10.1145/3573942.3573968","url":null,"abstract":"Multimodal sentiment analysis refers to the use of computers to analyze and identify the emotions that people want to express through the extracted multimodal sentiment features, and it plays a significant role in human-computer interaction and financial market prediction. Most existing approaches to multimodal sentiment analysis use contextual information for modeling, and while this modeling approach can effectively capture the contextual connections within modalities, the correlations between modalities are often overlooked, and the correlations between modalities are also critical to the final recognition results in multimodal sentiment analysis. Therefore, this paper proposes a multimodal sentiment analysis approach based on the universal transformer, a framework that uses the universal transformer to model the connections between multiple modalities while employing effective feature extraction methods to capture the contextual connections of individual modalities. We evaluated our proposed method on two benchmark datasets for multimodal sentiment analysis, CMU-MOSI and CMU-MOSEI, and the results outperformed other methods of the same type.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121151314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}