Many existing generative adversarial networks (GANs) lack effective semantic modeling, leading to unnatural local details and blurring in generated images. In this work, based on DivCo, we propose a Symmetric Dual-Attention Generative Adversarial Network (DivCo-SDAGAN) with channel and spatial feature fusion in which the Dual-Attention Module (DAM) is introduced to strengthen the feature representation ability of the network to synthesize photo-realistic images with more natural local details. The Channel Weighted Aggregation Module (CWAM) and the Spatial Attention Module (SAM) of the DAM are designed to capture the semantic information of channel dimension and spatial dimension, respectively, and they can be easily integrated into other GANs-based models. Extensive experiments show that the proposed DivCo-SDAGAN can produce more diverse images under the same input, achieving more satisfactory results than other existing methods.
{"title":"A Symmetric Dual-Attention Generative Adversarial Network with Channel and Spatial Features Fusion","authors":"Jiaming Zhang, Xinfeng Zhang, Bo Zhang, Maoshen Jia, Yuqing Liang, Yitian Zhang","doi":"10.1145/3581807.3581904","DOIUrl":"https://doi.org/10.1145/3581807.3581904","url":null,"abstract":"Many existing generative adversarial networks (GANs) lack effective semantic modeling, leading to unnatural local details and blurring in generated images. In this work, based on DivCo, we propose a Symmetric Dual-Attention Generative Adversarial Network (DivCo-SDAGAN) with channel and spatial feature fusion in which the Dual-Attention Module (DAM) is introduced to strengthen the feature representation ability of the network to synthesize photo-realistic images with more natural local details. The Channel Weighted Aggregation Module (CWAM) and the Spatial Attention Module (SAM) of the DAM are designed to capture the semantic information of channel dimension and spatial dimension, respectively, and they can be easily integrated into other GANs-based models. Extensive experiments show that the proposed DivCo-SDAGAN can produce more diverse images under the same input, achieving more satisfactory results than other existing methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127202707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zain Sayeed, Daniel R. Cavazos, Tannor Court, Chaoyang Chen, Bryan E. Little, Hussein F. Darwiche
Acute blood loss anemia requiring allogeneic blood transfusion with inherent risks is still a postoperative complication of total knee arthroplasty (TKA). This study aimed to use machine learning models for the prediction of blood transfusion following primary TKA and to identify contributing factors. A total of 1328 patients who underwent primary TKA in our institute were evaluated using data extracted MARQCI database to identify patient demographics and surgical variables that may be associated with blood transfusion. Multilayer perceptron neural networks (MPNN) machine learning algorithm was used to predict transfusion rates and the importance of factors associated with blood transfusion following TKA. Statistical analyses including bivariate correlate analysis, Chi-Square test, and t test were performed for demographic analysis and to determine the correlation between blood transfusion and other variables. Results demonstrated important factors associated with transfusion rates include pre- and post-operative hemoglobin level, ASA score, tranexamic acid usage, age, BMI and other factors. The MPNN machine learning achieved excellent performance across discrimination (AUC=0.997). This study demonstrated that MPNN for the prediction of patient-specific blood transfusion rates following TKA represented a novel application of machine learning with the potential to improve pre-operative planning for treatment outcome.
{"title":"Machine Learning for Prediction of Blood Transfusion Rates in Primary Total Knee Arthroplasty","authors":"Zain Sayeed, Daniel R. Cavazos, Tannor Court, Chaoyang Chen, Bryan E. Little, Hussein F. Darwiche","doi":"10.1145/3581807.3581894","DOIUrl":"https://doi.org/10.1145/3581807.3581894","url":null,"abstract":"Acute blood loss anemia requiring allogeneic blood transfusion with inherent risks is still a postoperative complication of total knee arthroplasty (TKA). This study aimed to use machine learning models for the prediction of blood transfusion following primary TKA and to identify contributing factors. A total of 1328 patients who underwent primary TKA in our institute were evaluated using data extracted MARQCI database to identify patient demographics and surgical variables that may be associated with blood transfusion. Multilayer perceptron neural networks (MPNN) machine learning algorithm was used to predict transfusion rates and the importance of factors associated with blood transfusion following TKA. Statistical analyses including bivariate correlate analysis, Chi-Square test, and t test were performed for demographic analysis and to determine the correlation between blood transfusion and other variables. Results demonstrated important factors associated with transfusion rates include pre- and post-operative hemoglobin level, ASA score, tranexamic acid usage, age, BMI and other factors. The MPNN machine learning achieved excellent performance across discrimination (AUC=0.997). This study demonstrated that MPNN for the prediction of patient-specific blood transfusion rates following TKA represented a novel application of machine learning with the potential to improve pre-operative planning for treatment outcome.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125812386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongbin Xu, Xiantao Jiang, Tao Yin, Qi Cen, Zhijian Zhang, Tian Song, F. Yu
Shipping safety is one of the factors restricting the development of navigation. In particular, the route near the shore is prone to unknown risks due to the existence of multiple types of ships, the density of ships, the shielding between ships, and other reasons. This paper presents a method for detecting medium-range ships, which can improve security for ships. This method is based on the You Only Look Once Version 5 network (YOLOv5). To improve the accuracy, the coordinate attention model is integrated into the detection network. The main research content and experimental work of this paper are as follows. Firstly, the YOLOv5 network and spatial attention mechanism are analyzed. Then, detection experiments were carried out based on YOLOv5 and Singapore Maritime Data Set (SMD). Then, the coordinate attention model was used to improve the network. Finally, by adjusting training parameters and improving attention, the mAP of test results of the object detection network reaches 73%, and the feasibility of object detection of the YOLOv5 algorithm with coordinate attention is confirmed.
航运安全是制约航运业发展的因素之一。特别是近岸航线,由于多类型船舶的存在、船舶密度、船舶间的屏蔽等原因,容易出现未知风险。本文提出了一种检测中程船舶的方法,提高了船舶的安全性。这种方法基于You Only Look Once Version 5网络(YOLOv5)。为了提高检测网络的准确性,将坐标注意模型集成到检测网络中。本文的主要研究内容和实验工作如下:首先,分析了YOLOv5网络和空间注意机制。然后,基于YOLOv5和新加坡海事数据集(SMD)进行检测实验。然后,采用坐标注意模型对网络进行改进。最后,通过调整训练参数和提高注意力,目标检测网络测试结果的mAP达到73%,验证了坐标关注下YOLOv5算法目标检测的可行性。
{"title":"Coordinate Attention-enabled Ship Object Detection with Electro-optical Image","authors":"Hongbin Xu, Xiantao Jiang, Tao Yin, Qi Cen, Zhijian Zhang, Tian Song, F. Yu","doi":"10.1145/3581807.3581815","DOIUrl":"https://doi.org/10.1145/3581807.3581815","url":null,"abstract":"Shipping safety is one of the factors restricting the development of navigation. In particular, the route near the shore is prone to unknown risks due to the existence of multiple types of ships, the density of ships, the shielding between ships, and other reasons. This paper presents a method for detecting medium-range ships, which can improve security for ships. This method is based on the You Only Look Once Version 5 network (YOLOv5). To improve the accuracy, the coordinate attention model is integrated into the detection network. The main research content and experimental work of this paper are as follows. Firstly, the YOLOv5 network and spatial attention mechanism are analyzed. Then, detection experiments were carried out based on YOLOv5 and Singapore Maritime Data Set (SMD). Then, the coordinate attention model was used to improve the network. Finally, by adjusting training parameters and improving attention, the mAP of test results of the object detection network reaches 73%, and the feasibility of object detection of the YOLOv5 algorithm with coordinate attention is confirmed.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"447 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116561408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate implementation of action classification for the elderly on lightweight convolutional neural networks benefits resource-limited embedded and mobile devices in the healthcare industry. The study proposes a lightweight convolutional neural network model called mD-MobileNet. The micro-Doppler feature spectrograms of 106 elderly people were studied as a dataset. Transfer learning methods were used to train the proposed model, and three lightweight convolutional neural networks (MobileNetV3-Small, ShuffleNetV2, and EfficientNet-B0) were compared using the same training method. All of these models were able to correctly classify various actions. By comparison, mD-MobileNet gave the best classification results. mD-MobileNet’s Top-1 Accuracy reached 96.1% while Marco F1 was 96.30. By comparing the results with Grad-CAM’s visualization and analyzing them in conjunction with its network structure features, it was determined that mD-MobileNet has the best local perception with the least number of model parameters and the highest accuracy rate compared to other models.
{"title":"Activity classification of the elderly based on lightweight convolutional neural networks","authors":"Hanzhang Ding, Wenzhang Zhu","doi":"10.1145/3581807.3581834","DOIUrl":"https://doi.org/10.1145/3581807.3581834","url":null,"abstract":"Accurate implementation of action classification for the elderly on lightweight convolutional neural networks benefits resource-limited embedded and mobile devices in the healthcare industry. The study proposes a lightweight convolutional neural network model called mD-MobileNet. The micro-Doppler feature spectrograms of 106 elderly people were studied as a dataset. Transfer learning methods were used to train the proposed model, and three lightweight convolutional neural networks (MobileNetV3-Small, ShuffleNetV2, and EfficientNet-B0) were compared using the same training method. All of these models were able to correctly classify various actions. By comparison, mD-MobileNet gave the best classification results. mD-MobileNet’s Top-1 Accuracy reached 96.1% while Marco F1 was 96.30. By comparing the results with Grad-CAM’s visualization and analyzing them in conjunction with its network structure features, it was determined that mD-MobileNet has the best local perception with the least number of model parameters and the highest accuracy rate compared to other models.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"67 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122803459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analysis over product reviews has drawn much attention due to its wide application. Most of the sentiment analysis research focuses on entertainment and catering due to the limitation of existing public datasets. In order to promote the comprehensiveness of data in the field of sentiment analysis, we present a new large-scale multi-sentiment tobacco dataset by distilling effective consumer experience information from massive online reviews of tobacco consumption. The release of this dataset would push forward the research in tobacco field. With the goal of advancing and facilitating the research of the overall sentiment of sentences with multiple aspects, we propose simple yet effective EHCRNN model, which combines the strengths of recent NLP advances. Experiments on our new dataset and the public nlpcc2014 task dataset show that the proposed model significantly outperforms the state-of-the-art baseline methods.
{"title":"An Effective Sentiment Analysis Model for Tobacco Consumption","authors":"Yanru Hao, Tianchi Yang, Chuan Shi, Rui Wang, Ding Xiao","doi":"10.1145/3581807.3581880","DOIUrl":"https://doi.org/10.1145/3581807.3581880","url":null,"abstract":"Analysis over product reviews has drawn much attention due to its wide application. Most of the sentiment analysis research focuses on entertainment and catering due to the limitation of existing public datasets. In order to promote the comprehensiveness of data in the field of sentiment analysis, we present a new large-scale multi-sentiment tobacco dataset by distilling effective consumer experience information from massive online reviews of tobacco consumption. The release of this dataset would push forward the research in tobacco field. With the goal of advancing and facilitating the research of the overall sentiment of sentences with multiple aspects, we propose simple yet effective EHCRNN model, which combines the strengths of recent NLP advances. Experiments on our new dataset and the public nlpcc2014 task dataset show that the proposed model significantly outperforms the state-of-the-art baseline methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"2652 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To improve the prediction accuracy of traffic flow, short-term traffic flow prediction based on delayed reconstruction and integrated GRU-SVR model using Stacking strategy is proposed to address the problems of nonlinearity, complexity and time dependence of traffic flow. By solving the phase space reconstruction parameters through the chaotic nature of source traffic to map the sequences into high-dimensional vector matrices, the integrated GRU-SVR model is optimized using iGridSearch CV for prediction. GRU alleviates the long dependence problem among data, can make full use of the before-and-after correlation information in the time dimension to sense the data causality, and the SVR introduction parameters are searched for optimality through improved grid search, and the global optimal solution is obtained in high time efficiency. The global optimal solution can ensure the generalizability of the integrated model. The results show that the RMSE, MAPE and R2 score of the integrated algorithm are better than the other three models. The experiments prove that the method can effectively improve the prediction accuracy and has better generalization ability.
{"title":"Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR","authors":"Yuhang Lei, Jingsheng Lei, Weifei Wang","doi":"10.1145/3581807.3581901","DOIUrl":"https://doi.org/10.1145/3581807.3581901","url":null,"abstract":"To improve the prediction accuracy of traffic flow, short-term traffic flow prediction based on delayed reconstruction and integrated GRU-SVR model using Stacking strategy is proposed to address the problems of nonlinearity, complexity and time dependence of traffic flow. By solving the phase space reconstruction parameters through the chaotic nature of source traffic to map the sequences into high-dimensional vector matrices, the integrated GRU-SVR model is optimized using iGridSearch CV for prediction. GRU alleviates the long dependence problem among data, can make full use of the before-and-after correlation information in the time dimension to sense the data causality, and the SVR introduction parameters are searched for optimality through improved grid search, and the global optimal solution is obtained in high time efficiency. The global optimal solution can ensure the generalizability of the integrated model. The results show that the RMSE, MAPE and R2 score of the integrated algorithm are better than the other three models. The experiments prove that the method can effectively improve the prediction accuracy and has better generalization ability.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Castro, P. Bruneau, Jean-Sébastien Sottet, Dario Torregrossa
The practice of data science and machine learning often involves training many kinds of models, for inferring some target variable, or extracting structured knowledge from data. Training procedures generally require lengthy and intensive computations, so a natural step for data scientists is to try to accelerate these procedures, typically through parallelization as supported by multiple CPU cores and GPU devices. In this paper, we focus on Python libraries commonly used by machine learning practitioners, and propose a case-based experimental approach to overview mainstream tools for software acceleration. For each use case, we highlight and quantify the optimizations from the baseline implementations to the optimized versions. Finally, we draw a taxonomy of the tools and techniques involved in our experiments, and identify common pitfalls, in view to provide actionable guidelines to data scientists and code optimization tools developers.
{"title":"Parallelization of Data Science Tasks, an Experimental Overview","authors":"Oscar Castro, P. Bruneau, Jean-Sébastien Sottet, Dario Torregrossa","doi":"10.1145/3581807.3581878","DOIUrl":"https://doi.org/10.1145/3581807.3581878","url":null,"abstract":"The practice of data science and machine learning often involves training many kinds of models, for inferring some target variable, or extracting structured knowledge from data. Training procedures generally require lengthy and intensive computations, so a natural step for data scientists is to try to accelerate these procedures, typically through parallelization as supported by multiple CPU cores and GPU devices. In this paper, we focus on Python libraries commonly used by machine learning practitioners, and propose a case-based experimental approach to overview mainstream tools for software acceleration. For each use case, we highlight and quantify the optimizations from the baseline implementations to the optimized versions. Finally, we draw a taxonomy of the tools and techniques involved in our experiments, and identify common pitfalls, in view to provide actionable guidelines to data scientists and code optimization tools developers.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128354892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the computer vision technology has been attracting more attention than ever and being applied in a wide range of fields. Among them, the technology on automatic recognition of human motion is particularly important, since it leads to automatic detection of suspicious persons and automatic monitoring of elderly people. Therefore, the research on human motion recognition using computer vision techniques has been actively conducted in Japan and overseas. However, most of the conventional researches on human motion recognition employs a video of a human motion taken using an external fixed camera. There is no research on human motion recognition using a video of a surrounding scenery provided from a wearable camera. This paper proposes a method of recognizing a human motion by estimating the posture change of a wearable camera attached to a walking human from the motion of a scenery in the video provided from the wearable camera and by analyzing a human trunk change obtained from the posture change of the camera. In the method, AKAZE is applied to the images to detect feature points and to find their correspondence. The 5-point algorithm is used to estimate the Epipolar geometry constraint and an essential matrix which provides a camera relative motion. The change of the camera relative motion is then used to analyze the shape of a human trunk. The analyzed results, i.e., walking motion features, are finally fed into a SVM to identify the motion. In the experiment, five types of walking motions are captured by a wearable camera from five subjects. The accuracy on human motion recognition was 80%. More precise feature points extraction, more exact estimation of motions, and considering variety of human walking motions are needed to improve the proposed technique.
{"title":"Recognition of Human Walking Motion Using a Wearable Camera","authors":"Zi-yang Liu, Tomoyuki Kurosaki, J. Tan","doi":"10.1145/3581807.3581809","DOIUrl":"https://doi.org/10.1145/3581807.3581809","url":null,"abstract":"In recent years, the computer vision technology has been attracting more attention than ever and being applied in a wide range of fields. Among them, the technology on automatic recognition of human motion is particularly important, since it leads to automatic detection of suspicious persons and automatic monitoring of elderly people. Therefore, the research on human motion recognition using computer vision techniques has been actively conducted in Japan and overseas. However, most of the conventional researches on human motion recognition employs a video of a human motion taken using an external fixed camera. There is no research on human motion recognition using a video of a surrounding scenery provided from a wearable camera. This paper proposes a method of recognizing a human motion by estimating the posture change of a wearable camera attached to a walking human from the motion of a scenery in the video provided from the wearable camera and by analyzing a human trunk change obtained from the posture change of the camera. In the method, AKAZE is applied to the images to detect feature points and to find their correspondence. The 5-point algorithm is used to estimate the Epipolar geometry constraint and an essential matrix which provides a camera relative motion. The change of the camera relative motion is then used to analyze the shape of a human trunk. The analyzed results, i.e., walking motion features, are finally fed into a SVM to identify the motion. In the experiment, five types of walking motions are captured by a wearable camera from five subjects. The accuracy on human motion recognition was 80%. More precise feature points extraction, more exact estimation of motions, and considering variety of human walking motions are needed to improve the proposed technique.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127023157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective feature learning is one of the prime components for human action recognition algorithm. Three-dimensional convolutional neural network (3D CNN) can directly extract spatio-temporal features, however it is insufficient to capture the most discriminative part of the action video. The redundant spatial regions within and between temporal frames would weak the descriptive ability of the 3D CNN model. To address this problem, we propose a lightweight spatio-temporal attention module (ST-AM), composed of spatial attention module (SAM) and temporal attention module (TAM). SAM and TAM can effectively encode the semantic spatial areas and suppress the redundant temporal frames to reduce misclassification. The proposed SAM and TAM have complementary effects and can be easily embedded into the existing 3D CNN action recognition model. Experiment on UCF-101 and HMDB-51 datasets shows that the ST-AM embedded model achieves impressive performance on action recognition task.
{"title":"An Efficient Lightweight Spatio-temporal Attention Module for Action Recognition","authors":"Zhonghua Sun, Meng Dai, Ziwen Yi, Tianyi Wang, Jinchao Feng, Kebin Jia","doi":"10.1145/3581807.3581810","DOIUrl":"https://doi.org/10.1145/3581807.3581810","url":null,"abstract":"Effective feature learning is one of the prime components for human action recognition algorithm. Three-dimensional convolutional neural network (3D CNN) can directly extract spatio-temporal features, however it is insufficient to capture the most discriminative part of the action video. The redundant spatial regions within and between temporal frames would weak the descriptive ability of the 3D CNN model. To address this problem, we propose a lightweight spatio-temporal attention module (ST-AM), composed of spatial attention module (SAM) and temporal attention module (TAM). SAM and TAM can effectively encode the semantic spatial areas and suppress the redundant temporal frames to reduce misclassification. The proposed SAM and TAM have complementary effects and can be easily embedded into the existing 3D CNN action recognition model. Experiment on UCF-101 and HMDB-51 datasets shows that the ST-AM embedded model achieves impressive performance on action recognition task.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129973931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, deep learning has been widely used in the field of infrared and visible image fusion. However, the existing methods based on deep learning have the problems of losing details and less consideration of long-range dependence. To address that, we propose a novel encoder-decoder fusion model based on nest connections and Axial-attention, named NAF. The network can extract more multi-scale information as possible and retain more long-range dependencies due to the Axial-attention in each convolution block. The method includes three parts: an encoder consists of convolutional blocks, a fusion strategy based on spatial attention and channel attention, and a decoder to process the fused features. Specifically, the source images are firstly fed into an encoder to extract multi-scale depth features. Then, a fusion strategy is employed to merge the depth features of each scale generated by the encoder. Finally, a decoder based on nested convolutional block is exploited to reconstruct the fused image. The experimental results on public data sets demonstrate that the proposed method has better fusion performance than other state-of-the-art methods in both subjective and objective evaluation.
{"title":"NAF: Nest Axial Attention Fusion Network for Infrared and Visible Images","authors":"Jiaxi Lu, Bicao Li, Zhoufeng Liu, Zhuhong Shao, Chunlei Li, Zong-Hui Wang","doi":"10.1145/3581807.3581849","DOIUrl":"https://doi.org/10.1145/3581807.3581849","url":null,"abstract":"In recent years, deep learning has been widely used in the field of infrared and visible image fusion. However, the existing methods based on deep learning have the problems of losing details and less consideration of long-range dependence. To address that, we propose a novel encoder-decoder fusion model based on nest connections and Axial-attention, named NAF. The network can extract more multi-scale information as possible and retain more long-range dependencies due to the Axial-attention in each convolution block. The method includes three parts: an encoder consists of convolutional blocks, a fusion strategy based on spatial attention and channel attention, and a decoder to process the fused features. Specifically, the source images are firstly fed into an encoder to extract multi-scale depth features. Then, a fusion strategy is employed to merge the depth features of each scale generated by the encoder. Finally, a decoder based on nested convolutional block is exploited to reconstruct the fused image. The experimental results on public data sets demonstrate that the proposed method has better fusion performance than other state-of-the-art methods in both subjective and objective evaluation.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130730563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}