Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition最新文献

英文中文

A Symmetric Dual-Attention Generative Adversarial Network with Channel and Spatial Features Fusion 通道与空间特征融合的对称双注意生成对抗网络

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581904

Jiaming Zhang, Xinfeng Zhang, Bo Zhang, Maoshen Jia, Yuqing Liang, Yitian Zhang

Many existing generative adversarial networks (GANs) lack effective semantic modeling, leading to unnatural local details and blurring in generated images. In this work, based on DivCo, we propose a Symmetric Dual-Attention Generative Adversarial Network (DivCo-SDAGAN) with channel and spatial feature fusion in which the Dual-Attention Module (DAM) is introduced to strengthen the feature representation ability of the network to synthesize photo-realistic images with more natural local details. The Channel Weighted Aggregation Module (CWAM) and the Spatial Attention Module (SAM) of the DAM are designed to capture the semantic information of channel dimension and spatial dimension, respectively, and they can be easily integrated into other GANs-based models. Extensive experiments show that the proposed DivCo-SDAGAN can produce more diverse images under the same input, achieving more satisfactory results than other existing methods.

许多现有的生成对抗网络(GANs)缺乏有效的语义建模，导致生成图像的局部细节不自然和模糊。本文在DivCo的基础上，提出了一种通道和空间特征融合的对称双注意生成对抗网络(DivCo- sdagan)，其中引入了双注意模块(Dual-Attention Module, DAM)来增强网络的特征表示能力，从而合成具有更自然局部细节的逼真图像。DAM中的信道加权聚合模块(CWAM)和空间注意模块(SAM)分别用于捕获信道维度和空间维度的语义信息，并且可以很容易地集成到其他基于高斯的模型中。大量的实验表明，在相同的输入条件下，所提出的DivCo-SDAGAN可以产生更多样化的图像，取得了比其他现有方法更令人满意的结果。

引用次数: 0

Machine Learning for Prediction of Blood Transfusion Rates in Primary Total Knee Arthroplasty 机器学习预测初次全膝关节置换术中输血率

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581894

Zain Sayeed, Daniel R. Cavazos, Tannor Court, Chaoyang Chen, Bryan E. Little, Hussein F. Darwiche

Acute blood loss anemia requiring allogeneic blood transfusion with inherent risks is still a postoperative complication of total knee arthroplasty (TKA). This study aimed to use machine learning models for the prediction of blood transfusion following primary TKA and to identify contributing factors. A total of 1328 patients who underwent primary TKA in our institute were evaluated using data extracted MARQCI database to identify patient demographics and surgical variables that may be associated with blood transfusion. Multilayer perceptron neural networks (MPNN) machine learning algorithm was used to predict transfusion rates and the importance of factors associated with blood transfusion following TKA. Statistical analyses including bivariate correlate analysis, Chi-Square test, and t test were performed for demographic analysis and to determine the correlation between blood transfusion and other variables. Results demonstrated important factors associated with transfusion rates include pre- and post-operative hemoglobin level, ASA score, tranexamic acid usage, age, BMI and other factors. The MPNN machine learning achieved excellent performance across discrimination (AUC=0.997). This study demonstrated that MPNN for the prediction of patient-specific blood transfusion rates following TKA represented a novel application of machine learning with the potential to improve pre-operative planning for treatment outcome.

急性失血性贫血需要异基因输血，其固有的风险仍然是全膝关节置换术(TKA)术后并发症。本研究旨在使用机器学习模型来预测原发性TKA后的输血情况，并确定影响因素。使用MARQCI数据库提取的数据对我院1328例原发性TKA患者进行评估，以确定可能与输血相关的患者人口统计学和手术变量。使用多层感知器神经网络(MPNN)机器学习算法预测TKA后输血率和输血相关因素的重要性。统计分析包括双变量相关分析、卡方检验和t检验，用于人口统计学分析和确定输血与其他变量之间的相关性。结果显示，与输血率相关的重要因素包括术前和术后血红蛋白水平、ASA评分、氨甲环酸使用情况、年龄、BMI等因素。MPNN机器学习取得了优异的跨分辨性能(AUC=0.997)。该研究表明，预测TKA后患者特异性输血率的MPNN代表了机器学习的一种新应用，具有改善术前治疗结果计划的潜力。

{"title":"Machine Learning for Prediction of Blood Transfusion Rates in Primary Total Knee Arthroplasty","authors":"Zain Sayeed, Daniel R. Cavazos, Tannor Court, Chaoyang Chen, Bryan E. Little, Hussein F. Darwiche","doi":"10.1145/3581807.3581894","DOIUrl":"https://doi.org/10.1145/3581807.3581894","url":null,"abstract":"Acute blood loss anemia requiring allogeneic blood transfusion with inherent risks is still a postoperative complication of total knee arthroplasty (TKA). This study aimed to use machine learning models for the prediction of blood transfusion following primary TKA and to identify contributing factors. A total of 1328 patients who underwent primary TKA in our institute were evaluated using data extracted MARQCI database to identify patient demographics and surgical variables that may be associated with blood transfusion. Multilayer perceptron neural networks (MPNN) machine learning algorithm was used to predict transfusion rates and the importance of factors associated with blood transfusion following TKA. Statistical analyses including bivariate correlate analysis, Chi-Square test, and t test were performed for demographic analysis and to determine the correlation between blood transfusion and other variables. Results demonstrated important factors associated with transfusion rates include pre- and post-operative hemoglobin level, ASA score, tranexamic acid usage, age, BMI and other factors. The MPNN machine learning achieved excellent performance across discrimination (AUC=0.997). This study demonstrated that MPNN for the prediction of patient-specific blood transfusion rates following TKA represented a novel application of machine learning with the potential to improve pre-operative planning for treatment outcome.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125812386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coordinate Attention-enabled Ship Object Detection with Electro-optical Image 基于光电图像的坐标注意力船舶目标检测

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581815

Hongbin Xu, Xiantao Jiang, Tao Yin, Qi Cen, Zhijian Zhang, Tian Song, F. Yu

Shipping safety is one of the factors restricting the development of navigation. In particular, the route near the shore is prone to unknown risks due to the existence of multiple types of ships, the density of ships, the shielding between ships, and other reasons. This paper presents a method for detecting medium-range ships, which can improve security for ships. This method is based on the You Only Look Once Version 5 network (YOLOv5). To improve the accuracy, the coordinate attention model is integrated into the detection network. The main research content and experimental work of this paper are as follows. Firstly, the YOLOv5 network and spatial attention mechanism are analyzed. Then, detection experiments were carried out based on YOLOv5 and Singapore Maritime Data Set (SMD). Then, the coordinate attention model was used to improve the network. Finally, by adjusting training parameters and improving attention, the mAP of test results of the object detection network reaches 73%, and the feasibility of object detection of the YOLOv5 algorithm with coordinate attention is confirmed.

航运安全是制约航运业发展的因素之一。特别是近岸航线，由于多类型船舶的存在、船舶密度、船舶间的屏蔽等原因，容易出现未知风险。本文提出了一种检测中程船舶的方法，提高了船舶的安全性。这种方法基于You Only Look Once Version 5网络(YOLOv5)。为了提高检测网络的准确性，将坐标注意模型集成到检测网络中。本文的主要研究内容和实验工作如下:首先，分析了YOLOv5网络和空间注意机制。然后，基于YOLOv5和新加坡海事数据集(SMD)进行检测实验。然后，采用坐标注意模型对网络进行改进。最后，通过调整训练参数和提高注意力，目标检测网络测试结果的mAP达到73%，验证了坐标关注下YOLOv5算法目标检测的可行性。

{"title":"Coordinate Attention-enabled Ship Object Detection with Electro-optical Image","authors":"Hongbin Xu, Xiantao Jiang, Tao Yin, Qi Cen, Zhijian Zhang, Tian Song, F. Yu","doi":"10.1145/3581807.3581815","DOIUrl":"https://doi.org/10.1145/3581807.3581815","url":null,"abstract":"Shipping safety is one of the factors restricting the development of navigation. In particular, the route near the shore is prone to unknown risks due to the existence of multiple types of ships, the density of ships, the shielding between ships, and other reasons. This paper presents a method for detecting medium-range ships, which can improve security for ships. This method is based on the You Only Look Once Version 5 network (YOLOv5). To improve the accuracy, the coordinate attention model is integrated into the detection network. The main research content and experimental work of this paper are as follows. Firstly, the YOLOv5 network and spatial attention mechanism are analyzed. Then, detection experiments were carried out based on YOLOv5 and Singapore Maritime Data Set (SMD). Then, the coordinate attention model was used to improve the network. Finally, by adjusting training parameters and improving attention, the mAP of test results of the object detection network reaches 73%, and the feasibility of object detection of the YOLOv5 algorithm with coordinate attention is confirmed.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"447 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116561408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Activity classification of the elderly based on lightweight convolutional neural networks 基于轻量级卷积神经网络的老年人活动分类

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581834

Hanzhang Ding, Wenzhang Zhu

Accurate implementation of action classification for the elderly on lightweight convolutional neural networks benefits resource-limited embedded and mobile devices in the healthcare industry. The study proposes a lightweight convolutional neural network model called mD-MobileNet. The micro-Doppler feature spectrograms of 106 elderly people were studied as a dataset. Transfer learning methods were used to train the proposed model, and three lightweight convolutional neural networks (MobileNetV3-Small, ShuffleNetV2, and EfficientNet-B0) were compared using the same training method. All of these models were able to correctly classify various actions. By comparison, mD-MobileNet gave the best classification results. mD-MobileNet’s Top-1 Accuracy reached 96.1% while Marco F1 was 96.30. By comparing the results with Grad-CAM’s visualization and analyzing them in conjunction with its network structure features, it was determined that mD-MobileNet has the best local perception with the least number of model parameters and the highest accuracy rate compared to other models.

在轻量级卷积神经网络上准确实现老年人的动作分类，有利于医疗保健行业中资源有限的嵌入式和移动设备。该研究提出了一种称为mD-MobileNet的轻量级卷积神经网络模型。以106名老年人的微多普勒特征谱为数据集进行研究。采用迁移学习方法对所提出的模型进行训练，并使用相同的训练方法对三个轻量级卷积神经网络(MobileNetV3-Small、ShuffleNetV2和EfficientNet-B0)进行比较。所有这些模型都能够正确地对各种动作进行分类。通过比较，mD-MobileNet给出了最好的分类结果。mD-MobileNet的Top-1准确率达到96.1%，Marco F1为96.30。将结果与Grad-CAM的可视化结果进行比较，并结合其网络结构特征进行分析，确定mD-MobileNet与其他模型相比具有最佳的局部感知，模型参数数量最少，准确率最高。

引用次数: 0

An Effective Sentiment Analysis Model for Tobacco Consumption 一种有效的烟草消费情感分析模型

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581880

Yanru Hao, Tianchi Yang, Chuan Shi, Rui Wang, Ding Xiao

Analysis over product reviews has drawn much attention due to its wide application. Most of the sentiment analysis research focuses on entertainment and catering due to the limitation of existing public datasets. In order to promote the comprehensiveness of data in the field of sentiment analysis, we present a new large-scale multi-sentiment tobacco dataset by distilling effective consumer experience information from massive online reviews of tobacco consumption. The release of this dataset would push forward the research in tobacco field. With the goal of advancing and facilitating the research of the overall sentiment of sentences with multiple aspects, we propose simple yet effective EHCRNN model, which combines the strengths of recent NLP advances. Experiments on our new dataset and the public nlpcc2014 task dataset show that the proposed model significantly outperforms the state-of-the-art baseline methods.

产品评论分析因其广泛的应用而备受关注。由于现有公共数据集的限制，大多数情感分析研究都集中在娱乐和餐饮领域。为了提高情感分析领域数据的全面性，我们从大量在线烟草消费评论中提取有效的消费者体验信息，提出了一种新的大规模多情感烟草数据集。该数据集的发布将推动烟草领域的研究。为了推进和促进多方面句子整体情感的研究，我们提出了简单而有效的EHCRNN模型，该模型结合了近年来NLP研究的优势。在我们的新数据集和公共nlpcc2014任务数据集上的实验表明，所提出的模型显著优于最先进的基线方法。

引用次数: 0

Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR 基于延迟重构和GRU-SVR的交通流预测研究

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581901

Yuhang Lei, Jingsheng Lei, Weifei Wang

To improve the prediction accuracy of traffic flow, short-term traffic flow prediction based on delayed reconstruction and integrated GRU-SVR model using Stacking strategy is proposed to address the problems of nonlinearity, complexity and time dependence of traffic flow. By solving the phase space reconstruction parameters through the chaotic nature of source traffic to map the sequences into high-dimensional vector matrices, the integrated GRU-SVR model is optimized using iGridSearch CV for prediction. GRU alleviates the long dependence problem among data, can make full use of the before-and-after correlation information in the time dimension to sense the data causality, and the SVR introduction parameters are searched for optimality through improved grid search, and the global optimal solution is obtained in high time efficiency. The global optimal solution can ensure the generalizability of the integrated model. The results show that the RMSE, MAPE and R2 score of the integrated algorithm are better than the other three models. The experiments prove that the method can effectively improve the prediction accuracy and has better generalization ability.

为了提高交通流的预测精度，提出了基于延迟重构和基于叠加策略的GRU-SVR综合模型的短期交通流预测方法，解决了交通流的非线性、复杂性和时间依赖性问题。利用源流量的混沌特性求解相空间重构参数，将序列映射到高维向量矩阵中，利用iGridSearch CV优化集成的GRU-SVR模型进行预测。GRU缓解了数据间的长期依赖问题，能够充分利用时间维度上的前后相关信息来感知数据因果关系，并通过改进的网格搜索来搜索SVR引入参数的最优性，以较高的时间效率获得全局最优解。全局最优解保证了综合模型的可泛化性。结果表明，综合算法的RMSE、MAPE和R2得分均优于其他三种模型。实验证明，该方法能有效提高预测精度，具有较好的泛化能力。

{"title":"Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR","authors":"Yuhang Lei, Jingsheng Lei, Weifei Wang","doi":"10.1145/3581807.3581901","DOIUrl":"https://doi.org/10.1145/3581807.3581901","url":null,"abstract":"To improve the prediction accuracy of traffic flow, short-term traffic flow prediction based on delayed reconstruction and integrated GRU-SVR model using Stacking strategy is proposed to address the problems of nonlinearity, complexity and time dependence of traffic flow. By solving the phase space reconstruction parameters through the chaotic nature of source traffic to map the sequences into high-dimensional vector matrices, the integrated GRU-SVR model is optimized using iGridSearch CV for prediction. GRU alleviates the long dependence problem among data, can make full use of the before-and-after correlation information in the time dimension to sense the data causality, and the SVR introduction parameters are searched for optimality through improved grid search, and the global optimal solution is obtained in high time efficiency. The global optimal solution can ensure the generalizability of the integrated model. The results show that the RMSE, MAPE and R2 score of the integrated algorithm are better than the other three models. The experiments prove that the method can effectively improve the prediction accuracy and has better generalization ability.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallelization of Data Science Tasks, an Experimental Overview 数据科学任务的并行化，实验综述

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581878

Oscar Castro, P. Bruneau, Jean-Sébastien Sottet, Dario Torregrossa

The practice of data science and machine learning often involves training many kinds of models, for inferring some target variable, or extracting structured knowledge from data. Training procedures generally require lengthy and intensive computations, so a natural step for data scientists is to try to accelerate these procedures, typically through parallelization as supported by multiple CPU cores and GPU devices. In this paper, we focus on Python libraries commonly used by machine learning practitioners, and propose a case-based experimental approach to overview mainstream tools for software acceleration. For each use case, we highlight and quantify the optimizations from the baseline implementations to the optimized versions. Finally, we draw a taxonomy of the tools and techniques involved in our experiments, and identify common pitfalls, in view to provide actionable guidelines to data scientists and code optimization tools developers.

数据科学和机器学习的实践通常涉及训练多种模型，用于推断一些目标变量，或从数据中提取结构化知识。训练过程通常需要长时间和密集的计算，因此数据科学家的自然步骤是尝试加速这些过程，通常通过多个CPU内核和GPU设备支持的并行化。在本文中，我们专注于机器学习从业者常用的Python库，并提出了一种基于案例的实验方法来概述软件加速的主流工具。对于每个用例，我们强调并量化从基线实现到优化版本的优化。最后，我们对实验中涉及的工具和技术进行了分类，并确定了常见的陷阱，以便为数据科学家和代码优化工具开发人员提供可操作的指导方针。

引用次数: 1

Recognition of Human Walking Motion Using a Wearable Camera 基于可穿戴相机的人体行走运动识别

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581809

Zi-yang Liu, Tomoyuki Kurosaki, J. Tan

In recent years, the computer vision technology has been attracting more attention than ever and being applied in a wide range of fields. Among them, the technology on automatic recognition of human motion is particularly important, since it leads to automatic detection of suspicious persons and automatic monitoring of elderly people. Therefore, the research on human motion recognition using computer vision techniques has been actively conducted in Japan and overseas. However, most of the conventional researches on human motion recognition employs a video of a human motion taken using an external fixed camera. There is no research on human motion recognition using a video of a surrounding scenery provided from a wearable camera. This paper proposes a method of recognizing a human motion by estimating the posture change of a wearable camera attached to a walking human from the motion of a scenery in the video provided from the wearable camera and by analyzing a human trunk change obtained from the posture change of the camera. In the method, AKAZE is applied to the images to detect feature points and to find their correspondence. The 5-point algorithm is used to estimate the Epipolar geometry constraint and an essential matrix which provides a camera relative motion. The change of the camera relative motion is then used to analyze the shape of a human trunk. The analyzed results, i.e., walking motion features, are finally fed into a SVM to identify the motion. In the experiment, five types of walking motions are captured by a wearable camera from five subjects. The accuracy on human motion recognition was 80%. More precise feature points extraction, more exact estimation of motions, and considering variety of human walking motions are needed to improve the proposed technique.

近年来，计算机视觉技术越来越受到人们的重视，并在各个领域得到了广泛的应用。其中，人体运动的自动识别技术尤为重要，因为它可以自动检测可疑人员和自动监控老年人。因此，利用计算机视觉技术进行人体运动识别的研究在日本和国外都得到了积极的开展。然而，传统的人体运动识别研究大多采用外部固定摄像机拍摄的人体运动视频。目前还没有研究利用可穿戴相机提供的周围风景视频来识别人体动作。本文提出了一种识别人体运动的方法，通过可穿戴摄像机提供的视频中景物的运动来估计附着在行走的人身上的可穿戴摄像机的姿势变化，并通过分析摄像机姿势变化得到的人体躯干变化来识别人体运动。该方法利用AKAZE对图像进行特征点检测，找出特征点之间的对应关系。5点算法用于估计极几何约束和提供相机相对运动的基本矩阵。然后利用相机相对运动的变化来分析人体躯干的形状。最后将分析结果即行走运动特征输入支持向量机进行运动识别。在实验中，一个可穿戴相机捕捉了五名受试者的五种行走动作。对人体运动识别的准确率为80%。该方法需要更精确地提取特征点，更精确地估计运动，并考虑到人类行走运动的多样性。

{"title":"Recognition of Human Walking Motion Using a Wearable Camera","authors":"Zi-yang Liu, Tomoyuki Kurosaki, J. Tan","doi":"10.1145/3581807.3581809","DOIUrl":"https://doi.org/10.1145/3581807.3581809","url":null,"abstract":"In recent years, the computer vision technology has been attracting more attention than ever and being applied in a wide range of fields. Among them, the technology on automatic recognition of human motion is particularly important, since it leads to automatic detection of suspicious persons and automatic monitoring of elderly people. Therefore, the research on human motion recognition using computer vision techniques has been actively conducted in Japan and overseas. However, most of the conventional researches on human motion recognition employs a video of a human motion taken using an external fixed camera. There is no research on human motion recognition using a video of a surrounding scenery provided from a wearable camera. This paper proposes a method of recognizing a human motion by estimating the posture change of a wearable camera attached to a walking human from the motion of a scenery in the video provided from the wearable camera and by analyzing a human trunk change obtained from the posture change of the camera. In the method, AKAZE is applied to the images to detect feature points and to find their correspondence. The 5-point algorithm is used to estimate the Epipolar geometry constraint and an essential matrix which provides a camera relative motion. The change of the camera relative motion is then used to analyze the shape of a human trunk. The analyzed results, i.e., walking motion features, are finally fed into a SVM to identify the motion. In the experiment, five types of walking motions are captured by a wearable camera from five subjects. The accuracy on human motion recognition was 80%. More precise feature points extraction, more exact estimation of motions, and considering variety of human walking motions are needed to improve the proposed technique.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127023157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Lightweight Spatio-temporal Attention Module for Action Recognition 一种用于动作识别的高效轻量级时空注意模块

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581810

Zhonghua Sun, Meng Dai, Ziwen Yi, Tianyi Wang, Jinchao Feng, Kebin Jia

Effective feature learning is one of the prime components for human action recognition algorithm. Three-dimensional convolutional neural network (3D CNN) can directly extract spatio-temporal features, however it is insufficient to capture the most discriminative part of the action video. The redundant spatial regions within and between temporal frames would weak the descriptive ability of the 3D CNN model. To address this problem, we propose a lightweight spatio-temporal attention module (ST-AM), composed of spatial attention module (SAM) and temporal attention module (TAM). SAM and TAM can effectively encode the semantic spatial areas and suppress the redundant temporal frames to reduce misclassification. The proposed SAM and TAM have complementary effects and can be easily embedded into the existing 3D CNN action recognition model. Experiment on UCF-101 and HMDB-51 datasets shows that the ST-AM embedded model achieves impressive performance on action recognition task.

有效的特征学习是人体动作识别算法的重要组成部分之一。三维卷积神经网络(3D CNN)可以直接提取动作视频的时空特征，但它不足以捕捉动作视频中最具判别性的部分。时间帧内和帧间的冗余空间区域会削弱三维CNN模型的描述能力。为了解决这一问题，我们提出了一个轻量级的时空注意模块(ST-AM)，由空间注意模块(SAM)和时间注意模块(TAM)组成。SAM和TAM可以有效地对语义空间区域进行编码，抑制冗余时间帧，减少误分类。所提出的SAM和TAM具有互补的效果，可以很容易地嵌入到现有的3D CNN动作识别模型中。在UCF-101和HMDB-51数据集上的实验表明，ST-AM嵌入式模型在动作识别任务上取得了令人满意的性能。

引用次数: 0

NAF: Nest Axial Attention Fusion Network for Infrared and Visible Images 红外和可见光图像的巢轴向注意力融合网络

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581849

Jiaxi Lu, Bicao Li, Zhoufeng Liu, Zhuhong Shao, Chunlei Li, Zong-Hui Wang

In recent years, deep learning has been widely used in the field of infrared and visible image fusion. However, the existing methods based on deep learning have the problems of losing details and less consideration of long-range dependence. To address that, we propose a novel encoder-decoder fusion model based on nest connections and Axial-attention, named NAF. The network can extract more multi-scale information as possible and retain more long-range dependencies due to the Axial-attention in each convolution block. The method includes three parts: an encoder consists of convolutional blocks, a fusion strategy based on spatial attention and channel attention, and a decoder to process the fused features. Specifically, the source images are firstly fed into an encoder to extract multi-scale depth features. Then, a fusion strategy is employed to merge the depth features of each scale generated by the encoder. Finally, a decoder based on nested convolutional block is exploited to reconstruct the fused image. The experimental results on public data sets demonstrate that the proposed method has better fusion performance than other state-of-the-art methods in both subjective and objective evaluation.

近年来，深度学习在红外和可见光图像融合领域得到了广泛的应用。然而，现有的基于深度学习的方法存在丢失细节和对远程依赖考虑较少的问题。为了解决这个问题，我们提出了一种新的基于巢连接和轴向注意的编码器-解码器融合模型，称为NAF。由于每个卷积块的轴向关注，网络可以尽可能地提取更多的多尺度信息，并保留更多的远程依赖关系。该方法包括三部分:由卷积块组成的编码器，基于空间注意和信道注意的融合策略，以及对融合特征进行处理的解码器。具体而言，首先将源图像送入编码器，提取多尺度深度特征。然后，采用融合策略对编码器生成的每个尺度的深度特征进行融合;最后，利用基于嵌套卷积块的解码器对融合后的图像进行重构。在公共数据集上的实验结果表明，该方法在主观和客观评价方面都具有较好的融合性能。

{"title":"NAF: Nest Axial Attention Fusion Network for Infrared and Visible Images","authors":"Jiaxi Lu, Bicao Li, Zhoufeng Liu, Zhuhong Shao, Chunlei Li, Zong-Hui Wang","doi":"10.1145/3581807.3581849","DOIUrl":"https://doi.org/10.1145/3581807.3581849","url":null,"abstract":"In recent years, deep learning has been widely used in the field of infrared and visible image fusion. However, the existing methods based on deep learning have the problems of losing details and less consideration of long-range dependence. To address that, we propose a novel encoder-decoder fusion model based on nest connections and Axial-attention, named NAF. The network can extract more multi-scale information as possible and retain more long-range dependencies due to the Axial-attention in each convolution block. The method includes three parts: an encoder consists of convolutional blocks, a fusion strategy based on spatial attention and channel attention, and a decoder to process the fused features. Specifically, the source images are firstly fed into an encoder to extract multi-scale depth features. Then, a fusion strategy is employed to merge the depth features of each scale generated by the encoder. Finally, a decoder based on nested convolutional block is exploited to reconstruct the fused image. The experimental results on public data sets demonstrate that the proposed method has better fusion performance than other state-of-the-art methods in both subjective and objective evaluation.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130730563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀