Pub Date : 2024-08-19DOI: 10.1007/s40747-024-01571-4
Zanxi Ruan, Yingmei Wei, Yanming Guo, Yuxiang Xie
Most previous few-shot action recognition works tend to process video temporal and spatial features separately, resulting in insufficient extraction of comprehensive features. In this paper, a novel hybrid attentive prototypical network (HAPN) framework for few-shot action recognition is proposed. Distinguished by its joint processing of temporal and spatial information, the HAPN framework strategically manipulates these dimensions from feature extraction to the attention module, consequently enhancing its ability to perform action recognition tasks. Our framework utilizes the R(2+1)D backbone network, coupling the extraction of integrated temporal and spatial features to ensure a comprehensive understanding of video content. Additionally, our framework introduces the novel Residual Tri-dimensional Attention (ResTriDA) mechanism, specifically designed to augment feature information across the temporal, spatial, and channel dimensions. ResTriDA dynamically enhances crucial aspects of video features by amplifying significant channel-wise features for action distinction, accentuating spatial details vital for capturing the essence of actions within frames, and emphasizing temporal dynamics to capture movement over time. We further propose a prototypical attentive matching module (PAM) built on the concept of metric learning to resolve the overfitting issue common in few-shot tasks. We evaluate our HAPN framework on three classical few-shot action recognition datasets: Kinetics-100, UCF101, and HMDB51. The results indicate that our framework significantly outperformed state-of-the-art methods. Notably, the 1-shot task, demonstrated an increase of 9.8% in accuracy on UCF101 and improvements of 3.9% on HMDB51 and 12.4% on Kinetics-100. These gains confirm the robustness and effectiveness of our approach in leveraging limited data for precise action recognition.
{"title":"Hybrid attentive prototypical network for few-shot action recognition","authors":"Zanxi Ruan, Yingmei Wei, Yanming Guo, Yuxiang Xie","doi":"10.1007/s40747-024-01571-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01571-4","url":null,"abstract":"<p>Most previous few-shot action recognition works tend to process video temporal and spatial features separately, resulting in insufficient extraction of comprehensive features. In this paper, a novel hybrid attentive prototypical network (HAPN) framework for few-shot action recognition is proposed. Distinguished by its joint processing of temporal and spatial information, the HAPN framework strategically manipulates these dimensions from feature extraction to the attention module, consequently enhancing its ability to perform action recognition tasks. Our framework utilizes the R(2+1)D backbone network, coupling the extraction of integrated temporal and spatial features to ensure a comprehensive understanding of video content. Additionally, our framework introduces the novel Residual Tri-dimensional Attention (ResTriDA) mechanism, specifically designed to augment feature information across the temporal, spatial, and channel dimensions. ResTriDA dynamically enhances crucial aspects of video features by amplifying significant channel-wise features for action distinction, accentuating spatial details vital for capturing the essence of actions within frames, and emphasizing temporal dynamics to capture movement over time. We further propose a prototypical attentive matching module (PAM) built on the concept of metric learning to resolve the overfitting issue common in few-shot tasks. We evaluate our HAPN framework on three classical few-shot action recognition datasets: Kinetics-100, UCF101, and HMDB51. The results indicate that our framework significantly outperformed state-of-the-art methods. Notably, the 1-shot task, demonstrated an increase of 9.8% in accuracy on UCF101 and improvements of 3.9% on HMDB51 and 12.4% on Kinetics-100. These gains confirm the robustness and effectiveness of our approach in leveraging limited data for precise action recognition.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1007/s40747-024-01576-z
Yaozhe Zhou, Yujun Lu, Liye Lv
In response to the issues of low solution efficiency, poor path planning quality, and limited search completeness in narrow passage environments associated with Rapidly-exploring Random Tree (RRT), this paper proposes a Grid-based Variable Probability Rapidly-exploring Random Tree algorithm (GVP-RRT) for narrow passages. The algorithm introduced in this paper preprocesses the map through gridization to extract features of different path regions. Subsequently, it employs random growth with variable probability density based on the features of path regions using various strategies based on grid, probability, and guidance to enhance the probability of growth in narrow passages, thereby improving the completeness of the algorithm. Finally, the planned route is subjected to path re-optimization based on the triangle inequality principle. The simulation results demonstrate that the planning success rate of GVP-RRT in complex narrow channels is increased by 11.5–69.5% compared with other comparative algorithms, the average planning time is reduced by more than 50%, and the GVP-RRT has a shorter average planning path length.
{"title":"GVP-RRT: a grid based variable probability Rapidly-exploring Random Tree algorithm for AGV path planning","authors":"Yaozhe Zhou, Yujun Lu, Liye Lv","doi":"10.1007/s40747-024-01576-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01576-z","url":null,"abstract":"<p>In response to the issues of low solution efficiency, poor path planning quality, and limited search completeness in narrow passage environments associated with Rapidly-exploring Random Tree (RRT), this paper proposes a Grid-based Variable Probability Rapidly-exploring Random Tree algorithm (GVP-RRT) for narrow passages. The algorithm introduced in this paper preprocesses the map through gridization to extract features of different path regions. Subsequently, it employs random growth with variable probability density based on the features of path regions using various strategies based on grid, probability, and guidance to enhance the probability of growth in narrow passages, thereby improving the completeness of the algorithm. Finally, the planned route is subjected to path re-optimization based on the triangle inequality principle. The simulation results demonstrate that the planning success rate of GVP-RRT in complex narrow channels is increased by 11.5–69.5% compared with other comparative algorithms, the average planning time is reduced by more than 50%, and the GVP-RRT has a shorter average planning path length.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1007/s40747-024-01593-y
Yangyue Feng, Xiaokang Yang, Yong Li, Lijuan Zhang, Yan Lv, Jinfang Jin
The point cloud keypoint detection algorithm like USIP that uses downsampling first and then fine-tuning the sampling points cannot effectively detect the defect part of the single view defect point cloud, resulting in the inability to output the keypoints of the defect part. Therefore, this paper proposes the twin structure key point detection algorithm named TSKPD based on the idea of contrastive learning, which uses two single-view defect point clouds to synthesize relatively more complete key points for learning, so as to promote the network model to learn the features of the complete point cloud. The robustness of key point detection of point cloud is effectively improved, and the detection of single view defect point cloud is realized. The test results on ModelNet40 and ShapeNet datasets show that the coverage rate of TSKPD on the missing part of the single view defect point cloud is 12.62 higher than the existing optimal algorithm.
{"title":"TSKPD: twin structure key point detection in point cloud","authors":"Yangyue Feng, Xiaokang Yang, Yong Li, Lijuan Zhang, Yan Lv, Jinfang Jin","doi":"10.1007/s40747-024-01593-y","DOIUrl":"https://doi.org/10.1007/s40747-024-01593-y","url":null,"abstract":"<p>The point cloud keypoint detection algorithm like USIP that uses downsampling first and then fine-tuning the sampling points cannot effectively detect the defect part of the single view defect point cloud, resulting in the inability to output the keypoints of the defect part. Therefore, this paper proposes the twin structure key point detection algorithm named TSKPD based on the idea of contrastive learning, which uses two single-view defect point clouds to synthesize relatively more complete key points for learning, so as to promote the network model to learn the features of the complete point cloud. The robustness of key point detection of point cloud is effectively improved, and the detection of single view defect point cloud is realized. The test results on ModelNet40 and ShapeNet datasets show that the coverage rate of TSKPD on the missing part of the single view defect point cloud is 12.62 higher than the existing optimal algorithm.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the application of Neural Combinatorial Optimization (NCO) techniques in Combinatorial Optimization (CO) has emerged as a popular and promising research direction. Currently, there are mainly two types of NCO, namely, the Constructive Neural Combinatorial Optimization (CNCO) and the Perturbative Neural Combinatorial Optimization (PNCO). The CNCO generally trains an encoder-decoder model via supervised learning to construct solutions from scratch. It exhibits high speed in construction process, however, it lacks the ability for sustained optimization due to the one-shot mapping, which bounds its potential for application. Instead, the PNCO generally trains neural network models via deep reinforcement learning (DRL) to intelligently select appropriate human-designed heuristics to improve existing solutions. It can achieve high-quality solutions but at the cost of high computational demand. To leverage the strengths of both approaches, we propose to hybrid the CNCO and PNCO by designing a hybrid framework comprising two stages, in which the CNCO is the first stage and the PNCO is the second. Specifically, in the first stage, we utilize the attention model to generate preliminary solutions for given CO instances. In the second stage, we employ DRL to intelligently select and combine appropriate algorithmic components from improvement pool, perturbation pool, and prediction pool to continuously optimize the obtained solutions. Experimental results on synthetic and real Capacitated Vehicle Routing Problems (CVRPs) and Traveling Salesman Problems(TSPs) demonstrate the effectiveness of the proposed hybrid framework with the assistance of automated algorithm design.
{"title":"A hybrid neural combinatorial optimization framework assisted by automated algorithm design","authors":"Liang Ma, Xingxing Hao, Wei Zhou, Qianbao He, Ruibang Zhang, Li Chen","doi":"10.1007/s40747-024-01600-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01600-2","url":null,"abstract":"<p>In recent years, the application of Neural Combinatorial Optimization (NCO) techniques in Combinatorial Optimization (CO) has emerged as a popular and promising research direction. Currently, there are mainly two types of NCO, namely, the Constructive Neural Combinatorial Optimization (CNCO) and the Perturbative Neural Combinatorial Optimization (PNCO). The CNCO generally trains an encoder-decoder model via supervised learning to construct solutions from scratch. It exhibits high speed in construction process, however, it lacks the ability for sustained optimization due to the one-shot mapping, which bounds its potential for application. Instead, the PNCO generally trains neural network models via deep reinforcement learning (DRL) to intelligently select appropriate human-designed heuristics to improve existing solutions. It can achieve high-quality solutions but at the cost of high computational demand. To leverage the strengths of both approaches, we propose to hybrid the CNCO and PNCO by designing a hybrid framework comprising two stages, in which the CNCO is the first stage and the PNCO is the second. Specifically, in the first stage, we utilize the attention model to generate preliminary solutions for given CO instances. In the second stage, we employ DRL to intelligently select and combine appropriate algorithmic components from improvement pool, perturbation pool, and prediction pool to continuously optimize the obtained solutions. Experimental results on synthetic and real Capacitated Vehicle Routing Problems (CVRPs) and Traveling Salesman Problems(TSPs) demonstrate the effectiveness of the proposed hybrid framework with the assistance of automated algorithm design.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01591-0
Zhiyi Meng, Ke Yu, Rui Qiu
To address the protracted blood transportation time prevalent in contemporary urban settings, we proposed a location-routing optimization problem tailored to the distribution of blood within intricate road networks. This involved a comprehensive assessment that encompassed the judicious selection of sites for both stations and blood centers, coupled with the meticulous planning of delivery routes for unmanned aerial vehicles (UAVs) that orchestrate the transportation of blood. First, a model was formulated to minimize the overall cost, including transportation expenses, costs associated with the site, and other relevant costs related to blood transportation vehicles coordinated by UAVs. Subsequently, a two-stage hybrid heuristic algorithm was designed based on the distinctive characteristics of the problem at hand. Moreover, an enhanced k-means algorithm was employed to generate clustering schemes, utilizing the centroid method to address the challenge of location selection for delivery sites effectively. A genetic algorithm enhanced with adaptive operators was employed to address the challenging large-scale NP-hard problem associated with route planning in intricate urban road networks. The results indicated that, compared to the traditional blood delivery model using vehicles, the total blood transportation cost decreased by 12.65% and the overall delivery time was reduced by 37.5% with the adoption of drone-assisted delivery; ultimately, case and sensitivity analyses were conducted to investigate the impact of variables including the number of blood transportation vehicles, UAVs, driver wages, and unit costs of blood transportation vehicles on the location-routing problem.
{"title":"Location-routing optimization of UAV collaborative blood delivery vehicle distribution on complex roads","authors":"Zhiyi Meng, Ke Yu, Rui Qiu","doi":"10.1007/s40747-024-01591-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01591-0","url":null,"abstract":"<p>To address the protracted blood transportation time prevalent in contemporary urban settings, we proposed a location-routing optimization problem tailored to the distribution of blood within intricate road networks. This involved a comprehensive assessment that encompassed the judicious selection of sites for both stations and blood centers, coupled with the meticulous planning of delivery routes for unmanned aerial vehicles (UAVs) that orchestrate the transportation of blood. First, a model was formulated to minimize the overall cost, including transportation expenses, costs associated with the site, and other relevant costs related to blood transportation vehicles coordinated by UAVs. Subsequently, a two-stage hybrid heuristic algorithm was designed based on the distinctive characteristics of the problem at hand. Moreover, an enhanced k-means algorithm was employed to generate clustering schemes, utilizing the centroid method to address the challenge of location selection for delivery sites effectively. A genetic algorithm enhanced with adaptive operators was employed to address the challenging large-scale NP-hard problem associated with route planning in intricate urban road networks. The results indicated that, compared to the traditional blood delivery model using vehicles, the total blood transportation cost decreased by 12.65% and the overall delivery time was reduced by 37.5% with the adoption of drone-assisted delivery; ultimately, case and sensitivity analyses were conducted to investigate the impact of variables including the number of blood transportation vehicles, UAVs, driver wages, and unit costs of blood transportation vehicles on the location-routing problem.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01584-z
Chuanbo Wen, Xianbin Wu, Zidong Wang, Weibo Liu, Junjie Yang
The safe and reliable operation of the pitch system is essential for the stable and efficient operation of a wind turbine (WT). The pitch fault data collected by supervisory control and data acquisition systems (SCADA) often contain a wide variety of variables, leading to redundant features that interfere with the accuracy of final diagnosis results, making it difficult to meet requirements. Also, the problem of extracting only local features while ignoring global information is present in the feature extraction process using the deep Convolutional Neural Network (CNN) model. To address these issues, the global average correlation coefficient is proposed in this article to measure the correlation between multiple variables in SCADA data. By considering the correlation among multiple variables comprehensively, redundant features are effectively eliminated, enhancing the accuracy of fault diagnosis. Furthermore, a new local amplification fusion architecture network (LAFA-Net) based on multi-head attention (MHA) is introduced. An efficient local feature extraction module, designed to enhance the model’s perception of detailed features while maintaining global context information, is first introduced. LAFA-Net integrates the advantages of CNN and MHA, efficiently extracting and fusing valuable features from filtered data for both local and global aspects. Experiments on real pitch fault data demonstrate that the global average correlation coefficient effectively screens out redundant features in the dataset that negatively impact fault diagnosis results, thereby improving diagnosis efficiency and accuracy. The LAFA-Net model, capable of accurately diagnosing multiple types of pitch faults, shows a superior classification effect and accuracy compared to several advanced models, along with a faster convergence speed.
{"title":"A novel local feature fusion architecture for wind turbine pitch fault diagnosis with redundant feature screening","authors":"Chuanbo Wen, Xianbin Wu, Zidong Wang, Weibo Liu, Junjie Yang","doi":"10.1007/s40747-024-01584-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01584-z","url":null,"abstract":"<p>The safe and reliable operation of the pitch system is essential for the stable and efficient operation of a wind turbine (WT). The pitch fault data collected by supervisory control and data acquisition systems (SCADA) often contain a wide variety of variables, leading to redundant features that interfere with the accuracy of final diagnosis results, making it difficult to meet requirements. Also, the problem of extracting only local features while ignoring global information is present in the feature extraction process using the deep Convolutional Neural Network (CNN) model. To address these issues, the global average correlation coefficient is proposed in this article to measure the correlation between multiple variables in SCADA data. By considering the correlation among multiple variables comprehensively, redundant features are effectively eliminated, enhancing the accuracy of fault diagnosis. Furthermore, a new local amplification fusion architecture network (LAFA-Net) based on multi-head attention (MHA) is introduced. An efficient local feature extraction module, designed to enhance the model’s perception of detailed features while maintaining global context information, is first introduced. LAFA-Net integrates the advantages of CNN and MHA, efficiently extracting and fusing valuable features from filtered data for both local and global aspects. Experiments on real pitch fault data demonstrate that the global average correlation coefficient effectively screens out redundant features in the dataset that negatively impact fault diagnosis results, thereby improving diagnosis efficiency and accuracy. The LAFA-Net model, capable of accurately diagnosing multiple types of pitch faults, shows a superior classification effect and accuracy compared to several advanced models, along with a faster convergence speed.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01596-9
Xiaoyan Li, Shenghua Xu, Hengxu Jin, Zhuolu Wang, Yu Ma, Xuan He
With the continuous accumulation of massive amounts of mobile data, point-of-interest (POI) recommendation has become a vital task for location-based social networks. Deep neural networks or matrix factorization (MF) alone are challenging to effectively learn user–POI interaction functions. Moreover, the user–POI interaction matrix is sparse, and the heterogeneous characteristics of auxiliary information are underused. Therefore, we propose an innovative POI recommendation method that integrates attention-aware meta-paths based on deep neural matrix factorization (DNMF-AM). First, we develop a multi-relational heterogeneous information network of “user–POI–geographic region–POI category.” Multiple-weighted isomorphic information networks based on meta-paths are employed to obtain node-embedding vectors across different relationships. Attention networks are employed to aggregate node vectors across various relationships and serve as auxiliary information to mitigate the challenges of data sparsity. Subsequently, the internal embedding vectors of the users and POIs are extracted using feature embedding based on the user–POI interaction matrix. Second, these vectors are integrated with the embedding vectors obtained by aggregating the attention networks. Third, deep neural matrix factorization is used to learn linear and nonlinear user–POI interactions to mitigate the implicit feedback problem. This outcome is achieved using generalized matrix factorization and convolution-constrained multi-head self-attention mechanism deep neural networks. Extensive experiments conducted on two real-world datasets demonstrate that the DNMF-AM outperforms the optimal baseline NeuMF-CAA by 4.24% and 5.04% in terms of HR@10 and NDCG@10, respectively.
{"title":"POI recommendation by deep neural matrix factorization integrated attention-aware meta-paths","authors":"Xiaoyan Li, Shenghua Xu, Hengxu Jin, Zhuolu Wang, Yu Ma, Xuan He","doi":"10.1007/s40747-024-01596-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01596-9","url":null,"abstract":"<p>With the continuous accumulation of massive amounts of mobile data, point-of-interest (POI) recommendation has become a vital task for location-based social networks. Deep neural networks or matrix factorization (MF) alone are challenging to effectively learn user–POI interaction functions. Moreover, the user–POI interaction matrix is sparse, and the heterogeneous characteristics of auxiliary information are underused. Therefore, we propose an innovative POI recommendation method that integrates attention-aware meta-paths based on deep neural matrix factorization (DNMF-AM). First, we develop a multi-relational heterogeneous information network of “user–POI–geographic region–POI category.” Multiple-weighted isomorphic information networks based on meta-paths are employed to obtain node-embedding vectors across different relationships. Attention networks are employed to aggregate node vectors across various relationships and serve as auxiliary information to mitigate the challenges of data sparsity. Subsequently, the internal embedding vectors of the users and POIs are extracted using feature embedding based on the user–POI interaction matrix. Second, these vectors are integrated with the embedding vectors obtained by aggregating the attention networks. Third, deep neural matrix factorization is used to learn linear and nonlinear user–POI interactions to mitigate the implicit feedback problem. This outcome is achieved using generalized matrix factorization and convolution-constrained multi-head self-attention mechanism deep neural networks. Extensive experiments conducted on two real-world datasets demonstrate that the DNMF-AM outperforms the optimal baseline NeuMF-CAA by 4.24% and 5.04% in terms of HR@10 and NDCG@10, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01579-w
Guanfeng Yu, Lei Zhang, Siyuan Shen, Zhengjun Zhai
Vision-inertial navigation offers a promising solution for aircraft to estimate ego-motion accurately in environments devoid of Global Navigation Satellite System (GNSS). However, existing approaches have limited adaptability for fixed-wing aircraft with high maneuverability and insufficient visual features, problems of low accuracy and subpar real-time arise. This paper introduces a novel vision-inertial heterogeneous data fusion methodology, aiming to enhance the navigation accuracy and computational efficiency of fixed-wing aircraft landing navigation. The visual front-end of the system extracts multi-scale infrared runway features and computes geo-reference runway image as observation. The infrared runway features are recognized efficiently and robustly by a lightweight end-to-end neural network from blurry infrared images, and the geo-reference runway is generated through projection of the runway’s prior geographical information and prior pose. The fusion back-end of the navigation system is the Covariance Feedback Control based Cubature Kalman Filter (CFC-CKF) framework, which tightly integrates visual observations and inertial measurements for zero-drift pose estimation and curbs the effect of inaccurate kinematic noise statistics. Finally, real flight experiments demonstrate that the algorithm can estimate the pose at a frequency of 100 Hz and fulfill the navigation accuracy requirements for high-speed landing of fixed-wing aircraft.
{"title":"Real-time vision-inertial landing navigation for fixed-wing aircraft with CFC-CKF","authors":"Guanfeng Yu, Lei Zhang, Siyuan Shen, Zhengjun Zhai","doi":"10.1007/s40747-024-01579-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01579-w","url":null,"abstract":"<p>Vision-inertial navigation offers a promising solution for aircraft to estimate ego-motion accurately in environments devoid of Global Navigation Satellite System (GNSS). However, existing approaches have limited adaptability for fixed-wing aircraft with high maneuverability and insufficient visual features, problems of low accuracy and subpar real-time arise. This paper introduces a novel vision-inertial heterogeneous data fusion methodology, aiming to enhance the navigation accuracy and computational efficiency of fixed-wing aircraft landing navigation. The visual front-end of the system extracts multi-scale infrared runway features and computes geo-reference runway image as observation. The infrared runway features are recognized efficiently and robustly by a lightweight end-to-end neural network from blurry infrared images, and the geo-reference runway is generated through projection of the runway’s prior geographical information and prior pose. The fusion back-end of the navigation system is the Covariance Feedback Control based Cubature Kalman Filter (CFC-CKF) framework, which tightly integrates visual observations and inertial measurements for zero-drift pose estimation and curbs the effect of inaccurate kinematic noise statistics. Finally, real flight experiments demonstrate that the algorithm can estimate the pose at a frequency of 100 Hz and fulfill the navigation accuracy requirements for high-speed landing of fixed-wing aircraft.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01594-x
Zhengyang Yu, Xiaojuan Chen, Chang Qu
Recognizing micro-expressions (MEs) as subtle and transient forms of human emotional expressions is critical for accurately judging human feelings. However, recognizing MEs is challenging due to their transient and low-intensity characteristics. This study develops a lightweight shallow dual-group symmetric attention network (SDGSA) to address the limitations of existing methods in capturing the subtle features of MEs. This network takes the optical flow features as inputs, extracting ME features through a shallow network and performing finer feature segmentation in the channel dimension through a dual-group strategy. The goal is to focus on different types of facial information without disrupting facial symmetry. Moreover, this study implements a spatial symmetry attention module, focusing on extracting facial symmetry features to emphasize further the symmetric information of the left and right sides of the face. Additionally, we introduce the channel blending technique to optimize the information fusion between different channel features. Extensive experiments on SMIC, CASME II, SAMM, and 3DB-combined mainstream ME datasets demonstrate that the proposed SDGSA method outperforms the metrics of current state-of-the-art methods. As shown by ablation experimental results, the proposed dual-group symmetric attention module outperforms classical attention modules, such as the convolutional block attention module, squeeze-and-excitation, efficient channel attention, spatial group-wise enhancement, and multi-head self-attention. Importantly, SDGSA maintained excellent performance while having only 0.278 million parameters. The code and model are publicly available at https://github.com/YZY980123/SDGSA.
微表情(ME)是人类微妙而短暂的情绪表达形式,识别微表情对于准确判断人类情感至关重要。然而,由于微表情的瞬时性和低强度特征,识别微表情具有挑战性。本研究开发了一种轻量级浅层双组对称注意力网络(SDGSA),以解决现有方法在捕捉 ME 细微特征方面的局限性。该网络将光流特征作为输入,通过浅层网络提取 ME 特征,并通过双组策略在通道维度上进行更精细的特征分割。其目的是在不破坏面部对称性的前提下,关注不同类型的面部信息。此外,本研究还采用了空间对称关注模块,重点提取面部对称特征,以进一步强调面部左右两侧的对称信息。此外,我们还引入了通道混合技术,以优化不同通道特征之间的信息融合。在 SMIC、CASME II、SAMM 和 3DB 合并主流 ME 数据集上进行的大量实验表明,所提出的 SDGSA 方法优于当前最先进方法的指标。消融实验结果表明,所提出的双组对称注意模块优于经典注意模块,如卷积块注意模块、挤压激励、高效通道注意、空间组增强和多头自我注意。重要的是,SDGSA 仅有 27.8 万个参数,却能保持出色的性能。代码和模型可在 https://github.com/YZY980123/SDGSA 公开获取。
{"title":"SDGSA: a lightweight shallow dual-group symmetric attention network for micro-expression recognition","authors":"Zhengyang Yu, Xiaojuan Chen, Chang Qu","doi":"10.1007/s40747-024-01594-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01594-x","url":null,"abstract":"<p>Recognizing micro-expressions (MEs) as subtle and transient forms of human emotional expressions is critical for accurately judging human feelings. However, recognizing MEs is challenging due to their transient and low-intensity characteristics. This study develops a lightweight shallow dual-group symmetric attention network (SDGSA) to address the limitations of existing methods in capturing the subtle features of MEs. This network takes the optical flow features as inputs, extracting ME features through a shallow network and performing finer feature segmentation in the channel dimension through a dual-group strategy. The goal is to focus on different types of facial information without disrupting facial symmetry. Moreover, this study implements a spatial symmetry attention module, focusing on extracting facial symmetry features to emphasize further the symmetric information of the left and right sides of the face. Additionally, we introduce the channel blending technique to optimize the information fusion between different channel features. Extensive experiments on SMIC, CASME II, SAMM, and 3DB-combined mainstream ME datasets demonstrate that the proposed SDGSA method outperforms the metrics of current state-of-the-art methods. As shown by ablation experimental results, the proposed dual-group symmetric attention module outperforms classical attention modules, such as the convolutional block attention module, squeeze-and-excitation, efficient channel attention, spatial group-wise enhancement, and multi-head self-attention. Importantly, SDGSA maintained excellent performance while having only 0.278 million parameters. The code and model are publicly available at https://github.com/YZY980123/SDGSA.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01602-0
Shaoguang Zhang, Jianguang Lu, Xianghong Tang
In the field of molecular biology, graph representation learning is crucial for molecular structure analysis. However, challenges arise in recognising functional groups and distinguishing isomers due to a lack of spatial structure information. To address these problems, we design a novel graph representation learning method based on a spatial structure information extraction Transformer (SSET). The SSET model comprises the Edge Feature Fusion Subgraph Spatial Structure Extractor (ETSE) module and the Positional Information Encoding Graph Transformer (PEGT) module. The ETSE module extracts spatial structural information by fusing edge features and generating the most-value subgraph (Mv-subgraph). The PEGT module encodes positional information based on the graph transformer, addressing the indistinguishability problem among nodes with identical features. In addition, the SSET model alleviates the burden of high computational complexity by using subgraph. Experiments on real datasets show that the SSET model, built on the graph transformer, considerably improves graph representation learning.
{"title":"Molecular subgraph representation learning based on spatial structure transformer","authors":"Shaoguang Zhang, Jianguang Lu, Xianghong Tang","doi":"10.1007/s40747-024-01602-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01602-0","url":null,"abstract":"<p>In the field of molecular biology, graph representation learning is crucial for molecular structure analysis. However, challenges arise in recognising functional groups and distinguishing isomers due to a lack of spatial structure information. To address these problems, we design a novel graph representation learning method based on a spatial structure information extraction Transformer (SSET). The SSET model comprises the Edge Feature Fusion Subgraph Spatial Structure Extractor (ETSE) module and the Positional Information Encoding Graph Transformer (PEGT) module. The ETSE module extracts spatial structural information by fusing edge features and generating the most-value subgraph (Mv-subgraph). The PEGT module encodes positional information based on the graph transformer, addressing the indistinguishability problem among nodes with identical features. In addition, the SSET model alleviates the burden of high computational complexity by using subgraph. Experiments on real datasets show that the SSET model, built on the graph transformer, considerably improves graph representation learning.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}