首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Enhancement-suppression driven lightweight fine-grained micro-expression recognition
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-24 DOI: 10.1016/j.jvcir.2024.104383
Xinmiao Ding , Yuanyuan Li , Yulin Wu , Wen Guo
Micro-expressions are short-lived and authentic emotional expressions used in several fields such as deception detection, criminal analysis, and medical diagnosis. Although deep learning-based approaches have achieved outstanding performance in micro-expression recognition, the recognition performance of lightweight networks for terminal applications is still unsatisfactory. This is mainly because existing models either excessively focus on a single region or lack comprehensiveness in identifying various regions, resulting in insufficient extraction of fine-grained features. To address this problem, this paper proposes a lightweight micro-expression recognition framework –Lightweight Fine-Grained Network (LFGNet). The proposed network adopts EdgeNeXt as the backbone network to effectively combine local and global features, as a result, it greatly reduces the complexity of the model while capturing micro-expression actions. To further enhance the feature extraction ability of the model, the Enhancement-Suppression Module (ESM) is developed where the Feature Suppression Module(FSM) is used to force the model to extract other potential features at deeper layers. Finally, a multi-scale Feature Fusion Module (FFM) is proposed to weigh the fusion of the learned features at different granularity scales for improving the robustness of the model. Experimental results, obtained from four datasets, demonstrate that the proposed method outperforms already existing methods in terms of recognition accuracy and model complexity.
{"title":"Enhancement-suppression driven lightweight fine-grained micro-expression recognition","authors":"Xinmiao Ding ,&nbsp;Yuanyuan Li ,&nbsp;Yulin Wu ,&nbsp;Wen Guo","doi":"10.1016/j.jvcir.2024.104383","DOIUrl":"10.1016/j.jvcir.2024.104383","url":null,"abstract":"<div><div>Micro-expressions are short-lived and authentic emotional expressions used in several fields such as deception detection, criminal analysis, and medical diagnosis. Although deep learning-based approaches have achieved outstanding performance in micro-expression recognition, the recognition performance of lightweight networks for terminal applications is still unsatisfactory. This is mainly because existing models either excessively focus on a single region or lack comprehensiveness in identifying various regions, resulting in insufficient extraction of fine-grained features. To address this problem, this paper proposes a lightweight micro-expression recognition framework –Lightweight Fine-Grained Network (LFGNet). The proposed network adopts EdgeNeXt as the backbone network to effectively combine local and global features, as a result, it greatly reduces the complexity of the model while capturing micro-expression actions. To further enhance the feature extraction ability of the model, the Enhancement-Suppression Module (ESM) is developed where the Feature Suppression Module(FSM) is used to force the model to extract other potential features at deeper layers. Finally, a multi-scale Feature Fusion Module (FFM) is proposed to weigh the fusion of the learned features at different granularity scales for improving the robustness of the model. Experimental results, obtained from four datasets, demonstrate that the proposed method outperforms already existing methods in terms of recognition accuracy and model complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104383"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Register assisted aggregation for visual place recognition
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-24 DOI: 10.1016/j.jvcir.2024.104384
Xuan Yu, Zhenyong Fu
Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.
{"title":"Register assisted aggregation for visual place recognition","authors":"Xuan Yu,&nbsp;Zhenyong Fu","doi":"10.1016/j.jvcir.2024.104384","DOIUrl":"10.1016/j.jvcir.2024.104384","url":null,"abstract":"<div><div>Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104384"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Devising a comprehensive synthetic underwater image dataset
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-24 DOI: 10.1016/j.jvcir.2024.104386
Kuruma Purnima, C.Siva Kumar
The underwater environment is characterized by complex light interactions, including effects such as color loss, contrast loss, water distortion, backscatter, light attenuation, and color cast, which vary depending on water purity, depth, and other factors. While many datasets in the literature contain specific ground-truth images, image pairs, or limited analysis with metrics, there is a need for a comprehensive dataset that covers a wide range of underwater effects with varying severity levels. This paper introduces a dataset consisting of 100 ground-truth images and 15,000 synthetic underwater images. Given the complexity of underwater light variations, simulating these effects is challenging. This study approximates the underwater effects using implementable combinations of color cast, blurring, low-light, and contrast reduction. In addition to generating 15,100 images, the dataset includes a comprehensive analysis with 21 focus metrics, such as the average contrast measure operator and Brenner’s gradient-based metric, as well as 7 statistical measures, including mean intensity and skewness.
{"title":"Devising a comprehensive synthetic underwater image dataset","authors":"Kuruma Purnima,&nbsp;C.Siva Kumar","doi":"10.1016/j.jvcir.2024.104386","DOIUrl":"10.1016/j.jvcir.2024.104386","url":null,"abstract":"<div><div>The underwater environment is characterized by complex light interactions, including effects such as color loss, contrast loss, water distortion, backscatter, light attenuation, and color cast, which vary depending on water purity, depth, and other factors. While many datasets in the literature contain specific ground-truth images, image pairs, or limited analysis with metrics, there is a need for a comprehensive dataset that covers a wide range of underwater effects with varying severity levels. This paper introduces a dataset consisting of 100 ground-truth images and 15,000 synthetic underwater images. Given the complexity of underwater light variations, simulating these effects is challenging. This study approximates the underwater effects using implementable combinations of color cast, blurring, low-light, and contrast reduction. In addition to generating 15,100 images, the dataset includes a comprehensive analysis with 21 focus metrics, such as the average contrast measure operator and Brenner’s gradient-based metric, as well as 7 statistical measures, including mean intensity and skewness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104386"},"PeriodicalIF":2.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local flow propagation and global multi-scale dilated Transformer for video inpainting
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-23 DOI: 10.1016/j.jvcir.2024.104380
Yuting Zuo , Jing Chen , Kaixing Wang , Qi Lin , Huanqiang Zeng
In this paper, a video inpainting framework that combines Local Flow Propagation with the Global Multi-scale Dilated Transformer, referred to as LFP-GMDT, is proposed. First, optical flow is utilized to guide the bidirectional propagation of features between adjacent frames for local inpainting. With the introduction of deformable convolutions, optical flow errors are corrected, substantially enhancing the accuracy of both local inpainting and frame alignment. Following the local inpainting stage, a multi-scale dilated Transformer module is designed for global inpainting. This module integrates multi-scale feature representations with an attention mechanism, introducing a multi-scale dilated attention mechanism that balances the modeling capabilities of local details and global structures while reducing computational complexity. Experimental results show that, compared to existing models, LFP-GMDT performs exceptionally well in detail restoration and structural integrity, particularly excelling in the recovery of edge structures, leading to an overall enhancement in visual quality.
{"title":"Local flow propagation and global multi-scale dilated Transformer for video inpainting","authors":"Yuting Zuo ,&nbsp;Jing Chen ,&nbsp;Kaixing Wang ,&nbsp;Qi Lin ,&nbsp;Huanqiang Zeng","doi":"10.1016/j.jvcir.2024.104380","DOIUrl":"10.1016/j.jvcir.2024.104380","url":null,"abstract":"<div><div>In this paper, a video inpainting framework that combines Local Flow Propagation with the Global Multi-scale Dilated Transformer, referred to as LFP-GMDT, is proposed. First, optical flow is utilized to guide the bidirectional propagation of features between adjacent frames for local inpainting. With the introduction of deformable convolutions, optical flow errors are corrected, substantially enhancing the accuracy of both local inpainting and frame alignment. Following the local inpainting stage, a multi-scale dilated Transformer module is designed for global inpainting. This module integrates multi-scale feature representations with an attention mechanism, introducing a multi-scale dilated attention mechanism that balances the modeling capabilities of local details and global structures while reducing computational complexity. Experimental results show that, compared to existing models, LFP-GMDT performs exceptionally well in detail restoration and structural integrity, particularly excelling in the recovery of edge structures, leading to an overall enhancement in visual quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104380"},"PeriodicalIF":2.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PMDNet: A multi-stage approach to single image dehazing with contextual and spatial feature preservation
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-20 DOI: 10.1016/j.jvcir.2024.104379
D. Pushpalatha, P. Prithvi
Hazy images suffer from degraded contrast and visibility due to atmospheric factors, affecting the accuracy of object detection in computer vision tasks. To address this, we propose a novel Progressive Multiscale Dehazing Network (PMDNet) for restoring the original quality of hazy images. Our network aims to balance high-level contextual information and spatial details effectively during the image recovery process. PMDNet employs a multi-stage architecture that gradually learns to remove haze by breaking down the dehazing process into manageable steps. Starting with a U-Net encoder-decoder to capture high-level context, PMDNet integrates a subnetwork to preserve local feature details. A SAN reweights features at each stage, ensuring smooth information transfer and preventing loss through cross-connections. Extensive experiments on datasets like RESIDE, I-HAZE, O-HAZE, D-HAZE, REAL-HAZE48, RTTS and Forest datasets, demonstrate the robustness of PMDNet, achieving strong qualitative and quantitative results.
{"title":"PMDNet: A multi-stage approach to single image dehazing with contextual and spatial feature preservation","authors":"D. Pushpalatha,&nbsp;P. Prithvi","doi":"10.1016/j.jvcir.2024.104379","DOIUrl":"10.1016/j.jvcir.2024.104379","url":null,"abstract":"<div><div>Hazy images suffer from degraded contrast and visibility due to atmospheric factors, affecting the accuracy of object detection in computer vision tasks. To address this, we propose a novel Progressive Multiscale Dehazing Network (PMDNet) for restoring the original quality of hazy images. Our network aims to balance high-level contextual information and spatial details effectively during the image recovery process. PMDNet employs a multi-stage architecture that gradually learns to remove haze by breaking down the dehazing process into manageable steps. Starting with a U-Net encoder-decoder to capture high-level context, PMDNet integrates a subnetwork to preserve local feature details. A SAN reweights features at each stage, ensuring smooth information transfer and preventing loss through cross-connections. Extensive experiments on datasets like RESIDE, I-HAZE, O-HAZE, D-HAZE, REAL-HAZE48, RTTS and Forest datasets, demonstrate the robustness of PMDNet, achieving strong qualitative and quantitative results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104379"},"PeriodicalIF":2.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight gesture recognition network
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-20 DOI: 10.1016/j.jvcir.2024.104362
Jinzhao Guo, Xuemei Lei, Bo Li
As one of the main human–computer interaction methods, gesture recognition has an urgent issue to be addressed, which huge paramaters and massive computation of the classification and recognition algorithm cause high cost in practical applications. To reduce cost and enhance the detection efficiency, a lightweight model of gesture recognition algorithms is proposed in this paper, based on the YOLOv5s framework. Firstly, we adopt ShuffleNetV2 as the backbone network to reduce the computational load and enhance the model’s detection speed. Additionally, lightweight modules such as GSConv and VoVGSCSP are introduced into the neck network to further compress the model size while maintaining accuracy. Furthermore, the BiFPN (Bi-directional Feature Pyramid Network) structure is incorporated to enhance the network’s detection accuracy at a lower computational cost. Lastly, we introduce the Coordinate Attention (CA) mechanism to enhance the network’s focus on key features. To investigate the rationale behind the introduction of the CA attention mechanism and the BiFPN network structure, we analyze the extracted features and validate the network’s attention on different parts of the feature maps through visualization. Experimental results demonstrate that the proposed algorithm achieves an average precision of 95.2% on the HD-HaGRID dataset. Compared to the original YOLOv5s model, the proposal model reduces the parameter count by 70.6% and the model size by 69.2%. Therefore, this model is suitable for real-time gesture recognition classification and detection, demonstrating significant potential for practical applications.
{"title":"A lightweight gesture recognition network","authors":"Jinzhao Guo,&nbsp;Xuemei Lei,&nbsp;Bo Li","doi":"10.1016/j.jvcir.2024.104362","DOIUrl":"10.1016/j.jvcir.2024.104362","url":null,"abstract":"<div><div>As one of the main human–computer interaction methods, gesture recognition has an urgent issue to be addressed, which huge paramaters and massive computation of the classification and recognition algorithm cause high cost in practical applications. To reduce cost and enhance the detection efficiency, a lightweight model of gesture recognition algorithms is proposed in this paper, based on the YOLOv5s framework. Firstly, we adopt ShuffleNetV2 as the backbone network to reduce the computational load and enhance the model’s detection speed. Additionally, lightweight modules such as GSConv and VoVGSCSP are introduced into the neck network to further compress the model size while maintaining accuracy. Furthermore, the BiFPN (Bi-directional Feature Pyramid Network) structure is incorporated to enhance the network’s detection accuracy at a lower computational cost. Lastly, we introduce the Coordinate Attention (CA) mechanism to enhance the network’s focus on key features. To investigate the rationale behind the introduction of the CA attention mechanism and the BiFPN network structure, we analyze the extracted features and validate the network’s attention on different parts of the feature maps through visualization. Experimental results demonstrate that the proposed algorithm achieves an average precision of 95.2% on the HD-HaGRID dataset. Compared to the original YOLOv5s model, the proposal model reduces the parameter count by 70.6% and the model size by 69.2%. Therefore, this model is suitable for real-time gesture recognition classification and detection, demonstrating significant potential for practical applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104362"},"PeriodicalIF":2.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing industrial anomaly detection with Mamba-inspired feature fusion
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-19 DOI: 10.1016/j.jvcir.2024.104368
Mingjing Pei , Xiancun Zhou , Yourui Huang , Fenghui Zhang , Mingli Pei , Yadong Yang , Shijian Zheng , Mai Xin
Image anomaly detection is crucial in industrial applications, with significant research value and practical application potential. Despite recent advancements using image segmentation techniques, challenges remain in global feature extraction, computational complexity, and pixel-level anomaly localization. A scheme is designed to address the issues above. First, the Mamba concept is introduced to enhance global feature extraction while reducing computational complexity. This dual benefit optimizes performance in both aspects. Second, an effective feature fusion module is designed to integrate low-level information into high-level features, improving segmentation accuracy by enabling more precise decoding. The proposed model was evaluated on three datasets, including MVTec AD, BTAD, and AeBAD, demonstrating superior performance across different types of anomalies. Specifically, on the MVTec AD dataset, our method achieved an average AUROC of 99.1% for image-level anomalies and 98.1% for pixel-level anomalies, including a state-of-the-art (SOTA) result of 100% AUROC in the texture anomaly category. These results demonstrate the effectiveness of our method as a valuable reference for industrial image anomaly detection.
{"title":"Enhancing industrial anomaly detection with Mamba-inspired feature fusion","authors":"Mingjing Pei ,&nbsp;Xiancun Zhou ,&nbsp;Yourui Huang ,&nbsp;Fenghui Zhang ,&nbsp;Mingli Pei ,&nbsp;Yadong Yang ,&nbsp;Shijian Zheng ,&nbsp;Mai Xin","doi":"10.1016/j.jvcir.2024.104368","DOIUrl":"10.1016/j.jvcir.2024.104368","url":null,"abstract":"<div><div>Image anomaly detection is crucial in industrial applications, with significant research value and practical application potential. Despite recent advancements using image segmentation techniques, challenges remain in global feature extraction, computational complexity, and pixel-level anomaly localization. A scheme is designed to address the issues above. First, the Mamba concept is introduced to enhance global feature extraction while reducing computational complexity. This dual benefit optimizes performance in both aspects. Second, an effective feature fusion module is designed to integrate low-level information into high-level features, improving segmentation accuracy by enabling more precise decoding. The proposed model was evaluated on three datasets, including MVTec AD, BTAD, and AeBAD, demonstrating superior performance across different types of anomalies. Specifically, on the MVTec AD dataset, our method achieved an average AUROC of 99.1% for image-level anomalies and 98.1% for pixel-level anomalies, including a state-of-the-art (SOTA) result of 100% AUROC in the texture anomaly category. These results demonstrate the effectiveness of our method as a valuable reference for industrial image anomaly detection.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104368"},"PeriodicalIF":2.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCMixer: Radar-camera fusion based on vision transformer for robust object detection
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104367
Lindong Wang , Hongya Tuo , Yu Yuan , Henry Leung , Zhongliang Jing
In real-world object detection applications, the camera would be affected by poor lighting conditions, resulting in a deteriorate performance. Millimeter-wave radar and camera have complementary advantages, radar point cloud can help detecting small objects under low light. In this study, we focus on feature-level fusion and propose a novel end-to-end detection network RCMixer. RCMixer mainly includes depth pillar expansion(DPE), hierarchical vision transformer and radar spatial attention (RSA) module. DPE enhances radar projection image according to perspective principle and invariance assumption of adjacent depth; The hierarchical vision transformer backbone alternates the feature extraction of spatial dimension and channel dimension; RSA extracts the radar attention, then it fuses radar and camera features at the late stage. The experiment results on nuScenes dataset show that the accuracy of RCMixer exceeds all comparison networks and its detection ability of small objects in dark light is better than the camera-only method. In addition, the ablation study demonstrates the effectiveness of our method.
{"title":"RCMixer: Radar-camera fusion based on vision transformer for robust object detection","authors":"Lindong Wang ,&nbsp;Hongya Tuo ,&nbsp;Yu Yuan ,&nbsp;Henry Leung ,&nbsp;Zhongliang Jing","doi":"10.1016/j.jvcir.2024.104367","DOIUrl":"10.1016/j.jvcir.2024.104367","url":null,"abstract":"<div><div>In real-world object detection applications, the camera would be affected by poor lighting conditions, resulting in a deteriorate performance. Millimeter-wave radar and camera have complementary advantages, radar point cloud can help detecting small objects under low light. In this study, we focus on feature-level fusion and propose a novel end-to-end detection network RCMixer. RCMixer mainly includes depth pillar expansion(DPE), hierarchical vision transformer and radar spatial attention (RSA) module. DPE enhances radar projection image according to perspective principle and invariance assumption of adjacent depth; The hierarchical vision transformer backbone alternates the feature extraction of spatial dimension and channel dimension; RSA extracts the radar attention, then it fuses radar and camera features at the late stage. The experiment results on nuScenes dataset show that the accuracy of RCMixer exceeds all comparison networks and its detection ability of small objects in dark light is better than the camera-only method. In addition, the ablation study demonstrates the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104367"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale UAV image stitching based on global registration optimization and graph-cut method
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-18 DOI: 10.1016/j.jvcir.2024.104354
Zhongxing Wang , Zhizhong Fu , Jin Xu
This paper presents a large-scale unmanned aerial vehicle (UAV) image stitching method based on global registration optimization and the graph-cut technique. To minimize cumulative registration errors in large-scale image stitching, we propose a two-step global registration optimization approach, which includes affine transformation optimization followed by projective transformation optimization. Evenly distributed matching points are used to formulate the objective function for registration optimization, with the optimal affine transformation serving as the initial value for projective transformation optimization. Additionally, a rigid constraint is incorporated as the regularization term for projective transformation optimization to preserve shape and prevent unnatural warping of the aligned images. After global registration, the graph-cut method is employed to blend the aligned images and generate the final mosaic. The proposed method is evaluated on five UAV-captured remote sensing image datasets. Experimental results demonstrate that our approach effectively aligns multiple images and produces high-quality, seamless mosaics.
{"title":"Large-scale UAV image stitching based on global registration optimization and graph-cut method","authors":"Zhongxing Wang ,&nbsp;Zhizhong Fu ,&nbsp;Jin Xu","doi":"10.1016/j.jvcir.2024.104354","DOIUrl":"10.1016/j.jvcir.2024.104354","url":null,"abstract":"<div><div>This paper presents a large-scale unmanned aerial vehicle (UAV) image stitching method based on global registration optimization and the graph-cut technique. To minimize cumulative registration errors in large-scale image stitching, we propose a two-step global registration optimization approach, which includes affine transformation optimization followed by projective transformation optimization. Evenly distributed matching points are used to formulate the objective function for registration optimization, with the optimal affine transformation serving as the initial value for projective transformation optimization. Additionally, a rigid constraint is incorporated as the regularization term for projective transformation optimization to preserve shape and prevent unnatural warping of the aligned images. After global registration, the graph-cut method is employed to blend the aligned images and generate the final mosaic. The proposed method is evaluated on five UAV-captured remote sensing image datasets. Experimental results demonstrate that our approach effectively aligns multiple images and produces high-quality, seamless mosaics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104354"},"PeriodicalIF":2.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-17 DOI: 10.1016/j.jvcir.2024.104364
Lichuan Geng , Jie Chen , Yun Tie , Lin Qi , Chengwu Liang
The area of human–computer interaction has generated considerable interest in dynamic gesture recognition. However, the intrinsic qualities of the gestures themselves, including their flexibility and spatial scale, as well as external factors such as lighting and background, have impeded the improvement of recognition accuracy. To address this, we present a novel end-to-end recognition network named 3D Central Difference Separable Residual Long Short-Term Memory (LSTM) Coordinate Attention (3D CRLCA) in this paper. The network is composed of three components: (1) 3D Central Difference Separable Convolution (3D CDSC), (2) a residual module to enhance the network’s capability to distinguish between categories, and (3) an LSTM-Coordinate Attention (LSTM-CA) module to direct the network’s attention to the gesture region and its temporal and spatial characteristics. Our experiments using the ChaLearn Large-scale Gesture Recognition Dataset (IsoGD) and IPN datasets demonstrate the effectiveness of our approach, surpassing other existing methods.
{"title":"Dynamic gesture recognition using 3D central difference separable residual LSTM coordinate attention networks","authors":"Lichuan Geng ,&nbsp;Jie Chen ,&nbsp;Yun Tie ,&nbsp;Lin Qi ,&nbsp;Chengwu Liang","doi":"10.1016/j.jvcir.2024.104364","DOIUrl":"10.1016/j.jvcir.2024.104364","url":null,"abstract":"<div><div>The area of human–computer interaction has generated considerable interest in dynamic gesture recognition. However, the intrinsic qualities of the gestures themselves, including their flexibility and spatial scale, as well as external factors such as lighting and background, have impeded the improvement of recognition accuracy. To address this, we present a novel end-to-end recognition network named 3D Central Difference Separable Residual Long Short-Term Memory (LSTM) Coordinate Attention (3D CRLCA) in this paper. The network is composed of three components: (1) 3D Central Difference Separable Convolution (3D CDSC), (2) a residual module to enhance the network’s capability to distinguish between categories, and (3) an LSTM-Coordinate Attention (LSTM-CA) module to direct the network’s attention to the gesture region and its temporal and spatial characteristics. Our experiments using the ChaLearn Large-scale Gesture Recognition Dataset (IsoGD) and IPN datasets demonstrate the effectiveness of our approach, surpassing other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104364"},"PeriodicalIF":2.6,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1