Pub Date : 2025-01-02DOI: 10.1016/j.dsp.2024.104968
Pham Thi Thanh Huyen , Bo Quoc Bao , Nguyen Thu Phuong
This paper considers the effectiveness of short packet communication (SPC) in a full-duplex (FD) relay system with a non-orthogonal multiple access (NOMA) scheme. We use mathematical methods to derive the block error rate (BLER) and average achievable rate (AAR) of the proposed system in which residual-self interference (RSI) and imperfect successive interference cancellation (SIC) are taken into consideration. We optimize the power allocation coefficient to minimize the average BLER and investigate the influence of factors such as block-length, transmission bits, RSI, and imperfect SIC on system performance. We enhance the BLER and AAR by employing maximum-ratio combining (MRC) and maximum-ratio transmission (MRT) with multiple antennas at the source and destinations. The results indicate that using a multi-antenna for the considered NOMA FD relay system can achieve the criteria of ultra-reliable and low-latency communications (URLLC). It also demonstrates the superiority of NOMA over conventional orthogonal multiple access (OMA) schemes and FD over half duplex (HD) schemes. The system exhibits an exceptionally low average BLER. Our analytical frameworks are validated by simulation results, providing valuable insights for optimizing the power allocation coefficients to minimize BLER in the proposed system.
{"title":"Full-duplex relay non-orthogonal multiple access networks with short-packet communication: Block error rate and average achievable rate analysis","authors":"Pham Thi Thanh Huyen , Bo Quoc Bao , Nguyen Thu Phuong","doi":"10.1016/j.dsp.2024.104968","DOIUrl":"10.1016/j.dsp.2024.104968","url":null,"abstract":"<div><div>This paper considers the effectiveness of short packet communication (SPC) in a full-duplex (FD) relay system with a non-orthogonal multiple access (NOMA) scheme. We use mathematical methods to derive the block error rate (BLER) and average achievable rate (AAR) of the proposed system in which residual-self interference (RSI) and imperfect successive interference cancellation (SIC) are taken into consideration. We optimize the power allocation coefficient to minimize the average BLER and investigate the influence of factors such as block-length, transmission bits, RSI, and imperfect SIC on system performance. We enhance the BLER and AAR by employing maximum-ratio combining (MRC) and maximum-ratio transmission (MRT) with multiple antennas at the source and destinations. The results indicate that using a multi-antenna for the considered NOMA FD relay system can achieve the criteria of ultra-reliable and low-latency communications (URLLC). It also demonstrates the superiority of NOMA over conventional orthogonal multiple access (OMA) schemes and FD over half duplex (HD) schemes. The system exhibits an exceptionally low average BLER. Our analytical frameworks are validated by simulation results, providing valuable insights for optimizing the power allocation coefficients to minimize BLER in the proposed system.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"159 ","pages":"Article 104968"},"PeriodicalIF":2.9,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143144273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.dsp.2024.104967
Diyuan Xu , Yide Wang , Biyun Ma , Qingqing Zhu , Julien Sarrazin
The principal-singular-vector utilization modal analysis (PUMA) related algorithms have been proposed to address the problem of insufficient robustness of the method of direction estimation (MODE) related algorithms, which are sensitive to the parity of the number of sources due to the additional assumption and constraints on the symmetry of the root polynomial coefficients. Moreover, the MODE-related algorithms do not have severe performance degradation when the source covariance matrix is rank deficient, however, the initial PUMA-related algorithms will have a degraded performance under such circumstances. The initial PUMA is developed using a full rank source covariance matrix hypothesis, which is not valid for coherent sources. In this paper, a rigorous extension of the PUMA and enhanced-PUMA (EPUMA) is proposed to handle the case where the source covariance matrix may be rank deficient. The modified PUMA/EPUMA (Mod-PUMA/EPUMA) can be applied rigorously in the case of multiple coherent sources. In addition, it has lower computational complexity and faster convergence than the initial PUMA/EPUMA. The effectiveness of the Mod-PUMA/EPUMA is shown by experimental comparison with the initial PUMA-related algorithms and MODE-related algorithms.
{"title":"A modified PUMA/EPUMA for direction-of-arrival estimation of coherent sources","authors":"Diyuan Xu , Yide Wang , Biyun Ma , Qingqing Zhu , Julien Sarrazin","doi":"10.1016/j.dsp.2024.104967","DOIUrl":"10.1016/j.dsp.2024.104967","url":null,"abstract":"<div><div>The principal-singular-vector utilization modal analysis (PUMA) related algorithms have been proposed to address the problem of insufficient robustness of the method of direction estimation (MODE) related algorithms, which are sensitive to the parity of the number of sources due to the additional assumption and constraints on the symmetry of the root polynomial coefficients. Moreover, the MODE-related algorithms do not have severe performance degradation when the source covariance matrix is rank deficient, however, the initial PUMA-related algorithms will have a degraded performance under such circumstances. The initial PUMA is developed using a full rank source covariance matrix hypothesis, which is not valid for coherent sources. In this paper, a rigorous extension of the PUMA and enhanced-PUMA (EPUMA) is proposed to handle the case where the source covariance matrix may be rank deficient. The modified PUMA/EPUMA (Mod-PUMA/EPUMA) can be applied rigorously in the case of multiple coherent sources. In addition, it has lower computational complexity and faster convergence than the initial PUMA/EPUMA. The effectiveness of the Mod-PUMA/EPUMA is shown by experimental comparison with the initial PUMA-related algorithms and MODE-related algorithms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104967"},"PeriodicalIF":2.9,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.dsp.2025.104979
Dezhao Zhai , Wei Chen , Baoming Miao , Fulong Liu , Siqi Han , Yinghao Ding , Ming Yu , Hang Wu
In recent years, remote sensing image classification tasks have garnered widespread attention and have been extensively studied by researchers. Most current studies focus on improving classification accuracy, leading to overly large and complex networks with high computational costs that are challenging to deploy for real-time remote sensing tasks. To address this issue, neural network pruning has emerged as an effective solution. However, existing pruning methods typically prune along a single dimension, and as the pruning ratio increases, important weights in that dimension often suffer from over-pruning, resulting in significant accuracy loss. This paper proposes a novel pruning method for remote sensing scene classification—Multidimensional Space Pruning (MSP). MSP performs stereoscopic pruning of filters along both channel and depth dimensions, simultaneously removing redundant information across two different dimensions. This prevents excessive pruning of important weights in a single dimension, thereby significantly reducing model complexity while maintaining accuracy. As a novel pruning method, MSP achieves remarkable results. At a pruning ratio of 0.4, MSP-pruned VGG-16 and ResNet-34 models on the NWPU-RESISC45 dataset show accuracy drops of only 1.05 % and 0.71 %, respectively, while achieving compression ratios of 92.52 % and 93.19 %. Similarly, on the AID dataset, the accuracy drops are merely 0.26 % and 0.54 %, with compression ratios reaching 96.23 % and 88.56 %, respectively. Experimental results on two public remote sensing image datasets demonstrate that compared to existing methods, MSP achieves higher compression ratios while maintaining model accuracy, showcasing superior model compression performance.
{"title":"Multi-dimensional spatial pruning for remote sensing image scene classification","authors":"Dezhao Zhai , Wei Chen , Baoming Miao , Fulong Liu , Siqi Han , Yinghao Ding , Ming Yu , Hang Wu","doi":"10.1016/j.dsp.2025.104979","DOIUrl":"10.1016/j.dsp.2025.104979","url":null,"abstract":"<div><div>In recent years, remote sensing image classification tasks have garnered widespread attention and have been extensively studied by researchers. Most current studies focus on improving classification accuracy, leading to overly large and complex networks with high computational costs that are challenging to deploy for real-time remote sensing tasks. To address this issue, neural network pruning has emerged as an effective solution. However, existing pruning methods typically prune along a single dimension, and as the pruning ratio increases, important weights in that dimension often suffer from over-pruning, resulting in significant accuracy loss. This paper proposes a novel pruning method for remote sensing scene classification—Multidimensional Space Pruning (MSP). MSP performs stereoscopic pruning of filters along both channel and depth dimensions, simultaneously removing redundant information across two different dimensions. This prevents excessive pruning of important weights in a single dimension, thereby significantly reducing model complexity while maintaining accuracy. As a novel pruning method, MSP achieves remarkable results. At a pruning ratio of 0.4, MSP-pruned VGG-16 and ResNet-34 models on the NWPU-RESISC45 dataset show accuracy drops of only 1.05 % and 0.71 %, respectively, while achieving compression ratios of 92.52 % and 93.19 %. Similarly, on the AID dataset, the accuracy drops are merely 0.26 % and 0.54 %, with compression ratios reaching 96.23 % and 88.56 %, respectively. Experimental results on two public remote sensing image datasets demonstrate that compared to existing methods, MSP achieves higher compression ratios while maintaining model accuracy, showcasing superior model compression performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104979"},"PeriodicalIF":2.9,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.dsp.2024.104965
Hailiang Ye , Xiaomei Huang , Houying Zhu , Feilong Cao
Graph neural networks (GNNs) have substantially advanced hyperspectral image (HSI) classification. However, GNN-based methods encounter challenges in identifying significant discriminative features with high similarity across long distances and transmitting high-order neighborhood information. Consequently, this paper proposes an enhanced network based on parallel graph node diffusion (PGNDE) for HSI classification. Its core develops a parallel multi-scale graph attention diffusion module and a node similarity contrastive loss. Specifically, the former first constructs a multi-head attention-forward propagation (AFP) module for different scales, which incorporates multi-hop contextual information into attention calculation and diffuses information in parallel throughout the network to capture critical feature information within the HSI. Afterward, it builds an adaptive weight computation layer that collaborates with multiple parallel AFP modules, enabling the adaptive calculation of node feature weights from various AFP modules and generating desired node representations. Moreover, a node similarity contrastive loss is devised to facilitate the similarity between superpixels from the same category. Experiments with several benchmark HSI datasets validate the effectiveness of PGNDAF across existing methods.
{"title":"An enhanced network with parallel graph node diffusion and node similarity contrastive loss for hyperspectral image classification","authors":"Hailiang Ye , Xiaomei Huang , Houying Zhu , Feilong Cao","doi":"10.1016/j.dsp.2024.104965","DOIUrl":"10.1016/j.dsp.2024.104965","url":null,"abstract":"<div><div>Graph neural networks (GNNs) have substantially advanced hyperspectral image (HSI) classification. However, GNN-based methods encounter challenges in identifying significant discriminative features with high similarity across long distances and transmitting high-order neighborhood information. Consequently, this paper proposes an enhanced network based on parallel graph node diffusion (PGNDE) for HSI classification. Its core develops a parallel multi-scale graph attention diffusion module and a node similarity contrastive loss. Specifically, the former first constructs a multi-head attention-forward propagation (AFP) module for different scales, which incorporates multi-hop contextual information into attention calculation and diffuses information in parallel throughout the network to capture critical feature information within the HSI. Afterward, it builds an adaptive weight computation layer that collaborates with multiple parallel AFP modules, enabling the adaptive calculation of node feature weights from various AFP modules and generating desired node representations. Moreover, a node similarity contrastive loss is devised to facilitate the similarity between superpixels from the same category. Experiments with several benchmark HSI datasets validate the effectiveness of PGNDAF across existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104965"},"PeriodicalIF":2.9,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1016/j.dsp.2024.104978
Huadeng Wang , Zhipeng Liu , Xipeng Pan , Kang Yu , Rushi Lan , Junlin Guan , Bingbing Li
Mitosis nuclei counting is one of the important indicators for the pathological diagnosis and histological grade of breast cancer. With the development of deep learning methods, there have been some models for the automatic recognition of mitosis nuclei with good performance. However, due to the complex and diverse evolution stages of mitosis nuclei, automatic recognition of mitosis nuclei is very challenging, and the performance and generalization ability of the currently proposed models need to be greatly enhanced. Meanwhile, the manual annotation of images in deep learning-based model training requires experienced pathologists, which is very time-consuming and inefficient. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation network achieves high recall performance by incorporating the proposed depthwise separable convolution residual block and a channel-spatial attention gate, where the latter innovatively combines both channel and spatial attention mechanisms and utilizes a simple GRU unit for effective feature fusion. Then, a classification network is cascaded to improve the detection performance of mitosis nuclei further. The proposed model is verified on the pixel-level annotated ICPR 2012 dataset, which was annotated by professional pathologists, achieving the highest F1-score of 0.8687 compared with the current state-of-the-art algorithms. Additionally, the model demonstrates superior performance on the Ganzhou Municipal Hospital (GZMH) dataset, also annotated by professional pathologists, which was first released with this paper by the authors.
{"title":"A novel dataset and a two-stage deep learning method for breast cancer mitosis nuclei identification","authors":"Huadeng Wang , Zhipeng Liu , Xipeng Pan , Kang Yu , Rushi Lan , Junlin Guan , Bingbing Li","doi":"10.1016/j.dsp.2024.104978","DOIUrl":"10.1016/j.dsp.2024.104978","url":null,"abstract":"<div><div>Mitosis nuclei counting is one of the important indicators for the pathological diagnosis and histological grade of breast cancer. With the development of deep learning methods, there have been some models for the automatic recognition of mitosis nuclei with good performance. However, due to the complex and diverse evolution stages of mitosis nuclei, automatic recognition of mitosis nuclei is very challenging, and the performance and generalization ability of the currently proposed models need to be greatly enhanced. Meanwhile, the manual annotation of images in deep learning-based model training requires experienced pathologists, which is very time-consuming and inefficient. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation network achieves high recall performance by incorporating the proposed depthwise separable convolution residual block and a channel-spatial attention gate, where the latter innovatively combines both channel and spatial attention mechanisms and utilizes a simple GRU unit for effective feature fusion. Then, a classification network is cascaded to improve the detection performance of mitosis nuclei further. The proposed model is verified on the pixel-level annotated ICPR 2012 dataset, which was annotated by professional pathologists, achieving the highest F1-score of 0.8687 compared with the current state-of-the-art algorithms. Additionally, the model demonstrates superior performance on the Ganzhou Municipal Hospital (GZMH) dataset, also annotated by professional pathologists, which was first released with this paper by the authors.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104978"},"PeriodicalIF":2.9,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1016/j.dsp.2024.104976
Jian Xiong , Jie Wu , Ming Tang , Pengwen Xiong , Yushui Huang , Hang Guo
The ViBe algorithm is a motion target detection algorithm based on a static background. To address the issue where static foreground objects mistakenly incorporated into the background can destroy the real background, leading to poor target detection or even missed targets, this paper proposes a two-layer model (initial model-optimized model) that uses both forward and backward correlation to acquire long-term background information, thereby improving dynamic target detection performance. The method uses a regular Vibe as the initial model, whose initial background samples are initialised by a random time-based image initialisation method, and the initial background obtained from the initial model is used as an input to the optimised model. Then, based on the relationship between the initial model background and the optimised model background, freeze the background pixel update of the corresponding area of the optimised model, when there is a significant difference between the corresponding area of the static foreground in the initial model background and the optimized model background. By taking advantage of the difference between the update cycles of the front and back models, the long-term real background is obtained. Finally, a long-term background is used to detect dynamic targets and determine the target motion state. In order to verify the effectiveness and accuracy of the proposed method, this paper is validated with VISOR and SBMnet image datasets. An application example of behavioural anomaly monitoring is also given. The experimental results show that the Recall of the proposed method is significantly improved, with an average increase of 2.18%, 26.31%, 7.94%, and 6.11% compared to literature [36], Ant_Vibe, Gc_IviBe method [38], and the traditional ViBe algorithm, respectively. Compared with traditional ViBe and literature [36], the average precision is improved by 11.71% and 6.64%, respectively. Additionally, this method has better segmentation accuracy for static foreground objects, with an average FM improvement of 4.94% compared to literature [36]. From the perspectives of root mean square error (RMSE) and structural similarity (SSIM), the RMSE is reduced by 9.88% and the SSIM is improved by 1.89% compared to the background models generated by traditional ViBe in the object detection process.
{"title":"Moving object detection based on ViBe long-term background modeling","authors":"Jian Xiong , Jie Wu , Ming Tang , Pengwen Xiong , Yushui Huang , Hang Guo","doi":"10.1016/j.dsp.2024.104976","DOIUrl":"10.1016/j.dsp.2024.104976","url":null,"abstract":"<div><div>The ViBe algorithm is a motion target detection algorithm based on a static background. To address the issue where static foreground objects mistakenly incorporated into the background can destroy the real background, leading to poor target detection or even missed targets, this paper proposes a two-layer model (initial model-optimized model) that uses both forward and backward correlation to acquire long-term background information, thereby improving dynamic target detection performance. The method uses a regular Vibe as the initial model, whose initial background samples are initialised by a random time-based image initialisation method, and the initial background obtained from the initial model is used as an input to the optimised model. Then, based on the relationship between the initial model background and the optimised model background, freeze the background pixel update of the corresponding area of the optimised model, when there is a significant difference between the corresponding area of the static foreground in the initial model background and the optimized model background. By taking advantage of the difference between the update cycles of the front and back models, the long-term real background is obtained. Finally, a long-term background is used to detect dynamic targets and determine the target motion state. In order to verify the effectiveness and accuracy of the proposed method, this paper is validated with VISOR and SBMnet image datasets. An application example of behavioural anomaly monitoring is also given. The experimental results show that the Recall of the proposed method is significantly improved, with an average increase of 2.18%, 26.31%, 7.94%, and 6.11% compared to literature [<span><span>36</span></span>], Ant_Vibe, Gc_IviBe method [<span><span>38</span></span>], and the traditional ViBe algorithm, respectively. Compared with traditional ViBe and literature [<span><span>36</span></span>], the average precision is improved by 11.71% and 6.64%, respectively. Additionally, this method has better segmentation accuracy for static foreground objects, with an average FM improvement of 4.94% compared to literature [<span><span>36</span></span>]. From the perspectives of root mean square error (RMSE) and structural similarity (SSIM), the RMSE is reduced by 9.88% and the SSIM is improved by 1.89% compared to the background models generated by traditional ViBe in the object detection process.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104976"},"PeriodicalIF":2.9,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1016/j.dsp.2024.104963
Weiwei Bai , Guoqiang Zheng , Yu Mu , Huahong Ma , Zhe Han , Yujun Xue
With the development of 6G networks, enhancing spectrum sensing performance under low signal-to-noise ratio (SNR) conditions has become a crucial research focus. Addressing the challenge of low detection probability under low SNR, we propose a cooperative spectrum sensing method based on a channel attention mechanism and a parallel Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks. This method utilizes the parallel structure of CNN and LSTM to extract spatial and temporal features from the spectrum sensing data, respectively. First, a channel attention mechanism is introduced into the CNN to enhance the focus on important features within the spectrum sensing data during spatial feature extraction, while LSTM is applied individually to the spectrum sensing data of each secondary user to extract temporal features. Then, the features extracted by the CNN and LSTM are flattened and concatenated, followed by feature-level fusion through a fully connected layer to produce the final spectrum sensing result. Simulation results demonstrate that this method achieves a high detection probability, particularly under low SNR conditions. When the SNR is below -10 dB, the average detection probability of the proposed method improves by 5.83% compared to the Parallel CNN and LSTM method at a false alarm probability of 0.1, and by 7.09% at 0.01.
{"title":"Cooperative spectrum sensing method based on channel attention and parallel CNN-LSTM","authors":"Weiwei Bai , Guoqiang Zheng , Yu Mu , Huahong Ma , Zhe Han , Yujun Xue","doi":"10.1016/j.dsp.2024.104963","DOIUrl":"10.1016/j.dsp.2024.104963","url":null,"abstract":"<div><div>With the development of 6G networks, enhancing spectrum sensing performance under low signal-to-noise ratio (SNR) conditions has become a crucial research focus. Addressing the challenge of low detection probability under low SNR, we propose a cooperative spectrum sensing method based on a channel attention mechanism and a parallel Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks. This method utilizes the parallel structure of CNN and LSTM to extract spatial and temporal features from the spectrum sensing data, respectively. First, a channel attention mechanism is introduced into the CNN to enhance the focus on important features within the spectrum sensing data during spatial feature extraction, while LSTM is applied individually to the spectrum sensing data of each secondary user to extract temporal features. Then, the features extracted by the CNN and LSTM are flattened and concatenated, followed by feature-level fusion through a fully connected layer to produce the final spectrum sensing result. Simulation results demonstrate that this method achieves a high detection probability, particularly under low SNR conditions. When the SNR is below -10 dB, the average detection probability of the proposed method improves by 5.83% compared to the Parallel CNN and LSTM method at a false alarm probability of 0.1, and by 7.09% at 0.01.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104963"},"PeriodicalIF":2.9,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1016/j.dsp.2024.104964
Weixin Luo, Sannan Yuan
Due to the ability of drone technology to cover wide areas and reach difficult-to-access places, the detection of small targets by Unmanned Aerial Vehicles (UAVs) is crucial in various applications such as urban management planning and emergency response. This paper proposes innovations to the YOLOv8 architecture, significantly enhancing its performance in small target detection from multi-scale drone perspectives. We introduce Channel Priority Attention Dynamic Snake Convolution and Dynamic Small Object Detection Head Layer (DyHead-SODL) to improve the model's ability to capture fine details and detection accuracy. Additionally, we implemented an enhanced loss function (MPDIoU) and a Deformable Attention Transformer (DAT) to optimize detection efficiency without increasing computational burden. Experimental results on the Visdrone and RSOD datasets demonstrate significant improvements, with the proposed method increasing mAP50 by 10.5 %, mAP95 by 6.9 %, reducing localization error Eloc by 1.27, and decreasing the miss rate by 2.42 on the Visdrone dataset, outperforming existing state-of-the-art detection models while maintaining low computational complexity. The proposed method has also been tested on four large public datasets: DOTA, HRSC, LEVIR, and CARPK, demonstrating the generalization capability of our model. These advances provide effective solutions for small target detection from drone perspectives and promote the development of single-stage object detection technology. These advancements provide an effective solution for small object detection from UAVs, advancing the field of one-stage object detection technology.
{"title":"Enhanced YOLOv8 for small-object detection in multiscale UAV imagery: Innovations in detection accuracy and efficiency","authors":"Weixin Luo, Sannan Yuan","doi":"10.1016/j.dsp.2024.104964","DOIUrl":"10.1016/j.dsp.2024.104964","url":null,"abstract":"<div><div>Due to the ability of drone technology to cover wide areas and reach difficult-to-access places, the detection of small targets by Unmanned Aerial Vehicles (UAVs) is crucial in various applications such as urban management planning and emergency response. This paper proposes innovations to the YOLOv8 architecture, significantly enhancing its performance in small target detection from multi-scale drone perspectives. We introduce Channel Priority Attention Dynamic Snake Convolution and Dynamic Small Object Detection Head Layer (DyHead-SODL) to improve the model's ability to capture fine details and detection accuracy. Additionally, we implemented an enhanced loss function (MPDIoU) and a Deformable Attention Transformer (DAT) to optimize detection efficiency without increasing computational burden. Experimental results on the Visdrone and RSOD datasets demonstrate significant improvements, with the proposed method increasing mAP50 by 10.5 %, mAP95 by 6.9 %, reducing localization error Eloc by 1.27, and decreasing the miss rate by 2.42 on the Visdrone dataset, outperforming existing state-of-the-art detection models while maintaining low computational complexity. The proposed method has also been tested on four large public datasets: DOTA, HRSC, LEVIR, and CARPK, demonstrating the generalization capability of our model. These advances provide effective solutions for small target detection from drone perspectives and promote the development of single-stage object detection technology. These advancements provide an effective solution for small object detection from UAVs, advancing the field of one-stage object detection technology.</div><div>(source code: https://github.com/GuccIceCream/yolov8/tree/master)</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104964"},"PeriodicalIF":2.9,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1016/j.dsp.2024.104959
Guangtao Cheng , Baoyi Xian , Yifan Liu , Xue Chen , Lianjun Hu , Zhanjie Song
During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.
{"title":"A hierarchical Transformer network for smoke video recognition","authors":"Guangtao Cheng , Baoyi Xian , Yifan Liu , Xue Chen , Lianjun Hu , Zhanjie Song","doi":"10.1016/j.dsp.2024.104959","DOIUrl":"10.1016/j.dsp.2024.104959","url":null,"abstract":"<div><div>During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104959"},"PeriodicalIF":2.9,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1016/j.dsp.2024.104954
Min-Jen Tsai, Hui-Min Lin, Guan-De Yu
Detecting modified images has become increasingly crucial in combating fake news and protecting people's privacy. This is particularly significant for JPEG images, which are widely used online. Tampering with JPEG images often involves recompression using a different quantization table, which alters the histograms of the original image's discrete cosine transform (DCT) coefficients. This study exploits this double compression effect to propose a novel deep learning model that combines a CNN and a stacked residual bidirectional long short-term memory (Bi-LSTM) model that incorporates self-attention mechanisms. A CNN model is initially used to learn the characteristics of DCT coefficients and quantization tables extracted from JPEG files. Subsequently, these features are fed into a stacked residual Bi-LSTM model with an attention mechanism to effectively capture the data's long-term forward and backward relationships. By leveraging the strengths of these diverse techniques, we construct a deep Bi-LSTM with up to five layers, which achieves superior predictive performance compared to existing methods. Our model demonstrates its potential for the robust detection and localization of JPEG forgery.
{"title":"Double JPEG compression with forgery detection","authors":"Min-Jen Tsai, Hui-Min Lin, Guan-De Yu","doi":"10.1016/j.dsp.2024.104954","DOIUrl":"10.1016/j.dsp.2024.104954","url":null,"abstract":"<div><div>Detecting modified images has become increasingly crucial in combating fake news and protecting people's privacy. This is particularly significant for JPEG images, which are widely used online. Tampering with JPEG images often involves recompression using a different quantization table, which alters the histograms of the original image's discrete cosine transform (DCT) coefficients. This study exploits this double compression effect to propose a novel deep learning model that combines a CNN and a stacked residual bidirectional long short-term memory (Bi-LSTM) model that incorporates self-attention mechanisms. A CNN model is initially used to learn the characteristics of DCT coefficients and quantization tables extracted from JPEG files. Subsequently, these features are fed into a stacked residual Bi-LSTM model with an attention mechanism to effectively capture the data's long-term forward and backward relationships. By leveraging the strengths of these diverse techniques, we construct a deep Bi-LSTM with up to five layers, which achieves superior predictive performance compared to existing methods. Our model demonstrates its potential for the robust detection and localization of JPEG forgery.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"158 ","pages":"Article 104954"},"PeriodicalIF":2.9,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}