Pub Date : 2026-01-31DOI: 10.1016/j.dsp.2026.105970
Tianyao Feng , Benying Tan , Muyang Li , Jianpeng Wu , Shuxue Ding
Feature extraction, which focuses on extracting essential characteristics from raw data, plays a critical role in data processing. Although traditional unsupervised dictionary learning is effective in deriving low-dimensional representative features, the learned features often lack discriminative power. To enhance feature discrimination, this paper incorporates label information into the dictionary learning objective function and adopts the Minimax Concave Penalty (MCP) as a regularizer instead of the conventional l0-norm and l1-norm, leading to a novel supervised dictionary learning model termed DCSDL-MCP. To address the resulting optimization challenge, the objective function is first consolidated and reformulated into a form akin to traditional unsupervised dictionary learning. Then, a joint optimization strategy combining the Difference of Convex Functions Algorithm (DCA) and the Iterative Soft Thresholding Algorithm (ISTA) is developed to solve the problem efficiently. Extensive experiments on face recognition and object localization datasets demonstrate that the proposed method achieves superior accuracy and robustness. These results underscore its practical value and broad application potential in real-world scenarios.
{"title":"DCSDL-MCP: Discriminative supervised dictionary learning with the minimax concave penalty","authors":"Tianyao Feng , Benying Tan , Muyang Li , Jianpeng Wu , Shuxue Ding","doi":"10.1016/j.dsp.2026.105970","DOIUrl":"10.1016/j.dsp.2026.105970","url":null,"abstract":"<div><div>Feature extraction, which focuses on extracting essential characteristics from raw data, plays a critical role in data processing. Although traditional unsupervised dictionary learning is effective in deriving low-dimensional representative features, the learned features often lack discriminative power. To enhance feature discrimination, this paper incorporates label information into the dictionary learning objective function and adopts the Minimax Concave Penalty (MCP) as a regularizer instead of the conventional <em>l</em><sub>0</sub>-norm and <em>l</em><sub>1</sub>-norm, leading to a novel supervised dictionary learning model termed DCSDL-MCP. To address the resulting optimization challenge, the objective function is first consolidated and reformulated into a form akin to traditional unsupervised dictionary learning. Then, a joint optimization strategy combining the Difference of Convex Functions Algorithm (DCA) and the Iterative Soft Thresholding Algorithm (ISTA) is developed to solve the problem efficiently. Extensive experiments on face recognition and object localization datasets demonstrate that the proposed method achieves superior accuracy and robustness. These results underscore its practical value and broad application potential in real-world scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105970"},"PeriodicalIF":3.0,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.dsp.2026.105969
Yiheng Liu , Hua Zhang , Xuemei Wang , Qinghai Dong , Xiaode Lyu
Random frequency and pulse repetition interval agile (RFPA) signals are valued for their low probability of intercept (LPI) performance, yet their effective exploitation hinges on efficient coherent integration. Existing methods for RFPA coherent integration, however, typically rely on exhaustive parameter searches, leading to an inherent difficulty in balancing detection performance with computational efficiency. To overcome this limitation, we propose an RFPA-Keystone transform (RFPA-KT) algorithm. The algorithm first performs a search-free phase compensation to correct hopping-induced phase offsets without the need for exhaustive range search, then mitigates Doppler ambiguity through a pre-compensation framework, and finally employs the resampling KT to correct range cell migration (RCM) and achieve signal coherence. Both simulations and semi-physical experiments show that the proposed method achieves a probability of detection (Pd) approaching the maximum-likelihood (ML) performance benchmark under varying noise levels and parameter agility conditions, while significantly reducing the complexity from to . These advantages highlight its potential as an efficient and robust solution for real-time coherent integration in RFPA radar systems.
{"title":"RFPA-keystone transform: A search-free coherent integration method for random frequency and PRI agile radar","authors":"Yiheng Liu , Hua Zhang , Xuemei Wang , Qinghai Dong , Xiaode Lyu","doi":"10.1016/j.dsp.2026.105969","DOIUrl":"10.1016/j.dsp.2026.105969","url":null,"abstract":"<div><div>Random frequency and pulse repetition interval agile (RFPA) signals are valued for their low probability of intercept (LPI) performance, yet their effective exploitation hinges on efficient coherent integration. Existing methods for RFPA coherent integration, however, typically rely on exhaustive parameter searches, leading to an inherent difficulty in balancing detection performance with computational efficiency. To overcome this limitation, we propose an RFPA-Keystone transform (RFPA-KT) algorithm. The algorithm first performs a search-free phase compensation to correct hopping-induced phase offsets without the need for exhaustive range search, then mitigates Doppler ambiguity through a pre-compensation framework, and finally employs the resampling KT to correct range cell migration (RCM) and achieve signal coherence. Both simulations and semi-physical experiments show that the proposed method achieves a probability of detection (Pd) approaching the maximum-likelihood (ML) performance benchmark under varying noise levels and parameter agility conditions, while significantly reducing the complexity from <span><math><mrow><mi>O</mi><mo>(</mo><mrow><msub><mi>N</mi><mi>r</mi></msub><msub><mi>M</mi><mi>v</mi></msub><mi>M</mi></mrow><mo>)</mo></mrow></math></span> to <span><math><mrow><mi>O</mi><mo>(</mo><mrow><msub><mi>N</mi><mi>r</mi></msub><msub><mi>M</mi><mi>v</mi></msub><mtext>lo</mtext><msub><mi>g</mi><mn>2</mn></msub><mi>M</mi></mrow><mo>)</mo></mrow></math></span>. These advantages highlight its potential as an efficient and robust solution for real-time coherent integration in RFPA radar systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105969"},"PeriodicalIF":3.0,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.dsp.2026.105954
Zhimin Lu , Qing Zhang , Boheng Tian , Fuhua Ge , Chenxi Mo , Rui Guo , Xianbin Duan , Chunming Guo , Pengfei Yu
Multichannel fluorescence imaging plays a pivotal role in cell type identification and pathological diagnosis. However, manual analysis of fluorescence images is prone to misdiagnoses and missed diagnoses. Although AI algorithms hold promise, current methods struggle to extract discriminative features, thereby compromising the accuracy of pathological analysis. This study proposes ASCF-RTDETR, a novel model for precisely detecting epithelial cells in multichannel fluorescence images. ASCF-RTDETR incorporates an Adaptive Multi-Scale Collaborative Feature Fusion (AMFF) module, enabling comprehensive feature interaction through horizontal and vertical dual-path parallel propagation. This is complemented by a High-Efficiency Feature Upsampling Convolution (HFUC) and Multi-Scale Convolution Block (MSCB), enhancing feature representation. Furthermore, a Dynamic Histogram Attention-based Intra-scale Feature Interaction (DHIFI) module is introduced, leveraging bin-wise and frequency-wise dual-path reconstruction to enhance cell boundary features. Concurrently, a lightweight Dual Convolution (DualConv) structure is integrated to reduce computational complexity and provide implicit regularization against imaging noise. Experiments on a self-constructed multichannel fluorescence-labeled epithelial cell dataset demonstrate ASCF-RTDETR’s superior detection performance, achieving a 93.5% mAP50 and 90.7% F1-score, with nearly 50% reduced computational cost compared to baseline models. The model also exhibits strong generalization across multiple public datasets, offering a reliable solution for automated epithelial cell detection and analysis.
{"title":"ASCF-RTDETR: Adaptive scale collaborative feature learning for epithelial cell detection in multichannel fluorescence images","authors":"Zhimin Lu , Qing Zhang , Boheng Tian , Fuhua Ge , Chenxi Mo , Rui Guo , Xianbin Duan , Chunming Guo , Pengfei Yu","doi":"10.1016/j.dsp.2026.105954","DOIUrl":"10.1016/j.dsp.2026.105954","url":null,"abstract":"<div><div>Multichannel fluorescence imaging plays a pivotal role in cell type identification and pathological diagnosis. However, manual analysis of fluorescence images is prone to misdiagnoses and missed diagnoses. Although AI algorithms hold promise, current methods struggle to extract discriminative features, thereby compromising the accuracy of pathological analysis. This study proposes ASCF-RTDETR, a novel model for precisely detecting epithelial cells in multichannel fluorescence images. ASCF-RTDETR incorporates an Adaptive Multi-Scale Collaborative Feature Fusion (AMFF) module, enabling comprehensive feature interaction through horizontal and vertical dual-path parallel propagation. This is complemented by a High-Efficiency Feature Upsampling Convolution (HFUC) and Multi-Scale Convolution Block (MSCB), enhancing feature representation. Furthermore, a Dynamic Histogram Attention-based Intra-scale Feature Interaction (DHIFI) module is introduced, leveraging bin-wise and frequency-wise dual-path reconstruction to enhance cell boundary features. Concurrently, a lightweight Dual Convolution (DualConv) structure is integrated to reduce computational complexity and provide implicit regularization against imaging noise. Experiments on a self-constructed multichannel fluorescence-labeled epithelial cell dataset demonstrate ASCF-RTDETR’s superior detection performance, achieving a 93.5% <em>mAP</em><sub>50</sub> and 90.7% <em>F1</em>-score, with nearly 50% reduced computational cost compared to baseline models. The model also exhibits strong generalization across multiple public datasets, offering a reliable solution for automated epithelial cell detection and analysis.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105954"},"PeriodicalIF":3.0,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Style transfer, a pivotal domain in machine vision, has achieved remarkable success in generating Western-style paintings. However, due to the unique “void” (Liubai) aesthetic of Chinese ink painting, the direct application of existing methods often yields irregular artifacts in blank areas and washes out details of brush strokes. To mitigate these limitations, this paper proposes a physically-guided hierarchical attention framework based on CycleGAN. Specifically, we introduce a coarse-to-fine algorithmic design where an inverted brightness-based masking mechanism is first constructed to serve as a spatial prior, explicitly suppressing high-frequency artifacts in void regions based on physical domain characteristics. Building upon this spatial prior, the Convolutional Block Attention Module (CBAM) is integrated into the generator as an adaptive feature modulator, recalibrating weights to adaptively concentrate computational resources on refining semantic foreground textures. Additionally, we incorporate the Learned Perceptual Image Patch Similarity (LPIPS) metric into the cyclic consistency constraint. This perceptually aligned objective resolves the “texture smoothing” issue inherent in pixel-wise losses. Experiments on our curated L2I (Landscape-to-Ink) benchmark dataset show that the model effectively suppresses artifacts and enhances artistic effects, outperforming existing methods. This work offers a robust algorithmic solution for the preservation and innovation of traditional Chinese art. The dataset is available at https://github.com/ww02711/L2I.git.
{"title":"Artifact-suppressed style transfer for Chinese ink paintings via enhanced CycleGAN","authors":"Shuo Zhang, Shengwen Wang, Hongrui Liu, Yonghua Zhang, Ziqing Huang","doi":"10.1016/j.dsp.2026.105965","DOIUrl":"10.1016/j.dsp.2026.105965","url":null,"abstract":"<div><div>Style transfer, a pivotal domain in machine vision, has achieved remarkable success in generating Western-style paintings. However, due to the unique “void” (<em>Liubai</em>) aesthetic of Chinese ink painting, the direct application of existing methods often yields irregular artifacts in blank areas and washes out details of brush strokes. To mitigate these limitations, this paper proposes a physically-guided hierarchical attention framework based on CycleGAN. Specifically, we introduce a coarse-to-fine algorithmic design where an inverted brightness-based masking mechanism is first constructed to serve as a spatial prior, explicitly suppressing high-frequency artifacts in void regions based on physical domain characteristics. Building upon this spatial prior, the Convolutional Block Attention Module (CBAM) is integrated into the generator as an adaptive feature modulator, recalibrating weights to adaptively concentrate computational resources on refining semantic foreground textures. Additionally, we incorporate the Learned Perceptual Image Patch Similarity (LPIPS) metric into the cyclic consistency constraint. This perceptually aligned objective resolves the “texture smoothing” issue inherent in pixel-wise losses. Experiments on our curated L2I (Landscape-to-Ink) benchmark dataset show that the model effectively suppresses artifacts and enhances artistic effects, outperforming existing methods. This work offers a robust algorithmic solution for the preservation and innovation of traditional Chinese art. The dataset is available at <span><span>https://github.com/ww02711/L2I.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105965"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105959
Youtao Jiang , Yao Xu , Shaobo Jia , Peng Lin , Xiaoxu Guo , Jianyue Zhu , Zhizhong Zhang
Non-orthogonal multiple access (NOMA)-based two-way relay (TWR) systems can enhance communication coverage and spectral efficiency, but they face challenges in supporting future cellular Internet of Things (IoT) due to the coexistence of heterogeneous rate signals. This paper proposes a mutualistic ambient backscatter communication-aided NOMA scheme for TWR-based cellular IoT, where two cellular users and a relaying user exchange information via physical-layer network coding and NOMA, while IoT devices transmit data using backscatter modulation and cellular radio frequency signals. However, the multi-type interference and complex composite channels in the proposed scheme result in complicated signal-to-interference-plus-noise ratio expressions, which complicate accurate performance characterization. To address this, we derive closed-form expressions for the ergodic sum rate (ESR) using an equivalent transformation of squared generalized-K random variables, and characterize the asymptotic ESR at high signal-to-noise ratio. Simulation results validate the theoretical analysis and demonstrate the ESR gains over conventional orthogonal multiple access, NOMA-based TWR, and symbiotic NOMA-based TWR, while revealing the impacts of the IoT device count, node distance, and power allocation on the ESR.
{"title":"AmBC-NOMA with physical-layer network coding for mutualistic two-way relay cellular IoT","authors":"Youtao Jiang , Yao Xu , Shaobo Jia , Peng Lin , Xiaoxu Guo , Jianyue Zhu , Zhizhong Zhang","doi":"10.1016/j.dsp.2026.105959","DOIUrl":"10.1016/j.dsp.2026.105959","url":null,"abstract":"<div><div>Non-orthogonal multiple access (NOMA)-based two-way relay (TWR) systems can enhance communication coverage and spectral efficiency, but they face challenges in supporting future cellular Internet of Things (IoT) due to the coexistence of heterogeneous rate signals. This paper proposes a mutualistic ambient backscatter communication-aided NOMA scheme for TWR-based cellular IoT, where two cellular users and a relaying user exchange information via physical-layer network coding and NOMA, while IoT devices transmit data using backscatter modulation and cellular radio frequency signals. However, the multi-type interference and complex composite channels in the proposed scheme result in complicated signal-to-interference-plus-noise ratio expressions, which complicate accurate performance characterization. To address this, we derive closed-form expressions for the ergodic sum rate (ESR) using an equivalent transformation of squared generalized-K random variables, and characterize the asymptotic ESR at high signal-to-noise ratio. Simulation results validate the theoretical analysis and demonstrate the ESR gains over conventional orthogonal multiple access, NOMA-based TWR, and symbiotic NOMA-based TWR, while revealing the impacts of the IoT device count, node distance, and power allocation on the ESR.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105959"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105967
Changqing Song , Dian Xiao , Wanbing Hao , Wanzhi Ma , Hongzhi Zhao , Shihai Shao
Driven by dual demands of spectrum-intensive military electronic warfare systems and high-spectral-efficiency civilian communications, simultaneous transmit-receive (STAR) array technology has gained significant attention due to its potential for efficient spectrum reuse. However, strong self-interference (SI) between transmit and receive channels degrades the receiver sensitivity, posing a critical technical barrier to its practical implementation. This study systematically reviews the research progress in STAR array SI cancellation technologies, covering five key aspects: SI coupling channels, spatial-domain cancellation, analog-domain cancellation, digital-domain cancellation, and experimental verification. Current state-of-the-art systems demonstrate up to 137.3 dB of isolation for 256 × 256 STAR arrays and 140.5 dB for 4 × 4 arrays, approaching engineering feasibility. Nevertheless, the large-scale deployment of multi-antenna arrays in civil and military applications will expose STAR arrays to more severe challenges from strong near-field SI. Future research should focus on clarifying near-field coupling mechanisms, optimizing spatial degrees of freedom, reducing the complexity of SI reconstruction, and refining compensation strategies for non-ideal factors to advance the deployment of STAR technology.
{"title":"A Review of self-interference cancellation technologies for simultaneous transmit-receive arrays","authors":"Changqing Song , Dian Xiao , Wanbing Hao , Wanzhi Ma , Hongzhi Zhao , Shihai Shao","doi":"10.1016/j.dsp.2026.105967","DOIUrl":"10.1016/j.dsp.2026.105967","url":null,"abstract":"<div><div>Driven by dual demands of spectrum-intensive military electronic warfare systems and high-spectral-efficiency civilian communications, simultaneous transmit-receive (STAR) array technology has gained significant attention due to its potential for efficient spectrum reuse. However, strong self-interference (SI) between transmit and receive channels degrades the receiver sensitivity, posing a critical technical barrier to its practical implementation. This study systematically reviews the research progress in STAR array SI cancellation technologies, covering five key aspects: SI coupling channels, spatial-domain cancellation, analog-domain cancellation, digital-domain cancellation, and experimental verification. Current state-of-the-art systems demonstrate up to 137.3 dB of isolation for 256 × 256 STAR arrays and 140.5 dB for 4 × 4 arrays, approaching engineering feasibility. Nevertheless, the large-scale deployment of multi-antenna arrays in civil and military applications will expose STAR arrays to more severe challenges from strong near-field SI. Future research should focus on clarifying near-field coupling mechanisms, optimizing spatial degrees of freedom, reducing the complexity of SI reconstruction, and refining compensation strategies for non-ideal factors to advance the deployment of STAR technology.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105967"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105962
Xiang Chen, Shuzhen Zhang, Hailong Song, Qi Yan
Recently, deformable convolutions based on convolutional neural networks have been widely used in hyperspectral image (HSI) classification due to their flexible geometric adaptability and superior local feature extraction capabilities. However, they still face significant challenges in establishing long-range dependencies and capturing global contextual information among pixel sequences. To address these challenges, a novel deformable convolution and Transformer hybrid network (DTHNet) is proposed for HSI classification. Specifically, PCA is firstly employed to reduce the dimensionality of the original HSI and a group depth joint convolution block (GDJCB) is utilized to capture the spectral-spatial features of the reduced HSI patches, avoiding the neglect of certain spectral bands. Secondly, a parallel architecture composed of a designed deformable convolution and a Transformer is utilized to jointly extract local-global spectral-spatial features and long-range dependencies in HSI. In the deformable convolution branch, a simple parameter-free attention (SimAM) enhanced spectral-spatial convolution block (SSCB) is designed to effectively prevent the loss of key information and the generation of redundant features during the convolution. In the Transformer branch, the deep integration of convolutional operation and self-attention mechanism further promotes more effective extraction of HSI features. Finally, fusion features from the two branches to obtain the more accurate HSI classification. Experimental results on three widely used HSI datasets demonstrate that the proposed DTHNet outperforms several state-of-the-art HSI classification networks.
{"title":"Deformable convolution and transformer hybrid network for hyperspectral image classification","authors":"Xiang Chen, Shuzhen Zhang, Hailong Song, Qi Yan","doi":"10.1016/j.dsp.2026.105962","DOIUrl":"10.1016/j.dsp.2026.105962","url":null,"abstract":"<div><div>Recently, deformable convolutions based on convolutional neural networks have been widely used in hyperspectral image (HSI) classification due to their flexible geometric adaptability and superior local feature extraction capabilities. However, they still face significant challenges in establishing long-range dependencies and capturing global contextual information among pixel sequences. To address these challenges, a novel deformable convolution and Transformer hybrid network (DTHNet) is proposed for HSI classification. Specifically, PCA is firstly employed to reduce the dimensionality of the original HSI and a group depth joint convolution block (GDJCB) is utilized to capture the spectral-spatial features of the reduced HSI patches, avoiding the neglect of certain spectral bands. Secondly, a parallel architecture composed of a designed deformable convolution and a Transformer is utilized to jointly extract local-global spectral-spatial features and long-range dependencies in HSI. In the deformable convolution branch, a simple parameter-free attention (SimAM) enhanced spectral-spatial convolution block (SSCB) is designed to effectively prevent the loss of key information and the generation of redundant features during the convolution. In the Transformer branch, the deep integration of convolutional operation and self-attention mechanism further promotes more effective extraction of HSI features. Finally, fusion features from the two branches to obtain the more accurate HSI classification. Experimental results on three widely used HSI datasets demonstrate that the proposed DTHNet outperforms several state-of-the-art HSI classification networks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105962"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105968
Yixuan Shen, Mei Da, Lin Jiang
To address the deficiencies of existing infrared image detection models in terms of detection accuracy, computational complexity, detection speed, as well as missed detections and false detections in complex backgrounds, this paper proposes a lightweight infrared small target detection algorithm: YOLO - MBL. Firstly, we design a Dynamic Convolution Multi - Path Fusion Module (DCMP) to replace the original C3k2 module to enhance the feature extraction capability of the network. Secondly, we design the SDI - BiFPN as a feature fusion module in the neck network to capture more comprehensive feature information, thereby effectively avoiding the loss of information during the transmission process. Furthermore, a Lightweight Shared Convolutional Detection Head (LSCD) is introduced to reduce the number of model parameters. Finally, the Wise - MPDIoU loss function is adopted to accelerate the model convergence process and enhance its detection accuracy. To validate the effectiveness of the YOLO - MBL algorithm, we conducted comparative experiments on the FLIR dataset and the HIT - UAV dataset. The experimental results demonstrate that the YOLO - MBL model achieves a 4.6% improvement in detection accuracy ([email protected]) on the FLIR dataset, with a parameter reduction of 0.2 M, and reaches an FPS of 81.1. On the HIT - UAV dataset, the model's detection accuracy ([email protected]) is enhanced by 3.7%, accompanied by a parameter reduction of 0.2 M, and the FPS attains 84.1. Compared with traditional algorithms and current mainstream one - stage detection algorithms, the YOLO - MBL algorithm demonstrates significant advantages in terms of detection accuracy. The code repository is available at: https://github.com/yixixi12/YOLO-MBL.git.
{"title":"YOLO-MBL: An infrared small target detection algorithm based on YOLOv11","authors":"Yixuan Shen, Mei Da, Lin Jiang","doi":"10.1016/j.dsp.2026.105968","DOIUrl":"10.1016/j.dsp.2026.105968","url":null,"abstract":"<div><div>To address the deficiencies of existing infrared image detection models in terms of detection accuracy, computational complexity, detection speed, as well as missed detections and false detections in complex backgrounds, this paper proposes a lightweight infrared small target detection algorithm: YOLO - MBL. Firstly, we design a Dynamic Convolution Multi - Path Fusion Module (DCMP) to replace the original C3k2 module to enhance the feature extraction capability of the network. Secondly, we design the SDI - BiFPN as a feature fusion module in the neck network to capture more comprehensive feature information, thereby effectively avoiding the loss of information during the transmission process. Furthermore, a Lightweight Shared Convolutional Detection Head (LSCD) is introduced to reduce the number of model parameters. Finally, the Wise - MPDIoU loss function is adopted to accelerate the model convergence process and enhance its detection accuracy. To validate the effectiveness of the YOLO - MBL algorithm, we conducted comparative experiments on the FLIR dataset and the HIT - UAV dataset. The experimental results demonstrate that the YOLO - MBL model achieves a 4.6% improvement in detection accuracy ([email protected]) on the FLIR dataset, with a parameter reduction of 0.2 M, and reaches an FPS of 81.1. On the HIT - UAV dataset, the model's detection accuracy ([email protected]) is enhanced by 3.7%, accompanied by a parameter reduction of 0.2 M, and the FPS attains 84.1. Compared with traditional algorithms and current mainstream one - stage detection algorithms, the YOLO - MBL algorithm demonstrates significant advantages in terms of detection accuracy. The code repository is available at: <span><span>https://github.com/yixixi12/YOLO-MBL.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105968"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105966
Zhentao Lin , Bi Zeng , Song Wen , Zihao Chen , Huiting Hu
Traditional Voice Activity Detection (VAD)-based systems frequently encounter challenges in handling speaker overlap within multi-speaker environments, particularly in the context of target speaker Automatic Speech Recognition (ASR). This difficulty arises predominantly from the limitations of front-end VAD modules, which are independently trained to distinguish noise from speech but often introduce insertion and deletion errors, adversely affecting the overall performance of the ASR system. To address this coupling deficiency, we propose an End-to-End Streaming Personal target speaker ASR (SP-ASR) framework that achieves fusion of VAD and ASR components in a streaming style. Our architecture introduces two key innovations: Initially, a Streaming Personal VAD (SP-VAD) module functions as a neural gatekeeper, segmenting audio streams while emphasizing target speaker characteristics through its Contextual Attention and Target Speaker Attention (CA-TSA) mechanism. Subsequently, a Streaming Mask-based ASR (SM-ASR) model is employed, which is integrated with SP-VAD and fine-tuned using both coarse-grained and fine-grained speaker information to extract speaker-specific transcriptions. Our experiments reveal a remarkable reduction in the concatenated target-speaker Word Error Rate (ctWER), showcasing the superiority of the End-to-End SP-ASR fusion system over conventional ASR systems, especially under conditions with significant speech overlap and noise.
{"title":"End-to-end target speaker speech recognition with voice activity detection fusion","authors":"Zhentao Lin , Bi Zeng , Song Wen , Zihao Chen , Huiting Hu","doi":"10.1016/j.dsp.2026.105966","DOIUrl":"10.1016/j.dsp.2026.105966","url":null,"abstract":"<div><div>Traditional Voice Activity Detection (VAD)-based systems frequently encounter challenges in handling speaker overlap within multi-speaker environments, particularly in the context of target speaker Automatic Speech Recognition (ASR). This difficulty arises predominantly from the limitations of front-end VAD modules, which are independently trained to distinguish noise from speech but often introduce <em>insertion and deletion errors</em>, adversely affecting the overall performance of the ASR system. To address this coupling deficiency, we propose an End-to-End Streaming Personal target speaker ASR (SP-ASR) framework that achieves fusion of VAD and ASR components in a streaming style. Our architecture introduces two key innovations: Initially, a Streaming Personal VAD (SP-VAD) module functions as a neural gatekeeper, segmenting audio streams while emphasizing target speaker characteristics through its Contextual Attention and Target Speaker Attention (CA-TSA) mechanism. Subsequently, a Streaming Mask-based ASR (SM-ASR) model is employed, which is integrated with SP-VAD and fine-tuned using both coarse-grained and fine-grained speaker information to extract speaker-specific transcriptions. Our experiments reveal a remarkable reduction in the concatenated target-speaker Word Error Rate (ctWER), showcasing the superiority of the End-to-End SP-ASR fusion system over conventional ASR systems, especially under conditions with significant speech overlap and noise.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105966"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.dsp.2026.105953
Kuiwu Wang , Qin Zhang , Xiaolong Hu , Pengfei Wan , Zhenlu Jin
This paper delves into the label matching problem within the Label Multi-Bernoulli framework for multi-target tracking tasks under a distributed multi-sensor system, emphasizing its pivotal role in the domain of multi-sensor multi-target tracking. The paper elucidates that within the LMB fusion process, label matching and data fusion can be effectively addressed as distinct, independent stages, thereby enhancing system modularity and processing efficiency. Regarding the innovation in fusion strategies, this paper proposes a pruning and merging approach based on the dual correlation between Gaussian component distance and motion direction. By precisely identifying and consolidating redundant information that potentially signifies the same target, this method not only optimizes fusion outcomes but also mitigates the computational burden on the sensor network. To tackle the label matching challenge, this paper devises a statistic rooted in label history similarity for two prevalent communication protocols. This statistic comprehensively considers the survival history of labels, offering a more reliable criterion for assessing label matching quality. Furthermore, to address the global label matching problem, this paper introduces the genetic algorithm as an intelligent optimization tool. Leveraging the iterative search mechanism of the genetic algorithm, this paper achieves optimal or near-optimal solutions for global label matching, significantly boosting the overall system performance. Simulation results demonstrate that the method exhibits strong performance in complex environments characterized by reduced detection probabilities and dense clutter, showcasing robustness and adaptability.
{"title":"Distributed multi-Sensor multi-Target track matching algorithm based on LMB filter","authors":"Kuiwu Wang , Qin Zhang , Xiaolong Hu , Pengfei Wan , Zhenlu Jin","doi":"10.1016/j.dsp.2026.105953","DOIUrl":"10.1016/j.dsp.2026.105953","url":null,"abstract":"<div><div>This paper delves into the label matching problem within the Label Multi-Bernoulli framework for multi-target tracking tasks under a distributed multi-sensor system, emphasizing its pivotal role in the domain of multi-sensor multi-target tracking. The paper elucidates that within the LMB fusion process, label matching and data fusion can be effectively addressed as distinct, independent stages, thereby enhancing system modularity and processing efficiency. Regarding the innovation in fusion strategies, this paper proposes a pruning and merging approach based on the dual correlation between Gaussian component distance and motion direction. By precisely identifying and consolidating redundant information that potentially signifies the same target, this method not only optimizes fusion outcomes but also mitigates the computational burden on the sensor network. To tackle the label matching challenge, this paper devises a statistic rooted in label history similarity for two prevalent communication protocols. This statistic comprehensively considers the survival history of labels, offering a more reliable criterion for assessing label matching quality. Furthermore, to address the global label matching problem, this paper introduces the genetic algorithm as an intelligent optimization tool. Leveraging the iterative search mechanism of the genetic algorithm, this paper achieves optimal or near-optimal solutions for global label matching, significantly boosting the overall system performance. Simulation results demonstrate that the method exhibits strong performance in complex environments characterized by reduced detection probabilities and dense clutter, showcasing robustness and adaptability.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105953"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}