Pub Date : 2025-12-19DOI: 10.1109/LSP.2025.3646132
Tong Wei;Huiping Huang;Linlong Wu;Chong-Yung Chi;Bhavani Shankar M. R.;Björn Ottersten
This letter addresses the quadratic equality constrained least squares (QEC-LS) problem, a class of non-convex optimization problems that arise in various signal processing and communication applications. We revisit the alternating direction method of multipliers (ADMM) approach to QEC-LS problem and investigate its convergence and efficiency. Despite the inherent non-convexity, the proposed ADMM algorithm is proved to converge globally only requiring the quadratic term equal to a positive constant. Numerical results demonstrate that our method achieves global optimality with significantly reduced complexity compared to existing approaches such as semidefinite relaxation and primal-dual methods.
{"title":"Quadratic Equality Constrained Least Squares: Low-Complexity ADMM for Global Optimality","authors":"Tong Wei;Huiping Huang;Linlong Wu;Chong-Yung Chi;Bhavani Shankar M. R.;Björn Ottersten","doi":"10.1109/LSP.2025.3646132","DOIUrl":"https://doi.org/10.1109/LSP.2025.3646132","url":null,"abstract":"This letter addresses the quadratic equality constrained least squares (QEC-LS) problem, a class of non-convex optimization problems that arise in various signal processing and communication applications. We revisit the alternating direction method of multipliers (ADMM) approach to QEC-LS problem and investigate its convergence and efficiency. Despite the inherent non-convexity, the proposed ADMM algorithm is proved to converge globally only requiring the quadratic term equal to a positive constant. Numerical results demonstrate that our method achieves global optimality with significantly reduced complexity compared to existing approaches such as semidefinite relaxation and primal-dual methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"361-365"},"PeriodicalIF":3.9,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11304554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1109/LSP.2025.3646129
Xiaoyu Yan;Chao Yang;Ping An;Xinpeng Huang
With the rising adoption of 360$^circ$ video in virtual reality (VR) applications, assessing its perceptual quality remains a challenge due to projection-induced distortions in equirectangular projection (ERP) formats. Traditional sliding-window cropping methods often distort high-latitude content and fail to reflect the actual viewing experience. To address this, we propose a novel viewport patch-based video quality assessment (VQA) method. By sampling view directions on the sphere and applying gnomonic projection, our method extracts undistorted and perceptually consistent viewport patches that preserve both spatial fidelity and full-frame coverage. We further design a two-stream network that jointly models high-frequency distortion and residual information over time, enhanced by squeeze-and-excitation (SE) attention to capture spatial-temporal features. Experiments and analysis show that our method significantly improves the accuracy and reliability of 360$^circ$ VQA, achieving PLCC/SROCC values of 0.9603/0.9628 on the VQA-ODV dataset and 0.9585/0.9400 on the BIT360 dataset, with only 0.22M parameters.
{"title":"Viewport-Patch Extraction Enhanced 360$^circ$ Video Quality Assessment","authors":"Xiaoyu Yan;Chao Yang;Ping An;Xinpeng Huang","doi":"10.1109/LSP.2025.3646129","DOIUrl":"https://doi.org/10.1109/LSP.2025.3646129","url":null,"abstract":"With the rising adoption of 360<inline-formula><tex-math>$^circ$</tex-math></inline-formula> video in virtual reality (VR) applications, assessing its perceptual quality remains a challenge due to projection-induced distortions in equirectangular projection (ERP) formats. Traditional sliding-window cropping methods often distort high-latitude content and fail to reflect the actual viewing experience. To address this, we propose a novel viewport patch-based video quality assessment (VQA) method. By sampling view directions on the sphere and applying gnomonic projection, our method extracts undistorted and perceptually consistent viewport patches that preserve both spatial fidelity and full-frame coverage. We further design a two-stream network that jointly models high-frequency distortion and residual information over time, enhanced by squeeze-and-excitation (SE) attention to capture spatial-temporal features. Experiments and analysis show that our method significantly improves the accuracy and reliability of 360<inline-formula><tex-math>$^circ$</tex-math></inline-formula> VQA, achieving PLCC/SROCC values of 0.9603/0.9628 on the VQA-ODV dataset and 0.9585/0.9400 on the BIT360 dataset, with only 0.22M parameters.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"386-390"},"PeriodicalIF":3.9,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1109/LSP.2025.3645580
Chuanxi Xing;Yiwen Hou;Yihan Meng;Tinglong Huang;Weiqiang Li;Minglinhan Hu
To address the high complexity and noise sensitivity of the Temporal Multiple Sparse Bayesian Learning (TMSBL) algorithm in shallow-sea environments, this letter proposes a novel and robust channel estimation scheme. Our proposed scheme first denoises the received pilot matrix using the K-Singular Value Decomposition (KSVD) algorithm and then leverages the Stagewise Orthogonal Matching Pursuit (StOMP) to acquire a robust sparse prior to initializing the TMSBL framework. This structured approach leverages the temporal correlation between channels for joint estimation, while noise variance is estimated directly from OFDM null subcarriers to enhance stability and efficiency. The simulation results demonstrate the superiority of the proposed method. At an SNR of −10 dB, it reduces the normalized mean square error (NMSE) by more than 94% compared to the standard TMSBL algorithm and reduces the computation time by approximately 95.87%, ensuring higher accuracy and efficiency for underwater acoustic communications.
{"title":"Underwater Acoustic Channel Estimation via Accelerated TMSBL With KSVD-Based Denoising and Robust Initialization","authors":"Chuanxi Xing;Yiwen Hou;Yihan Meng;Tinglong Huang;Weiqiang Li;Minglinhan Hu","doi":"10.1109/LSP.2025.3645580","DOIUrl":"https://doi.org/10.1109/LSP.2025.3645580","url":null,"abstract":"To address the high complexity and noise sensitivity of the Temporal Multiple Sparse Bayesian Learning (TMSBL) algorithm in shallow-sea environments, this letter proposes a novel and robust channel estimation scheme. Our proposed scheme first denoises the received pilot matrix using the K-Singular Value Decomposition (KSVD) algorithm and then leverages the Stagewise Orthogonal Matching Pursuit (StOMP) to acquire a robust sparse prior to initializing the TMSBL framework. This structured approach leverages the temporal correlation between channels for joint estimation, while noise variance is estimated directly from OFDM null subcarriers to enhance stability and efficiency. The simulation results demonstrate the superiority of the proposed method. At an SNR of −10 dB, it reduces the normalized mean square error (NMSE) by more than 94% compared to the standard TMSBL algorithm and reduces the computation time by approximately 95.87%, ensuring higher accuracy and efficiency for underwater acoustic communications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"431-435"},"PeriodicalIF":3.9,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1109/LSP.2025.3645593
Muran Guo;Shenao Gu;Limin Guo;Guifu Yang
Aiming at the demands on direction of arrival (DOA) estimation systems with low system cost and power consumption, this letter proposes a new scheme where the spatial compressive sampling and mixed-resolution quantization (MRQ) are adopted. The number of front-end circuit chains is reduced through spatial compressive sampling, thereby lowering the system cost. Additionally, some channels are quantized at low bits, resulting in reduced power consumption. However, spatial compressive sampling, along with MRQ, leads to information loss during the compression procedure. To assess the estimation performance of the proposed scheme, the Cramér-Rao bound (CRB) expression is derived in this letter to quantify performance loss, where the compressive additive quantization noise model is constructed to characterize the effects of MRQ. Overall, the proposed scheme achieves reductions in system cost and power consumption at the expense of only marginal precision degradation. Numerical simulations are conducted to validate the performance of the proposed scheme.
{"title":"DOA Estimation Exploiting Compressive Measurements With Mixed-ADCs","authors":"Muran Guo;Shenao Gu;Limin Guo;Guifu Yang","doi":"10.1109/LSP.2025.3645593","DOIUrl":"https://doi.org/10.1109/LSP.2025.3645593","url":null,"abstract":"Aiming at the demands on direction of arrival (DOA) estimation systems with low system cost and power consumption, this letter proposes a new scheme where the spatial compressive sampling and mixed-resolution quantization (MRQ) are adopted. The number of front-end circuit chains is reduced through spatial compressive sampling, thereby lowering the system cost. Additionally, some channels are quantized at low bits, resulting in reduced power consumption. However, spatial compressive sampling, along with MRQ, leads to information loss during the compression procedure. To assess the estimation performance of the proposed scheme, the Cramér-Rao bound (CRB) expression is derived in this letter to quantify performance loss, where the compressive additive quantization noise model is constructed to characterize the effects of MRQ. Overall, the proposed scheme achieves reductions in system cost and power consumption at the expense of only marginal precision degradation. Numerical simulations are conducted to validate the performance of the proposed scheme.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"436-440"},"PeriodicalIF":3.9,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1109/LSP.2025.3644313
Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang
Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.
{"title":"ESGN-YOLO: Enhancing Multi-Scale Small Object Detection via Efficient Feature Fusion and Adaptive Spatial Modeling","authors":"Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang","doi":"10.1109/LSP.2025.3644313","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644313","url":null,"abstract":"Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"426-430"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.
{"title":"PRISM-Occ: Path-Routed Integrated Sparse Mixture-of-Experts for Multi-Modal BEV Occupancy Prediction","authors":"Yujia Zhang;Hui Zhu;Chen Hua;Xinkai Kuang;Ziyu Chen;Chunmao Jiang","doi":"10.1109/LSP.2025.3644948","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644948","url":null,"abstract":"Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"381-385"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.
随着对三维点云需求的增长,数据量也在急剧增长。为了应对这一挑战,运动图像专家组(MPEG)正在开发增强的基于几何的点云压缩(增强型G-PCC)标准,该标准使用区域自适应层次变换(RAHT)进行高效的属性编码。然而,由于当前帧和参考帧的几何形状不同,它们之间的八叉树结构不匹配,影响了相互预测的性能。因此,我们提出了一种基于虚拟参考帧的帧间预测方法,该方法将参考帧的几何形状与当前帧对齐。具体来说,虚拟参照系的几何形状来源于当前参照系,其属性信息来源于参照系。实验结果表明,与目前最先进的Enhanced G-PCC参考软件version 28 release candidate 2 (TMC13v28.0-rc2)相比,在无损几何和有损属性编码条件下,该方法可以显著提高预测间RAHT系数的比例,从而实现Luma、Cb和Cr分量的平均bj / n δ率(bj / n δ率)分别为- 6.3%、- 8.9%和- 8.4%。对于有损几何和有损属性的编码条件,对应的bd -rate分别为- 6.5%、- 11.3%和- 7.7%。
{"title":"Virtual Reference Frame-Based Inter Prediction for MPEG Enhanced G-PCC","authors":"Xingjian Zhang;Yuxuan Wei;Zhe Liu;Zehan Wang;Hui Yuan","doi":"10.1109/LSP.2025.3644314","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644314","url":null,"abstract":"As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"301-305"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/LSP.2025.3644669
Ahmed Ali Abbasi;Namrata Vaswani
We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.
{"title":"Locally Shuffled Low Rank Column-Wise Sensing","authors":"Ahmed Ali Abbasi;Namrata Vaswani","doi":"10.1109/LSP.2025.3644669","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644669","url":null,"abstract":"We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"446-450"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/LSP.2025.3644315
Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu
In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.
{"title":"Extended Node-Specific Distributed Generalized Sidelobe Canceler for Outdoor Wireless Acoustic Sensor Networks","authors":"Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu","doi":"10.1109/LSP.2025.3644315","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644315","url":null,"abstract":"In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"306-310"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.
{"title":"Explicit-Implicit Prompt Injection and Semantic-Guided Latent LoRA for Vision-Language Tracking","authors":"Jiapeng Zhang;Ying Wei;Yongfeng Li;Gang Yang;Qiaohong Hao","doi":"10.1109/LSP.2025.3643354","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643354","url":null,"abstract":"Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"376-380"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}