首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
ESGN-YOLO: Enhancing Multi-Scale Small Object Detection via Efficient Feature Fusion and Adaptive Spatial Modeling ESGN-YOLO:基于高效特征融合和自适应空间建模的多尺度小目标检测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-16 DOI: 10.1109/LSP.2025.3644313
Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang
Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.
目标检测在遥感、监视和自动驾驶中至关重要。由于有限的像素、冗余的背景以及视点和照明变化带来的噪声,检测小物体仍然具有挑战性。为了解决这些问题,我们提出了ESGN-YOLO,这是一个轻量级模型,有三个改进。高效特征融合模块(EFFM)增强了多尺度和定向特征提取。Shift-Wise卷积(SWC)瓶颈细化了细粒度特征并抑制了背景冗余。组归一化标头(GNSH)进一步提高了检测精度和效率。在VisDrone2019和RS-STOD上的实验表明,ESGN-YOLO在尺寸紧凑(3.7 M参数)和计算成本适中(12.3 GFLOPs)的情况下取得了优异的mAP@0.5(34.5%和76%)性能。快速推理验证了其在资源受限条件下无人机实时部署和小目标检测的实用性。
{"title":"ESGN-YOLO: Enhancing Multi-Scale Small Object Detection via Efficient Feature Fusion and Adaptive Spatial Modeling","authors":"Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang","doi":"10.1109/LSP.2025.3644313","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644313","url":null,"abstract":"Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"426-430"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRISM-Occ: Path-Routed Integrated Sparse Mixture-of-Experts for Multi-Modal BEV Occupancy Prediction 多模态纯电动汽车占用预测的路径路由集成稀疏混合专家
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-16 DOI: 10.1109/LSP.2025.3644948
Yujia Zhang;Hui Zhu;Chen Hua;Xinkai Kuang;Ziyu Chen;Chunmao Jiang
Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.
鸟瞰图(BEV)占用率预测系统根据序列传感器数据估计3D占用空间,为自动驾驶的下游规划和决策提供环境模型。现有的方法通常依赖于密集融合或朴素特征叠加,膨胀计算和内存,产生校准不良的概率,以及在遮挡和长尾类别下训练脆性。我们提出PRISM-Occ,一个用于多模式BEV占用的双层稀疏专家混合框架。具有稀疏Top-K的路径路由分层路由器(PRHR)仅激活模态内部和模态之间的一组紧凑的专家,减少了参数数量,同时增强了专门化。异方差占用头预测空间温度图以改进校准,并且简单的预先调整与分阶段硬样本时间表稳定遮挡和稀有类别下的训练。在Occ3D-nuScenes和SurroundOcc上,PRISM-Occ使用单尺度256 × 704输入和固定的低分辨率骨干,实现了最先进的精度和更好的校准概率,通过减少参数和相当的运行时内存提供了更强的精度效率折衷。
{"title":"PRISM-Occ: Path-Routed Integrated Sparse Mixture-of-Experts for Multi-Modal BEV Occupancy Prediction","authors":"Yujia Zhang;Hui Zhu;Chen Hua;Xinkai Kuang;Ziyu Chen;Chunmao Jiang","doi":"10.1109/LSP.2025.3644948","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644948","url":null,"abstract":"Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"381-385"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual Reference Frame-Based Inter Prediction for MPEG Enhanced G-PCC 基于虚拟参考帧的MPEG增强型G-PCC互连预测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/LSP.2025.3644314
Xingjian Zhang;Yuxuan Wei;Zhe Liu;Zehan Wang;Hui Yuan
As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.
随着对三维点云需求的增长,数据量也在急剧增长。为了应对这一挑战,运动图像专家组(MPEG)正在开发增强的基于几何的点云压缩(增强型G-PCC)标准,该标准使用区域自适应层次变换(RAHT)进行高效的属性编码。然而,由于当前帧和参考帧的几何形状不同,它们之间的八叉树结构不匹配,影响了相互预测的性能。因此,我们提出了一种基于虚拟参考帧的帧间预测方法,该方法将参考帧的几何形状与当前帧对齐。具体来说,虚拟参照系的几何形状来源于当前参照系,其属性信息来源于参照系。实验结果表明,与目前最先进的Enhanced G-PCC参考软件version 28 release candidate 2 (TMC13v28.0-rc2)相比,在无损几何和有损属性编码条件下,该方法可以显著提高预测间RAHT系数的比例,从而实现Luma、Cb和Cr分量的平均bj / n δ率(bj / n δ率)分别为- 6.3%、- 8.9%和- 8.4%。对于有损几何和有损属性的编码条件,对应的bd -rate分别为- 6.5%、- 11.3%和- 7.7%。
{"title":"Virtual Reference Frame-Based Inter Prediction for MPEG Enhanced G-PCC","authors":"Xingjian Zhang;Yuxuan Wei;Zhe Liu;Zehan Wang;Hui Yuan","doi":"10.1109/LSP.2025.3644314","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644314","url":null,"abstract":"As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"301-305"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locally Shuffled Low Rank Column-Wise Sensing 局部洗牌低秩列感知
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/LSP.2025.3644669
Ahmed Ali Abbasi;Namrata Vaswani
We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.
我们引入并精确地表述了一些观测数据被打乱/排列/洗牌/未标记时的低秩列阵感知(LRCS)问题。洗牌LRCS是一个比LRCS更困难的问题,因为有三个未知变量集,其中一个是离散的。我们提出的求解该问题的算法是最近在快速LRCS中引入的交替GD和最小化(AltGDmin)算法的第一个多块泛化算法。因为这是一个新问题,所以没有解决办法。我们还开发了AltMin解决方案,并提供了广泛的数值比较,表明所提出的基于altgdmin的方法比AltMin快得多。作为基线,我们使用AltGDmin-LRCS和AltMin-LRCS来解决这个问题的压缩版本,这成为一个LRCS问题。我们的实验表明,当可用的测量数量较少时,这种方法失败,而我们提出的方法有效。最后,我们对算法的每次迭代时间复杂度进行了限定,并对算法的初始化步骤提供了保证。
{"title":"Locally Shuffled Low Rank Column-Wise Sensing","authors":"Ahmed Ali Abbasi;Namrata Vaswani","doi":"10.1109/LSP.2025.3644669","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644669","url":null,"abstract":"We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"446-450"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended Node-Specific Distributed Generalized Sidelobe Canceler for Outdoor Wireless Acoustic Sensor Networks 面向室外无线声传感器网络的扩展节点特定分布式广义旁瓣对消器
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/LSP.2025.3644315
Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu
In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.
在使用麦克风阵列节点的分布式声源增强(SSE)任务中,最先进的节点特定分布式广义旁瓣消除(NS-DGSC)算法在同时增强多个期望声源方面取得了显著的性能。然而,其假设的相等数量的节点和源通常不适用于户外应用。本文提出了一种扩展的NS-DGSC (ENS-DGSC)算法来解决这个问题。引入相关性检查模块来处理节点数量超过或匹配源的场景。此外,还设计了使用两种不同策略的时间对齐模块来解决节点间的时间延迟问题。评估结果表明,所提出的NS-DGSC不仅保留了NS-DGSC的优点,而且在节点多于源的情况下具有优越的增强性能。
{"title":"Extended Node-Specific Distributed Generalized Sidelobe Canceler for Outdoor Wireless Acoustic Sensor Networks","authors":"Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu","doi":"10.1109/LSP.2025.3644315","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644315","url":null,"abstract":"In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"306-310"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explicit-Implicit Prompt Injection and Semantic-Guided Latent LoRA for Vision-Language Tracking 用于视觉语言跟踪的显隐提示注入和语义引导的潜在LoRA
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-12 DOI: 10.1109/LSP.2025.3643354
Jiapeng Zhang;Ying Wei;Yongfeng Li;Gang Yang;Qiaohong Hao
Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.
基于提示的学习在视觉语言跟踪(VLT)中显示出前景,但现有的方法通常仅依赖于显式或隐式提示,限制了细粒度的跨模态对齐。此外,先前基于低秩自适应(LoRA)的微调通常只关注视觉自适应,而忽略了语言语义。为了解决这些问题,我们提出了一个统一的VLT框架,该框架集成了显式-隐式提示注入(EIPI)和语义引导的潜在LoRA (SGLL)。EIPI引入语义提示,通过两种途径促进健壮的和上下文敏感的目标建模。显式提示通过多模态目标表示与搜索区域之间的交互构建,而隐式提示通过轻量级瓶颈网络从语言特征中学习。然后,SGLL通过在潜在空间中引入可学习的查询来扩展标准的LoRA,允许基于语言视觉语义的残差调制,而无需重新训练整个模型。这种双重设计产生了具有强跨模态适应性的参数高效跟踪器。大量的实验表明,我们的方法在保持高效率的同时优于先前的基于提示的方法。
{"title":"Explicit-Implicit Prompt Injection and Semantic-Guided Latent LoRA for Vision-Language Tracking","authors":"Jiapeng Zhang;Ying Wei;Yongfeng Li;Gang Yang;Qiaohong Hao","doi":"10.1109/LSP.2025.3643354","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643354","url":null,"abstract":"Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"376-380"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shallow Neural Network Training via Atomic Norms and Semidefinite Programming 基于原子规范和半定规划的浅层神经网络训练
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-12 DOI: 10.1109/LSP.2025.3643361
Andrew J. Christensen;Ananya Sen Gupta
Neural networks have achieved remarkable results across numerous scientific domains because of their ability to uncover complex patterns. However, despite their effectiveness, these networks rely on heuristic training of highly non-convex objective functions, limiting theoretical understanding and practical reliability. Recent work has shown that shallow neural networks with scalar outputs can be formulated as convex optimization problems, bridging empirical success with theory. In this work, we build upon this framework for vector-valued outputs, introducing a convex formulation for two-layer ReLU networks based on an atomic norm and expressible as a semidefinite program (SDP). This yields a principled convex relaxation of multi-output networks that is both expressive and tractable. We validate the approach using standard SDP solvers, demonstrating its feasibility. These results extend convex neural network training beyond scalar outputs and provide a foundation for scalable, robust alternatives to current heuristic deep learning methods. Our method achieved a 7.3% increase in classification accuracy compared to a baseline convex multi-output network.
神经网络在许多科学领域取得了显著的成果,因为它们能够发现复杂的模式。然而,尽管它们很有效,但这些网络依赖于高度非凸目标函数的启发式训练,限制了理论理解和实际可靠性。最近的工作表明,具有标量输出的浅层神经网络可以表述为凸优化问题,将经验成功与理论联系起来。在这项工作中,我们在这个向量值输出框架的基础上,引入了一个基于原子范数和可表示为半确定程序(SDP)的双层ReLU网络的凸公式。这产生了多输出网络的原则性凸松弛,既具有表现力又易于处理。我们使用标准的SDP求解器验证了该方法,证明了其可行性。这些结果将凸神经网络训练扩展到标量输出之外,并为当前启发式深度学习方法的可扩展、鲁棒替代方案提供了基础。与基线凸多输出网络相比,我们的方法在分类精度上提高了7.3%。
{"title":"Shallow Neural Network Training via Atomic Norms and Semidefinite Programming","authors":"Andrew J. Christensen;Ananya Sen Gupta","doi":"10.1109/LSP.2025.3643361","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643361","url":null,"abstract":"Neural networks have achieved remarkable results across numerous scientific domains because of their ability to uncover complex patterns. However, despite their effectiveness, these networks rely on heuristic training of highly non-convex objective functions, limiting theoretical understanding and practical reliability. Recent work has shown that shallow neural networks with scalar outputs can be formulated as convex optimization problems, bridging empirical success with theory. In this work, we build upon this framework for vector-valued outputs, introducing a convex formulation for two-layer ReLU networks based on an atomic norm and expressible as a semidefinite program (SDP). This yields a principled convex relaxation of multi-output networks that is both expressive and tractable. We validate the approach using standard SDP solvers, demonstrating its feasibility. These results extend convex neural network training beyond scalar outputs and provide a foundation for scalable, robust alternatives to current heuristic deep learning methods. Our method achieved a 7.3% increase in classification accuracy compared to a baseline convex multi-output network.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"321-325"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple Self-Organizing Map With Vision Transformers 简单的自组织地图与视觉变压器
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-12 DOI: 10.1109/LSP.2025.3643388
Alan Luo;Kaiwen Yuan
Vision Transformers (ViTs) have demonstrated exceptional performance in various vision tasks. However, they tend to underperform on smaller datasets due to their inherent lack of inductive biases. Current approaches address this limitation implicitly—often by pairing ViTs with pretext tasks or by distilling knowledge from convolutional neural networks (CNNs) to strengthen the prior. In contrast, Self-Organizing Maps (SOMs), a widely adopted self-supervised framework, are inherently structured to preserve topology and spatial organization, making them a promising candidate to directly address the limitations of ViTs in limited or small training datasets. Despite this potential, equipping SOMs with modern deep learning architectures remains largely unexplored. In this study, we conduct a novel exploration on how Vision Transformers (ViTs) and Self-Organizing Maps (SOMs) can empower each other, aiming to bridge this critical research gap. Our findings demonstrate that these architectures can synergistically enhance each other, leading to significantly improved performance in both unsupervised and supervised tasks.
视觉变压器(ViTs)在各种视觉任务中表现出优异的性能。然而,由于它们固有的缺乏归纳偏差,它们往往在较小的数据集上表现不佳。目前的方法直接解决了这一限制,通常是通过将vit与借口任务配对,或者从卷积神经网络(cnn)中提取知识来增强先验。相比之下,自组织地图(SOMs)是一种广泛采用的自监督框架,其固有的结构可以保持拓扑和空间组织,使其成为直接解决vit在有限或小型训练数据集中的局限性的有希望的候选。尽管有这种潜力,但为som配备现代深度学习架构在很大程度上仍未被探索。在这项研究中,我们对视觉变形器(ViTs)和自组织地图(SOMs)如何相互授权进行了新颖的探索,旨在弥合这一关键的研究差距。我们的研究结果表明,这些架构可以协同增强彼此,从而在无监督和有监督任务中显著提高性能。
{"title":"Simple Self-Organizing Map With Vision Transformers","authors":"Alan Luo;Kaiwen Yuan","doi":"10.1109/LSP.2025.3643388","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643388","url":null,"abstract":"Vision Transformers (ViTs) have demonstrated exceptional performance in various vision tasks. However, they tend to underperform on smaller datasets due to their inherent lack of inductive biases. Current approaches address this limitation implicitly—often by pairing ViTs with pretext tasks or by distilling knowledge from convolutional neural networks (CNNs) to strengthen the prior. In contrast, Self-Organizing Maps (SOMs), a widely adopted self-supervised framework, are inherently structured to preserve topology and spatial organization, making them a promising candidate to directly address the limitations of ViTs in limited or small training datasets. Despite this potential, equipping SOMs with modern deep learning architectures remains largely unexplored. In this study, we conduct a novel exploration on how Vision Transformers (ViTs) and Self-Organizing Maps (SOMs) can empower each other, aiming to bridge this critical research gap. Our findings demonstrate that these architectures can synergistically enhance each other, leading to significantly improved performance in both unsupervised and supervised tasks.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"331-335"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonintrusive Watermarking for CycleGAN CycleGAN的非侵入式水印
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-12 DOI: 10.1109/LSP.2025.3643348
Yebin Zheng;Haonan An;Guang Hua;Yongming Chen;Zhiping Lin
Generative adversarial networks (GANs) are a set of powerful generative models, among which CycleGAN, featuring the unique cycle-consistency loss, has gained special popularity. However, this unique structure and the cycle-consistency loss make watermarking CycleGAN particularly challenging, rendering existing deep neural network (DNN) watermarking methods, whether model-agnostic or GAN-specific, inapplicable. Meanwhile, existing DNN watermarking methods are intrusive in nature, requiring direct or indirect modification of model parameters for watermark embedding, which raises fidelity concerns. To solve the above problems, we propose the first nonintrusive and robust watermarking method for CycleGAN. We empirically show that without modifying the CycleGAN model, a user-defined watermark image can still be extracted from model outputs using a dedicated watermark decoder. Extensive experimental results verify that while achieving the so-called absolute fidelity, the proposed method is robust to various attacks, from image post-processing to model stealing.
生成式对抗网络(Generative adversarial networks, GANs)是一组功能强大的生成模型,其中CycleGAN以其独特的循环一致性损失特性而受到特别的关注。然而,这种独特的结构和周期一致性损失使得水印CycleGAN特别具有挑战性,使得现有的深度神经网络(DNN)水印方法,无论是模型无关的还是gan特定的,都不适用。同时,现有深度神经网络水印方法具有侵入性,需要直接或间接修改模型参数进行水印嵌入,存在保真度问题。为了解决上述问题,我们提出了CycleGAN的第一种非侵入式鲁棒水印方法。我们的经验表明,在不修改CycleGAN模型的情况下,使用专用的水印解码器仍然可以从模型输出中提取自定义的水印图像。大量的实验结果证明,在实现所谓的绝对保真度的同时,该方法对从图像后处理到模型窃取的各种攻击都具有鲁棒性。
{"title":"Nonintrusive Watermarking for CycleGAN","authors":"Yebin Zheng;Haonan An;Guang Hua;Yongming Chen;Zhiping Lin","doi":"10.1109/LSP.2025.3643348","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643348","url":null,"abstract":"Generative adversarial networks (GANs) are a set of powerful generative models, among which CycleGAN, featuring the unique cycle-consistency loss, has gained special popularity. However, this unique structure and the cycle-consistency loss make watermarking CycleGAN particularly challenging, rendering existing deep neural network (DNN) watermarking methods, whether model-agnostic or GAN-specific, inapplicable. Meanwhile, existing DNN watermarking methods are intrusive in nature, requiring direct or indirect modification of model parameters for watermark embedding, which raises fidelity concerns. To solve the above problems, we propose the first nonintrusive and robust watermarking method for CycleGAN. We empirically show that without modifying the CycleGAN model, a user-defined watermark image can still be extracted from model outputs using a dedicated watermark decoder. Extensive experimental results verify that while achieving the so-called absolute fidelity, the proposed method is robust to various attacks, from image post-processing to model stealing.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"256-260"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concentration Inequalities for Semidefinite Least Squares Based on Data 基于数据的半定最小二乘集中不等式
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-11 DOI: 10.1109/LSP.2025.3643385
Filippo Fabiani;Andrea Simonetto
We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are $varepsilon$-close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.
本文研究了具有半定约束的数据驱动最小二乘问题,并推导了这些约束松弛时其最优解谱的有限样本保证。特别是,我们提供了一个高置信度界,允许人们解决一个更简单的程序来代替完整的SDLS问题,同时确保最终解决方案的特征值与SD约束所强制的特征值接近。开发的证书随着数据数量的增加而不断缩小,结果证明它易于计算、不受分布限制,并且只需要独立且相同分布的样本。此外,当SDLS用于学习未知的二次函数时,我们建立了在没有SD约束的情况下梯度下降迭代最小化代理代价与真正的最小化器之间的误差界限。
{"title":"Concentration Inequalities for Semidefinite Least Squares Based on Data","authors":"Filippo Fabiani;Andrea Simonetto","doi":"10.1109/LSP.2025.3643385","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643385","url":null,"abstract":"We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are <inline-formula><tex-math>$varepsilon$</tex-math></inline-formula>-close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"326-330"},"PeriodicalIF":3.9,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1