Pub Date : 2025-12-16DOI: 10.1109/LSP.2025.3644313
Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang
Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.
{"title":"ESGN-YOLO: Enhancing Multi-Scale Small Object Detection via Efficient Feature Fusion and Adaptive Spatial Modeling","authors":"Zihao Guo;MeiLing Zhong;Shukai Duan;Lidan Wang","doi":"10.1109/LSP.2025.3644313","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644313","url":null,"abstract":"Object detection is crucial in remote sensing, surveillance, and autonomous driving. Detecting small objects remains challenging due to limited pixels, redundant backgrounds, and noise from viewpoint and illumination variations. To address these, we propose ESGN-YOLO, a lightweight model with three improvements. The Efficient Feature Fusion Module (EFFM) enhances multi-scale and directional feature extraction. The Shift-Wise Convolution (SWC) Bottleneck refines fine-grained features and suppresses background redundancy. The Group Normalisation Scale Head (GNSH) further improves detection accuracy and efficiency. Experiments on VisDrone2019 and RS-STOD show ESGN-YOLO achieves superior mAP@0.5 (34.5% and 76%) with a compact size (3.7 M parameters) and moderate computational cost (12.3 GFLOPs). Fast inference confirms its practicality for real-time UAV deployment and small-object detection under resource-constrained conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"426-430"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.
{"title":"PRISM-Occ: Path-Routed Integrated Sparse Mixture-of-Experts for Multi-Modal BEV Occupancy Prediction","authors":"Yujia Zhang;Hui Zhu;Chen Hua;Xinkai Kuang;Ziyu Chen;Chunmao Jiang","doi":"10.1109/LSP.2025.3644948","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644948","url":null,"abstract":"Bird's-eye-view (BEV) occupancy prediction estimates 3D occupied space from sequential sensor data, providing the environment model that underpins downstream planning and decision-making in autonomous driving. Existing methods often rely on dense fusion or naive feature stacking, inflating compute and memory, yielding poorly calibrated probabilities, and training brittleness under occlusion and long-tail categories. We propose PRISM-Occ, a dual-level sparse Mixture-of-Experts framework for multi-modal BEV occupancy. A path-routed hierarchical router (PRHR) with Sparse Top-K activates only a compact set of experts within and across modalities, reducing parameter count while sharpening specialization. A heteroscedastic occupancy head predicts a spatial temperature map to improve calibration, and a simple prior adjustment with a staged hard-sample schedule stabilizes training under occlusion and rare classes. On Occ3D-nuScenes and SurroundOcc, PRISM-Occ achieves state-of-the-art accuracy and better-calibrated probabilities using single-scale 256 × 704 inputs and fixed, lower-resolution backbones, delivering a stronger accuracy–efficiency trade-off with reduced parameters and comparable runtime memory.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"381-385"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.
随着对三维点云需求的增长,数据量也在急剧增长。为了应对这一挑战,运动图像专家组(MPEG)正在开发增强的基于几何的点云压缩(增强型G-PCC)标准,该标准使用区域自适应层次变换(RAHT)进行高效的属性编码。然而,由于当前帧和参考帧的几何形状不同,它们之间的八叉树结构不匹配,影响了相互预测的性能。因此,我们提出了一种基于虚拟参考帧的帧间预测方法,该方法将参考帧的几何形状与当前帧对齐。具体来说,虚拟参照系的几何形状来源于当前参照系,其属性信息来源于参照系。实验结果表明,与目前最先进的Enhanced G-PCC参考软件version 28 release candidate 2 (TMC13v28.0-rc2)相比,在无损几何和有损属性编码条件下,该方法可以显著提高预测间RAHT系数的比例,从而实现Luma、Cb和Cr分量的平均bj / n δ率(bj / n δ率)分别为- 6.3%、- 8.9%和- 8.4%。对于有损几何和有损属性的编码条件,对应的bd -rate分别为- 6.5%、- 11.3%和- 7.7%。
{"title":"Virtual Reference Frame-Based Inter Prediction for MPEG Enhanced G-PCC","authors":"Xingjian Zhang;Yuxuan Wei;Zhe Liu;Zehan Wang;Hui Yuan","doi":"10.1109/LSP.2025.3644314","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644314","url":null,"abstract":"As the demand for 3D point clouds grows, the data volume is growing dramatically. To tackle this challenge, the Moving Picture Expert Group (MPEG) is developing the enhanced geometry-based point cloud compression (Enhanced G-PCC) standard, which uses Region-Adaptive Hierarchical Transform (RAHT) for highly efficient attribute coding. However, since the geometry of the current frame and the reference frame is different, the octree structure between them does not match, which affects the performance of inter prediction. Therefore, we propose a virtual reference frame-based inter prediction method by aligning the geometry of the reference frame and the current frame. Specifically, the geometry of the virtual reference frame comes from the current frame, while its attribute information comes from the reference frame. Experimental results show that the proposed method can significantly increase the proportion of inter predicted RAHT coefficients and thus achieve average Bjøntegaard Delta Rates (BD-rates) of −6.3%, −8.9%, and −8.4% for the Luma, Cb, and Cr components, respectively, under the lossless geometry and lossy attribute coding condition, compared to the state-of-the-art Enhanced G-PCC reference software version 28 release candidate 2 (TMC13v28.0-rc2). For the coding condition of lossy geometry and lossy attribute, the corresponding BD-rates are −6.5%, −11.3%, and −7.7%, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"301-305"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/LSP.2025.3644669
Ahmed Ali Abbasi;Namrata Vaswani
We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.
{"title":"Locally Shuffled Low Rank Column-Wise Sensing","authors":"Ahmed Ali Abbasi;Namrata Vaswani","doi":"10.1109/LSP.2025.3644669","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644669","url":null,"abstract":"We introduce and precisely formulate the Low Rank Columnwise matrix Sensing (LRCS) problem when some of the observed data is scrambled / permuted / shuffled / unlabeled. Shuffled LRCS is a more difficult problem than just LRCS because there are three unknown variable sets and one of them is discrete. Our proposed algorithm for solving it is the first multi-block generalization of the Alternating GD and Minimization (AltGDmin) algorithm that was introduced in recent work for fast LRCS. Since this is a new problem, no solutions exist. We also develop the AltMin solution and provide extensive numerical comparisons demonstrating that the proposed AltGDmin-based method is much faster than AltMin. As baseline, we use AltGDmin-LRCS and AltMin-LRCS for a collapsed version of this problem, which becomes an LRCS problem. Our experiments show that, when the available number of measurements is small, this fails, while our proposed method works. Finally, we bound the per-iteration time complexity of our algorithm and also provide a guarantee for its initialization step.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"446-450"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/LSP.2025.3644315
Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu
In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.
{"title":"Extended Node-Specific Distributed Generalized Sidelobe Canceler for Outdoor Wireless Acoustic Sensor Networks","authors":"Shiqin Li;Jing Hu;Zhao Zhao;Zhiyong Xu","doi":"10.1109/LSP.2025.3644315","DOIUrl":"https://doi.org/10.1109/LSP.2025.3644315","url":null,"abstract":"In distributed sound source enhancement (SSE) tasks using microphone array nodes, state-of-the-art node-specific distributed generalized sidelobe canceler (NS-DGSC) algorithm has achieved remarkable performance for simultaneously enhancing multiple desired sources. However, its assumption of an equal number of nodes and sources usually does not hold in outdoor applications. This letter proposes an extended NS-DGSC (ENS-DGSC) algorithm to tackle this issue. A correlation check module is introduced to handle scenarios where nodes outnumber or match sources. Furthermore, a temporal alignment module using two different strategies is designed to address time delays among nodes. Evaluations reveal that the proposed ENS-DGSC not only retains advantages of the NS-DGSC, but also provides superior enhancement performance with more nodes than sources.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"306-310"},"PeriodicalIF":3.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.
{"title":"Explicit-Implicit Prompt Injection and Semantic-Guided Latent LoRA for Vision-Language Tracking","authors":"Jiapeng Zhang;Ying Wei;Yongfeng Li;Gang Yang;Qiaohong Hao","doi":"10.1109/LSP.2025.3643354","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643354","url":null,"abstract":"Prompt-based learning has shown promise in visual-language tracking (VLT), yet existing methods often rely on either explicit or implicit prompting alone, limiting fine-grained cross-modal alignment. Moreover, Low-Rank Adaptation (LoRA) -based fine-tuning in prior work typically focuses on visual-only adaptation, overlooking language semantics. To address these issues, we propose a unified VLT framework that integrates Explicit-Implicit Prompt Injection (EIPI) and Semantic-Guided Latent LoRA (SGLL). EIPI introduces semantic prompts to facilitate robust and context-sensitive target modeling through two pathways. The explicit prompts are constructed by interact between multi-modal target representations with the search region, while implicit prompts are learned from linguistic features via a lightweight bottleneck network. Then, SGLL extends standard LoRA by introducing learnable queries in the latent space, allowing residual modulation based on language-visual semantics without retraining the full model. This dual design yields a parameter-efficient tracker with strong cross-modal adaptability. Extensive experiments show our method outperforms prior prompt-based approaches while maintaining high efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"376-380"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1109/LSP.2025.3643361
Andrew J. Christensen;Ananya Sen Gupta
Neural networks have achieved remarkable results across numerous scientific domains because of their ability to uncover complex patterns. However, despite their effectiveness, these networks rely on heuristic training of highly non-convex objective functions, limiting theoretical understanding and practical reliability. Recent work has shown that shallow neural networks with scalar outputs can be formulated as convex optimization problems, bridging empirical success with theory. In this work, we build upon this framework for vector-valued outputs, introducing a convex formulation for two-layer ReLU networks based on an atomic norm and expressible as a semidefinite program (SDP). This yields a principled convex relaxation of multi-output networks that is both expressive and tractable. We validate the approach using standard SDP solvers, demonstrating its feasibility. These results extend convex neural network training beyond scalar outputs and provide a foundation for scalable, robust alternatives to current heuristic deep learning methods. Our method achieved a 7.3% increase in classification accuracy compared to a baseline convex multi-output network.
{"title":"Shallow Neural Network Training via Atomic Norms and Semidefinite Programming","authors":"Andrew J. Christensen;Ananya Sen Gupta","doi":"10.1109/LSP.2025.3643361","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643361","url":null,"abstract":"Neural networks have achieved remarkable results across numerous scientific domains because of their ability to uncover complex patterns. However, despite their effectiveness, these networks rely on heuristic training of highly non-convex objective functions, limiting theoretical understanding and practical reliability. Recent work has shown that shallow neural networks with scalar outputs can be formulated as convex optimization problems, bridging empirical success with theory. In this work, we build upon this framework for vector-valued outputs, introducing a convex formulation for two-layer ReLU networks based on an atomic norm and expressible as a semidefinite program (SDP). This yields a principled convex relaxation of multi-output networks that is both expressive and tractable. We validate the approach using standard SDP solvers, demonstrating its feasibility. These results extend convex neural network training beyond scalar outputs and provide a foundation for scalable, robust alternatives to current heuristic deep learning methods. Our method achieved a 7.3% increase in classification accuracy compared to a baseline convex multi-output network.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"321-325"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1109/LSP.2025.3643388
Alan Luo;Kaiwen Yuan
Vision Transformers (ViTs) have demonstrated exceptional performance in various vision tasks. However, they tend to underperform on smaller datasets due to their inherent lack of inductive biases. Current approaches address this limitation implicitly—often by pairing ViTs with pretext tasks or by distilling knowledge from convolutional neural networks (CNNs) to strengthen the prior. In contrast, Self-Organizing Maps (SOMs), a widely adopted self-supervised framework, are inherently structured to preserve topology and spatial organization, making them a promising candidate to directly address the limitations of ViTs in limited or small training datasets. Despite this potential, equipping SOMs with modern deep learning architectures remains largely unexplored. In this study, we conduct a novel exploration on how Vision Transformers (ViTs) and Self-Organizing Maps (SOMs) can empower each other, aiming to bridge this critical research gap. Our findings demonstrate that these architectures can synergistically enhance each other, leading to significantly improved performance in both unsupervised and supervised tasks.
{"title":"Simple Self-Organizing Map With Vision Transformers","authors":"Alan Luo;Kaiwen Yuan","doi":"10.1109/LSP.2025.3643388","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643388","url":null,"abstract":"Vision Transformers (ViTs) have demonstrated exceptional performance in various vision tasks. However, they tend to underperform on smaller datasets due to their inherent lack of inductive biases. Current approaches address this limitation implicitly—often by pairing ViTs with pretext tasks or by distilling knowledge from convolutional neural networks (CNNs) to strengthen the prior. In contrast, Self-Organizing Maps (SOMs), a widely adopted self-supervised framework, are inherently structured to preserve topology and spatial organization, making them a promising candidate to directly address the limitations of ViTs in limited or small training datasets. Despite this potential, equipping SOMs with modern deep learning architectures remains largely unexplored. In this study, we conduct a novel exploration on how Vision Transformers (ViTs) and Self-Organizing Maps (SOMs) can empower each other, aiming to bridge this critical research gap. Our findings demonstrate that these architectures can synergistically enhance each other, leading to significantly improved performance in both unsupervised and supervised tasks.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"331-335"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1109/LSP.2025.3643348
Yebin Zheng;Haonan An;Guang Hua;Yongming Chen;Zhiping Lin
Generative adversarial networks (GANs) are a set of powerful generative models, among which CycleGAN, featuring the unique cycle-consistency loss, has gained special popularity. However, this unique structure and the cycle-consistency loss make watermarking CycleGAN particularly challenging, rendering existing deep neural network (DNN) watermarking methods, whether model-agnostic or GAN-specific, inapplicable. Meanwhile, existing DNN watermarking methods are intrusive in nature, requiring direct or indirect modification of model parameters for watermark embedding, which raises fidelity concerns. To solve the above problems, we propose the first nonintrusive and robust watermarking method for CycleGAN. We empirically show that without modifying the CycleGAN model, a user-defined watermark image can still be extracted from model outputs using a dedicated watermark decoder. Extensive experimental results verify that while achieving the so-called absolute fidelity, the proposed method is robust to various attacks, from image post-processing to model stealing.
{"title":"Nonintrusive Watermarking for CycleGAN","authors":"Yebin Zheng;Haonan An;Guang Hua;Yongming Chen;Zhiping Lin","doi":"10.1109/LSP.2025.3643348","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643348","url":null,"abstract":"Generative adversarial networks (GANs) are a set of powerful generative models, among which CycleGAN, featuring the unique cycle-consistency loss, has gained special popularity. However, this unique structure and the cycle-consistency loss make watermarking CycleGAN particularly challenging, rendering existing deep neural network (DNN) watermarking methods, whether model-agnostic or GAN-specific, inapplicable. Meanwhile, existing DNN watermarking methods are intrusive in nature, requiring direct or indirect modification of model parameters for watermark embedding, which raises fidelity concerns. To solve the above problems, we propose the first nonintrusive and robust watermarking method for CycleGAN. We empirically show that without modifying the CycleGAN model, a user-defined watermark image can still be extracted from model outputs using a dedicated watermark decoder. Extensive experimental results verify that while achieving the so-called absolute fidelity, the proposed method is robust to various attacks, from image post-processing to model stealing.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"256-260"},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1109/LSP.2025.3643385
Filippo Fabiani;Andrea Simonetto
We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are $varepsilon$-close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.
{"title":"Concentration Inequalities for Semidefinite Least Squares Based on Data","authors":"Filippo Fabiani;Andrea Simonetto","doi":"10.1109/LSP.2025.3643385","DOIUrl":"https://doi.org/10.1109/LSP.2025.3643385","url":null,"abstract":"We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are <inline-formula><tex-math>$varepsilon$</tex-math></inline-formula>-close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"326-330"},"PeriodicalIF":3.9,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}