首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
Adversarial Robustness of Self-Supervised Learning Features 自监督学习特征的对抗鲁棒性
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/OJSP.2025.3562797
Nicholas Mehlman;Shri Narayanan
As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.
随着深度学习模型的激增,人们对其可靠性和安全性的担忧也在增加。一个重要的挑战是理解对抗性扰动,它可以改变模型的预测,尽管量级很小。先前的研究已经提出,这种现象是由监督学习的基本缺陷造成的,通过这种缺陷,分类器利用任何更具预测性的输入特征,而不管这些特征是否对对抗性攻击具有鲁棒性。在本文中,我们在对比自监督学习方法的背景下考虑特征鲁棒性,这种方法近年来变得特别普遍。我们的研究结果表明,事实上,在自我监督过程中学习到的特征比在监督学习过程中产生的特征更能抵抗对抗性扰动。然而,我们也发现这些自监督特征表现出较差的类间解纠缠,限制了它们对整体分类器鲁棒性的贡献。
{"title":"Adversarial Robustness of Self-Supervised Learning Features","authors":"Nicholas Mehlman;Shri Narayanan","doi":"10.1109/OJSP.2025.3562797","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3562797","url":null,"abstract":"As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"468-477"},"PeriodicalIF":2.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971198","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Array Design for Angle of Arrival Estimation Using the Worst-Case Two-Target Cramér-Rao Bound 基于最坏情况双目标cram<s:1> - rao界的到达角估计阵列设计
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-07 DOI: 10.1109/OJSP.2025.3558686
Costas A. Kokke;Mario Coutino;Richard Heusdens;Geert Leus
Sparse array design is used to help reduce computational, hardware, and power requirements compared to uniform arrays while maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown target directions. To handle the unknown target directions when optimizing the Cramér-Rao bound, we propose to use the worst-case Cramér-Rao bound of two uncorrelated equal power sources with arbitrary angles. This new worst-case two-target Cramér-Rao bound metric has some resemblance to the peak sidelobe level metric which is commonly used in unknown multi-target scenarios. We cast the sensor selection problem for 3-D arrays using the worst-case two-target Cramér-Rao bound as a convex semi-definite program and obtain the binary selection by randomized rounding. We illustrate the proposed method through numerical examples, comparing it to solutions obtained by minimizing the single-target Cramér-Rao bound, minimizing the Cramér-Rao bound for known target angles, the concentric rectangular array and the boundary array. We show that our method selects a combination of edge and center elements, which contrasts with solutions obtained by minimizing the single-target Cramér-Rao bound. The proposed selections also exhibit lower peak sidelobe levels without the need for sidelobe level constraints.
与均匀阵列相比,稀疏阵列设计有助于减少计算、硬件和功耗需求,同时保持可接受的性能。虽然以前的稀疏感知采用最小化cram r- rao界,但它没有考虑多目标和未知目标方向。为了在优化cramsamr - rao界时处理未知的目标方向,我们提出使用任意角度的两个不相关相等电源的最坏情况cramsamr - rao界。这种新的最坏情况双目标cram r- rao界度量与通常用于未知多目标情况的峰值旁瓣电平度量有一定的相似之处。将最坏情况下的双目标cram - rao界作为凸半定规划,对三维阵列的传感器选择问题进行了求解,并通过随机四舍五入的方法得到了传感器的二值选择。通过数值算例对该方法进行了说明,并将其与单目标cram - rao界最小解、已知目标角的cram - rao界最小解、同心矩形阵列解和边界阵列解进行了比较。我们证明了我们的方法选择了边缘和中心元素的组合,这与最小化单目标cram r- rao界得到的解形成了对比。所提出的选择还表现出较低的峰值旁瓣电平,而不需要旁瓣电平约束。
{"title":"Array Design for Angle of Arrival Estimation Using the Worst-Case Two-Target Cramér-Rao Bound","authors":"Costas A. Kokke;Mario Coutino;Richard Heusdens;Geert Leus","doi":"10.1109/OJSP.2025.3558686","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3558686","url":null,"abstract":"Sparse array design is used to help reduce computational, hardware, and power requirements compared to uniform arrays while maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown target directions. To handle the unknown target directions when optimizing the Cramér-Rao bound, we propose to use the worst-case Cramér-Rao bound of two uncorrelated equal power sources with arbitrary angles. This new worst-case two-target Cramér-Rao bound metric has some resemblance to the peak sidelobe level metric which is commonly used in unknown multi-target scenarios. We cast the sensor selection problem for 3-D arrays using the worst-case two-target Cramér-Rao bound as a convex semi-definite program and obtain the binary selection by randomized rounding. We illustrate the proposed method through numerical examples, comparing it to solutions obtained by minimizing the single-target Cramér-Rao bound, minimizing the Cramér-Rao bound for known target angles, the concentric rectangular array and the boundary array. We show that our method selects a combination of edge and center elements, which contrasts with solutions obtained by minimizing the single-target Cramér-Rao bound. The proposed selections also exhibit lower peak sidelobe levels without the need for sidelobe level constraints.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"453-467"},"PeriodicalIF":2.9,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10955272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework 分散梯度下降的统一分析:一个收缩映射框架
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-02 DOI: 10.1109/OJSP.2025.3557332
Erik G. Larsson;Nicolò Michelusi
The decentralized gradient descent (DGD) algorithm, and its sibling, diffusion, are workhorses in decentralized machine learning, distributed inference and estimation, and multi-agent coordination. We propose a novel, principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies, using contraction mappings coupled with a result called the mean Hessian theorem (MHT). The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes. While these bounds are qualitatively similar to results found in the literature, our approach using contractions together with the MHT decouples the algorithm dynamics (how quickly the algorithm converges to its fixed point) from its asymptotic convergence properties (how far the fixed point is from the global optimum). This yields a simple, intuitive analysis that is accessible to a broader audience. Extensions are provided to multiple local gradient updates, time-varying step sizes, noisy gradients (stochastic DGD and diffusion), communication noise, and random topologies.
去中心化梯度下降(DGD)算法及其兄弟扩散算法是去中心化机器学习、分布式推理和估计以及多智能体协调的主要方法。我们提出了一个新的,原则性的框架,用于分析强凸,光滑目标和任意无向拓扑的DGD和扩散,使用收缩映射和称为平均Hessian定理(MHT)的结果。这些工具的使用产生了严格的收敛界限,无论是在无噪声和有噪声的制度。虽然这些边界在性质上与文献中发现的结果相似,但我们使用压缩和MHT的方法将算法动态(算法收敛到不动点的速度)与其渐近收敛性质(不动点离全局最优点有多远)解耦。这产生了一个简单、直观的分析,可供更广泛的受众使用。扩展提供了多个局部梯度更新,时变步长,噪声梯度(随机DGD和扩散),通信噪声和随机拓扑。
{"title":"Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework","authors":"Erik G. Larsson;Nicolò Michelusi","doi":"10.1109/OJSP.2025.3557332","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3557332","url":null,"abstract":"The decentralized gradient descent (DGD) algorithm, and its sibling, diffusion, are workhorses in decentralized machine learning, distributed inference and estimation, and multi-agent coordination. We propose a novel, principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies, using contraction mappings coupled with a result called the mean Hessian theorem (MHT). The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes. While these bounds are qualitatively similar to results found in the literature, our approach using contractions together with the MHT decouples the algorithm dynamics (how quickly the algorithm converges to its fixed point) from its asymptotic convergence properties (how far the fixed point is from the global optimum). This yields a simple, intuitive analysis that is accessible to a broader audience. Extensions are provided to multiple local gradient updates, time-varying step sizes, noisy gradients (stochastic DGD and diffusion), communication noise, and random topologies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"507-529"},"PeriodicalIF":2.9,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10947567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VAMP-Based Kalman Filtering Under Non-Gaussian Process Noise 非高斯过程噪声下基于vamp的卡尔曼滤波
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-02 DOI: 10.1109/OJSP.2025.3557271
Tiancheng Gao;Mohamed Akrout;Faouzi Bellili;Amine Mezghani
Estimating time-varying signals becomes particularly challenging in the face of non-Gaussian (e.g., sparse) and/or rapidly time-varying process noise. By building upon the recent progress in the approximate message passing (AMP) paradigm, this paper unifies the vector variant of AMP (i.e., VAMP) with the Kalman filter (KF) into a unified message passing framework. The new algorithm (coined VAMP-KF) does not restrict the process noise to a specific structure (e.g., same support over time), thereby accounting for non-Gaussian process noise sources that are uncorrelated both component-wise and over time. For the sake of theoretical performance prediction, we conduct a state evolution (SE) analysis of the proposed algorithm and show its consistency with the asymptotic empirical mean-squared error (MSE). Numerical results using sparse noise dynamics with different sparsity ratios demonstrate unambiguously the effectiveness of the proposed VAMP-KF algorithm and its superiority over state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.
在面对非高斯(例如,稀疏)和/或快速时变过程噪声时,估计时变信号变得特别具有挑战性。本文以近似消息传递(AMP)范式的最新进展为基础,将AMP的向量变体(即VAMP)与卡尔曼滤波器(KF)统一为统一的消息传递框架。新算法(称为VAMP-KF)不会将过程噪声限制到特定结构(例如,随着时间的推移,相同的支持),从而考虑到组件和时间不相关的非高斯过程噪声源。为了理论上的性能预测,我们对所提出的算法进行了状态演化(SE)分析,并证明了其与渐近经验均方误差(MSE)的一致性。使用不同稀疏度比的稀疏噪声动力学的数值结果明确地证明了所提出的VAMP-KF算法的有效性,并且在重建精度和计算复杂度方面优于现有算法。
{"title":"VAMP-Based Kalman Filtering Under Non-Gaussian Process Noise","authors":"Tiancheng Gao;Mohamed Akrout;Faouzi Bellili;Amine Mezghani","doi":"10.1109/OJSP.2025.3557271","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3557271","url":null,"abstract":"Estimating time-varying signals becomes particularly challenging in the face of non-Gaussian (e.g., sparse) and/or rapidly time-varying process noise. By building upon the recent progress in the approximate message passing (AMP) paradigm, this paper unifies the vector variant of AMP (i.e., VAMP) with the Kalman filter (KF) into a unified message passing framework. The new algorithm (coined VAMP-KF) does not restrict the process noise to a specific structure (e.g., same support over time), thereby accounting for non-Gaussian process noise sources that are uncorrelated both component-wise and over time. For the sake of theoretical performance prediction, we conduct a state evolution (SE) analysis of the proposed algorithm and show its consistency with the asymptotic empirical mean-squared error (MSE). Numerical results using sparse noise dynamics with different sparsity ratios demonstrate unambiguously the effectiveness of the proposed VAMP-KF algorithm and its superiority over state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"434-452"},"PeriodicalIF":2.9,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10947573","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images 基于空间和局部上下文增强的Swin变压器增强遥感图像语义分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3573202
Rong-Xing Ding;Yi-Han Xu;Gang Yu;Wen Zhou;Ding Zhou
Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.
遥感图像的语义分割在作物覆盖和类型分析、环境监测等领域有着广泛的应用。在遥感图像的语义分割中,由于遥感图像的特殊性,不仅需要局部上下文信息,全局上下文信息也在其中发挥着重要作用。受Swin Transformer强大的全局建模能力的启发,我们提出了LSENet网络,它遵循UNet网络的编码器-解码器架构。在编码阶段,我们提出了空间增强模块(SEM),通过对空间信息进行编码,帮助Swin Transformer进一步增强特征提取。在解码阶段,我们提出了局部增强模块(LEM),将其嵌入到Swin Transformer中,以改进Swin Transformer,帮助网络获得更多的局部语义信息,从而更准确地对像素进行分类,特别是在边缘区域,LEM的加入可以获得更平滑的边缘。在Vaihingen和Potsdam数据集上的实验结果表明了该方法的有效性。具体来说,波茨坦数据集的mIoU指标为78.58%,Vaihingen数据集为72.59%,OpenEarthMap数据集为64.49%。
{"title":"Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images","authors":"Rong-Xing Ding;Yi-Han Xu;Gang Yu;Wen Zhou;Ding Zhou","doi":"10.1109/OJSP.2025.3573202","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3573202","url":null,"abstract":"Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"608-620"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11011931","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144299229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Streaming LiDAR Scene Flow Estimation 流激光雷达场景流估计
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3572759
Mazen Abdelfattah;Z. Jane Wang;Rabab Ward
Safe navigation of autonomous vehicles requires accurate and rapid understanding of their dynamic 3D environment. Scene flow estimation models this dynamic environment by predicting point motion between sequential point cloud scans, and is crucial for safe navigation. Existing state-of-the-art scene flow estimation methods, based on test-time optimization, achieve high accuracy but suffer from significant latency, limiting their applicability in real-time onboard systems. This latency stems from both the iterative test-time optimization process and the inherent delay of waiting for the LiDAR to acquire a complete $360^circ$ scan. To overcome this bottleneck, we introduce a novel streaming scene flow framework leveraging the sequential nature of LiDAR slice acquisition, demonstrating a dramatic reduction in end-to-end latency. Instead of waiting for the full $360^circ$ scan, our method immediately estimates scene flow using each LiDAR slice once it is captured. To mitigate the reduced context of individual slices, we propose a novel contextual augmentation technique that expands the target slice by a small angular margin, incorporating crucial slice boundary information. Furthermore, to enhance test-time optimization within our streaming framework, our novel initialization scheme ’warm-starts' the current optimization using optimized parameters from the preceding slice. This achieves substantial speedups while maintaining, and in some cases surpassing, full-scan accuracy. We rigorously evaluate our approach on the challenging Waymo and Argoverse datasets, demonstrating significant latency reduction without compromising scene flow quality. This work paves the way for deploying high-accuracy, real-time scene flow algorithms in autonomous driving, advancing the field towards more responsive and safer autonomous systems.
自动驾驶汽车的安全导航需要准确快速地了解其动态3D环境。场景流估计通过预测连续点云扫描之间的点运动来模拟这种动态环境,对安全导航至关重要。现有的基于测试时间优化的场景流估计方法具有较高的精度,但存在较大的延迟,限制了其在实时车载系统中的适用性。这种延迟源于迭代测试时间优化过程和等待激光雷达获得完整的360^circ$扫描的固有延迟。为了克服这一瓶颈,我们引入了一种新的流场景流框架,利用激光雷达切片采集的顺序特性,大大减少了端到端延迟。我们的方法不是等待完整的$360^circ$扫描,而是在捕获每个LiDAR切片后立即使用它来估计场景流量。为了减轻单个切片上下文的减少,我们提出了一种新的上下文增强技术,该技术将目标切片扩展一个小的角度边缘,并结合关键的切片边界信息。此外,为了增强流框架内的测试时间优化,我们的新初始化方案使用前片的优化参数“热启动”当前优化。这在保持(在某些情况下甚至超过)全扫描精度的同时实现了显著的加速。我们在具有挑战性的Waymo和Argoverse数据集上严格评估了我们的方法,证明了在不影响场景流质量的情况下显著降低延迟。这项工作为在自动驾驶中部署高精度、实时场景流算法铺平了道路,推动了该领域向更灵敏、更安全的自动驾驶系统发展。
{"title":"Streaming LiDAR Scene Flow Estimation","authors":"Mazen Abdelfattah;Z. Jane Wang;Rabab Ward","doi":"10.1109/OJSP.2025.3572759","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3572759","url":null,"abstract":"Safe navigation of autonomous vehicles requires accurate and rapid understanding of their dynamic 3D environment. Scene flow estimation models this dynamic environment by predicting point motion between sequential point cloud scans, and is crucial for safe navigation. Existing state-of-the-art scene flow estimation methods, based on test-time optimization, achieve high accuracy but suffer from significant latency, limiting their applicability in real-time onboard systems. This latency stems from both the iterative test-time optimization process and the inherent delay of waiting for the LiDAR to acquire a complete <inline-formula><tex-math>$360^circ$</tex-math></inline-formula> scan. To overcome this bottleneck, we introduce a novel <italic>streaming</i> scene flow framework leveraging the sequential nature of LiDAR slice acquisition, demonstrating a dramatic reduction in end-to-end latency. Instead of waiting for the full <inline-formula><tex-math>$360^circ$</tex-math></inline-formula> scan, our method immediately estimates scene flow using each LiDAR slice once it is captured. To mitigate the reduced context of individual slices, we propose a novel contextual augmentation technique that expands the target slice by a small angular margin, incorporating crucial slice boundary information. Furthermore, to enhance test-time optimization within our streaming framework, our novel initialization scheme ’warm-starts' the current optimization using optimized parameters from the preceding slice. This achieves substantial speedups while maintaining, and in some cases surpassing, full-scan accuracy. We rigorously evaluate our approach on the challenging Waymo and Argoverse datasets, demonstrating significant latency reduction without compromising scene flow quality. This work paves the way for deploying high-accuracy, real-time scene flow algorithms in autonomous driving, advancing the field towards more responsive and safer autonomous systems.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"590-598"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11012710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Appearance Estimation and Image Segmentation via Tensor Factorization 基于张量分解的外观估计和图像分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3572820
Jeova Farias Sales Rocha Neto
Image Segmentation is one of the core tasks in Computer Vision, and solving it often depends on modeling the image appearance data via the color distributions of each of its constituent regions. Whereas many segmentation algorithms handle the appearance model dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high-order color statistics from the image as an input to a tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multi-region images and automatically output the regions' proportions without prior user interaction, overcoming the drawbacks of a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.
图像分割是计算机视觉的核心任务之一,解决图像分割问题往往依赖于通过图像各组成区域的颜色分布对图像外观数据进行建模。虽然许多分割算法使用交替或隐式方法处理外观模型依赖,但我们在这里提出了一种新的方法,可以直接从图像中估计它们,而不需要关于底层分割的先验信息。我们的方法使用来自图像的局部高阶颜色统计作为输入到基于张量分解的潜在变量模型估计器。该方法能够估计多区域图像中的模型,并在没有事先用户交互的情况下自动输出区域的比例,克服了先前尝试解决该问题的缺点。我们还在许多具有挑战性的合成和真实成像场景中展示了我们提出的方法的性能,并表明它导致了一种有效的分割算法。
{"title":"Appearance Estimation and Image Segmentation via Tensor Factorization","authors":"Jeova Farias Sales Rocha Neto","doi":"10.1109/OJSP.2025.3572820","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3572820","url":null,"abstract":"Image Segmentation is one of the core tasks in Computer Vision, and solving it often depends on modeling the image appearance data via the color distributions of each of its constituent regions. Whereas many segmentation algorithms handle the appearance model dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high-order color statistics from the image as an input to a tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multi-region images and automatically output the regions' proportions without prior user interaction, overcoming the drawbacks of a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"581-589"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11011680","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Motion Vector Resolutions in Raw Plenoptic Video Coding 自适应运动矢量分辨率在原始全光视频编码
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-22 DOI: 10.1109/OJSP.2025.3572840
Thuc Nguyen Huu;Vinh Van Duong;Jonghoon Yim;Byeungwoo Jeon
This paper addresses the unique challenges of compressing raw plenoptic video, in which the inherent hexagonal micro-image layout and sparse distribution of motion vectors (MVs) often diminish the coding efficiency of conventional block-based motion compensation. To mitigate excessive overhead from motion vector difference (MVD) signaling, we use three specialized MV resolutions: a hexagonal-lattice (HL) alignment that matches the micro-image structure, an integer-pel resolution, and a quarter-pel resolution. We then develop a rate-distortion (RD)-optimized scheme that adaptively selects the most suitable MV resolution at the coding unit level. By integrating our approach into the Versatile Video Coding (VVC) framework, the proposed method reduces MVD bits significantly while preserving high prediction accuracy. Experiments using two comprehensive plenoptic camera datasets — lenslet 1.0 and lenslet 2.0 — demonstrate substantial gains over the VVC anchor, achieving average Bjontegaard–Delta rate savings of 5.90% and 1.80%, respectively. These results confirm that combining HL and conventional resolutions in an RD-optimized manner substantially enhances motion prediction efficiency for raw plenoptic video.
本文解决了压缩原始全光视频的独特挑战,其中固有的六边形微图像布局和运动矢量(mv)的稀疏分布经常降低传统的基于块的运动补偿的编码效率。为了减轻运动矢量差分(MVD)信号带来的过度开销,我们使用了三种专用的MV分辨率:匹配微图像结构的六边形-晶格(HL)对齐、整数分辨率和四分之一像素分辨率。然后,我们开发了一种率失真(RD)优化方案,可自适应地在编码单元级别选择最合适的MV分辨率。通过将我们的方法集成到通用视频编码(VVC)框架中,该方法在保持较高预测精度的同时显著减少了MVD位。使用两种全面的全光学相机数据集(lenslet 1.0和lenslet 2.0)进行的实验表明,与VVC锚相比,VVC锚具有显著的优势,分别实现了5.90%和1.80%的平均Bjontegaard-Delta速率节约。这些结果证实,将HL和传统分辨率以rd优化的方式结合起来,大大提高了原始全视场视频的运动预测效率。
{"title":"Adaptive Motion Vector Resolutions in Raw Plenoptic Video Coding","authors":"Thuc Nguyen Huu;Vinh Van Duong;Jonghoon Yim;Byeungwoo Jeon","doi":"10.1109/OJSP.2025.3572840","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3572840","url":null,"abstract":"This paper addresses the unique challenges of compressing raw plenoptic video, in which the inherent hexagonal micro-image layout and sparse distribution of motion vectors (MVs) often diminish the coding efficiency of conventional block-based motion compensation. To mitigate excessive overhead from motion vector difference (MVD) signaling, we use three specialized MV resolutions: a hexagonal-lattice (HL) alignment that matches the micro-image structure, an integer-pel resolution, and a quarter-pel resolution. We then develop a rate-distortion (RD)-optimized scheme that adaptively selects the most suitable MV resolution at the coding unit level. By integrating our approach into the Versatile Video Coding (VVC) framework, the proposed method reduces MVD bits significantly while preserving high prediction accuracy. Experiments using two comprehensive plenoptic camera datasets — lenslet 1.0 and lenslet 2.0 — demonstrate substantial gains over the VVC anchor, achieving average Bjontegaard–Delta rate savings of 5.90% and 1.80%, respectively. These results confirm that combining HL and conventional resolutions in an RD-optimized manner substantially enhances motion prediction efficiency for raw plenoptic video.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"917-925"},"PeriodicalIF":2.7,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010129","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144758367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Snapshot Hyperspectral Imaging With Co-Designed Optics, Color Filter Array, and Unrolled Network 快照高光谱成像与共同设计的光学,彩色滤光片阵列,和展开的网络
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-21 DOI: 10.1109/OJSP.2025.3571675
Ayoung Kim;Ugur Akpinar;Erdem Sahin;Atanas Gotchev
We propose a novel snapshot hyperspectral imaging method that incorporates co-designed optics, a color filter array (CFA), and an unrolled post-processing network through end-to-end learning. The camera optics consists of a fixed refractive lens and a diffractive optical element (DOE). The learned DOE and CFA efficiently encode the hyperspectral data cube on the sensor via phase and amplitude modulation at the camera aperture and sensor planes, respectively. Subsequently, the unrolled network reconstructs the hyperspectral images from the sensor signal with high accuracy. We conduct extensive simulations to analyze and validate the performance of the proposed method for several CFA models and in non-ideal imaging conditions. We demonstrate that the Gaussian model is effective for parameterizing the spectral transmission functions of CFA pixels, providing high reconstruction accuracy and being relatively easy to implement. Furthermore, we show that learned CFA patterns are effective when optimally coupled with co-designed diffractive-refractive optics. We evaluate the robustness of our method against sensor noise and potential inaccuracies in the fabrication of the DOE and CFA. Our results show that our method achieves superior reconstruction quality compared to state-of-the-art methods, excelling in both spatial and spectral detail recovery and maintaining robustness against realistic noise levels.
我们提出了一种新的快照高光谱成像方法,该方法结合了共同设计的光学器件、彩色滤光器阵列(CFA)和通过端到端学习展开的后处理网络。相机光学系统由固定折射透镜和衍射光学元件(DOE)组成。学习到的DOE和CFA分别在相机孔径和传感器平面上通过相位和幅度调制有效地将高光谱数据立方体编码到传感器上。随后,展开的网络从传感器信号中重构出高精度的高光谱图像。我们进行了大量的模拟来分析和验证所提出的方法在几种CFA模型和非理想成像条件下的性能。我们证明高斯模型对于参数化CFA像元的光谱透射函数是有效的,提供了较高的重建精度,并且相对容易实现。此外,我们表明,当与共同设计的衍射-折射光学器件最佳耦合时,学习到的CFA模式是有效的。我们评估了我们的方法对传感器噪声和潜在的不准确性在制造DOE和CFA的鲁棒性。我们的研究结果表明,与最先进的方法相比,我们的方法实现了卓越的重建质量,在空间和光谱细节恢复方面都表现出色,并保持了对现实噪声水平的鲁棒性。
{"title":"Snapshot Hyperspectral Imaging With Co-Designed Optics, Color Filter Array, and Unrolled Network","authors":"Ayoung Kim;Ugur Akpinar;Erdem Sahin;Atanas Gotchev","doi":"10.1109/OJSP.2025.3571675","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3571675","url":null,"abstract":"We propose a novel snapshot hyperspectral imaging method that incorporates co-designed optics, a color filter array (CFA), and an unrolled post-processing network through end-to-end learning. The camera optics consists of a fixed refractive lens and a diffractive optical element (DOE). The learned DOE and CFA efficiently encode the hyperspectral data cube on the sensor via phase and amplitude modulation at the camera aperture and sensor planes, respectively. Subsequently, the unrolled network reconstructs the hyperspectral images from the sensor signal with high accuracy. We conduct extensive simulations to analyze and validate the performance of the proposed method for several CFA models and in non-ideal imaging conditions. We demonstrate that the Gaussian model is effective for parameterizing the spectral transmission functions of CFA pixels, providing high reconstruction accuracy and being relatively easy to implement. Furthermore, we show that learned CFA patterns are effective when optimally coupled with co-designed diffractive-refractive optics. We evaluate the robustness of our method against sensor noise and potential inaccuracies in the fabrication of the DOE and CFA. Our results show that our method achieves superior reconstruction quality compared to state-of-the-art methods, excelling in both spatial and spectral detail recovery and maintaining robustness against realistic noise levels.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"599-607"},"PeriodicalIF":2.9,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11008739","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection 基于中心差分卷积的空间频率深度假检测小波视觉变换
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-20 DOI: 10.1109/OJSP.2025.3571679
Nour Eldin Alaa Badr;Jean-Christophe Nebel;Darrel Greenhill;Xing Liang
The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.
生成式人工智能的日益普及导致深度假内容的显著增加,迫切需要广义和可靠的深度假检测方法。由于现有的方法要么依赖于空间域特征,要么依赖于频率域特征,它们很难在看不见的数据集上进行泛化,尤其是那些有细微操作的数据集。为了解决这些挑战,设计了一种新颖的端到端小波中心差分卷积视觉变压器框架,以增强空频深度假检测。与以前的方法不同,该方法应用离散小波变换进行多级频率分解和中心差分卷积来捕获局部细粒度差异并关注纹理差异,同时还结合视觉变换进行全局上下文理解。频率-空间特征融合注意模块集成了这些特征,能够有效地检测假文物。此外,与早期的工作相比,引入了空间和频域的细微扰动以进一步提高泛化。综合跨数据集评估表明,在对低质量和高质量人脸图像进行训练时,WaViT-CDC的性能优于最先进的方法,在具有挑战性的高分辨率真实数据集(如Celeb-DF和WildDeepfake)上的平均性能提高了2.5%和4.5%。
{"title":"WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection","authors":"Nour Eldin Alaa Badr;Jean-Christophe Nebel;Darrel Greenhill;Xing Liang","doi":"10.1109/OJSP.2025.3571679","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3571679","url":null,"abstract":"The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"621-630"},"PeriodicalIF":2.9,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144299230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1