首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
Mask Optimization for Image Inpainting Using No-Reference Image Quality Assessment 使用无参考图像质量评估的图像绘制蒙版优化
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-05 DOI: 10.1109/OJSP.2025.3577089
Taiki Uchiyama;Mariko Isogawa
Image inpainting is a technique designed to remove unwanted regions from images and restore them. This technique is expected to be applied in various applications, including image editing, virtual reality (VR), mixed reality (MR), and augmented reality (AR). Typically, the inpainting process is based on missing regions predefined by user-applied masks. However, the specified areas may not always be ideal for inpainting, and the quality of the inpainting results varies depending on the annotated masked region. Therefore, this paper addresses the task of generating masks that improve inpainting results. To this end, we proposed a method that utilized No-Reference Image Quality Assessment (NR-IQA), which can score image quality without a reference image, to generate masked regions that maximize inpainting quality.
图像修复是一种技术,旨在从图像中删除不需要的区域,并恢复它们。这项技术有望应用于各种应用,包括图像编辑、虚拟现实(VR)、混合现实(MR)和增强现实(AR)。通常,绘制过程是基于用户应用掩码预定义的缺失区域。然而,指定的区域可能并不总是理想的补绘区域,并且补绘结果的质量取决于标注的遮罩区域。因此,本文解决了生成遮罩的任务,以提高喷漆效果。为此,我们提出了一种利用无参考图像质量评估(NR-IQA)的方法,该方法可以在没有参考图像的情况下对图像质量进行评分,以生成最大程度提高绘制质量的掩模区域。
{"title":"Mask Optimization for Image Inpainting Using No-Reference Image Quality Assessment","authors":"Taiki Uchiyama;Mariko Isogawa","doi":"10.1109/OJSP.2025.3577089","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3577089","url":null,"abstract":"Image inpainting is a technique designed to remove unwanted regions from images and restore them. This technique is expected to be applied in various applications, including image editing, virtual reality (VR), mixed reality (MR), and augmented reality (AR). Typically, the inpainting process is based on missing regions predefined by user-applied masks. However, the specified areas may not always be ideal for inpainting, and the quality of the inpainting results varies depending on the annotated masked region. Therefore, this paper addresses the task of <bold>generating masks that improve inpainting results</b>. To this end, we proposed a method that utilized No-Reference Image Quality Assessment (NR-IQA), which can score image quality without a reference image, to generate masked regions that maximize inpainting quality.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"856-864"},"PeriodicalIF":2.7,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11025170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-28 DOI: 10.1109/OJSP.2025.3564830
Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria
Multimodal medical imaging, which involves the simultaneous acquisition of different modalities, enhances diagnostic accuracy and provides comprehensive visualization of anatomy and physiology. However, this significantly increases data size, posing storage and transmission challenges. Standard image codecs fail to properly exploit cross-modality redundancies, limiting coding efficiency. In this paper, a novel approach is proposed to enhance the compression gain and to reduce the computational complexity of a lossless cross-modality coding scheme for multimodal image pairs. The scheme uses a deep learning-based approach with Image-to-Image translation based on a Generative Adversarial Network architecture to generate an estimated image of one modality from its cross-modal pair. Two different approaches for inter-modal prediction are considered: one using the original and the estimated images for the inter-prediction scheme and another considering a weighted sum of both images. Subsequently, a decider based on a Convolutional Neural Network is employed to estimate the best coding approach to be selected among the two alternatives, before the coding step. A novel loss function that considers the decision accuracy and the compression gain of the chosen prediction approach is applied to improve the decision-making task. The experimental results on PET-CT and PET-MRI datasets demonstrate that the proposed approach improves by 11.76% and 4.61% the compression efficiency when compared with the single modality intra-coding of the Versatile Video Coding. Additionally, this approach allows to reduce the computational complexity by almost half in comparison to selecting the most compression-efficient after testing both schemes.
多模态医学成像,包括同时获取不同的模态,提高了诊断的准确性,并提供了解剖学和生理学的全面可视化。然而,这大大增加了数据大小,带来了存储和传输方面的挑战。标准的图像编解码器不能很好地利用跨模态冗余,限制了编码效率。本文提出了一种新的方法来提高多模态图像对的无损交叉模态编码的压缩增益并降低其计算复杂度。该方案使用基于生成对抗网络架构的基于深度学习的图像到图像转换方法,从其跨模态对中生成一种模态的估计图像。考虑了两种不同的模式间预测方法:一种是使用原始图像和估计图像进行模式间预测,另一种是考虑两个图像的加权和。然后,在编码步骤之前,使用基于卷积神经网络的决策器来估计要在两个备选方案中选择的最佳编码方法。提出了一种新的损失函数,考虑了所选预测方法的决策精度和压缩增益,以改善决策任务。在PET-CT和PET-MRI数据集上的实验结果表明,与通用视频编码的单模态内编码相比,该方法的压缩效率分别提高了11.76%和4.61%。此外,与在测试两种方案后选择压缩效率最高的方案相比,这种方法可以将计算复杂度降低近一半。
{"title":"Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression","authors":"Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria","doi":"10.1109/OJSP.2025.3564830","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3564830","url":null,"abstract":"Multimodal medical imaging, which involves the simultaneous acquisition of different modalities, enhances diagnostic accuracy and provides comprehensive visualization of anatomy and physiology. However, this significantly increases data size, posing storage and transmission challenges. Standard image codecs fail to properly exploit cross-modality redundancies, limiting coding efficiency. In this paper, a novel approach is proposed to enhance the compression gain and to reduce the computational complexity of a lossless cross-modality coding scheme for multimodal image pairs. The scheme uses a deep learning-based approach with Image-to-Image translation based on a Generative Adversarial Network architecture to generate an estimated image of one modality from its cross-modal pair. Two different approaches for inter-modal prediction are considered: one using the original and the estimated images for the inter-prediction scheme and another considering a weighted sum of both images. Subsequently, a decider based on a Convolutional Neural Network is employed to estimate the best coding approach to be selected among the two alternatives, before the coding step. A novel loss function that considers the decision accuracy and the compression gain of the chosen prediction approach is applied to improve the decision-making task. The experimental results on PET-CT and PET-MRI datasets demonstrate that the proposed approach improves by 11.76% and 4.61% the compression efficiency when compared with the single modality intra-coding of the Versatile Video Coding. Additionally, this approach allows to reduce the computational complexity by almost half in comparison to selecting the most compression-efficient after testing both schemes.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"489-497"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-Adaptive Inference for State-of-the-Art Learned Video Compression
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-28 DOI: 10.1109/OJSP.2025.3564817
Ahmet Bilican;M. Akın Yılmaz;A. Murat Tekalp
While the BD-rate performance of recent learned video codec models in both low-delay and random-access modes exceed that of respective modes of traditional codecs on average over common benchmarks, the performance improvements for individual videos with complex/large motions is much smaller compared to scenes with simple motion. This is related to the inability of a learned encoder model to generalize to motion vector ranges that have not been seen in the training set, which causes loss of performance in both coding of flow fields as well as frame prediction and coding. As a remedy, we propose a generic (model-agnostic) framework to control the scale of motion vectors in a scene during inference (encoding) to approximately match the range of motion vectors in the test and training videos by adaptively downsampling frames. This results in down-scaled motion vectors enabling: i) better flow estimation; hence, frame prediction and ii) more efficient flow compression. We show that the proposed framework for content-adaptive inference improves the BD-rate performance of already state-of-the-art low-delay video codec DCVC-FM by up to 41% on individual videos without any model fine tuning. We present ablation studies to show measures of motion and scene complexity can be used to predict the effectiveness of the proposed framework.
虽然最近学习的视频编解码器模型在低延迟和随机访问模式下的bd速率性能在普通基准上平均超过传统编解码器的各自模式,但与简单运动的场景相比,具有复杂/大运动的单个视频的性能改进要小得多。这与学习到的编码器模型无法推广到训练集中没有看到的运动向量范围有关,这会导致流场编码以及帧预测和编码的性能损失。作为补救措施,我们提出了一个通用的(模型无关的)框架来控制推理(编码)过程中场景中运动向量的规模,通过自适应降采样帧来近似匹配测试和训练视频中的运动向量的范围。这将导致运动矢量的缩小,从而实现:1)更好的流量估计;因此,帧预测和ii)更有效的流压缩。我们表明,所提出的内容自适应推理框架在没有任何模型微调的情况下,将已经最先进的低延迟视频编解码器DCVC-FM在单个视频上的bd速率性能提高了41%。我们提出的消融研究表明,运动和场景复杂性的措施可以用来预测所提出的框架的有效性。
{"title":"Content-Adaptive Inference for State-of-the-Art Learned Video Compression","authors":"Ahmet Bilican;M. Akın Yılmaz;A. Murat Tekalp","doi":"10.1109/OJSP.2025.3564817","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3564817","url":null,"abstract":"While the BD-rate performance of recent learned video codec models in both low-delay and random-access modes exceed that of respective modes of traditional codecs on average over common benchmarks, the performance improvements for individual videos with complex/large motions is much smaller compared to scenes with simple motion. This is related to the inability of a learned encoder model to generalize to motion vector ranges that have not been seen in the training set, which causes loss of performance in both coding of flow fields as well as frame prediction and coding. As a remedy, we propose a generic (model-agnostic) framework to control the scale of motion vectors in a scene during inference (encoding) to approximately match the range of motion vectors in the test and training videos by adaptively downsampling frames. This results in down-scaled motion vectors enabling: i) better flow estimation; hence, frame prediction and ii) more efficient flow compression. We show that the proposed framework for content-adaptive inference improves the BD-rate performance of already state-of-the-art low-delay video codec DCVC-FM by up to 41% on individual videos without any model fine tuning. We present ablation studies to show measures of motion and scene complexity can be used to predict the effectiveness of the proposed framework.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"498-506"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978087","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Robustness of Self-Supervised Learning Features 自监督学习特征的对抗鲁棒性
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/OJSP.2025.3562797
Nicholas Mehlman;Shri Narayanan
As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.
随着深度学习模型的激增,人们对其可靠性和安全性的担忧也在增加。一个重要的挑战是理解对抗性扰动,它可以改变模型的预测,尽管量级很小。先前的研究已经提出,这种现象是由监督学习的基本缺陷造成的,通过这种缺陷,分类器利用任何更具预测性的输入特征,而不管这些特征是否对对抗性攻击具有鲁棒性。在本文中,我们在对比自监督学习方法的背景下考虑特征鲁棒性,这种方法近年来变得特别普遍。我们的研究结果表明,事实上,在自我监督过程中学习到的特征比在监督学习过程中产生的特征更能抵抗对抗性扰动。然而,我们也发现这些自监督特征表现出较差的类间解纠缠,限制了它们对整体分类器鲁棒性的贡献。
{"title":"Adversarial Robustness of Self-Supervised Learning Features","authors":"Nicholas Mehlman;Shri Narayanan","doi":"10.1109/OJSP.2025.3562797","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3562797","url":null,"abstract":"As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"468-477"},"PeriodicalIF":2.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971198","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Array Design for Angle of Arrival Estimation Using the Worst-Case Two-Target Cramér-Rao Bound 基于最坏情况双目标cram<s:1> - rao界的到达角估计阵列设计
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-07 DOI: 10.1109/OJSP.2025.3558686
Costas A. Kokke;Mario Coutino;Richard Heusdens;Geert Leus
Sparse array design is used to help reduce computational, hardware, and power requirements compared to uniform arrays while maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown target directions. To handle the unknown target directions when optimizing the Cramér-Rao bound, we propose to use the worst-case Cramér-Rao bound of two uncorrelated equal power sources with arbitrary angles. This new worst-case two-target Cramér-Rao bound metric has some resemblance to the peak sidelobe level metric which is commonly used in unknown multi-target scenarios. We cast the sensor selection problem for 3-D arrays using the worst-case two-target Cramér-Rao bound as a convex semi-definite program and obtain the binary selection by randomized rounding. We illustrate the proposed method through numerical examples, comparing it to solutions obtained by minimizing the single-target Cramér-Rao bound, minimizing the Cramér-Rao bound for known target angles, the concentric rectangular array and the boundary array. We show that our method selects a combination of edge and center elements, which contrasts with solutions obtained by minimizing the single-target Cramér-Rao bound. The proposed selections also exhibit lower peak sidelobe levels without the need for sidelobe level constraints.
与均匀阵列相比,稀疏阵列设计有助于减少计算、硬件和功耗需求,同时保持可接受的性能。虽然以前的稀疏感知采用最小化cram r- rao界,但它没有考虑多目标和未知目标方向。为了在优化cramsamr - rao界时处理未知的目标方向,我们提出使用任意角度的两个不相关相等电源的最坏情况cramsamr - rao界。这种新的最坏情况双目标cram r- rao界度量与通常用于未知多目标情况的峰值旁瓣电平度量有一定的相似之处。将最坏情况下的双目标cram - rao界作为凸半定规划,对三维阵列的传感器选择问题进行了求解,并通过随机四舍五入的方法得到了传感器的二值选择。通过数值算例对该方法进行了说明,并将其与单目标cram - rao界最小解、已知目标角的cram - rao界最小解、同心矩形阵列解和边界阵列解进行了比较。我们证明了我们的方法选择了边缘和中心元素的组合,这与最小化单目标cram r- rao界得到的解形成了对比。所提出的选择还表现出较低的峰值旁瓣电平,而不需要旁瓣电平约束。
{"title":"Array Design for Angle of Arrival Estimation Using the Worst-Case Two-Target Cramér-Rao Bound","authors":"Costas A. Kokke;Mario Coutino;Richard Heusdens;Geert Leus","doi":"10.1109/OJSP.2025.3558686","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3558686","url":null,"abstract":"Sparse array design is used to help reduce computational, hardware, and power requirements compared to uniform arrays while maintaining acceptable performance. Although minimizing the Cramér-Rao bound has been adopted previously for sparse sensing, it did not consider multiple targets and unknown target directions. To handle the unknown target directions when optimizing the Cramér-Rao bound, we propose to use the worst-case Cramér-Rao bound of two uncorrelated equal power sources with arbitrary angles. This new worst-case two-target Cramér-Rao bound metric has some resemblance to the peak sidelobe level metric which is commonly used in unknown multi-target scenarios. We cast the sensor selection problem for 3-D arrays using the worst-case two-target Cramér-Rao bound as a convex semi-definite program and obtain the binary selection by randomized rounding. We illustrate the proposed method through numerical examples, comparing it to solutions obtained by minimizing the single-target Cramér-Rao bound, minimizing the Cramér-Rao bound for known target angles, the concentric rectangular array and the boundary array. We show that our method selects a combination of edge and center elements, which contrasts with solutions obtained by minimizing the single-target Cramér-Rao bound. The proposed selections also exhibit lower peak sidelobe levels without the need for sidelobe level constraints.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"453-467"},"PeriodicalIF":2.9,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10955272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework 分散梯度下降的统一分析:一个收缩映射框架
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-02 DOI: 10.1109/OJSP.2025.3557332
Erik G. Larsson;Nicolò Michelusi
The decentralized gradient descent (DGD) algorithm, and its sibling, diffusion, are workhorses in decentralized machine learning, distributed inference and estimation, and multi-agent coordination. We propose a novel, principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies, using contraction mappings coupled with a result called the mean Hessian theorem (MHT). The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes. While these bounds are qualitatively similar to results found in the literature, our approach using contractions together with the MHT decouples the algorithm dynamics (how quickly the algorithm converges to its fixed point) from its asymptotic convergence properties (how far the fixed point is from the global optimum). This yields a simple, intuitive analysis that is accessible to a broader audience. Extensions are provided to multiple local gradient updates, time-varying step sizes, noisy gradients (stochastic DGD and diffusion), communication noise, and random topologies.
去中心化梯度下降(DGD)算法及其兄弟扩散算法是去中心化机器学习、分布式推理和估计以及多智能体协调的主要方法。我们提出了一个新的,原则性的框架,用于分析强凸,光滑目标和任意无向拓扑的DGD和扩散,使用收缩映射和称为平均Hessian定理(MHT)的结果。这些工具的使用产生了严格的收敛界限,无论是在无噪声和有噪声的制度。虽然这些边界在性质上与文献中发现的结果相似,但我们使用压缩和MHT的方法将算法动态(算法收敛到不动点的速度)与其渐近收敛性质(不动点离全局最优点有多远)解耦。这产生了一个简单、直观的分析,可供更广泛的受众使用。扩展提供了多个局部梯度更新,时变步长,噪声梯度(随机DGD和扩散),通信噪声和随机拓扑。
{"title":"Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework","authors":"Erik G. Larsson;Nicolò Michelusi","doi":"10.1109/OJSP.2025.3557332","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3557332","url":null,"abstract":"The decentralized gradient descent (DGD) algorithm, and its sibling, diffusion, are workhorses in decentralized machine learning, distributed inference and estimation, and multi-agent coordination. We propose a novel, principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies, using contraction mappings coupled with a result called the mean Hessian theorem (MHT). The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes. While these bounds are qualitatively similar to results found in the literature, our approach using contractions together with the MHT decouples the algorithm dynamics (how quickly the algorithm converges to its fixed point) from its asymptotic convergence properties (how far the fixed point is from the global optimum). This yields a simple, intuitive analysis that is accessible to a broader audience. Extensions are provided to multiple local gradient updates, time-varying step sizes, noisy gradients (stochastic DGD and diffusion), communication noise, and random topologies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"507-529"},"PeriodicalIF":2.9,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10947567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VAMP-Based Kalman Filtering Under Non-Gaussian Process Noise 非高斯过程噪声下基于vamp的卡尔曼滤波
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-02 DOI: 10.1109/OJSP.2025.3557271
Tiancheng Gao;Mohamed Akrout;Faouzi Bellili;Amine Mezghani
Estimating time-varying signals becomes particularly challenging in the face of non-Gaussian (e.g., sparse) and/or rapidly time-varying process noise. By building upon the recent progress in the approximate message passing (AMP) paradigm, this paper unifies the vector variant of AMP (i.e., VAMP) with the Kalman filter (KF) into a unified message passing framework. The new algorithm (coined VAMP-KF) does not restrict the process noise to a specific structure (e.g., same support over time), thereby accounting for non-Gaussian process noise sources that are uncorrelated both component-wise and over time. For the sake of theoretical performance prediction, we conduct a state evolution (SE) analysis of the proposed algorithm and show its consistency with the asymptotic empirical mean-squared error (MSE). Numerical results using sparse noise dynamics with different sparsity ratios demonstrate unambiguously the effectiveness of the proposed VAMP-KF algorithm and its superiority over state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.
在面对非高斯(例如,稀疏)和/或快速时变过程噪声时,估计时变信号变得特别具有挑战性。本文以近似消息传递(AMP)范式的最新进展为基础,将AMP的向量变体(即VAMP)与卡尔曼滤波器(KF)统一为统一的消息传递框架。新算法(称为VAMP-KF)不会将过程噪声限制到特定结构(例如,随着时间的推移,相同的支持),从而考虑到组件和时间不相关的非高斯过程噪声源。为了理论上的性能预测,我们对所提出的算法进行了状态演化(SE)分析,并证明了其与渐近经验均方误差(MSE)的一致性。使用不同稀疏度比的稀疏噪声动力学的数值结果明确地证明了所提出的VAMP-KF算法的有效性,并且在重建精度和计算复杂度方面优于现有算法。
{"title":"VAMP-Based Kalman Filtering Under Non-Gaussian Process Noise","authors":"Tiancheng Gao;Mohamed Akrout;Faouzi Bellili;Amine Mezghani","doi":"10.1109/OJSP.2025.3557271","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3557271","url":null,"abstract":"Estimating time-varying signals becomes particularly challenging in the face of non-Gaussian (e.g., sparse) and/or rapidly time-varying process noise. By building upon the recent progress in the approximate message passing (AMP) paradigm, this paper unifies the vector variant of AMP (i.e., VAMP) with the Kalman filter (KF) into a unified message passing framework. The new algorithm (coined VAMP-KF) does not restrict the process noise to a specific structure (e.g., same support over time), thereby accounting for non-Gaussian process noise sources that are uncorrelated both component-wise and over time. For the sake of theoretical performance prediction, we conduct a state evolution (SE) analysis of the proposed algorithm and show its consistency with the asymptotic empirical mean-squared error (MSE). Numerical results using sparse noise dynamics with different sparsity ratios demonstrate unambiguously the effectiveness of the proposed VAMP-KF algorithm and its superiority over state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"434-452"},"PeriodicalIF":2.9,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10947573","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images 基于空间和局部上下文增强的Swin变压器增强遥感图像语义分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3573202
Rong-Xing Ding;Yi-Han Xu;Gang Yu;Wen Zhou;Ding Zhou
Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.
遥感图像的语义分割在作物覆盖和类型分析、环境监测等领域有着广泛的应用。在遥感图像的语义分割中,由于遥感图像的特殊性,不仅需要局部上下文信息,全局上下文信息也在其中发挥着重要作用。受Swin Transformer强大的全局建模能力的启发,我们提出了LSENet网络,它遵循UNet网络的编码器-解码器架构。在编码阶段,我们提出了空间增强模块(SEM),通过对空间信息进行编码,帮助Swin Transformer进一步增强特征提取。在解码阶段,我们提出了局部增强模块(LEM),将其嵌入到Swin Transformer中,以改进Swin Transformer,帮助网络获得更多的局部语义信息,从而更准确地对像素进行分类,特别是在边缘区域,LEM的加入可以获得更平滑的边缘。在Vaihingen和Potsdam数据集上的实验结果表明了该方法的有效性。具体来说,波茨坦数据集的mIoU指标为78.58%,Vaihingen数据集为72.59%,OpenEarthMap数据集为64.49%。
{"title":"Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images","authors":"Rong-Xing Ding;Yi-Han Xu;Gang Yu;Wen Zhou;Ding Zhou","doi":"10.1109/OJSP.2025.3573202","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3573202","url":null,"abstract":"Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"608-620"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11011931","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144299229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Streaming LiDAR Scene Flow Estimation 流激光雷达场景流估计
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3572759
Mazen Abdelfattah;Z. Jane Wang;Rabab Ward
Safe navigation of autonomous vehicles requires accurate and rapid understanding of their dynamic 3D environment. Scene flow estimation models this dynamic environment by predicting point motion between sequential point cloud scans, and is crucial for safe navigation. Existing state-of-the-art scene flow estimation methods, based on test-time optimization, achieve high accuracy but suffer from significant latency, limiting their applicability in real-time onboard systems. This latency stems from both the iterative test-time optimization process and the inherent delay of waiting for the LiDAR to acquire a complete $360^circ$ scan. To overcome this bottleneck, we introduce a novel streaming scene flow framework leveraging the sequential nature of LiDAR slice acquisition, demonstrating a dramatic reduction in end-to-end latency. Instead of waiting for the full $360^circ$ scan, our method immediately estimates scene flow using each LiDAR slice once it is captured. To mitigate the reduced context of individual slices, we propose a novel contextual augmentation technique that expands the target slice by a small angular margin, incorporating crucial slice boundary information. Furthermore, to enhance test-time optimization within our streaming framework, our novel initialization scheme ’warm-starts' the current optimization using optimized parameters from the preceding slice. This achieves substantial speedups while maintaining, and in some cases surpassing, full-scan accuracy. We rigorously evaluate our approach on the challenging Waymo and Argoverse datasets, demonstrating significant latency reduction without compromising scene flow quality. This work paves the way for deploying high-accuracy, real-time scene flow algorithms in autonomous driving, advancing the field towards more responsive and safer autonomous systems.
自动驾驶汽车的安全导航需要准确快速地了解其动态3D环境。场景流估计通过预测连续点云扫描之间的点运动来模拟这种动态环境,对安全导航至关重要。现有的基于测试时间优化的场景流估计方法具有较高的精度,但存在较大的延迟,限制了其在实时车载系统中的适用性。这种延迟源于迭代测试时间优化过程和等待激光雷达获得完整的360^circ$扫描的固有延迟。为了克服这一瓶颈,我们引入了一种新的流场景流框架,利用激光雷达切片采集的顺序特性,大大减少了端到端延迟。我们的方法不是等待完整的$360^circ$扫描,而是在捕获每个LiDAR切片后立即使用它来估计场景流量。为了减轻单个切片上下文的减少,我们提出了一种新的上下文增强技术,该技术将目标切片扩展一个小的角度边缘,并结合关键的切片边界信息。此外,为了增强流框架内的测试时间优化,我们的新初始化方案使用前片的优化参数“热启动”当前优化。这在保持(在某些情况下甚至超过)全扫描精度的同时实现了显著的加速。我们在具有挑战性的Waymo和Argoverse数据集上严格评估了我们的方法,证明了在不影响场景流质量的情况下显著降低延迟。这项工作为在自动驾驶中部署高精度、实时场景流算法铺平了道路,推动了该领域向更灵敏、更安全的自动驾驶系统发展。
{"title":"Streaming LiDAR Scene Flow Estimation","authors":"Mazen Abdelfattah;Z. Jane Wang;Rabab Ward","doi":"10.1109/OJSP.2025.3572759","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3572759","url":null,"abstract":"Safe navigation of autonomous vehicles requires accurate and rapid understanding of their dynamic 3D environment. Scene flow estimation models this dynamic environment by predicting point motion between sequential point cloud scans, and is crucial for safe navigation. Existing state-of-the-art scene flow estimation methods, based on test-time optimization, achieve high accuracy but suffer from significant latency, limiting their applicability in real-time onboard systems. This latency stems from both the iterative test-time optimization process and the inherent delay of waiting for the LiDAR to acquire a complete <inline-formula><tex-math>$360^circ$</tex-math></inline-formula> scan. To overcome this bottleneck, we introduce a novel <italic>streaming</i> scene flow framework leveraging the sequential nature of LiDAR slice acquisition, demonstrating a dramatic reduction in end-to-end latency. Instead of waiting for the full <inline-formula><tex-math>$360^circ$</tex-math></inline-formula> scan, our method immediately estimates scene flow using each LiDAR slice once it is captured. To mitigate the reduced context of individual slices, we propose a novel contextual augmentation technique that expands the target slice by a small angular margin, incorporating crucial slice boundary information. Furthermore, to enhance test-time optimization within our streaming framework, our novel initialization scheme ’warm-starts' the current optimization using optimized parameters from the preceding slice. This achieves substantial speedups while maintaining, and in some cases surpassing, full-scan accuracy. We rigorously evaluate our approach on the challenging Waymo and Argoverse datasets, demonstrating significant latency reduction without compromising scene flow quality. This work paves the way for deploying high-accuracy, real-time scene flow algorithms in autonomous driving, advancing the field towards more responsive and safer autonomous systems.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"590-598"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11012710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Appearance Estimation and Image Segmentation via Tensor Factorization 基于张量分解的外观估计和图像分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-23 DOI: 10.1109/OJSP.2025.3572820
Jeova Farias Sales Rocha Neto
Image Segmentation is one of the core tasks in Computer Vision, and solving it often depends on modeling the image appearance data via the color distributions of each of its constituent regions. Whereas many segmentation algorithms handle the appearance model dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high-order color statistics from the image as an input to a tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multi-region images and automatically output the regions' proportions without prior user interaction, overcoming the drawbacks of a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.
图像分割是计算机视觉的核心任务之一,解决图像分割问题往往依赖于通过图像各组成区域的颜色分布对图像外观数据进行建模。虽然许多分割算法使用交替或隐式方法处理外观模型依赖,但我们在这里提出了一种新的方法,可以直接从图像中估计它们,而不需要关于底层分割的先验信息。我们的方法使用来自图像的局部高阶颜色统计作为输入到基于张量分解的潜在变量模型估计器。该方法能够估计多区域图像中的模型,并在没有事先用户交互的情况下自动输出区域的比例,克服了先前尝试解决该问题的缺点。我们还在许多具有挑战性的合成和真实成像场景中展示了我们提出的方法的性能,并表明它导致了一种有效的分割算法。
{"title":"Appearance Estimation and Image Segmentation via Tensor Factorization","authors":"Jeova Farias Sales Rocha Neto","doi":"10.1109/OJSP.2025.3572820","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3572820","url":null,"abstract":"Image Segmentation is one of the core tasks in Computer Vision, and solving it often depends on modeling the image appearance data via the color distributions of each of its constituent regions. Whereas many segmentation algorithms handle the appearance model dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high-order color statistics from the image as an input to a tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multi-region images and automatically output the regions' proportions without prior user interaction, overcoming the drawbacks of a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"581-589"},"PeriodicalIF":2.9,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11011680","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1