首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
Cross-Dataset Head-Related Transfer Function Harmonization Based on Perceptually Relevant Loss Function 基于感知相关损失函数的跨数据集头部相关传递函数协调
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/OJSP.2025.3590248
Jiale Zhao;Dingding Yao;Junfeng Li
Head-Related Transfer Functions (HRTFs) play a vital role in binaural spatial audio rendering. With the release of numerous HRTF datasets in recent years, abundant data has become available to support HRTF-related research based on deep learning. However, measurement discrepancies across different datasets introduce significant variations in the data and directly merging these datasets may lead to systematic biases. The recent Listener Acoustic Personalization Challenge 2024 (European Signal Processing Conference) dealt with this issue, with the task of harmonizing different datasets to achieve lower classification accuracy while meeting thresholds over various localization metrics. To mitigate cross-dataset differences, this paper proposes a neural network-based HRTF harmonization approach aimed at eliminating dataset-specific properties embedded in the original measurements. The proposed method utilizes a perceptually relevant loss function, which jointly constrains multiple objectives, including interaural level differences, auditory-filter excitation patterns, and classification accuracy. Experimental results based on eight datasets demonstrate that the proposed approach can effectively minimize distributional disparities between datasets while mostly preserving localization performance. The classification accuracy for harmonized HRTFs between different datasets is reduced to as low as 31%, indicating a significant reduction in cross-dataset discrepancies. The proposed method ranked first in this challenge, which validates its effectiveness.
头部相关传递函数(hrtf)在双耳空间音频渲染中起着至关重要的作用。近年来,随着大量HRTF数据集的发布,为基于深度学习的HRTF相关研究提供了丰富的数据支持。然而,不同数据集之间的测量差异会导致数据的显著变化,直接合并这些数据集可能会导致系统偏差。最近的听众声学个性化挑战2024(欧洲信号处理会议)处理了这个问题,其任务是协调不同的数据集,以达到较低的分类精度,同时满足各种定位指标的阈值。为了减轻跨数据集的差异,本文提出了一种基于神经网络的HRTF协调方法,旨在消除嵌入在原始测量中的数据集特定属性。该方法利用感知相关损失函数,共同约束多个目标,包括耳间电平差异、听觉滤波激励模式和分类精度。基于8个数据集的实验结果表明,该方法可以有效地减少数据集之间的分布差异,同时基本保持定位性能。不同数据集之间协调hrtf的分类准确率降至31%,表明跨数据集差异显著降低。该方法在本次挑战中排名第一,验证了其有效性。
{"title":"Cross-Dataset Head-Related Transfer Function Harmonization Based on Perceptually Relevant Loss Function","authors":"Jiale Zhao;Dingding Yao;Junfeng Li","doi":"10.1109/OJSP.2025.3590248","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3590248","url":null,"abstract":"Head-Related Transfer Functions (HRTFs) play a vital role in binaural spatial audio rendering. With the release of numerous HRTF datasets in recent years, abundant data has become available to support HRTF-related research based on deep learning. However, measurement discrepancies across different datasets introduce significant variations in the data and directly merging these datasets may lead to systematic biases. The recent Listener Acoustic Personalization Challenge 2024 (European Signal Processing Conference) dealt with this issue, with the task of harmonizing different datasets to achieve lower classification accuracy while meeting thresholds over various localization metrics. To mitigate cross-dataset differences, this paper proposes a neural network-based HRTF harmonization approach aimed at eliminating dataset-specific properties embedded in the original measurements. The proposed method utilizes a perceptually relevant loss function, which jointly constrains multiple objectives, including interaural level differences, auditory-filter excitation patterns, and classification accuracy. Experimental results based on eight datasets demonstrate that the proposed approach can effectively minimize distributional disparities between datasets while mostly preserving localization performance. The classification accuracy for harmonized HRTFs between different datasets is reduced to as low as 31%, indicating a significant reduction in cross-dataset discrepancies. The proposed method ranked first in this challenge, which validates its effectiveness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"865-875"},"PeriodicalIF":2.7,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11082560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144739816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tell Me What You See: Text-Guided Real-World Image Denoising 告诉我你看到了什么:文本引导的真实世界图像去噪
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-14 DOI: 10.1109/OJSP.2025.3588715
Erez Yosef;Raja Giryes
Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.
基于噪声传感器测量的图像重建具有挑战性,已经提出了许多方法。然而,大多数方法都集中在学习鲁棒自然图像先验,同时对场景的噪声统计进行建模。在极弱的光照条件下,这些方法往往是不够的。还需要额外的信息,例如多次捕获,或者此处建议的场景描述。作为替代方案,我们建议使用基于文本的场景描述作为额外的先验,这是摄影师可以轻松提供的。受文本引导扩散模型在图像生成中显著成功的启发,我们表明,添加图像标题信息显著改善了合成图像和真实图像的图像去噪和重建。所有代码和数据将在出版后公开提供。
{"title":"Tell Me What You See: Text-Guided Real-World Image Denoising","authors":"Erez Yosef;Raja Giryes","doi":"10.1109/OJSP.2025.3588715","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588715","url":null,"abstract":"Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"890-899"},"PeriodicalIF":2.7,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078899","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Graph Structures With Autoregressive Graph Signal Models 用自回归图信号模型学习图结构
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-11 DOI: 10.1109/OJSP.2025.3588447
Kyle Donoghue;Ashkan Ashrafi
This paper presents a novel approach to graph learning, GL-AR, which leverages estimated autoregressive coefficients to recover undirected graph structures from time-series graph signals with propagation delay. GL-AR can discern graph structures where propagation between vertices is delayed, mirroring the dynamics of many real-world systems. This is achieved by utilizing the autoregressive coefficients of time-series graph signals in GL-AR’s learning algorithm. Existing graph learning techniques typically minimize the smoothness of a graph signal on a recovered graph structure to learn instantaneous relationships. GL-AR extends this approach by showing that minimizing smoothness with autoregressive coefficients can additionally recover relationships with propagation delay. The efficacy of GL-AR is demonstrated through applications to both synthetic and real-world datasets. Specifically, this work introduces the Graph-Tensor Method, a novel technique for generating synthetic time-series graph signals that represent edges as transfer functions. This method, along with real-world data from the National Climatic Data Center, is used to evaluate GL-AR’s performance in recovering undirected graph structures. Results indicate that GL-AR’s use of autoregressive coefficients enables it to outperform state-of-the-art graph learning techniques in scenarios with nonzero propagation delays. Furthermore, GL-AR’s performance is optimized by a new automated parameter selection algorithm, which eliminates the need for computationally intensive trial-and-error methods.
本文提出了一种新的图学习方法GL-AR,它利用估计的自回归系数从具有传播延迟的时间序列图信号中恢复无向图结构。GL-AR可以识别顶点之间传播延迟的图形结构,反映许多现实世界系统的动态。这是通过利用GL-AR学习算法中的时间序列图信号的自回归系数来实现的。现有的图学习技术通常会最小化图信号在恢复图结构上的平滑度,以学习瞬时关系。GL-AR扩展了这种方法,表明最小化自回归系数的平滑性可以额外恢复与传播延迟的关系。GL-AR的有效性通过对合成数据集和实际数据集的应用得到了证明。具体来说,这项工作介绍了图张量方法,这是一种生成合成时间序列图信号的新技术,将边缘表示为传递函数。该方法与来自国家气候数据中心的实际数据一起用于评估GL-AR在恢复无向图结构方面的性能。结果表明,GL-AR使用自回归系数使其在具有非零传播延迟的情况下优于最先进的图学习技术。此外,GL-AR的性能通过一种新的自动参数选择算法进行优化,从而消除了对计算密集型试错方法的需要。
{"title":"Learning Graph Structures With Autoregressive Graph Signal Models","authors":"Kyle Donoghue;Ashkan Ashrafi","doi":"10.1109/OJSP.2025.3588447","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588447","url":null,"abstract":"This paper presents a novel approach to graph learning, GL-AR, which leverages estimated autoregressive coefficients to recover undirected graph structures from time-series graph signals with propagation delay. GL-AR can discern graph structures where propagation between vertices is delayed, mirroring the dynamics of many real-world systems. This is achieved by utilizing the autoregressive coefficients of time-series graph signals in GL-AR’s learning algorithm. Existing graph learning techniques typically minimize the smoothness of a graph signal on a recovered graph structure to learn instantaneous relationships. GL-AR extends this approach by showing that minimizing smoothness with autoregressive coefficients can additionally recover relationships with propagation delay. The efficacy of GL-AR is demonstrated through applications to both synthetic and real-world datasets. Specifically, this work introduces the Graph-Tensor Method, a novel technique for generating synthetic time-series graph signals that represent edges as transfer functions. This method, along with real-world data from the National Climatic Data Center, is used to evaluate GL-AR’s performance in recovering undirected graph structures. Results indicate that GL-AR’s use of autoregressive coefficients enables it to outperform state-of-the-art graph learning techniques in scenarios with nonzero propagation delays. Furthermore, GL-AR’s performance is optimized by a new automated parameter selection algorithm, which eliminates the need for computationally intensive trial-and-error methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"838-855"},"PeriodicalIF":2.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Factor Graph Approach to Variational Sparse Gaussian Processes 变分稀疏高斯过程的因子图方法
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-02 DOI: 10.1109/OJSP.2025.3585440
Hoang Minh Huu Nguyen;İsmaıl Şenöz;Bert De Vries
A Variational Sparse Gaussian Process (VSGP) is a sophisticated nonparametric probabilistic model that has gained significant popularity since its inception. The VSGP model is often employed as a component of larger models or in a modified form across numerous applications. However, re-deriving the update equations for inference in these variations is technically challenging, which hinders broader adoption. In a separate line of research, message passing-based inference in factor graphs has emerged as an efficient framework for automated Bayesian inference. Despite its advantages, message passing techniques have not yet been applied to VSGP-based models due to the lack of a suitable representation for VSGP models in factor graphs. To address this limitation, we introduce a Sparse Gaussian Process (SGP) node within a Forney-style factor graph (FFG). We derive variational message passing update rules for the SGP node, enabling automated and efficient inference for VSGP-based models. We validate the update rules and illustrate the benefits of the SGP node through experiments in various Gaussian Process applications.
变分稀疏高斯过程(VSGP)是一种复杂的非参数概率模型,自诞生以来就得到了广泛的应用。VSGP模型经常被用作大型模型的组件,或者以经过修改的形式跨多个应用程序使用。然而,在这些变化中重新推导更新方程在技术上是具有挑战性的,这阻碍了更广泛的采用。在单独的研究中,因子图中基于消息传递的推理已经成为自动贝叶斯推理的有效框架。尽管有其优点,消息传递技术还没有应用到基于VSGP的模型中,因为在因子图中缺乏合适的VSGP模型表示。为了解决这一限制,我们在forney风格的因子图(FFG)中引入了稀疏高斯过程(SGP)节点。我们推导了SGP节点的变分消息传递更新规则,实现了基于vsgp的模型的自动高效推理。我们通过各种高斯过程应用的实验验证了更新规则,并说明了SGP节点的优点。
{"title":"A Factor Graph Approach to Variational Sparse Gaussian Processes","authors":"Hoang Minh Huu Nguyen;İsmaıl Şenöz;Bert De Vries","doi":"10.1109/OJSP.2025.3585440","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3585440","url":null,"abstract":"A Variational Sparse Gaussian Process (VSGP) is a sophisticated nonparametric probabilistic model that has gained significant popularity since its inception. The VSGP model is often employed as a component of larger models or in a modified form across numerous applications. However, re-deriving the update equations for inference in these variations is technically challenging, which hinders broader adoption. In a separate line of research, message passing-based inference in factor graphs has emerged as an efficient framework for automated Bayesian inference. Despite its advantages, message passing techniques have not yet been applied to VSGP-based models due to the lack of a suitable representation for VSGP models in factor graphs. To address this limitation, we introduce a Sparse Gaussian Process (SGP) node within a Forney-style factor graph (FFG). We derive variational message passing update rules for the SGP node, enabling automated and efficient inference for VSGP-based models. We validate the update rules and illustrate the benefits of the SGP node through experiments in various Gaussian Process applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"815-837"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11063321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Predictive Control Algorithm for Video Coding and Uplink Delivery in Delay-Critical Applications 延迟关键应用中视频编码和上行传输的模型预测控制算法
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-30 DOI: 10.1109/OJSP.2025.3584672
Mourad Aklouf;Frédéric Dufaux;Michel Kieffer;Marc Lény
Emerging applications such as remote car driving, drone control, or distant mobile robot operation impose a very tight constraint on the delay between the acquisition of a video frame by a camera embedded in the operated device and its display at the remote controller. This paper introduces a new frame-level video encoder rate control technique for ultra-low-latency video coding and delivery. A Model Predictive Control approach, exploiting the buffer level at the transmitter and an estimate of the transmission rate, is used to determine the target encoding rate of each video frame to adapt with minimum delay to sudden variations of the transmission channel characteristics. Then, an $R-(QP,D)$ model of the rate $R$ of the current frame to be encoded as a function of its quantization parameter (QP) and of the distortion $D$ of the reference frame is used to get the QP matching the target rate. This QP is then fed to the video coder. The proposed approach is compared to reference algorithms, namely PANDA, FESTIVE, BBA, and BOLA, some of which have been adapted to the considered server-driven low-latency coding and transmission scenario. Simulation results based on 4G bandwidth traces show that the proposed algorithm outperforms the others at different glass-to-glass delay constraints, considering several video quality metrics.
诸如远程汽车驾驶、无人机控制或远程移动机器人操作等新兴应用,对被操作设备中嵌入的摄像头获取视频帧与遥控器显示视频帧之间的延迟施加了非常严格的限制。本文介绍了一种新的帧级视频编码器速率控制技术,用于超低延迟视频编码和传输。模型预测控制方法利用发射机的缓冲电平和估计的传输速率来确定每个视频帧的目标编码速率,以最小的延迟适应传输信道特性的突然变化。然后,将当前帧的速率$R$编码为其量化参数(QP)和参考帧的失真$D$的函数$R-(QP,D)$模型,得到与目标速率匹配的QP。然后将该QP馈送到视频编码器。将该方法与参考算法进行了比较,即PANDA,喜庆,BBA和BOLA,其中一些算法已经适应了所考虑的服务器驱动的低延迟编码和传输场景。基于4G带宽跟踪的仿真结果表明,在考虑多个视频质量指标的情况下,该算法在不同的玻璃到玻璃延迟约束下优于其他算法。
{"title":"Model Predictive Control Algorithm for Video Coding and Uplink Delivery in Delay-Critical Applications","authors":"Mourad Aklouf;Frédéric Dufaux;Michel Kieffer;Marc Lény","doi":"10.1109/OJSP.2025.3584672","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3584672","url":null,"abstract":"Emerging applications such as remote car driving, drone control, or distant mobile robot operation impose a very tight constraint on the delay between the acquisition of a video frame by a camera embedded in the operated device and its display at the remote controller. This paper introduces a new frame-level video encoder rate control technique for ultra-low-latency video coding and delivery. A Model Predictive Control approach, exploiting the buffer level at the transmitter and an estimate of the transmission rate, is used to determine the target encoding rate of each video frame to adapt with minimum delay to sudden variations of the transmission channel characteristics. Then, an <inline-formula><tex-math>$R-(QP,D)$</tex-math></inline-formula> model of the rate <inline-formula><tex-math>$R$</tex-math></inline-formula> of the current frame to be encoded as a function of its quantization parameter (QP) and of the distortion <inline-formula><tex-math>$D$</tex-math></inline-formula> of the reference frame is used to get the QP matching the target rate. This QP is then fed to the video coder. The proposed approach is compared to reference algorithms, namely PANDA, FESTIVE, BBA, and BOLA, some of which have been adapted to the considered server-driven low-latency coding and transmission scenario. Simulation results based on 4G bandwidth traces show that the proposed algorithm outperforms the others at different glass-to-glass delay constraints, considering several video quality metrics.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"876-889"},"PeriodicalIF":2.7,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059858","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Cold Diffusion for the Decomposition of Identically Distributed Superimposed Images 利用冷扩散对同分布叠加图像进行分解
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-27 DOI: 10.1109/OJSP.2025.3583963
Helena Montenegro;Jaime S. Cardoso
With the growing adoption of Deep Learning for imaging tasks in biometrics and healthcare, it becomes increasingly important to ensure privacy when using and sharing images of people. Several works enable privacy-preserving image sharing by anonymizing the images so that the corresponding individuals are no longer recognizable. Most works average images or their embeddings as an anonymization technique, relying on the assumption that the average operation is irreversible. Recently, cold diffusion models, based on the popular denoising diffusion probabilistic models, have succeeded in reversing deterministic transformations on images. In this work, we leverage cold diffusion to decompose superimposed images, empirically demonstrating that it is possible to obtain two or more identically-distributed images given their average. We propose novel sampling strategies for this task and show their efficacy on three datasets. Our findings highlight the risks of averaging images as an anonymization technique and argue for the use of alternative anonymization strategies.
随着深度学习越来越多地用于生物识别和医疗保健领域的成像任务,在使用和共享人员图像时确保隐私变得越来越重要。一些作品通过匿名化图像使相应的个人不再被识别,从而实现保护隐私的图像共享。大多数作品平均图像或其嵌入作为一种匿名化技术,依赖于平均操作是不可逆的假设。近年来,冷扩散模型在流行的去噪扩散概率模型的基础上,成功地对图像进行了可逆的确定性变换。在这项工作中,我们利用冷扩散来分解叠加的图像,经验证明,在给定其平均值的情况下,有可能获得两个或多个相同分布的图像。我们提出了新的采样策略,并在三个数据集上展示了它们的有效性。我们的研究结果强调了平均图像作为匿名化技术的风险,并主张使用替代的匿名化策略。
{"title":"Leveraging Cold Diffusion for the Decomposition of Identically Distributed Superimposed Images","authors":"Helena Montenegro;Jaime S. Cardoso","doi":"10.1109/OJSP.2025.3583963","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3583963","url":null,"abstract":"With the growing adoption of Deep Learning for imaging tasks in biometrics and healthcare, it becomes increasingly important to ensure privacy when using and sharing images of people. Several works enable privacy-preserving image sharing by anonymizing the images so that the corresponding individuals are no longer recognizable. Most works average images or their embeddings as an anonymization technique, relying on the assumption that the average operation is irreversible. Recently, cold diffusion models, based on the popular denoising diffusion probabilistic models, have succeeded in reversing deterministic transformations on images. In this work, we leverage cold diffusion to decompose superimposed images, empirically demonstrating that it is possible to obtain two or more identically-distributed images given their average. We propose novel sampling strategies for this task and show their efficacy on three datasets. Our findings highlight the risks of averaging images as an anonymization technique and argue for the use of alternative anonymization strategies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"784-794"},"PeriodicalIF":2.9,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11054277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tiny-VPS: Tiny Video Panoptic Segmentation Standing on the Shoulder of Giant-VPS Tiny- vps:站在Giant-VPS肩膀上的微型视频全景分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-20 DOI: 10.1109/OJSP.2025.3581840
Qingfeng Liu;Mostafa El-Khamy;Kee-Bong Song
Video Panoptic Segmentation (VPS) is the most challenging video segmentation task, as it requires accurate labeling of every pixel in each frame, as well as identifying the multiple instances and tracking them across frames. In this paper, we explore state-of-the-art solutions for VPS at both the giant model regime for offline or server processing and the tiny model regime for online or edge computing. We designed Giant-VPS which achieved the first place solution in the 2024 Pixel Level Video Understanding in the Wild (PVUW) challenge. Our Giant-VPS builds on top of MinVIS and deploys the DINOv2-giant vision foundation model with a carefully designed ViT (Vision Transformer) adapter. For mobile and edge devices, we designed the Tiny-VPS model and show that our novel ViT-adapter distillation from the Giant-VPS model can further improve the accuracy of Tiny-VPS. Our Tiny-VPS is the first, in the sub-20 GFLOPS regime, to achieve competitive accuracy on VPS and VSS (Video Semantic Segmentation) benchmarks.
视频全光学分割(VPS)是最具挑战性的视频分割任务,因为它需要准确标记每帧中的每个像素,以及识别多个实例并跨帧跟踪它们。在本文中,我们探索了最先进的VPS解决方案,包括用于离线或服务器处理的大型模型体系和用于在线或边缘计算的小型模型体系。我们设计的Giant-VPS在2024年像素级野外视频理解(PVUW)挑战赛中获得了第一名的解决方案。我们的Giant-VPS构建在MinVIS之上,并使用精心设计的ViT(视觉变压器)适配器部署DINOv2-giant视觉基础模型。对于移动和边缘设备,我们设计了Tiny-VPS模型,并表明我们从Giant-VPS模型中提取的新型vitv适配器可以进一步提高Tiny-VPS的精度。我们的Tiny-VPS是第一个在低于20 GFLOPS的情况下,在VPS和VSS(视频语义分割)基准上达到具有竞争力的准确性的。
{"title":"Tiny-VPS: Tiny Video Panoptic Segmentation Standing on the Shoulder of Giant-VPS","authors":"Qingfeng Liu;Mostafa El-Khamy;Kee-Bong Song","doi":"10.1109/OJSP.2025.3581840","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3581840","url":null,"abstract":"Video Panoptic Segmentation (VPS) is the most challenging video segmentation task, as it requires accurate labeling of every pixel in each frame, as well as identifying the multiple instances and tracking them across frames. In this paper, we explore state-of-the-art solutions for VPS at both the giant model regime for offline or server processing and the tiny model regime for online or edge computing. We designed Giant-VPS which achieved the first place solution in the 2024 Pixel Level Video Understanding in the Wild (PVUW) challenge. Our Giant-VPS builds on top of MinVIS and deploys the DINOv2-giant vision foundation model with a carefully designed ViT (Vision Transformer) adapter. For mobile and edge devices, we designed the Tiny-VPS model and show that our novel ViT-adapter distillation from the Giant-VPS model can further improve the accuracy of Tiny-VPS. Our Tiny-VPS is the first, in the sub-20 GFLOPS regime, to achieve competitive accuracy on VPS and VSS (Video Semantic Segmentation) benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"803-814"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144623936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biorthogonal Lattice Tunable Wavelet Units and Their Implementation in Convolutional Neural Networks for Computer Vision Problems 双正交格子可调小波单元及其在卷积神经网络中的实现
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-18 DOI: 10.1109/OJSP.2025.3580967
An D. Le;Shiwei Jin;Sungbal Seo;You-Suk Bae;Truong Q. Nguyen
This work introduces a universal wavelet unit constructed with a biorthogonal lattice structure which is a novel tunable wavelet unit to enhance image classification and anomaly detection in convolutional neural networks by reducing information loss during pooling. The unit employs a biorthogonal lattice structure to modify convolution, pooling, and down-sampling operations. Implemented in residual neural networks with 18 layers, it improved detection accuracy on CIFAR10 (by 2.67% ), ImageNet1K (by 1.85% ), and the Describable Textures dataset (by 11.81% ), showcasing its advantages in detecting detailed features. Similar gains are achieved in the implementations for residual neural networks with 34 layers and 50 layers. For anomaly detection on the MVTec Anomaly Detection and TUKPCB datasets, the proposed method achieved a competitive performance and better anomaly localization.
本文提出了一种基于双正交晶格结构的通用小波单元,它是一种新型的可调小波单元,通过减少池化过程中的信息丢失来增强卷积神经网络的图像分类和异常检测能力。该单元采用双正交晶格结构来修改卷积、池化和下采样操作。在18层残差神经网络中实现,在CIFAR10、ImageNet1K和descriable Textures数据集上的检测准确率分别提高了2.67%、1.85%和11.81%,显示了其在细节特征检测方面的优势。在34层和50层残差神经网络的实现中也获得了类似的增益。对于MVTec异常检测和TUKPCB数据集的异常检测,该方法取得了较好的性能和较好的异常定位效果。
{"title":"Biorthogonal Lattice Tunable Wavelet Units and Their Implementation in Convolutional Neural Networks for Computer Vision Problems","authors":"An D. Le;Shiwei Jin;Sungbal Seo;You-Suk Bae;Truong Q. Nguyen","doi":"10.1109/OJSP.2025.3580967","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3580967","url":null,"abstract":"This work introduces a universal wavelet unit constructed with a biorthogonal lattice structure which is a novel tunable wavelet unit to enhance image classification and anomaly detection in convolutional neural networks by reducing information loss during pooling. The unit employs a biorthogonal lattice structure to modify convolution, pooling, and down-sampling operations. Implemented in residual neural networks with 18 layers, it improved detection accuracy on CIFAR10 (by 2.67% ), ImageNet1K (by 1.85% ), and the Describable Textures dataset (by 11.81% ), showcasing its advantages in detecting detailed features. Similar gains are achieved in the implementations for residual neural networks with 34 layers and 50 layers. For anomaly detection on the MVTec Anomaly Detection and TUKPCB datasets, the proposed method achieved a competitive performance and better anomaly localization.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"768-783"},"PeriodicalIF":2.9,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11039659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-Scene Calibration of Poisson Noise Parameters for Phase Image Recovery 相位图像恢复泊松噪声参数的现场标定
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-13 DOI: 10.1109/OJSP.2025.3579650
Achour Idoughi;Sreelakshmi Sreeharan;Chen Zhang;Joseph Raffoul;Hui Wang;Keigo Hirakawa
In sensor metrology, noise parameters governing the stochastic nature of photon detectors play critical role in characterizing the aleatoric uncertainty of computational imaging systems such as indirect time-of-flight cameras, structured light imaging, and division-of-time polarimetric imaging. Standard calibration procedures exists for extracting the noise parameters using calibration targets, but they are inconvenient or impractical for frequent updates. To keep up with noise parameters that are dynamically affected by sensor settings (e.g. exposure and gain) as well as environmental factors (e.g. temperature), we propose an In-Scene Calibration of Poisson Noise Parameters (ISC-PNP) method that does not require calibration targets. The main challenge lies in the heteroskedastic nature of the noise and the confounding influence of scene content. To address this, our method leverages global joint statistics of Poisson sensor data, which can be interpreted as a binomial random variable. We experimentally confirm that the noise parameters extracted by the proposed ISC-PNP and the standard calibration procedure are well-matched.
在传感器计量学中,控制光子探测器随机特性的噪声参数在表征计算成像系统的任意不确定性方面起着关键作用,例如间接飞行时间相机、结构光成像和时间分割偏振成像。现有的标准校准程序用于使用校准目标提取噪声参数,但它们不方便或不切实际,无法进行频繁的更新。为了跟上受传感器设置(例如曝光和增益)以及环境因素(例如温度)动态影响的噪声参数,我们提出了一种不需要校准目标的泊松噪声参数现场校准(ISC-PNP)方法。主要的挑战在于噪声的非均匀性和场景内容的混杂影响。为了解决这个问题,我们的方法利用泊松传感器数据的全局联合统计,可以将其解释为二项随机变量。实验结果表明,所提出的ISC-PNP提取的噪声参数与标准校准程序匹配良好。
{"title":"In-Scene Calibration of Poisson Noise Parameters for Phase Image Recovery","authors":"Achour Idoughi;Sreelakshmi Sreeharan;Chen Zhang;Joseph Raffoul;Hui Wang;Keigo Hirakawa","doi":"10.1109/OJSP.2025.3579650","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3579650","url":null,"abstract":"In sensor metrology, noise parameters governing the stochastic nature of photon detectors play critical role in characterizing the aleatoric uncertainty of computational imaging systems such as indirect time-of-flight cameras, structured light imaging, and division-of-time polarimetric imaging. Standard calibration procedures exists for extracting the noise parameters using calibration targets, but they are inconvenient or impractical for frequent updates. To keep up with noise parameters that are dynamically affected by sensor settings (e.g. exposure and gain) as well as environmental factors (e.g. temperature), we propose an In-Scene Calibration of Poisson Noise Parameters (ISC-PNP) method that does not require calibration targets. The main challenge lies in the heteroskedastic nature of the noise and the confounding influence of scene content. To address this, our method leverages global joint statistics of Poisson sensor data, which can be interpreted as a binomial random variable. We experimentally confirm that the noise parameters extracted by the proposed ISC-PNP and the standard calibration procedure are well-matched.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"682-690"},"PeriodicalIF":2.9,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11034763","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion 不连续收缩算子的连续松弛:近端包合与转换
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-13 DOI: 10.1109/OJSP.2025.3579646
Masahiro Yukawa
We present a principled way of deriving a continuous relaxation of a given discontinuous shrinkage operator, which is based on two fundamental results, proximal inclusion and conversion. Using our results, the discontinuous operator is converted, via double inversion, to a continuous operator; more precisely, the associated “set-valued” operator is converted to a “single-valued” Lipschitz continuous operator. The first illustrative example is the firm shrinkage operator which can be derived as a continuous relaxation of the hard shrinkage operator. We also derive a new operator as a continuous relaxation of the discontinuous shrinkage operator associated with the so-called reverse ordered weighted $ell _{1}$ (ROWL) penalty. Numerical examples demonstrate potential advantages of the continuous relaxation.
我们提出了一个原则性的方法来推导一个给定的不连续收缩算子的连续松弛,这是基于两个基本结果,近端包含和转换。利用我们的结果,通过双反演,不连续算子被转换为连续算子;更准确地说,相关的“集值”算子被转换为“单值”Lipschitz连续算子。第一个说明性的例子是坚固收缩算子,它可以作为硬收缩算子的连续松弛而导出。我们还推导了一个新的算子,作为与所谓的反向有序加权$ well _{1}$ (ROWL)惩罚相关的不连续收缩算子的连续松弛。数值算例表明了连续松弛的潜在优势。
{"title":"Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion","authors":"Masahiro Yukawa","doi":"10.1109/OJSP.2025.3579646","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3579646","url":null,"abstract":"We present a principled way of deriving a continuous relaxation of a given discontinuous shrinkage operator, which is based on two fundamental results, proximal inclusion and conversion. Using our results, the discontinuous operator is converted, via double inversion, to a continuous operator; more precisely, the associated “set-valued” operator is converted to a “single-valued” Lipschitz continuous operator. The first illustrative example is the firm shrinkage operator which can be derived as a continuous relaxation of the hard shrinkage operator. We also derive a new operator as a continuous relaxation of the discontinuous shrinkage operator associated with the so-called reverse ordered weighted <inline-formula><tex-math>$ell _{1}$</tex-math></inline-formula> (ROWL) penalty. Numerical examples demonstrate potential advantages of the continuous relaxation.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"753-767"},"PeriodicalIF":2.9,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11034740","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1