IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献_第5页

Segmentation-Free Velocity Field Super-Resolution on 4D Flow MRI 四维流磁共振成像上的无分割速度场超分辨率

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-04 DOI: 10.1109/TIP.2024.3470553

Sébastien Levilly;Saïd Moussaoui;Jean-Michel Serfaty

Blood flow observation is of high interest in cardiovascular disease diagnosis and assessment. For this purpose, 2D Phase-Contrast MRI is widely used in the clinical routine. 4D flow MRI sequences, which dynamically image the anatomic shape and velocity vectors within a region of interest, are promising but rarely used due to their low resolution and signal-to-noise ratio (SNR). Computational fluid dynamics (CFD) simulation is considered as a reference solution for resolution enhancement. However, its precision relies on image segmentation and a clinical expertise for the definition of the vessel borders. The main contribution of this paper is a Segmentation-Free Super-Resolution (SFSR) algorithm. Based on inverse problem methodology, SFSR relies on minimizing a compound criterion involving: a data fidelity term, a fluid mechanics term, and a spatial velocity smoothing term. The proposed algorithm is evaluated with respect to state-of-the-art solutions, in terms of quantification error and computation time, on a synthetic 3D dataset with several noise levels, resulting in a 59% RMSE improvement and factor 2 super-resolution with a noise standard deviation of 5% of the Venc. Finally, its performance is demonstrated, with a scale factor of 2 and 3, on a pulsed flow phantom dataset with more complex patterns. The application on in-vivo were achievable within the 10 min. computation time.

在心血管疾病的诊断和评估中，血流观测具有很高的价值。为此，二维相位对比 MRI 被广泛应用于临床常规。四维血流磁共振成像序列可对感兴趣区域内的解剖形状和速度矢量进行动态成像，具有良好的前景，但由于其分辨率和信噪比（SNR）较低而很少使用。计算流体动力学（CFD）模拟被认为是提高分辨率的参考方案。然而，它的精确性依赖于图像分割和定义血管边界的临床专业知识。本文的主要贡献在于无分割超分辨率（SFSR）算法。基于逆问题方法，SFSR 依靠最小化一个复合准则，该准则涉及：数据保真度项、流体力学项和空间速度平滑项。在具有多种噪声水平的合成三维数据集上，从量化误差和计算时间的角度对所提出的算法进行了评估，结果显示 RMSE 提高了 59%，在噪声标准偏差为 Venc 的 5%的情况下，超分辨率提高了 2 倍。最后，在具有更复杂模式的脉冲血流模型数据集上，以 2 和 3 的比例因子展示了其性能。在体内的应用可在 10 分钟的计算时间内完成。

{"title":"Segmentation-Free Velocity Field Super-Resolution on 4D Flow MRI","authors":"Sébastien Levilly;Saïd Moussaoui;Jean-Michel Serfaty","doi":"10.1109/TIP.2024.3470553","DOIUrl":"10.1109/TIP.2024.3470553","url":null,"abstract":"Blood flow observation is of high interest in cardiovascular disease diagnosis and assessment. For this purpose, 2D Phase-Contrast MRI is widely used in the clinical routine. 4D flow MRI sequences, which dynamically image the anatomic shape and velocity vectors within a region of interest, are promising but rarely used due to their low resolution and signal-to-noise ratio (SNR). Computational fluid dynamics (CFD) simulation is considered as a reference solution for resolution enhancement. However, its precision relies on image segmentation and a clinical expertise for the definition of the vessel borders. The main contribution of this paper is a Segmentation-Free Super-Resolution (SFSR) algorithm. Based on inverse problem methodology, SFSR relies on minimizing a compound criterion involving: a data fidelity term, a fluid mechanics term, and a spatial velocity smoothing term. The proposed algorithm is evaluated with respect to state-of-the-art solutions, in terms of quantification error and computation time, on a synthetic 3D dataset with several noise levels, resulting in a 59% RMSE improvement and factor 2 super-resolution with a noise standard deviation of 5% of the Venc. Finally, its performance is demonstrated, with a scale factor of 2 and 3, on a pulsed flow phantom dataset with more complex patterns. The application on in-vivo were achievable within the 10 min. computation time.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5637-5649"},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-Cascaded and Crosstalk-Free Multi-Image Encryption Based on Optical Scanning Holography Using 2D Orthogonal Compressive Sensing 基于二维正交压缩传感光学扫描全息技术的无级联、无串扰多图像加密。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-03 DOI: 10.1109/TIP.2024.3468916

Luozhi Zhang;Qionghua Wang;Zhan Yu;Jinxi Li;Xing Bai;Xin Zhou;Yuanyuan Wu

We propose a non-cascaded and crosstalk-free multi-image encryption method based on optical scanning holography and 2D orthogonal compressive sensing. This approach enables the simultaneous recording and encryption of multiple plaintext images without mechanical scanning, while allows for independent retrieval of each image with exceptional quality and no crosstalk. Two features would bring about more substantial security and privacy. The one is that, by employing a sequence of pre-designed structural patterns as encryption keys at the pupil, multiple samplings can be achieved and ultimately the holographic cyphertext can be obtained. These patterns are generated using a measurement matrix processed with the generalized orthogonal one. As a result, one can accomplish the differentiation of images prior to the recording and thus neither need to pretreat the pending images nor to suppress the out-of-focus noise in the decrypted image. The other one is that, the non-cascaded architecture ensures that different plaintexts do not share sub-keys. Meanwhile, compared to 1D orthogonal compressive sensing, the 2D counterpart makes the proposed method to synchronously deal with multiple images of more complexity, while acquire significantly high-quality decrypted images and far greater encryption capacity. Further, the regularities of conversion between 1D and 2D orthogonal compressive sensing are identified, which may be instructive when to manufacture a practical multi-image cryptosystem or a single-pixel imaging equipment. A more general method or concept named synthesis pupil encoding is advanced. It may provide an effective way to combine multiple encryption methods together into a non-cascaded one. Our method possesses nonlinearity and it is also promising in multi-image asymmetric or public key cryptosystem as well as multi-user multiplexing.

我们提出了一种基于光学扫描全息和二维正交压缩传感的无级联、无串扰的多图像加密方法。这种方法无需机械扫描就能同时记录和加密多张明文图像，同时还能独立检索每张图像，且图像质量优异，无串扰。有两个特点会带来更高的安全性和隐私性。其一，通过在瞳孔处使用一系列预先设计好的结构模式作为加密密钥，可以实现多次采样，最终获得全息密文。这些图案是用广义正交矩阵处理过的测量矩阵生成的。因此，我们可以在记录之前完成图像的区分，从而既不需要对待处理图像进行预处理，也不需要抑制解密图像中的失焦噪声。另一方面，非级联结构确保了不同明文不会共享子密钥。同时，与一维正交压缩传感技术相比，二维正交压缩传感技术使所提出的方法可以同步处理更复杂的多幅图像，同时获得更高质量的解密图像和更大的加密能力。此外，我们还发现了一维和二维正交压缩传感之间转换的规律性，这对制造实用的多图像密码系统或单像素成像设备可能具有指导意义。此外，还提出了一种名为 "合成瞳孔编码 "的更普遍的方法或概念。它可以提供一种有效的方法，将多种加密方法组合成一种非级联加密方法。我们的方法具有非线性，在多图像非对称或公钥密码系统以及多用户多路复用方面也大有可为。

{"title":"Non-Cascaded and Crosstalk-Free Multi-Image Encryption Based on Optical Scanning Holography Using 2D Orthogonal Compressive Sensing","authors":"Luozhi Zhang;Qionghua Wang;Zhan Yu;Jinxi Li;Xing Bai;Xin Zhou;Yuanyuan Wu","doi":"10.1109/TIP.2024.3468916","DOIUrl":"10.1109/TIP.2024.3468916","url":null,"abstract":"We propose a non-cascaded and crosstalk-free multi-image encryption method based on optical scanning holography and 2D orthogonal compressive sensing. This approach enables the simultaneous recording and encryption of multiple plaintext images without mechanical scanning, while allows for independent retrieval of each image with exceptional quality and no crosstalk. Two features would bring about more substantial security and privacy. The one is that, by employing a sequence of pre-designed structural patterns as encryption keys at the pupil, multiple samplings can be achieved and ultimately the holographic cyphertext can be obtained. These patterns are generated using a measurement matrix processed with the generalized orthogonal one. As a result, one can accomplish the differentiation of images prior to the recording and thus neither need to pretreat the pending images nor to suppress the out-of-focus noise in the decrypted image. The other one is that, the non-cascaded architecture ensures that different plaintexts do not share sub-keys. Meanwhile, compared to 1D orthogonal compressive sensing, the 2D counterpart makes the proposed method to synchronously deal with multiple images of more complexity, while acquire significantly high-quality decrypted images and far greater encryption capacity. Further, the regularities of conversion between 1D and 2D orthogonal compressive sensing are identified, which may be instructive when to manufacture a practical multi-image cryptosystem or a single-pixel imaging equipment. A more general method or concept named synthesis pupil encoding is advanced. It may provide an effective way to combine multiple encryption methods together into a non-cascaded one. Our method possesses nonlinearity and it is also promising in multi-image asymmetric or public key cryptosystem as well as multi-user multiplexing.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5688-5702"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Point Clouds Matching Based on Discrete Optimal Transport 基于离散最优传输的点云匹配。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-03 DOI: 10.1109/TIP.2024.3459594

Litao Ma;Wei Bian;Xiaoping Xue

Matching is an important prerequisite for point clouds registration, which is to establish a reliable correspondence between two point clouds. This paper aims to improve recent theoretical and algorithmic results on discrete optimal transport (DOT), since it lacks robustness for the point clouds matching problems with large-scale affine or even nonlinear transformation. We first consider the importance of the used prior probability for accurate matching and give some theoretical analysis. Then, to solve the point clouds matching problems with complex deformation and noise, we propose an improved DOT model, which introduces an orthogonal matrix and a diagonal matrix into the classical DOT model. To enhance its capability of dealing with cases with outliers, we further bring forward a relaxed and regularized DOT model. Meantime, we propose two algorithms to solve the brought forward two models. Finally, extensive experiments on some real datasets are designed in the presence of reflection, large-scale rotation, stretch, noise, and outliers. Some state-of-the-art methods, including CPD, APM, RANSAC, TPS-ICP, TPS-RPM, RPMNet, and classical DOT methods, are to be discussed and compared. For different levels of degradation, the numerical results demonstrate that the proposed methods perform more favorably and robustly than the other methods.

匹配是点云注册的重要前提，即在两个点云之间建立可靠的对应关系。由于离散最优传输（DOT）对于大规模仿射变换甚至非线性变换的点云匹配问题缺乏鲁棒性，本文旨在改进离散最优传输（DOT）的最新理论和算法成果。我们首先考虑了所使用的先验概率对精确匹配的重要性，并给出了一些理论分析。然后，为了解决具有复杂形变和噪声的点云匹配问题，我们提出了一种改进的 DOT 模型，在经典 DOT 模型中引入了一个正交矩阵和一个对角矩阵。为了增强其处理异常值的能力，我们进一步提出了一种松弛的正则化 DOT 模型。同时，我们提出了两种算法来求解这两种模型。最后，我们在一些真实数据集上进行了大量实验，这些数据集存在反射、大尺度旋转、拉伸、噪声和异常值。一些最先进的方法，包括 CPD、APM、RANSAC、TPS-ICP、TPS-RPM、RPMNet 和经典 DOT 方法，将被讨论和比较。数值结果表明，对于不同程度的劣化，所提出的方法比其他方法的性能更优越、更稳健。

{"title":"Point Clouds Matching Based on Discrete Optimal Transport","authors":"Litao Ma;Wei Bian;Xiaoping Xue","doi":"10.1109/TIP.2024.3459594","DOIUrl":"10.1109/TIP.2024.3459594","url":null,"abstract":"Matching is an important prerequisite for point clouds registration, which is to establish a reliable correspondence between two point clouds. This paper aims to improve recent theoretical and algorithmic results on discrete optimal transport (DOT), since it lacks robustness for the point clouds matching problems with large-scale affine or even nonlinear transformation. We first consider the importance of the used prior probability for accurate matching and give some theoretical analysis. Then, to solve the point clouds matching problems with complex deformation and noise, we propose an improved DOT model, which introduces an orthogonal matrix and a diagonal matrix into the classical DOT model. To enhance its capability of dealing with cases with outliers, we further bring forward a relaxed and regularized DOT model. Meantime, we propose two algorithms to solve the brought forward two models. Finally, extensive experiments on some real datasets are designed in the presence of reflection, large-scale rotation, stretch, noise, and outliers. Some state-of-the-art methods, including CPD, APM, RANSAC, TPS-ICP, TPS-RPM, RPMNet, and classical DOT methods, are to be discussed and compared. For different levels of degradation, the numerical results demonstrate that the proposed methods perform more favorably and robustly than the other methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5650-5662"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CPI-Parser: Integrating Causal Properties Into Multiple Human Parsing CPI-Parser：将因果属性整合到多重人格解析中

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-03 DOI: 10.1109/TIP.2024.3469579

Xuanhan Wang;Xiaojia Chen;Lianli Gao;Jingkuan Song;Heng Tao Shen

Existing methods of multiple human parsing (MHP) apply deep models to learn instance-level representations for segmenting each person into non-overlapped body parts. However, learned representations often contain many spurious correlations that degrade model generalization, leading learned models to be vulnerable to visually contextual variations in images (e.g., unseen image styles/external interventions). To tackle this, we present a causal property integrated parsing model termed CPI-Parser, which is driven by fundamental causal principles involving two causal properties for human parsing (i.e., the causal diversity and the causal invariance). Specifically, we assume that an image is constructed by a mix of causal factors (the characteristics of body parts) and non-causal factors (external contexts), where only the former ones decide the essence of human parsing. Since causal/non-causal factors are unobservable, the proposed CPI-Parser is required to separate key factors that satisfy the causal properties from an image. In this way, the parser is able to rely on causal factors w.r.t relevant evidence rather than non-causal factors w.r.t spurious correlations, thus alleviating model degradation and yielding improved parsing ability. Notably, the CPI-Parser is designed in a flexible way and can be integrated into any existing MHP frameworks. Extensive experiments conducted on three widely used benchmarks demonstrate the effectiveness and generalizability of our method. Code and models are released (https://github.com/HAG-uestc/CPI-Parser) for research purpose.

现有的多重人体解析（MHP）方法采用深度模型来学习实例级表征，以便将每个人分割成不重叠的身体部位。然而，学习到的表征往往包含许多会降低模型泛化能力的虚假相关性，导致学习到的模型容易受到图像中视觉上下文变化（如未见的图像风格/外部干预）的影响。为了解决这个问题，我们提出了一种被称为 CPI-Parser 的因果属性集成解析模型，它由涉及人类解析的两个因果属性（即因果多样性和因果不变性）的基本因果原则驱动。具体来说，我们假设图像是由因果因素（身体部位的特征）和非因果因素（外部语境）混合构建的，其中只有前者决定了人类解析的本质。由于因果/非因果因素是不可观测的，因此所提出的 CPI-Parser 需要将满足因果属性的关键因素从图像中分离出来。这样，解析器就能依靠相关证据中的因果因素，而不是虚假相关的非因果因素，从而减轻模型退化，提高解析能力。值得注意的是，CPI-Parser 设计灵活，可以集成到任何现有的 MHP 框架中。在三个广泛使用的基准上进行的大量实验证明了我们方法的有效性和通用性。代码和模型已发布1，用于研究目的。

{"title":"CPI-Parser: Integrating Causal Properties Into Multiple Human Parsing","authors":"Xuanhan Wang;Xiaojia Chen;Lianli Gao;Jingkuan Song;Heng Tao Shen","doi":"10.1109/TIP.2024.3469579","DOIUrl":"10.1109/TIP.2024.3469579","url":null,"abstract":"Existing methods of multiple human parsing (MHP) apply deep models to learn instance-level representations for segmenting each person into non-overlapped body parts. However, learned representations often contain many spurious correlations that degrade model generalization, leading learned models to be vulnerable to visually contextual variations in images (e.g., unseen image styles/external interventions). To tackle this, we present a causal property integrated parsing model termed CPI-Parser, which is driven by fundamental causal principles involving two causal properties for human parsing (i.e., the causal diversity and the causal invariance). Specifically, we assume that an image is constructed by a mix of causal factors (the characteristics of body parts) and non-causal factors (external contexts), where only the former ones decide the essence of human parsing. Since causal/non-causal factors are unobservable, the proposed CPI-Parser is required to separate key factors that satisfy the causal properties from an image. In this way, the parser is able to rely on causal factors w.r.t relevant evidence rather than non-causal factors w.r.t spurious correlations, thus alleviating model degradation and yielding improved parsing ability. Notably, the CPI-Parser is designed in a flexible way and can be integrated into any existing MHP frameworks. Extensive experiments conducted on three widely used benchmarks demonstrate the effectiveness and generalizability of our method. Code and models are released (\u0000<uri>https://github.com/HAG-uestc/CPI-Parser</uri>\u0000) for research purpose.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5771-5782"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality 虚拟现实中渲染人类头像视频的主观和客观质量评估

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-02 DOI: 10.1109/TIP.2024.3468881

Yu-Chih Chen;Avinab Saha;Alexandre Chapiro;Christian Häne;Jean-Charles Bazin;Bo Qiu;Stefano Zanetti;Ioannis Katsavounidis;Alan C. Bovik

We study the visual quality judgments of human subjects on digital human avatars (sometimes referred to as “holograms” in the parlance of virtual reality [VR] and augmented reality [AR] systems) that have been subjected to distortions. We also study the ability of video quality models to predict human judgments. As streaming human avatar videos in VR or AR become increasingly common, the need for more advanced human avatar video compression protocols will be required to address the tradeoffs between faithfully transmitting high-quality visual representations while adjusting to changeable bandwidth scenarios. During transmission over the internet, the perceived quality of compressed human avatar videos can be severely impaired by visual artifacts. To optimize trade-offs between perceptual quality and data volume in practical workflows, video quality assessment (VQA) models are essential tools. However, there are very few VQA algorithms developed specifically to analyze human body avatar videos, due, at least in part, to the dearth of appropriate and comprehensive datasets of adequate size. Towards filling this gap, we introduce the LIVE-Meta Rendered Human Avatar VQA Database, which contains 720 human avatar videos processed using 20 different combinations of encoding parameters, labeled by corresponding human perceptual quality judgments that were collected in six degrees of freedom VR headsets. To demonstrate the usefulness of this new and unique video resource, we use it to study and compare the performances of a variety of state-of-the-art Full Reference and No Reference video quality prediction models, including a new model called HoloQA. As a service to the research community, we publicly releases the metadata of the new database at https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html.

我们研究了人类受试者对数字人类化身（有时在虚拟现实 [VR] 和增强现实 [AR] 系统中被称为 "全息图"）的视觉质量判断，这些化身都受到了扭曲。我们还研究了视频质量模型预测人类判断的能力。随着 VR 或 AR 中的人类头像视频流变得越来越普遍，我们需要更先进的人类头像视频压缩协议，以解决在忠实传输高质量视觉呈现的同时又能适应多变带宽场景之间的权衡问题。在互联网传输过程中，压缩后的人类头像视频的感知质量可能会受到视觉伪影的严重影响。为了在实际工作流程中优化感知质量和数据量之间的权衡，视频质量评估（VQA）模型是必不可少的工具。然而，专门为分析人体头像视频而开发的 VQA 算法却寥寥无几，至少部分原因是缺乏适当规模的合适综合数据集。为了填补这一空白，我们引入了 LIVE-Meta 渲染人体头像 VQA 数据库，该数据库包含 720 个使用 20 种不同编码参数组合处理的人体头像视频，并标注了相应的人类感知质量判断，这些判断是在六自由度 VR 头显中收集的。为了证明这一新的、独特的视频资源的实用性，我们利用它来研究和比较各种最先进的 "完全参考 "和 "无参考 "视频质量预测模型（包括名为 HoloQA 的新模型）的性能。作为对研究界的一项服务，我们在 https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html 网站上公开发布了新数据库的元数据。

{"title":"Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality","authors":"Yu-Chih Chen;Avinab Saha;Alexandre Chapiro;Christian Häne;Jean-Charles Bazin;Bo Qiu;Stefano Zanetti;Ioannis Katsavounidis;Alan C. Bovik","doi":"10.1109/TIP.2024.3468881","DOIUrl":"10.1109/TIP.2024.3468881","url":null,"abstract":"We study the visual quality judgments of human subjects on digital human avatars (sometimes referred to as “holograms” in the parlance of virtual reality [VR] and augmented reality [AR] systems) that have been subjected to distortions. We also study the ability of video quality models to predict human judgments. As streaming human avatar videos in VR or AR become increasingly common, the need for more advanced human avatar video compression protocols will be required to address the tradeoffs between faithfully transmitting high-quality visual representations while adjusting to changeable bandwidth scenarios. During transmission over the internet, the perceived quality of compressed human avatar videos can be severely impaired by visual artifacts. To optimize trade-offs between perceptual quality and data volume in practical workflows, video quality assessment (VQA) models are essential tools. However, there are very few VQA algorithms developed specifically to analyze human body avatar videos, due, at least in part, to the dearth of appropriate and comprehensive datasets of adequate size. Towards filling this gap, we introduce the LIVE-Meta Rendered Human Avatar VQA Database, which contains 720 human avatar videos processed using 20 different combinations of encoding parameters, labeled by corresponding human perceptual quality judgments that were collected in six degrees of freedom VR headsets. To demonstrate the usefulness of this new and unique video resource, we use it to study and compare the performances of a variety of state-of-the-art Full Reference and No Reference video quality prediction models, including a new model called HoloQA. As a service to the research community, we publicly releases the metadata of the new database at \u0000<uri>https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5740-5754"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

INformer: Inertial-Based Fusion Transformer for Camera Shake Deblurring INformer：基于惯性的相机抖动去模糊融合变换器

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-02 DOI: 10.1109/TIP.2024.3461967

Wenqi Ren;Linrui Wu;Yanyang Yan;Shengyao Xu;Feng Huang;Xiaochun Cao

Inertial measurement units (IMU) in the capturing device can record the motion information of the device, with gyroscopes measuring angular velocity and accelerometers measuring acceleration. However, conventional deblurring methods seldom incorporate IMU data, and existing approaches that utilize IMU information often face challenges in fully leveraging this valuable data, resulting in noise issues from the sensors. To address these issues, in this paper, we propose a multi-stage deblurring network named INformer, which combines inertial information with the Transformer architecture. Specifically, we design an IMU-image Attention Fusion (IAF) block to merge motion information derived from inertial measurements with blurry image features at the attention level. Furthermore, we introduce an Inertial-Guided Deformable Attention (IGDA) block for utilizing the motion information features as guidance to adaptively adjust the receptive field, which can further refine the corresponding blur kernel for pixels. Extensive experiments on comprehensive benchmarks demonstrate that our proposed method performs favorably against state-of-the-art deblurring approaches.

捕捉设备中的惯性测量单元（IMU）可以记录设备的运动信息，陀螺仪可以测量角速度，加速度计可以测量加速度。然而，传统的去模糊方法很少采用 IMU 数据，而现有的利用 IMU 信息的方法在充分利用这些宝贵数据方面往往面临挑战，因为传感器会产生噪声问题。为了解决这些问题，我们在本文中提出了一种名为 INformer 的多级去模糊网络，它将惯性信息与 Transformer 架构相结合。具体来说，我们设计了一个 IMU 图像注意力融合（IAF）区块，将惯性测量得到的运动信息与注意力层面的模糊图像特征进行融合。此外，我们还引入了惯性引导可变形注意力（IGDA）模块，利用运动信息特征作为引导，自适应地调整感受野，从而进一步完善相应的像素模糊内核。在综合基准上进行的大量实验表明，我们提出的方法与最先进的去模糊方法相比表现出色。

{"title":"INformer: Inertial-Based Fusion Transformer for Camera Shake Deblurring","authors":"Wenqi Ren;Linrui Wu;Yanyang Yan;Shengyao Xu;Feng Huang;Xiaochun Cao","doi":"10.1109/TIP.2024.3461967","DOIUrl":"10.1109/TIP.2024.3461967","url":null,"abstract":"Inertial measurement units (IMU) in the capturing device can record the motion information of the device, with gyroscopes measuring angular velocity and accelerometers measuring acceleration. However, conventional deblurring methods seldom incorporate IMU data, and existing approaches that utilize IMU information often face challenges in fully leveraging this valuable data, resulting in noise issues from the sensors. To address these issues, in this paper, we propose a multi-stage deblurring network named INformer, which combines inertial information with the Transformer architecture. Specifically, we design an IMU-image Attention Fusion (IAF) block to merge motion information derived from inertial measurements with blurry image features at the attention level. Furthermore, we introduce an Inertial-Guided Deformable Attention (IGDA) block for utilizing the motion information features as guidance to adaptively adjust the receptive field, which can further refine the corresponding blur kernel for pixels. Extensive experiments on comprehensive benchmarks demonstrate that our proposed method performs favorably against state-of-the-art deblurring approaches.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6045-6056"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm 通过新颖的特征范式应对将外观线索纳入启发式多目标跟踪器的挑战

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-02 DOI: 10.1109/TIP.2024.3468901

Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu

In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.

在多目标跟踪（MOT）领域，使用再识别（ReID）特征将外观线索纳入通过检测进行跟踪的启发式跟踪器，对其发展造成了限制。现有的 ReID 模式包括使用 ReID 模型从固定输入尺寸的裁剪对象中提取粗粒度对象级特征向量，并通过简单的归一化内积计算相似性。然而，MOT 需要从不同物体区域提取细粒度特征，并进行更精确的相似性测量，以识别个体，尤其是在存在遮挡的情况下。为了解决这些局限性，我们提出了一种新颖的特征范式。在这一范例中，我们从整个帧图像中提取特征图，以保留物体尺寸，并使用来自不同物体区域的一组精细特征来表示物体。这些特征从特征图上物体边界框内的自适应斑块中采样，以有效捕捉局部外观线索。我们引入了互比相似度（Mutual Ratio Similarity，MRS），以根据采样斑块精确测量两个物体之间最具辨别力区域的相似度，这在处理遮挡时被证明是有效的。此外，我们还提出了绝对交集大于联合（AIoU），以在计算特征成本时考虑物体的大小。我们将我们的范式与先进的运动技术相结合，开发出了启发式运动-特征联合多目标跟踪器 MoFe。其中，我们重新制定了小轨迹的轨迹状态转换，以更好地模拟它们的生命周期，并首先在 MoFe 之后引入运行时记录器来完善轨迹。在五个基准（即 GMOT-40、BDD100k、DanceTrack、MOT17 和 MOT20）上进行的广泛实验表明，MoFe 在不进行任何微调的情况下，在鲁棒性和普适性方面达到了最先进的性能，甚至超过了微调 ReID 特征的性能。

{"title":"Addressing Challenges of Incorporating Appearance Cues Into Heuristic Multi-Object Tracker via a Novel Feature Paradigm","authors":"Chongwei Liu;Haojie Li;Zhihui Wang;Rui Xu","doi":"10.1109/TIP.2024.3468901","DOIUrl":"10.1109/TIP.2024.3468901","url":null,"abstract":"In the field of Multi-Object Tracking (MOT), the incorporation of appearance cues into tracking-by-detection heuristic trackers using re-identification (ReID) features has posed limitations on its advancement. The existing ReID paradigm involves the extraction of coarse-grained object-level feature vectors from cropped objects at a fixed input size using a ReID model, and similarity computation through a simple normalized inner product. However, MOT requires fine-grained features from different object regions and more accurate similarity measurements to identify individuals, especially in the presence of occlusion. To address these limitations, we propose a novel feature paradigm. In this paradigm, we extract the feature map from the entire frame image to preserve object sizes and represent objects using a set of fine-grained features from different object regions. These features are sampled from adaptive patches within the object bounding box on the feature map to effectively capture local appearance cues. We introduce Mutual Ratio Similarity (MRS) to accurately measure the similarity of the most discriminative region between two objects based on the sampled patches, which proves effective in handling occlusion. Moreover, we propose absolute Intersection over Union (AIoU) to consider object sizes in feature cost computation. We integrate our paradigm with advanced motion techniques to develop a heuristic Motion-Feature joint multi-object tracker, MoFe. Within it, we reformulate the track state transition of tracklets to better model their life cycle, and firstly introduce a runtime recorder after MoFe to refine trajectories. Extensive experiments on five benchmarks, i.e., GMOT-40, BDD100k, DanceTrack, MOT17, and MOT20, demonstrate that MoFe achieves state-of-the-art performance in robustness and generalizability without any fine-tuning, and even surpasses the performance of fine-tuned ReID features.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5727-5739"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Video Instance Shadow Detection Under the Sun and Sky 太阳和天空下的阴影检测视频实例。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-02 DOI: 10.1109/TIP.2024.3468877

Zhenghao Xing;Tianyu Wang;Xiaowei Hu;Haoran Wu;Chi-Wing Fu;Pheng-Ann Heng

Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training. ViShadow features a two-stage training pipeline: the first stage, utilizing labeled image data, identifies shadow and object instances through contrastive learning for cross-frame pairing. The second stage employs unlabeled videos, incorporating an associated cycle consistency loss to enhance tracking ability. A retrieval mechanism is introduced to manage temporary disappearances, ensuring tracking continuity. The SOBA-VID dataset, comprising unlabeled training videos and labeled testing videos, along with the SOAP-VID metric, is introduced for the quantitative evaluation of VISD solutions. The effectiveness of ViShadow is further demonstrated through various video-level applications such as video inpainting, instance cloning, shadow editing, and text-instructed shadow-object manipulation.

阴影实例检测对照片编辑和光照方向估计等应用至关重要，在预测阴影实例、物体实例及其关联方面取得了重大进展。将这一任务扩展到视频中，在注释不同的视频数据以及解决遮挡和关联中的暂时消失所带来的复杂性方面提出了挑战。为了应对这些挑战，我们推出了 ViShadow，这是一种半监督式视频实例阴影检测框架，可同时利用已标注图像数据和未标注视频数据进行训练。ViShadow 采用两阶段训练管道：第一阶段利用标记图像数据，通过跨帧配对的对比学习来识别阴影和物体实例。第二阶段利用未标记的视频，结合相关的周期一致性损失来增强跟踪能力。此外，还引入了一种检索机制来管理临时消失，确保跟踪的连续性。SOBA-VID 数据集包括未标记的训练视频和已标记的测试视频以及 SOAP-VID 指标，用于对 VISD 解决方案进行定量评估。通过各种视频级应用，如视频内画、实例克隆、阴影编辑和文本指示阴影对象操作，进一步证明了 ViShadow 的有效性。

{"title":"Video Instance Shadow Detection Under the Sun and Sky","authors":"Zhenghao Xing;Tianyu Wang;Xiaowei Hu;Haoran Wu;Chi-Wing Fu;Pheng-Ann Heng","doi":"10.1109/TIP.2024.3468877","DOIUrl":"10.1109/TIP.2024.3468877","url":null,"abstract":"Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training. ViShadow features a two-stage training pipeline: the first stage, utilizing labeled image data, identifies shadow and object instances through contrastive learning for cross-frame pairing. The second stage employs unlabeled videos, incorporating an associated cycle consistency loss to enhance tracking ability. A retrieval mechanism is introduced to manage temporary disappearances, ensuring tracking continuity. The SOBA-VID dataset, comprising unlabeled training videos and labeled testing videos, along with the SOAP-VID metric, is introduced for the quantitative evaluation of VISD solutions. The effectiveness of ViShadow is further demonstrated through various video-level applications such as video inpainting, instance cloning, shadow editing, and text-instructed shadow-object manipulation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5715-5726"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142367959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adapting Vision-Language Models via Learning to Inject Knowledge 通过学习注入知识来调整视觉语言模型

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-02 DOI: 10.1109/TIP.2024.3468884

Shiyu Xuan;Ming Yang;Shiliang Zhang

Pre-trained vision-language models (VLM) such as CLIP, have demonstrated impressive zero-shot performance on various vision tasks. Trained on millions or even billions of image-text pairs, the text encoder has memorized a substantial amount of appearance knowledge. Such knowledge in VLM is usually leveraged by learning specific task-oriented prompts, which may limit its performance in unseen tasks. This paper proposes a new knowledge injection framework to pursue a generalizable adaption of VLM to downstream vision tasks. Instead of learning task-specific prompts, we extract task-agnostic knowledge features, and insert them into features of input images or texts. The fused features hence gain better discriminative capability and robustness to intra-category variances. Those knowledge features are generated by inputting learnable prompt sentences into text encoder of VLM, and extracting its multi-layer features. A new knowledge injection module (KIM) is proposed to refine text features or visual features using knowledge features. This knowledge injection framework enables both modalities to benefit from the rich knowledge memorized in the text encoder. Experiments show that our method outperforms recently proposed methods under few-shot learning, base-to-new classes generalization, cross-dataset transfer, and domain generalization settings. For instance, it outperforms CoOp by 4.5% under the few-shot learning setting, and CoCoOp by 4.4% under the base-to-new classes generalization setting. Our code will be released.

像 CLIP 这样的预训练视觉语言模型（VLM）在各种视觉任务中都表现出了令人印象深刻的零误差性能。经过数百万甚至数十亿图像-文本对的训练，文本编码器已经记住了大量的外观知识。在 VLM 中，这些知识通常是通过学习特定的任务导向提示来利用的，这可能会限制其在不可见任务中的表现。本文提出了一种新的知识注入框架，以追求 VLM 对下游视觉任务的通用适应性。我们不学习特定任务的提示，而是提取与任务无关的知识特征，并将其插入输入图像或文本的特征中。融合后的特征将获得更好的分辨能力和对类别内差异的鲁棒性。这些知识特征是通过向 VLM 文本编码器输入可学习的提示句子并提取其多层特征而生成的。我们提出了一个新的知识注入模块（KIM），利用知识特征完善文本特征或视觉特征。这种知识注入框架使两种模式都能从文本编码器中记忆的丰富知识中获益。实验表明，我们的方法在少数几次学习、从基础到新类别的泛化、跨数据集转移和领域泛化设置下都优于最近提出的方法。例如，在少量学习设置下，我们的方法比 CoOp 优胜 4.5%；在从基数到新类的泛化设置下，我们的方法比 CoCoOp 优胜 4.4%。我们的代码即将发布。

{"title":"Adapting Vision-Language Models via Learning to Inject Knowledge","authors":"Shiyu Xuan;Ming Yang;Shiliang Zhang","doi":"10.1109/TIP.2024.3468884","DOIUrl":"10.1109/TIP.2024.3468884","url":null,"abstract":"Pre-trained vision-language models (VLM) such as CLIP, have demonstrated impressive zero-shot performance on various vision tasks. Trained on millions or even billions of image-text pairs, the text encoder has memorized a substantial amount of appearance knowledge. Such knowledge in VLM is usually leveraged by learning specific task-oriented prompts, which may limit its performance in unseen tasks. This paper proposes a new knowledge injection framework to pursue a generalizable adaption of VLM to downstream vision tasks. Instead of learning task-specific prompts, we extract task-agnostic knowledge features, and insert them into features of input images or texts. The fused features hence gain better discriminative capability and robustness to intra-category variances. Those knowledge features are generated by inputting learnable prompt sentences into text encoder of VLM, and extracting its multi-layer features. A new knowledge injection module (KIM) is proposed to refine text features or visual features using knowledge features. This knowledge injection framework enables both modalities to benefit from the rich knowledge memorized in the text encoder. Experiments show that our method outperforms recently proposed methods under few-shot learning, base-to-new classes generalization, cross-dataset transfer, and domain generalization settings. For instance, it outperforms CoOp by 4.5% under the few-shot learning setting, and CoCoOp by 4.4% under the base-to-new classes generalization setting. Our code will be released.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5798-5809"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Event-Assisted Blurriness Representation Learning for Blurry Image Unfolding 针对模糊图像展开的事件辅助模糊表征学习

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Pub Date : 2024-10-01 DOI: 10.1109/TIP.2024.3468023

Pengyu Zhang;Hao Ju;Lei Yu;Weihua He;Yaoyuan Wang;Ziyang Zhang;Qi Xu;Shengming Li;Dong Wang;Huchuan Lu;Xu Jia

The goal of blurry image deblurring and unfolding task is to recover a single sharp frame or a sequence from a blurry one. Recently, its performance is greatly improved with introduction of a bio-inspired visual sensor, event camera. Most existing event-assisted deblurring methods focus on the design of powerful network architectures and effective training strategy, while ignoring the role of blur modeling in removing various blur in dynamic scenes. In this work, we propose to implicitly model blur in an image by computing blurriness representation with an event-assisted blurriness encoder. The learning of blurriness representation is formulated as a ranking problem based on specially synthesized pairs. Blurriness-aware image unfolding is achieved by integrating blur relevant information contained in the representation into a base unfolding network. The integration is mainly realized by the proposed blurriness-guided modulation and multi-scale aggregation modules. Experiments on GOPRO and HQF datasets show favorable performance of the proposed method against state-of-the-art approaches. More results on real-world data validate its effectiveness in recovering a sequence of latent sharp frames from a blurry image.

模糊图像去模糊和展开任务的目标是从模糊图像中恢复单帧或序列清晰图像。最近，随着受生物启发的视觉传感器--事件相机的引入，其性能得到了极大改善。现有的事件辅助去模糊方法大多侧重于设计强大的网络架构和有效的训练策略，而忽略了模糊建模在去除动态场景中各种模糊的作用。在这项工作中，我们提出通过使用事件辅助模糊度编码器计算模糊度表示，对图像中的模糊进行隐式建模。模糊度表示的学习被表述为一个基于特殊合成对的排序问题。模糊度感知图像展开是通过将模糊度表示中包含的相关信息整合到基础展开网络中来实现的。这种整合主要通过所提出的模糊度引导调制和多尺度聚合模块来实现。在 GOPRO 和 HQF 数据集上的实验表明，与最先进的方法相比，所提出的方法具有良好的性能。在真实世界数据上的更多结果验证了该方法在从模糊图像中恢复潜在锐利帧序列方面的有效性。

{"title":"Event-Assisted Blurriness Representation Learning for Blurry Image Unfolding","authors":"Pengyu Zhang;Hao Ju;Lei Yu;Weihua He;Yaoyuan Wang;Ziyang Zhang;Qi Xu;Shengming Li;Dong Wang;Huchuan Lu;Xu Jia","doi":"10.1109/TIP.2024.3468023","DOIUrl":"10.1109/TIP.2024.3468023","url":null,"abstract":"The goal of blurry image deblurring and unfolding task is to recover a single sharp frame or a sequence from a blurry one. Recently, its performance is greatly improved with introduction of a bio-inspired visual sensor, event camera. Most existing event-assisted deblurring methods focus on the design of powerful network architectures and effective training strategy, while ignoring the role of blur modeling in removing various blur in dynamic scenes. In this work, we propose to implicitly model blur in an image by computing blurriness representation with an event-assisted blurriness encoder. The learning of blurriness representation is formulated as a ranking problem based on specially synthesized pairs. Blurriness-aware image unfolding is achieved by integrating blur relevant information contained in the representation into a base unfolding network. The integration is mainly realized by the proposed blurriness-guided modulation and multi-scale aggregation modules. Experiments on GOPRO and HQF datasets show favorable performance of the proposed method against state-of-the-art approaches. More results on real-world data validate its effectiveness in recovering a sequence of latent sharp frames from a blurry image.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5824-5836"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142362791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0