IEEE Transactions on Image Processing最新文献_第4页

Recent Advances in 3D Object Detection in the Era of Deep Neural Networks: A Survey 深度神经网络时代三维目标检测的最新进展

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-11-28 DOI: 10.1109/TIP.2019.2955239

Mohammad Muntasir Rahman, Yanhao Tan, Jian Xue, K. Lu

With the rapid development of deep learning technology and other powerful tools, 3D object detection has made great progress and become one of the fastest growing field in computer vision. Many automated applications such as robotic navigation, autonomous driving, and virtual or augmented reality system require estimation of accurate 3D object location and detection. Under this requirement, many methods have been proposed to improve the performance of 3D object localization and detection. Despite recent efforts, 3D object detection is still a very challenging task due to occlusion, viewpoint variations, scale changes, and limited information in 3D scenes. In this paper, we present a comprehensive review of recent state-of-the-art approaches in 3D object detection technology. We start with some basic concepts, then describe some of the available datasets that are designed to facilitate the performance evaluation of 3D object detection algorithms. Next, we will review the state-of-the-art technologies in this area, highlighting their contributions, importance, and limitations as a guide for future research. Finally, we provide a quantitative comparison of the results of the state-of-the-art methods on the popular public datasets.

随着深度学习技术和其他强大工具的快速发展，三维物体检测取得了长足的进步，成为计算机视觉发展最快的领域之一。许多自动化应用，如机器人导航、自动驾驶和虚拟或增强现实系统，都需要精确的3D对象位置和检测的估计。在这种要求下，已经提出了许多方法来提高3D对象定位和检测的性能。尽管最近做出了努力，但由于3D场景中的遮挡、视点变化、比例变化和信息有限，3D对象检测仍然是一项极具挑战性的任务。在这篇论文中，我们对三维物体检测技术的最新技术进行了全面的综述。我们从一些基本概念开始，然后描述一些可用的数据集，这些数据集旨在促进3D对象检测算法的性能评估。接下来，我们将回顾该领域最先进的技术，强调它们的贡献、重要性和局限性，作为未来研究的指南。最后，我们在流行的公共数据集上对最先进的方法的结果进行了定量比较。

{"title":"Recent Advances in 3D Object Detection in the Era of Deep Neural Networks: A Survey","authors":"Mohammad Muntasir Rahman, Yanhao Tan, Jian Xue, K. Lu","doi":"10.1109/TIP.2019.2955239","DOIUrl":"https://doi.org/10.1109/TIP.2019.2955239","url":null,"abstract":"With the rapid development of deep learning technology and other powerful tools, 3D object detection has made great progress and become one of the fastest growing field in computer vision. Many automated applications such as robotic navigation, autonomous driving, and virtual or augmented reality system require estimation of accurate 3D object location and detection. Under this requirement, many methods have been proposed to improve the performance of 3D object localization and detection. Despite recent efforts, 3D object detection is still a very challenging task due to occlusion, viewpoint variations, scale changes, and limited information in 3D scenes. In this paper, we present a comprehensive review of recent state-of-the-art approaches in 3D object detection technology. We start with some basic concepts, then describe some of the available datasets that are designed to facilitate the performance evaluation of 3D object detection algorithms. Next, we will review the state-of-the-art technologies in this area, highlighting their contributions, importance, and limitations as a guide for future research. Finally, we provide a quantitative comparison of the results of the state-of-the-art methods on the popular public datasets.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"2947-2962"},"PeriodicalIF":10.6,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2955239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41748035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

High-Order Feature Learning for Multi-Atlas Based Label Fusion: Application to Brain Segmentation With MRI 基于多图谱标签融合的高阶特征学习:在MRI脑分割中的应用

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-11-12 DOI: 10.1109/TIP.2019.2952079

Liang Sun, Wei Shao, Mingliang Wang, Daoqiang Zhang, Mingxia Liu

Multi-atlas based segmentation methods have shown their effectiveness in brain regions-of-interesting (ROIs) segmentation, by propagating labels from multiple atlases to a target image based on the similarity between patches in the target image and multiple atlas images. Most of the existing multi-atlas based methods use image intensity features to calculate the similarity between a pair of image patches for label fusion. In particular, using only low-level image intensity features cannot adequately characterize the complex appearance patterns (e.g., the high-order relationship between voxels within a patch) of brain magnetic resonance (MR) images. To address this issue, this paper develops a high-order feature learning framework for multi-atlas based label fusion, where high-order features of image patches are extracted and fused for segmenting ROIs of structural brain MR images. Specifically, an unsupervised feature learning method (i.e., means-covariances restricted Boltzmann machine, mcRBM) is employed to learn high-order features (i.e., mean and covariance features) of patches in brain MR images. Then, a group-fused sparsity dictionary learning method is proposed to jointly calculate the voting weights for label fusion, based on the learned high-order and the original image intensity features. The proposed method is compared with several state-of-the-art label fusion methods on ADNI, NIREP and LONI-LPBA40 datasets. The Dice ratio achieved by our method is 88.30%, 88.83%, 79.54% and 81.02% on left and right hippocampus on the ADNI, NIREP and LONI-LPBA40 datasets, respectively, while the best Dice ratio yielded by the other methods are 86.51%, 87.39%, 78.48% and 79.65% on three datasets, respectively.

基于多地图集的分割方法在大脑感兴趣区域(roi)分割中显示了其有效性，该方法是基于目标图像和多个地图集图像中的斑块之间的相似性，将多个地图集中的标签传播到目标图像中。现有的基于多图谱的方法大多采用图像强度特征来计算一对图像补丁之间的相似度进行标签融合。特别是，仅使用低水平图像强度特征不能充分表征脑磁共振(MR)图像的复杂外观模式(例如，斑块内体素之间的高阶关系)。为了解决这一问题，本文开发了一种高阶特征学习框架，用于基于多图谱的标签融合，提取和融合图像斑块的高阶特征，用于分割结构脑MR图像的roi。具体而言，采用无监督特征学习方法(即均值-协方差受限玻尔兹曼机，mcRBM)学习脑MR图像中斑块的高阶特征(即均值和协方差特征)。然后，基于学习到的高阶特征和原始图像强度特征，提出了一种组融合稀疏字典学习方法，共同计算标签融合的投票权重;并在ADNI、NIREP和LONI-LPBA40数据集上与几种最新的标签融合方法进行了比较。在ADNI、NIREP和LONI-LPBA40数据集上，我们的方法在左右海马体上获得的Dice ratio分别为88.30%、88.83%、79.54%和81.02%，而其他方法在三个数据集上获得的最佳Dice ratio分别为86.51%、87.39%、78.48%和79.65%。

{"title":"High-Order Feature Learning for Multi-Atlas Based Label Fusion: Application to Brain Segmentation With MRI","authors":"Liang Sun, Wei Shao, Mingliang Wang, Daoqiang Zhang, Mingxia Liu","doi":"10.1109/TIP.2019.2952079","DOIUrl":"https://doi.org/10.1109/TIP.2019.2952079","url":null,"abstract":"Multi-atlas based segmentation methods have shown their effectiveness in brain regions-of-interesting (ROIs) segmentation, by propagating labels from multiple atlases to a target image based on the similarity between patches in the target image and multiple atlas images. Most of the existing multi-atlas based methods use image intensity features to calculate the similarity between a pair of image patches for label fusion. In particular, using only low-level image intensity features cannot adequately characterize the complex appearance patterns (e.g., the high-order relationship between voxels within a patch) of brain magnetic resonance (MR) images. To address this issue, this paper develops a high-order feature learning framework for multi-atlas based label fusion, where high-order features of image patches are extracted and fused for segmenting ROIs of structural brain MR images. Specifically, an unsupervised feature learning method (i.e., means-covariances restricted Boltzmann machine, mcRBM) is employed to learn high-order features (i.e., mean and covariance features) of patches in brain MR images. Then, a group-fused sparsity dictionary learning method is proposed to jointly calculate the voting weights for label fusion, based on the learned high-order and the original image intensity features. The proposed method is compared with several state-of-the-art label fusion methods on ADNI, NIREP and LONI-LPBA40 datasets. The Dice ratio achieved by our method is 88.30%, 88.83%, 79.54% and 81.02% on left and right hippocampus on the ADNI, NIREP and LONI-LPBA40 datasets, respectively, while the best Dice ratio yielded by the other methods are 86.51%, 87.39%, 78.48% and 79.65% on three datasets, respectively.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"2702-2713"},"PeriodicalIF":10.6,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2952079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62591284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Repeated Look-up Tables. 重复查找表

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-29 DOI: 10.1109/TIP.2019.2949245

Erik Reinhard, Elena Garces, Jurgen Stauder

Efficient hardware implementations routinely approximate mathematical functions with look-up tables, while keeping the error of the approximation under control. For a certain class of commonly occurring 1D functions, namely monotonically increasing or decreasing functions, we found that it is possible to approximate such functions by repeated application of a very low resolution 1D look-up table. There are many advantages to cascading multiple identical LUTs, including the promise of a very simple hardware design and the use of standard linear interpolation. Further, the complexity associated with unequal bin sizes can be avoided. We show that for realistic applications, including gamma correction, high dynamic range encoding and decoding curves, as well as tone mapping and inverse tone mapping applications, multiple cascaded look-up tables can reduce the approximation error by more than 50% compared to a single look-up table with the same total memory footprint.

高效的硬件实现通常使用查找表近似数学函数，同时控制近似误差。对于常见的某类一维函数，即单调递增或递减函数，我们发现可以通过重复应用分辨率极低的一维查找表来近似这类函数。级联多个相同的 LUT 有很多优点，其中包括硬件设计非常简单和使用标准线性插值。此外，还可以避免与不相等的 bin 大小相关的复杂性。我们的研究表明，在实际应用中，包括伽玛校正、高动态范围编码和解码曲线，以及音调映射和反音调映射应用中，多个级联查找表与总内存占用相同的单个查找表相比，可减少 50% 以上的近似误差。

引用次数: 0

Two-Dimensional Quaternion Sparse Discriminant Analysis. 二维四元稀疏判别分析

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-28 DOI: 10.1109/TIP.2019.2947775

Xiaolin Xiao, Yongyong Chen, Yue-Jiao Gong, Yicong Zhou

Linear discriminant analysis has been incorporated with various representations and measurements for dimension reduction and feature extraction. In this paper, we propose two-dimensional quaternion sparse discriminant analysis (2D-QSDA) that meets the requirements of representing RGB and RGB-D images. 2D-QSDA advances in three aspects: 1) including sparse regularization, 2D-QSDA relies only on the important variables, and thus shows good generalization ability to the out-of-sample data which are unseen during the training phase; 2) benefited from quaternion representation, 2D-QSDA well preserves the high order correlation among different image channels and provides a unified approach to extract features from RGB and RGB-D images; 3) the spatial structure of the input images is retained via the matrix-based processing. We tackle the constrained trace ratio problem of 2D-QSDA by solving a corresponding constrained trace difference problem, which is then transformed into a quaternion sparse regression (QSR) model. Afterward, we reformulate the QSR model to an equivalent complex form to avoid the processing of the complicated structure of quaternions. A nested iterative algorithm is designed to learn the solution of 2D-QSDA in the complex space and then we convert this solution back to the quaternion domain. To improve the separability of 2D-QSDA, we further propose 2D-QSDAw using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDAw compared with peer competitors.

线性判别分析已与各种表示方法和测量方法相结合，用于降维和特征提取。本文提出的二维四元数稀疏判别分析（2D-QSDA）可满足表示 RGB 和 RGB-D 图像的要求。2D-QSDA 在三个方面取得了进展：1）通过稀疏正则化，2D-QSDA 仅依赖于重要变量，因此对训练阶段未见的样本外数据具有良好的泛化能力；2）得益于四元数表示，2D-QSDA 很好地保留了不同图像通道之间的高阶相关性，为从 RGB 和 RGB-D 图像中提取特征提供了一种统一的方法；3）通过基于矩阵的处理保留了输入图像的空间结构。我们通过求解相应的受约束迹差问题来解决 2D-QSDA 的受约束迹比问题，然后将其转化为四元数稀疏回归（QSR）模型。之后，我们将 QSR 模型重新表述为等效复数形式，以避免处理复杂的四元数结构。我们设计了一种嵌套迭代算法来学习复数空间中的二维-QSDA 解，然后将此解转换回四元数域。为了提高 2D-QSDA 的可分离性，我们进一步提出了使用加权成对类间距离的 2D-QSDAw。在 RGB 和 RGB-D 数据库上进行的大量实验证明，与同类竞争产品相比，2D-QSDA 和 2D-QSDAw 非常有效。

{"title":"Two-Dimensional Quaternion Sparse Discriminant Analysis.","authors":"Xiaolin Xiao, Yongyong Chen, Yue-Jiao Gong, Yicong Zhou","doi":"10.1109/TIP.2019.2947775","DOIUrl":"10.1109/TIP.2019.2947775","url":null,"abstract":"Linear discriminant analysis has been incorporated with various representations and measurements for dimension reduction and feature extraction. In this paper, we propose two-dimensional quaternion sparse discriminant analysis (2D-QSDA) that meets the requirements of representing RGB and RGB-D images. 2D-QSDA advances in three aspects: 1) including sparse regularization, 2D-QSDA relies only on the important variables, and thus shows good generalization ability to the out-of-sample data which are unseen during the training phase; 2) benefited from quaternion representation, 2D-QSDA well preserves the high order correlation among different image channels and provides a unified approach to extract features from RGB and RGB-D images; 3) the spatial structure of the input images is retained via the matrix-based processing. We tackle the constrained trace ratio problem of 2D-QSDA by solving a corresponding constrained trace difference problem, which is then transformed into a quaternion sparse regression (QSR) model. Afterward, we reformulate the QSR model to an equivalent complex form to avoid the processing of the complicated structure of quaternions. A nested iterative algorithm is designed to learn the solution of 2D-QSDA in the complex space and then we convert this solution back to the quaternion domain. To improve the separability of 2D-QSDA, we further propose 2D-QSDAw using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDAw compared with peer competitors.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BMAN: Bidirectional Multi-scale Aggregation Networks for Abnormal Event Detection. BMAN：用于异常事件检测的双向多尺度聚合网络。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-24 DOI: 10.1109/TIP.2019.2948286

Sangmin Lee, Hak Gu Kim, Yong Man Ro

Abnormal event detection is an important task in video surveillance systems. In this paper, we propose a novel bidirectional multi-scale aggregation networks (BMAN) for abnormal event detection. The proposed BMAN learns spatiotemporal patterns of normal events to detect deviations from the learned normal patterns as abnormalities. The BMAN consists of two main parts: an inter-frame predictor and an appearancemotion joint detector. The inter-frame predictor is devised to encode normal patterns, which generates an inter-frame using bidirectional multi-scale aggregation based on attention. With the feature aggregation, robustness for object scale variations and complex motions is achieved in normal pattern encoding. Based on the encoded normal patterns, abnormal events are detected by the appearance-motion joint detector in which both appearance and motion characteristics of scenes are considered. Comprehensive experiments are performed, and the results show that the proposed method outperforms the existing state-of-the-art methods. The resulting abnormal event detection is interpretable on the visual basis of where the detected events occur. Further, we validate the effectiveness of the proposed network designs by conducting ablation study and feature visualization.

异常事件检测是视频监控系统中的一项重要任务。本文提出了一种用于异常事件检测的新型双向多尺度聚合网络（BMAN）。所提出的 BMAN 可以学习正常事件的时空模式，并将偏离所学正常模式的事件检测为异常事件。BMAN 主要由两部分组成：帧间预测器和外观运动联合检测器。帧间预测器的设计目的是对正常模式进行编码，它利用基于注意力的双向多尺度聚合生成帧间预测器。通过特征聚合，在正常模式编码中实现了对物体尺度变化和复杂运动的鲁棒性。根据编码的正常模式，通过外观-运动联合检测器来检测异常事件，该检测器同时考虑了场景的外观和运动特征。实验结果表明，所提出的方法优于现有的先进方法。由此产生的异常事件检测结果可以根据检测到的事件发生位置进行可视化解释。此外，我们还通过烧蚀研究和特征可视化验证了所提网络设计的有效性。

{"title":"BMAN: Bidirectional Multi-scale Aggregation Networks for Abnormal Event Detection.","authors":"Sangmin Lee, Hak Gu Kim, Yong Man Ro","doi":"10.1109/TIP.2019.2948286","DOIUrl":"10.1109/TIP.2019.2948286","url":null,"abstract":"Abnormal event detection is an important task in video surveillance systems. In this paper, we propose a novel bidirectional multi-scale aggregation networks (BMAN) for abnormal event detection. The proposed BMAN learns spatiotemporal patterns of normal events to detect deviations from the learned normal patterns as abnormalities. The BMAN consists of two main parts: an inter-frame predictor and an appearancemotion joint detector. The inter-frame predictor is devised to encode normal patterns, which generates an inter-frame using bidirectional multi-scale aggregation based on attention. With the feature aggregation, robustness for object scale variations and complex motions is achieved in normal pattern encoding. Based on the encoded normal patterns, abnormal events are detected by the appearance-motion joint detector in which both appearance and motion characteristics of scenes are considered. Comprehensive experiments are performed, and the results show that the proposed method outperforms the existing state-of-the-art methods. The resulting abnormal event detection is interpretable on the visual basis of where the detected events occur. Further, we validate the effectiveness of the proposed network designs by conducting ablation study and feature visualization.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low cost gaze estimation: knowledge-based solutions. 低成本凝视估算：基于知识的解决方案。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-18 DOI: 10.1109/TIP.2019.2946452

Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva

Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.

迄今为止，低分辨率场景下的眼动追踪技术尚未完全解决。能否在移动设备中使用眼动追踪技术是一个具有挑战性的目标，这将有助于把这项技术推广到尚未开发的领域。本文提出了一种基于知识的方法来解决低分辨率环境下的注视估计问题。通过对高分辨率范例的理解，可以提出解决注视估计问题的替代模型。因此，本文提出了三种模型：几何模型、插值模型和复合模型，作为远程低分辨率系统注视估算的解决方案。由于这项工作认为头部位置对提高凝视精度至关重要，因此还提出了一种头部姿态估算方法。这些方法在一个结合了头部和注视数据的最佳框架 I2Head 数据库中进行了验证。模型的实验验证证明了它们对图像处理误差的敏感性，这对几何模型至关重要。对静态和极端运动场景进行的分析表明，在存在用户位移的情况下，复合模型和几何模型具有更高的鲁棒性。获得的精确度值约为 3°，在极端位移情况下，精确度值接近 5°，完全可以与最先进的结果相媲美。

{"title":"Low cost gaze estimation: knowledge-based solutions.","authors":"Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva","doi":"10.1109/TIP.2019.2946452","DOIUrl":"10.1109/TIP.2019.2946452","url":null,"abstract":"Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences. 用于复杂视频序列背景初始化的自运动辅助张量完成法

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-17 DOI: 10.1109/TIP.2019.2946098

Ibrahim Kajo, Nidal Kamel, Yassine Ruichek

The background Initialization (BI) problem has attracted the attention of researchers in different image/video processing fields. Recently, a tensor-based technique called spatiotemporal slice-based singular value decomposition (SS-SVD) has been proposed for background initialization. SS-SVD applies the SVD on the tensor slices and estimates the background from low-rank information. Despite its efficiency in background initialization, the performance of SS-SVD requires further improvement in the case of complex sequences with challenges such as stationary foreground objects (SFOs), illumination changes, low frame-rate, and clutter. In this paper, a self-motion-assisted tensor completion method is proposed to overcome the limitations of SS-SVD in complex video sequences and enhance the visual appearance of the initialized background. With the proposed method, the motion information, extracted from the sparse portion of the tensor slices, is incorporated with the low-rank information of SS-SVD to eliminate existing artifacts in the initiated background. Efficient blending schemes between the low-rank (background) and sparse (foreground) information of the tensor slices is developed for scenarios such as SFO removal, lighting variation processing, low frame-rate processing, crowdedness estimation, and best frame selection. The performance of the proposed method on video sequences with complex scenarios is compared with the top-ranked state-of-the-art techniques in the field of background initialization. The results not only validate the improved performance over the majority of the tested challenges but also demonstrate the capability of the proposed method to initialize the background in less computational time.

背景初始化（BI）问题已引起不同图像/视频处理领域研究人员的关注。最近，有人提出了一种基于张量的背景初始化技术，称为时空切片奇异值分解（SS-SVD）。SS-SVD 将 SVD 应用于张量切片，并从低秩信息中估计背景。尽管 SS-SVD 在背景初始化方面效率很高，但在复杂序列的情况下，SS-SVD 的性能还需要进一步提高，因为复杂序列会面临静止前景物体 (SFO)、光照变化、低帧频和杂波等挑战。本文提出了一种自运动辅助张量补全方法，以克服 SS-SVD 在复杂视频序列中的局限性，并增强初始化背景的视觉外观。通过该方法，从张量切片稀疏部分提取的运动信息与 SS-SVD 的低秩信息相结合，消除了初始化背景中的现有伪影。在张量切片的低秩信息（背景）和稀疏信息（前景）之间开发了有效的混合方案，适用于 SFO 消除、光照变化处理、低帧率处理、拥挤度估计和最佳帧选择等场景。所提方法在复杂场景视频序列上的性能与背景初始化领域排名靠前的先进技术进行了比较。结果不仅验证了该方法在大多数测试挑战中的性能提升，还证明了该方法能够在更短的计算时间内完成背景初始化。

{"title":"Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences.","authors":"Ibrahim Kajo, Nidal Kamel, Yassine Ruichek","doi":"10.1109/TIP.2019.2946098","DOIUrl":"10.1109/TIP.2019.2946098","url":null,"abstract":"The background Initialization (BI) problem has attracted the attention of researchers in different image/video processing fields. Recently, a tensor-based technique called spatiotemporal slice-based singular value decomposition (SS-SVD) has been proposed for background initialization. SS-SVD applies the SVD on the tensor slices and estimates the background from low-rank information. Despite its efficiency in background initialization, the performance of SS-SVD requires further improvement in the case of complex sequences with challenges such as stationary foreground objects (SFOs), illumination changes, low frame-rate, and clutter. In this paper, a self-motion-assisted tensor completion method is proposed to overcome the limitations of SS-SVD in complex video sequences and enhance the visual appearance of the initialized background. With the proposed method, the motion information, extracted from the sparse portion of the tensor slices, is incorporated with the low-rank information of SS-SVD to eliminate existing artifacts in the initiated background. Efficient blending schemes between the low-rank (background) and sparse (foreground) information of the tensor slices is developed for scenarios such as SFO removal, lighting variation processing, low frame-rate processing, crowdedness estimation, and best frame selection. The performance of the proposed method on video sequences with complex scenarios is compared with the top-ranked state-of-the-art techniques in the field of background initialization. The results not only validate the improved performance over the majority of the tested challenges but also demonstrate the capability of the proposed method to initialize the background in less computational time.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm. 通过新的张量光谱 k 支持规范实现稳健的低张量最小化

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-15 DOI: 10.1109/TIP.2019.2946445

Jian Lou, Yiu-Ming Cheung

Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.

最近，基于一种新的三阶张量代数框架，张量奇异值分解（t-SVD）及其相关的管阶定义为低阶张量建模带来了新的启示。由于其在跨信道/帧信息建模方面的卓越能力，它在鲁棒图像/视频恢复和背景建模方面的应用显示出良好的性能。在 t-SVD 框架下，我们通过另一种凸松弛方法提出了一种新的张量规范，称为张量谱 k 支持规范（TSP-k）。作为现有的张量核规范（TNN）和张量弗罗贝尼斯规范（TFN）之间的一种插值，它能同时将次要奇异值归零以诱导低rankness，并捕获更多全局信息以更好地保留内在结构。我们提供了 TSP-k 规范的近算子和极算子作为关键的优化模块，以及两种针对中型和大型张量的展示优化算法。在中型和大型合成、图像和视频数据集上进行的实验都验证了 TSP-k 准则的优越性，以及这两种优化方法与现有同类方法相比的有效性。

{"title":"Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm.","authors":"Jian Lou, Yiu-Ming Cheung","doi":"10.1109/TIP.2019.2946445","DOIUrl":"10.1109/TIP.2019.2946445","url":null,"abstract":"Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Sequence Recurrent Neural Network for Vision-based Freezing of Gait Detection. 基于视觉的冻结步态检测的图序列循环神经网络

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-15 DOI: 10.1109/TIP.2019.2946469

Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng

Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.

步态冻结（FoG）是帕金森病（PD）最常见的症状之一，帕金森病是一种影响全球数百万人的神经退行性疾病。准确评估 FoG 对于帕金森病的管理和评估治疗效果至关重要。目前，FoG 评估需要训练有素的专家通过视觉观察进行耗时的注释。因此，我们需要自动 FoG 检测算法。在本研究中，我们将基于视觉的 FoG 检测表述为细粒度图序列建模任务，通过有向图表示每个时间片段中的解剖关节，因为 FoG 事件可以通过关节的运动模式观察到。本文提出了一种新颖的深度学习方法，即图序列递归神经网络（GS-RNN），通过设计以动态结构图序列为输入的图递归单元来表征 FoG 模式。针对没有先验边注释的情况，进一步提出了基于数据驱动的邻接估计方法。据我们所知，这是利用专为动态结构图序列设计的深度神经网络进行基于视觉的 FoG 检测的首批研究之一。对从 45 名患者身上收集的 150 多段视频进行的实验结果表明，所提出的 GS-RNN 在 FoG 检测方面表现出色，AUC 值达到 0.90。

{"title":"Graph Sequence Recurrent Neural Network for Vision-based Freezing of Gait Detection.","authors":"Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng","doi":"10.1109/TIP.2019.2946469","DOIUrl":"10.1109/TIP.2019.2946469","url":null,"abstract":"Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain. 离散小波变换域中 DIBR 合成图像的盲质量度量。

IF 10.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing

Pub Date : 2019-10-10 DOI: 10.1109/TIP.2019.2945675

Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu

Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the "blind" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.

自由视角视频（FVV）因其在沉浸式娱乐、远程监控和远程教育等多个领域的广泛应用而备受关注。由于 FVV 图像是在 "盲 "环境（无参考图像）下通过基于深度图像的渲染（DIBR）程序合成的，因此迫切需要一种实时可靠的盲点质量评估指标。然而，现有的图像质量评估指标对 DIBR 产生的几何失真不敏感。本研究提出了一种基于测量几何失真、全局清晰度和图像复杂度的 DIBR 合成图像盲法。首先，利用离散小波变换将 DIBR 合成图像分解成小波子带。然后，利用 Canny 算子检测二值化的低频子带和高频子带的边缘。进一步计算二值化低频子带和高频子带之间的边缘相似度，以量化 DIBR 合成图像中的几何失真。第二，计算小波子带的对数能量，以评估 DIBR 合成图像的全局清晰度。第三，采用自回归滤波器和双边滤波器相结合的混合滤波器来计算图像复杂度。最后，通过图像复杂度对几何失真和全局清晰度进行归一化处理，得出总体质量得分。实验表明，我们提出的质量方法优于同类无参考的最先进 DIBR 合成图像质量模型。

{"title":"Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain.","authors":"Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu","doi":"10.1109/TIP.2019.2945675","DOIUrl":"10.1109/TIP.2019.2945675","url":null,"abstract":"Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the \"blind\" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0