首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
A Memory- and Accuracy-Aware Gaussian Parameter-Based Stereo Matching Using Confidence Measure. 基于置信度的高斯参数立体匹配的记忆和精度感知。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-06-01 Epub Date: 2021-05-11 DOI: 10.1109/TPAMI.2019.2959613
Yeongmin Lee, Chong-Min Kyung

Accurate stereo matching requires a large amount of memory at a high bandwidth, which restricts its use in resource-limited systems such as mobile devices. This problem is compounded by the recent trend of applications requiring significantly high pixel resolution and disparity levels. To alleviate this, we present a memory-efficient and robust stereo matching algorithm. For cost aggregation, we employ the semiglobal parametric approach, which significantly reduces the memory bandwidth by representing the costs of all disparities as a Gaussian mixture model. All costs on multiple paths in an image are aggregated by updating the Gaussian parameters. The aggregation is performed during the scanning in the forward and backward directions. To reduce the amount of memory for the intermediate results during the forward scan, we suggest to store only the Gaussian parameters which contribute significantly to the final disparity selection. We also propose a method to enhance the overall procedure through a learning-based confidence measure. The random forest framework is used to train various features which are extracted from the cost and intensity profile. The experimental results on KITTI dataset show that the proposed method reduces the memory requirement to less than 3 percent of that of semiglobal matching (SGM) while providing a robust depth map compared to those of state-of-the-art SGM-based algorithms.

精确的立体匹配需要大量的内存和高带宽,这限制了其在移动设备等资源有限的系统中的应用。这个问题是由最近的应用趋势需要显著的高像素分辨率和视差水平。为了解决这一问题,我们提出了一种高效、鲁棒的立体匹配算法。对于成本聚合,我们采用半全局参数方法,该方法通过将所有差异的成本表示为高斯混合模型来显着减少内存带宽。通过更新高斯参数对图像中多条路径上的所有代价进行聚合。聚合是在正向和反向扫描时进行的。为了减少前向扫描期间中间结果的内存量,我们建议只存储对最终视差选择有重要贡献的高斯参数。我们还提出了一种通过基于学习的信心测量来增强整体过程的方法。随机森林框架用于训练从代价和强度曲线中提取的各种特征。在KITTI数据集上的实验结果表明,与目前基于半全局匹配的算法相比,该方法将内存需求降低到半全局匹配(SGM)的3%以下,同时提供了鲁棒的深度图。
{"title":"A Memory- and Accuracy-Aware Gaussian Parameter-Based Stereo Matching Using Confidence Measure.","authors":"Yeongmin Lee,&nbsp;Chong-Min Kyung","doi":"10.1109/TPAMI.2019.2959613","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2959613","url":null,"abstract":"<p><p>Accurate stereo matching requires a large amount of memory at a high bandwidth, which restricts its use in resource-limited systems such as mobile devices. This problem is compounded by the recent trend of applications requiring significantly high pixel resolution and disparity levels. To alleviate this, we present a memory-efficient and robust stereo matching algorithm. For cost aggregation, we employ the semiglobal parametric approach, which significantly reduces the memory bandwidth by representing the costs of all disparities as a Gaussian mixture model. All costs on multiple paths in an image are aggregated by updating the Gaussian parameters. The aggregation is performed during the scanning in the forward and backward directions. To reduce the amount of memory for the intermediate results during the forward scan, we suggest to store only the Gaussian parameters which contribute significantly to the final disparity selection. We also propose a method to enhance the overall procedure through a learning-based confidence measure. The random forest framework is used to train various features which are extracted from the cost and intensity profile. The experimental results on KITTI dataset show that the proposed method reduces the memory requirement to less than 3 percent of that of semiglobal matching (SGM) while providing a robust depth map compared to those of state-of-the-art SGM-based algorithms.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1845-1858"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2959613","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37484221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High Speed and High Dynamic Range Video with an Event Camera. 高速和高动态范围视频与事件相机。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-06-01 Epub Date: 2021-05-11 DOI: 10.1109/TPAMI.2019.2963386
Henri Rebecq, Rene Ranftl, Vladlen Koltun, Davide Scaramuzza

Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.

事件相机是一种新颖的传感器,它以异步“事件”流的形式报告亮度变化,而不是强度帧。与传统相机相比,它们具有显著的优势:高时间分辨率,高动态范围,无运动模糊。虽然事件流原则上编码了完整的视觉信号,但从事件流中重建强度图像在实践中是一个不适定问题。现有的重建方法是基于手工制作的先验和对成像过程的强假设以及自然图像的统计。在这项工作中,我们建议学习直接从数据中重建事件流的强度图像,而不是依赖于任何手工制作的先验。我们提出了一种新的循环网络来从事件流中重构视频,并在大量的模拟事件数据上对其进行训练。在训练过程中,我们建议使用感知损失来鼓励重建遵循自然图像统计。我们进一步扩展了从彩色事件流合成彩色图像的方法。我们的定量实验表明,我们的网络在图像质量()方面大大超过了最先进的重建方法,同时可以舒适地实时运行。我们表明,该网络能够合成高速现象(例如,子弹击中物体)的高帧率视频(每秒帧数),并能够在具有挑战性的照明条件下提供高动态范围重建。作为额外的贡献,我们证明了我们的重建作为事件数据的中间表示的有效性。我们表明,现成的计算机视觉算法可以应用于我们的任务重建,如对象分类和视觉惯性里程计,并且这种策略始终优于专门为事件数据设计的算法。我们发布了重建代码、预训练模型和数据集,以便进一步研究。
{"title":"High Speed and High Dynamic Range Video with an Event Camera.","authors":"Henri Rebecq,&nbsp;Rene Ranftl,&nbsp;Vladlen Koltun,&nbsp;Davide Scaramuzza","doi":"10.1109/TPAMI.2019.2963386","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2963386","url":null,"abstract":"<p><p>Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous \"events\" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1964-1980"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2963386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37512254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 325
Geodesic Multi-Class SVM with Stiefel Manifold Embedding. 使用 Stiefel Manifold 嵌入的 Geodesic Multi-Class SVM。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-03-30 DOI: 10.1109/TPAMI.2021.3069498
Rui Zhang, Xuelong Li, Hongyuan Zhang, Ziheng Jiao

Manifold of geodesic plays an essential role in characterizing the intrinsic data geometry. However, the existing SVM methods have largely neglected the manifold structure. As such, functional degeneration may occur due to the potential polluted training. Even worse, the entire SVM model might collapse in the presence of excessive training contamination. To address these issues, this paper devises a manifold SVM method based on a novel ξ -measure geodesic, whose primary design objective is to extract and preserve the data manifold structure in the presence of training noises. To further cope with overly contaminated training data, we introduce Kullback-Leibler (KL) regularization with steerable sparsity constraint. In this way, each loss weight is adaptively obtained by obeying the prior distribution and sparse activation during model training for robust fitting. Moreover, the optimal scale for Stiefel manifold can be automatically learned to improve the model flexibility. Accordingly, extensive experiments verify and validate the superiority of the proposed method.

测地线的流形在描述内在数据几何特征方面起着至关重要的作用。然而,现有的 SVM 方法在很大程度上忽略了流形结构。因此,潜在的污染训练可能会导致功能退化。更糟糕的是,在训练污染过度的情况下,整个 SVM 模型可能会崩溃。为了解决这些问题,本文设计了一种基于新颖的 ξ 测量大地线的流形 SVM 方法,其主要设计目标是在存在训练噪声的情况下提取并保留数据流形结构。为了进一步应对过度污染的训练数据,我们引入了带有可操纵稀疏性约束的库尔巴克-莱伯勒(KL)正则化。这样,在模型训练过程中,每个损失权重都能通过服从先验分布和稀疏激活自适应地获得,从而实现鲁棒拟合。此外,还可以自动学习 Stiefel 流形的最佳尺度,以提高模型的灵活性。因此,大量实验验证了所提方法的优越性。
{"title":"Geodesic Multi-Class SVM with Stiefel Manifold Embedding.","authors":"Rui Zhang, Xuelong Li, Hongyuan Zhang, Ziheng Jiao","doi":"10.1109/TPAMI.2021.3069498","DOIUrl":"10.1109/TPAMI.2021.3069498","url":null,"abstract":"<p><p>Manifold of geodesic plays an essential role in characterizing the intrinsic data geometry. However, the existing SVM methods have largely neglected the manifold structure. As such, functional degeneration may occur due to the potential polluted training. Even worse, the entire SVM model might collapse in the presence of excessive training contamination. To address these issues, this paper devises a manifold SVM method based on a novel ξ -measure geodesic, whose primary design objective is to extract and preserve the data manifold structure in the presence of training noises. To further cope with overly contaminated training data, we introduce Kullback-Leibler (KL) regularization with steerable sparsity constraint. In this way, each loss weight is adaptively obtained by obeying the prior distribution and sparse activation during model training for robust fitting. Moreover, the optimal scale for Stiefel manifold can be automatically learned to improve the model flexibility. Accordingly, extensive experiments verify and validate the superiority of the proposed method.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25531112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals. FakeCatcher:使用生物信号检测合成肖像视频
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-07-15 DOI: 10.1109/TPAMI.2020.3009287
Umur Aybars Ciftci, Ilke Demir, Lijun Yin

The recent proliferation of fake portrait videos poses direct threats on society, law, and privacy [1]. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first engage several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an "in the wild" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher on several datasets, resulting with 96%, 94.65%, 91.50%, and 91.07% accuracies, on Face Forensics [2], Face Forensics++ [3], CelebDF [4], and on our new Deep Fakes Dataset respectively. In addition, our approach produces a significantly superior detection rate against baselines, and does not depend on the source, generator, or properties of the fake content. We also analyze signals from various facial regions, under image distortions, with varying segment durations, from different generators, against unseen datasets, and under several dimensionality reduction techniques.

近年来,虚假肖像视频的泛滥对社会、法律和隐私造成了直接威胁[1]。相信政客的虚假视频、传播名人的虚假色情内容、编造冒充的虚假视频作为法庭证据,这些都是深度造假在现实世界中造成的后果。我们提出了一种检测肖像视频中合成内容的新方法,作为应对深度造假这一新兴威胁的预防性解决方案。换句话说,我们引入了深度伪造检测器。我们发现,盲目利用深度学习的检测器并不能有效捕捉虚假内容,因为生成模型会产生非常逼真的结果。我们的关键论断是,隐藏在人像视频中的生物信号可用作真实性的隐式描述符,因为它们在虚假内容中既没有空间上的保留,也没有时间上的保留。为了证明和利用这一论断,我们首先针对成对分离问题采用了几种信号变换,达到了 99.39% 的准确率。其次,我们利用这些发现,通过分析提出的信号变换和相应的特征集,制定了一个针对虚假内容的通用分类器。第三,我们生成了新的信号图,并利用 CNN 改进了用于检测合成内容的传统分类器。最后,我们发布了一个 "野生 "伪造肖像视频数据集,该数据集是我们在评估过程中收集的。我们在多个数据集上对 FakeCatcher 进行了评估,结果在 Face Forensics [2]、Face Forensics++ [3]、CelebDF [4] 和新的 Deep Fakes 数据集上的准确率分别为 96%、94.65%、91.50% 和 91.07%。此外,我们的方法的检测率明显优于基准方法,而且不依赖于假冒内容的来源、生成器或属性。我们还分析了来自不同面部区域的信号、图像失真情况下的信号、不同片段持续时间下的信号、不同生成器的信号、未见数据集的信号以及多种降维技术的信号。
{"title":"FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals.","authors":"Umur Aybars Ciftci, Ilke Demir, Lijun Yin","doi":"10.1109/TPAMI.2020.3009287","DOIUrl":"10.1109/TPAMI.2020.3009287","url":null,"abstract":"<p><p>The recent proliferation of fake portrait videos poses direct threats on society, law, and privacy [1]. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first engage several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an \"in the wild\" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher on several datasets, resulting with 96%, 94.65%, 91.50%, and 91.07% accuracies, on Face Forensics [2], Face Forensics++ [3], CelebDF [4], and on our new Deep Fakes Dataset respectively. In addition, our approach produces a significantly superior detection rate against baselines, and does not depend on the source, generator, or properties of the fake content. We also analyze signals from various facial regions, under image distortions, with varying segment durations, from different generators, against unseen datasets, and under several dimensionality reduction techniques.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2020-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38228143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method. 差分三维面部识别:将 3D 技术添加到最新的 2D 方法中。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-07-01 Epub Date: 2020-04-13 DOI: 10.1109/TPAMI.2020.2986951
J Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro

Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.

主动照明是增强二维人脸识别能力的一个重要补充,它能使二维人脸识别在欺骗攻击和弱光条件下更加稳健。在本研究中,我们发现可以采用主动照明技术,利用三维特征增强最先进的二维人脸识别方法,同时绕过复杂的三维重建任务。其关键思路是在测试人脸上投射一个高空间频率图案,这样我们就能同时恢复真实的三维信息和标准的二维人脸图像。因此,最先进的二维人脸识别解决方案可以透明地应用,同时从输入图像的高频分量中提取互补的三维面部特征。在 ND-2006 数据集上的实验结果表明,所提出的想法可以显著提高人脸识别性能,并大大提高对欺骗攻击的鲁棒性。
{"title":"Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method.","authors":"J Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro","doi":"10.1109/TPAMI.2020.2986951","DOIUrl":"10.1109/TPAMI.2020.2986951","url":null,"abstract":"<p><p>Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 7","pages":"1582-1593"},"PeriodicalIF":23.6,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7892197/pdf/nihms-1668137.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9150865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table of Contents 目录表
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-07-01 DOI: 10.1109/tpami.2020.2995283
{"title":"Table of Contents","authors":"","doi":"10.1109/tpami.2020.2995283","DOIUrl":"https://doi.org/10.1109/tpami.2020.2995283","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/tpami.2020.2995283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46180837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent Temporal Aggregation Framework for Deep Video Inpainting. 深度视频绘画的循环时间聚合框架。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-05-01 Epub Date: 2019-12-11 DOI: 10.1109/TPAMI.2019.2958083
Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting, and https://github.com/shwoo93/video_decaptioning.

视频绘画的目的是用可信的内容来填补视频的时空漏洞。尽管基于深度学习的单幅图像绘制取得了巨大的进展,但由于额外的时间维度,将这些方法扩展到视频领域仍然具有挑战性。在本文中,我们提出了一种用于快速深度视频绘制的循环时间聚合框架。特别是,我们构建了一个编码器-解码器模型,其中编码器采用多个参考帧,这些参考帧可以提供从场景动态中显示的可见像素。这些提示被聚合并提供给解码器。我们以自回归的方式应用循环反馈来强制视频结果的时间一致性。我们基于这个框架提出了两种建筑设计。我们的第一个模型是一个盲视频捕捉网络(BVDNet),它被设计用于在没有任何掩码信息的情况下自动删除和添加视频中的文本覆盖。我们的BVDNet在ECCV Chalearn 2018 LAP绘画比赛中获得第一名,赛道2:视频字幕。其次,我们提出了一个更通用的视频喷漆网络(VINet)来处理更任意和更大的洞。视频结果表明,与最先进的定性和定量方法相比,我们的框架具有优势。代码可在https://github.com/mcahny/Deep-Video-Inpainting和https://github.com/shwoo93/video_decaptioning上获得。
{"title":"Recurrent Temporal Aggregation Framework for Deep Video Inpainting.","authors":"Dahun Kim,&nbsp;Sanghyun Woo,&nbsp;Joon-Young Lee,&nbsp;In So Kweon","doi":"10.1109/TPAMI.2019.2958083","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2958083","url":null,"abstract":"<p><p>Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting, and https://github.com/shwoo93/video_decaptioning.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 5","pages":"1038-1052"},"PeriodicalIF":23.6,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2958083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37452544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Structured Label Inference for Visual Understanding. 用于视觉理解的结构化标签推理。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-05-01 Epub Date: 2019-01-16 DOI: 10.1109/TPAMI.2019.2893215
Nelson Nauata, Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, Greg Mori

Visual data such as images and videos contain a rich source of structured semantic labels as well as a wide range of interacting components. Visual content could be assigned with fine-grained labels describing major components, coarse-grained labels depicting high level abstractions, or a set of labels revealing attributes. Such categorization over different, interacting layers of labels evinces the potential for a graph-based encoding of label information. In this paper, we exploit this rich structure for performing graph-based inference in label space for a number of tasks: multi-label image and video classification and action detection in untrimmed videos. We consider the use of the Bidirectional Inference Neural Network (BINN) and Structured Inference Neural Network (SINN) for performing graph-based inference in label space and propose a Long Short-Term Memory (LSTM) based extension for exploiting activity progression on untrimmed videos. The methods were evaluated on (i) the Animal with Attributes (AwA), Scene Understanding (SUN) and NUS-WIDE datasets for multi-label image classification, (ii) the first two releases of the YouTube-8M large scale dataset for multi-label video classification, and (iii) the THUMOS'14 and MultiTHUMOS video datasets for action detection. Our results demonstrate the effectiveness of structured label inference in these challenging tasks, achieving significant improvements against baselines.

图像和视频等可视化数据包含丰富的结构化语义标签来源以及广泛的交互组件。可视内容可以使用描述主要组件的细粒度标签、描述高级抽象的粗粒度标签或一组显示属性的标签来分配。这种在不同的、相互作用的标签层上的分类证明了标签信息的基于图的编码的潜力。在本文中,我们利用这种丰富的结构在标签空间中执行基于图的推理,用于许多任务:多标签图像和视频分类以及未修剪视频中的动作检测。我们考虑使用双向推理神经网络(BINN)和结构化推理神经网络(SINN)在标签空间中执行基于图的推理,并提出基于长短期记忆(LSTM)的扩展,用于利用未修剪视频的活动进展。对这些方法进行了评估:(i)用于多标签图像分类的动物属性(AwA)、场景理解(SUN)和NUS-WIDE数据集,(ii)用于多标签视频分类的YouTube-8M大型数据集的前两个版本,以及(iii)用于动作检测的THUMOS'14和MultiTHUMOS视频数据集。我们的结果证明了结构化标签推理在这些具有挑战性的任务中的有效性,实现了对基线的显著改进。
{"title":"Structured Label Inference for Visual Understanding.","authors":"Nelson Nauata,&nbsp;Hexiang Hu,&nbsp;Guang-Tong Zhou,&nbsp;Zhiwei Deng,&nbsp;Zicheng Liao,&nbsp;Greg Mori","doi":"10.1109/TPAMI.2019.2893215","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2893215","url":null,"abstract":"<p><p>Visual data such as images and videos contain a rich source of structured semantic labels as well as a wide range of interacting components. Visual content could be assigned with fine-grained labels describing major components, coarse-grained labels depicting high level abstractions, or a set of labels revealing attributes. Such categorization over different, interacting layers of labels evinces the potential for a graph-based encoding of label information. In this paper, we exploit this rich structure for performing graph-based inference in label space for a number of tasks: multi-label image and video classification and action detection in untrimmed videos. We consider the use of the Bidirectional Inference Neural Network (BINN) and Structured Inference Neural Network (SINN) for performing graph-based inference in label space and propose a Long Short-Term Memory (LSTM) based extension for exploiting activity progression on untrimmed videos. The methods were evaluated on (i) the Animal with Attributes (AwA), Scene Understanding (SUN) and NUS-WIDE datasets for multi-label image classification, (ii) the first two releases of the YouTube-8M large scale dataset for multi-label video classification, and (iii) the THUMOS'14 and MultiTHUMOS video datasets for action detection. Our results demonstrate the effectiveness of structured label inference in these challenging tasks, achieving significant improvements against baselines.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 5","pages":"1257-1271"},"PeriodicalIF":23.6,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2893215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36875059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Shared Multi-View Data Representation for Multi-Domain Event Detection. 面向多域事件检测的共享多视图数据表示。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-05-01 Epub Date: 2019-01-18 DOI: 10.1109/TPAMI.2019.2893953
Zhenguo Yang, Qing Li, Wenyin Liu, Jianming Lv

Internet platforms provide new ways for people to share experiences, generating massive amounts of data related to various real-world concepts. In this paper, we present an event detection framework to discover real-world events from multiple data domains, including online news media and social media. As multi-domain data possess multiple data views that are heterogeneous, initial dictionaries consisting of labeled data samples are exploited to align the multi-view data. Furthermore, a shared multi-view data representation (SMDR) model is devised, which learns underlying and intrinsic structures shared among the data views by considering the structures underlying the data, data variations, and informativeness of dictionaries. SMDR incorpvarious constraints in the objective function, including shared representation, low-rank, local invariance, reconstruction error, and dictionary independence constraints. Given the data representations achieved by SMDR, class-wise residual models are designed to discover the events underlying the data based on the reconstruction residuals. Extensive experiments conducted on two real-world event detection datasets, i.e., Multi-domain and Multi-modality Event Detection dataset, and MediaEval Social Event Detection 2014 dataset, indicating the effectiveness of the proposed approaches.

互联网平台为人们提供了分享经验的新途径,产生了与各种现实世界概念相关的大量数据。在本文中,我们提出了一个事件检测框架,用于从多个数据域发现现实世界的事件,包括在线新闻媒体和社交媒体。由于多域数据具有多个异构的数据视图,因此利用由标记数据样本组成的初始字典来对齐多视图数据。在此基础上,设计了一种共享的多视图数据表示(SMDR)模型,该模型通过考虑数据的底层结构、数据变化和字典的信息量来学习数据视图之间共享的底层和内在结构。SMDR在目标函数中加入了各种约束,包括共享表示约束、低秩约束、局部不变性约束、重构误差约束和字典独立性约束。基于SMDR实现的数据表示,设计了分类残差模型,以基于重建残差发现数据背后的事件。在多域多模态事件检测数据集和中世纪社会事件检测2014数据集两个真实事件检测数据集上进行了大量实验,表明了所提出方法的有效性。
{"title":"Shared Multi-View Data Representation for Multi-Domain Event Detection.","authors":"Zhenguo Yang,&nbsp;Qing Li,&nbsp;Wenyin Liu,&nbsp;Jianming Lv","doi":"10.1109/TPAMI.2019.2893953","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2893953","url":null,"abstract":"<p><p>Internet platforms provide new ways for people to share experiences, generating massive amounts of data related to various real-world concepts. In this paper, we present an event detection framework to discover real-world events from multiple data domains, including online news media and social media. As multi-domain data possess multiple data views that are heterogeneous, initial dictionaries consisting of labeled data samples are exploited to align the multi-view data. Furthermore, a shared multi-view data representation (SMDR) model is devised, which learns underlying and intrinsic structures shared among the data views by considering the structures underlying the data, data variations, and informativeness of dictionaries. SMDR incorpvarious constraints in the objective function, including shared representation, low-rank, local invariance, reconstruction error, and dictionary independence constraints. Given the data representations achieved by SMDR, class-wise residual models are designed to discover the events underlying the data based on the reconstruction residuals. Extensive experiments conducted on two real-world event detection datasets, i.e., Multi-domain and Multi-modality Event Detection dataset, and MediaEval Social Event Detection 2014 dataset, indicating the effectiveness of the proposed approaches.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 5","pages":"1243-1256"},"PeriodicalIF":23.6,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2893953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36928018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Table of Contents 目录表
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2020-05-01 DOI: 10.1109/tpami.2020.2979381
{"title":"Table of Contents","authors":"","doi":"10.1109/tpami.2020.2979381","DOIUrl":"https://doi.org/10.1109/tpami.2020.2979381","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/tpami.2020.2979381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41446029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1