首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Growing a Brain: Fine-Tuning by Increasing Model Capacity 大脑成长:通过增加模型容量进行微调
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.323
Yu-Xiong Wang, Deva Ramanan, M. Hebert
CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset. This is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary visual recognition system makes use of fine-tuning to transfer knowledge from ImageNet. In this work, we analyze what components and parameters change during fine-tuning, and discover that increasing model capacity allows for more natural model adaptation through fine-tuning. By making an analogy to developmental learning, we demonstrate that growing a CNN with additional units, either by widening existing layers or deepening the overall network, significantly outperforms classic fine-tuning approaches. But in order to properly grow a network, we show that newly-added units must be appropriately normalized to allow for a pace of learning that is consistent with existing units. We empirically validate our approach on several benchmark datasets, producing state-of-the-art results.
cnn通过使用大型带注释的训练集学习高容量模型的能力,对计算机视觉产生了不可否认的影响。它们的一个显著特性是能够将知识从大型源数据集转移到(通常较小的)目标数据集。这通常是通过在新目标数据上微调固定大小的网络来实现的。事实上,几乎每一个当代视觉识别系统都利用微调从ImageNet转移知识。在这项工作中,我们分析了微调过程中哪些组件和参数发生了变化,并发现增加模型容量可以通过微调实现更自然的模型适应。通过与发展性学习进行类比,我们证明了通过扩大现有层或深化整个网络来增加额外单元的CNN,显著优于经典的微调方法。但是,为了适当地发展网络,我们表明,必须适当地规范化新添加的单元,以允许与现有单元一致的学习速度。我们在几个基准数据集上验证了我们的方法,产生了最先进的结果。
{"title":"Growing a Brain: Fine-Tuning by Increasing Model Capacity","authors":"Yu-Xiong Wang, Deva Ramanan, M. Hebert","doi":"10.1109/CVPR.2017.323","DOIUrl":"https://doi.org/10.1109/CVPR.2017.323","url":null,"abstract":"CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset. This is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary visual recognition system makes use of fine-tuning to transfer knowledge from ImageNet. In this work, we analyze what components and parameters change during fine-tuning, and discover that increasing model capacity allows for more natural model adaptation through fine-tuning. By making an analogy to developmental learning, we demonstrate that growing a CNN with additional units, either by widening existing layers or deepening the overall network, significantly outperforms classic fine-tuning approaches. But in order to properly grow a network, we show that newly-added units must be appropriately normalized to allow for a pace of learning that is consistent with existing units. We empirically validate our approach on several benchmark datasets, producing state-of-the-art results.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"3029-3038"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79045487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 132
Generative Hierarchical Learning of Sparse FRAME Models 稀疏框架模型的生成层次学习
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.209
Jianwen Xie, Yifei Xu, Erik Nijkamp, Y. Wu, Song-Chun Zhu
This paper proposes a method for generative learning of hierarchical random field models. The resulting model, which we call the hierarchical sparse FRAME (Filters, Random field, And Maximum Entropy) model, is a generalization of the original sparse FRAME model by decomposing it into multiple parts that are allowed to shift their locations, scales and rotations, so that the resulting model becomes a hierarchical deformable template. The model can be trained by an EM-type algorithm that alternates the following two steps: (1) Inference: Given the current model, we match it to each training image by inferring the unknown locations, scales, and rotations of the object and its parts by recursive sum-max maps, and (2) Re-learning: Given the inferred geometric configurations of the objects and their parts, we re-learn the model parameters by maximum likelihood estimation via stochastic gradient algorithm. Experiments show that the proposed method is capable of learning meaningful and interpretable templates that can be used for object detection, classification and clustering.
提出了一种分层随机场模型的生成学习方法。我们将得到的模型称为分层稀疏FRAME (Filters, Random field, And Maximum Entropy)模型,它是对原始稀疏FRAME模型的推广,将其分解为多个部分,这些部分可以移动它们的位置、比例和旋转,从而使得到的模型成为一个分层可变形的模板。该模型可以通过em类型的算法进行训练,该算法交替进行以下两个步骤:(1)推断:给定当前模型,我们通过递归和最大映射推断物体及其部分的未知位置、尺度和旋转,将其与每个训练图像进行匹配;(2)重新学习:给定推断的物体及其部分的几何构型,我们通过随机梯度算法通过最大似然估计重新学习模型参数。实验表明,该方法能够学习有意义且可解释的模板,用于目标检测、分类和聚类。
{"title":"Generative Hierarchical Learning of Sparse FRAME Models","authors":"Jianwen Xie, Yifei Xu, Erik Nijkamp, Y. Wu, Song-Chun Zhu","doi":"10.1109/CVPR.2017.209","DOIUrl":"https://doi.org/10.1109/CVPR.2017.209","url":null,"abstract":"This paper proposes a method for generative learning of hierarchical random field models. The resulting model, which we call the hierarchical sparse FRAME (Filters, Random field, And Maximum Entropy) model, is a generalization of the original sparse FRAME model by decomposing it into multiple parts that are allowed to shift their locations, scales and rotations, so that the resulting model becomes a hierarchical deformable template. The model can be trained by an EM-type algorithm that alternates the following two steps: (1) Inference: Given the current model, we match it to each training image by inferring the unknown locations, scales, and rotations of the object and its parts by recursive sum-max maps, and (2) Re-learning: Given the inferred geometric configurations of the objects and their parts, we re-learn the model parameters by maximum likelihood estimation via stochastic gradient algorithm. Experiments show that the proposed method is capable of learning meaningful and interpretable templates that can be used for object detection, classification and clustering.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"1933-1941"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82400686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Light Field Reconstruction Using Deep Convolutional Network on EPI 基于深度卷积网络的EPI光场重建
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.178
Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, Yebin Liu
In this paper, we take advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. We indicate that one of the main challenges in sparsely sampled light field reconstruction is the information asymmetry between the spatial and angular domain, where the detail portion in the angular domain is damaged by undersampling. To balance the spatial and angular information, the spatial high frequency components of an EPI is removed using EPI blur, before feeding to the network. Finally, a non-blind deblur operation is used to recover the spatial detail suppressed by the EPI blur. We evaluate our approach on several datasets including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with the state-of-the-arts algorithms. We also show a further application for depth enhancement by using the reconstructed light field.
本文利用光场数据中极平面图像(EPI)清晰的纹理结构,将稀疏视图集的光场重建问题建模为基于cnn的EPI角度细节恢复问题。我们指出,稀疏采样光场重建的主要挑战之一是空间域和角域之间的信息不对称,其中角域的细节部分被欠采样破坏。为了平衡空间和角度信息,在输入到网络之前,使用EPI模糊去除EPI的空间高频成分。最后,采用非盲去模糊操作恢复被EPI模糊抑制的空间细节。我们在几个数据集上评估了我们的方法,包括合成场景、真实场景和具有挑战性的显微镜光场数据。与最先进的算法相比,我们证明了所提出框架的高性能和鲁棒性。我们还展示了利用重建光场进行深度增强的进一步应用。
{"title":"Light Field Reconstruction Using Deep Convolutional Network on EPI","authors":"Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, Yebin Liu","doi":"10.1109/CVPR.2017.178","DOIUrl":"https://doi.org/10.1109/CVPR.2017.178","url":null,"abstract":"In this paper, we take advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and model the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. We indicate that one of the main challenges in sparsely sampled light field reconstruction is the information asymmetry between the spatial and angular domain, where the detail portion in the angular domain is damaged by undersampling. To balance the spatial and angular information, the spatial high frequency components of an EPI is removed using EPI blur, before feeding to the network. Finally, a non-blind deblur operation is used to recover the spatial detail suppressed by the EPI blur. We evaluate our approach on several datasets including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with the state-of-the-arts algorithms. We also show a further application for depth enhancement by using the reconstructed light field.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"250 1","pages":"1638-1646"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82915022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
Video Desnowing and Deraining Based on Matrix Decomposition 基于矩阵分解的视频去噪与去噪
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.303
Weihong Ren, Jiandong Tian, Zhi Han, Antoni B. Chan, Yandong Tang
The existing snow/rain removal methods often fail for heavy snow/rain and dynamic scene. One reason for the failure is due to the assumption that all the snowflakes/rain streaks are sparse in snow/rain scenes. The other is that the existing methods often can not differentiate moving objects and snowflakes/rain streaks. In this paper, we propose a model based on matrix decomposition for video desnowing and deraining to solve the problems mentioned above. We divide snowflakes/rain streaks into two categories: sparse ones and dense ones. With background fluctuations and optical flow information, the detection of moving objects and sparse snowflakes/rain streaks is formulated as a multi-label Markov Random Fields (MRFs). As for dense snowflakes/rain streaks, they are considered to obey Gaussian distribution. The snowflakes/rain streaks, including sparse ones and dense ones, in scene backgrounds are removed by low-rank representation of the backgrounds. Meanwhile, a group sparsity term in our model is designed to filter snow/rain pixels within the moving objects. Experimental results show that our proposed model performs better than the state-of-the-art methods for snow and rain removal.
现有的去除雪/雨的方法对于大雪/雨和动态场景往往失效。失败的一个原因是假设所有的雪花/雨条纹在雪/雨场景中都是稀疏的。二是现有的方法往往不能区分运动物体和雪花/雨条纹。针对上述问题,本文提出了一种基于矩阵分解的视频降噪降噪模型。我们把雪花/雨点分为两类:稀疏的和密集的。利用背景波动和光流信息,将运动物体和稀疏雪花/雨条纹的检测表述为多标签马尔科夫随机场(mrf)。对于密集的雪花/雨条,它们被认为服从高斯分布。场景背景中的雪花/雨纹,包括稀疏的和密集的,通过背景的低阶表示来去除。同时,在我们的模型中设计了一个组稀疏项来过滤移动物体中的雪/雨像素。实验结果表明,我们提出的模型比目前最先进的除雪和除雨方法效果更好。
{"title":"Video Desnowing and Deraining Based on Matrix Decomposition","authors":"Weihong Ren, Jiandong Tian, Zhi Han, Antoni B. Chan, Yandong Tang","doi":"10.1109/CVPR.2017.303","DOIUrl":"https://doi.org/10.1109/CVPR.2017.303","url":null,"abstract":"The existing snow/rain removal methods often fail for heavy snow/rain and dynamic scene. One reason for the failure is due to the assumption that all the snowflakes/rain streaks are sparse in snow/rain scenes. The other is that the existing methods often can not differentiate moving objects and snowflakes/rain streaks. In this paper, we propose a model based on matrix decomposition for video desnowing and deraining to solve the problems mentioned above. We divide snowflakes/rain streaks into two categories: sparse ones and dense ones. With background fluctuations and optical flow information, the detection of moving objects and sparse snowflakes/rain streaks is formulated as a multi-label Markov Random Fields (MRFs). As for dense snowflakes/rain streaks, they are considered to obey Gaussian distribution. The snowflakes/rain streaks, including sparse ones and dense ones, in scene backgrounds are removed by low-rank representation of the backgrounds. Meanwhile, a group sparsity term in our model is designed to filter snow/rain pixels within the moving objects. Experimental results show that our proposed model performs better than the state-of-the-art methods for snow and rain removal.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"2838-2847"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84954333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval 超越实例级图像检索:利用标题学习语义检索的全局视觉表示
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.560
Albert Gordo, Diane Larlus
Querying with an example image is a simple and intuitive interface to retrieve information from a visual database. Most of the research in image retrieval has focused on the task of instance-level image retrieval, where the goal is to retrieve images that contain the same object instance as the query image. In this work we move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. We show that, despite its subjective nature, the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators. We also show that a similarity based on human-annotated region-level captions is highly correlated with the human ranking and constitutes a good computable surrogate. Following this observation, we learn a visual embedding of the images where the similarity in the visual space is correlated with their semantic similarity surrogate. We further extend our model to learn a joint embedding of visual and textual cues that allows one to query the database using a text modifier in addition to the query image, adapting the results to the modifier. Finally, our model can ground the ranking decisions by showing regions that contributed the most to the similarity between pairs of images, providing a visual explanation of the similarity.
使用示例图像查询是一个简单而直观的界面,用于从可视化数据库检索信息。大多数图像检索的研究都集中在实例级图像检索任务上,其目标是检索包含与查询图像相同的对象实例的图像。在这项工作中,我们超越了实例级检索,并考虑了复杂场景中的语义图像检索任务,其目标是检索与查询图像具有相同语义的图像。我们表明,尽管具有主观性,但对视觉场景进行语义排序的任务是在人类注释器池中一致实现的。我们还表明,基于人类注释的区域级标题的相似性与人类排名高度相关,并且构成了一个很好的可计算代理。在此观察之后,我们学习图像的视觉嵌入,其中视觉空间中的相似性与它们的语义相似性代理相关。我们进一步扩展了我们的模型,以学习视觉和文本线索的联合嵌入,这使得除了查询图像之外,还可以使用文本修饰符来查询数据库,使结果适应修饰符。最后,我们的模型可以通过显示对图像对相似性贡献最大的区域,提供相似性的视觉解释,从而为排名决策奠定基础。
{"title":"Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval","authors":"Albert Gordo, Diane Larlus","doi":"10.1109/CVPR.2017.560","DOIUrl":"https://doi.org/10.1109/CVPR.2017.560","url":null,"abstract":"Querying with an example image is a simple and intuitive interface to retrieve information from a visual database. Most of the research in image retrieval has focused on the task of instance-level image retrieval, where the goal is to retrieve images that contain the same object instance as the query image. In this work we move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. We show that, despite its subjective nature, the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators. We also show that a similarity based on human-annotated region-level captions is highly correlated with the human ranking and constitutes a good computable surrogate. Following this observation, we learn a visual embedding of the images where the similarity in the visual space is correlated with their semantic similarity surrogate. We further extend our model to learn a joint embedding of visual and textual cues that allows one to query the database using a text modifier in addition to the query image, adapting the results to the modifier. Finally, our model can ground the ranking decisions by showing regions that contributed the most to the similarity between pairs of images, providing a visual explanation of the similarity.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"20 1","pages":"5272-5281"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87263371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Dynamic FAUST: Registering Human Bodies in Motion 动态FAUST:在运动中注册人体
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.591
Federica Bogo, J. Romero, Gerard Pons-Moll, Michael J. Black
While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data, that is 3D scans of moving non-rigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.
虽然3D扫描数据的可用性已经影响了整个计算机视觉的研究,但人们对4D数据的关注较少,4D数据是随着时间推移捕获的移动非刚性物体的3D扫描。为了对视觉研究有用,这样的4D扫描需要注册或对齐到一个共同的拓扑结构。因此,将网格配准方法扩展到4D是很重要的。遗憾的是,目前还没有可用于定量评价和比较4D配准方法的真实数据集。为了解决这个问题,我们创建了一个新的高分辨率4D扫描数据集,以60帧/秒的速度捕获运动中的人类受试者。我们提出了一种新的网格配准方法,该方法使用三维几何和纹理信息将序列中的所有扫描注册到一个共同的参考拓扑。该方法利用纹理在短时间和长时间间隔上的一致性,并处理形状和纹理捕获之间的时间偏移。我们展示了当运动快速和非刚性时,如何单独使用几何导致对齐中的显着误差。我们评估了注册的准确性,并提供了40,000个原始和对齐网格的数据集。动态FAUST将流行的FAUST数据集扩展为动态4D数据,可用于http://dfaust.is.tue.mpg.de的研究目的。
{"title":"Dynamic FAUST: Registering Human Bodies in Motion","authors":"Federica Bogo, J. Romero, Gerard Pons-Moll, Michael J. Black","doi":"10.1109/CVPR.2017.591","DOIUrl":"https://doi.org/10.1109/CVPR.2017.591","url":null,"abstract":"While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data, that is 3D scans of moving non-rigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"5573-5582"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77078810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 304
Single Image Reflection Suppression 单像反射抑制
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.190
Nikolaos Arvanitopoulos, R. Achanta, S. Süsstrunk
Reflections are a common artifact in images taken through glass windows. Automatically removing the reflection artifacts after the picture is taken is an ill-posed problem. Attempts to solve this problem using optimization schemes therefore rely on various prior assumptions from the physical world. Instead of removing reflections from a single image, which has met with limited success so far, we propose a novel approach to suppress reflections. It is based on a Laplacian data fidelity term and an l-zero gradient sparsity term imposed on the output. With experiments on artificial and real-world images we show that our reflection suppression method performs better than the state-of-the-art reflection removal techniques.
在透过玻璃窗拍摄的图像中,反射是一种常见的人工产物。在拍照后自动去除反射伪影是一个不适定问题。因此,使用优化方案解决这个问题的尝试依赖于来自物理世界的各种先验假设。我们提出了一种新的方法来抑制反射,而不是从单个图像中去除反射,这到目前为止已经取得了有限的成功。它基于拉普拉斯数据保真度项和施加在输出上的l- 0梯度稀疏性项。通过对人工图像和真实图像的实验,我们表明我们的反射抑制方法比最先进的反射去除技术表现得更好。
{"title":"Single Image Reflection Suppression","authors":"Nikolaos Arvanitopoulos, R. Achanta, S. Süsstrunk","doi":"10.1109/CVPR.2017.190","DOIUrl":"https://doi.org/10.1109/CVPR.2017.190","url":null,"abstract":"Reflections are a common artifact in images taken through glass windows. Automatically removing the reflection artifacts after the picture is taken is an ill-posed problem. Attempts to solve this problem using optimization schemes therefore rely on various prior assumptions from the physical world. Instead of removing reflections from a single image, which has met with limited success so far, we propose a novel approach to suppress reflections. It is based on a Laplacian data fidelity term and an l-zero gradient sparsity term imposed on the output. With experiments on artificial and real-world images we show that our reflection suppression method performs better than the state-of-the-art reflection removal techniques.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"1752-1760"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75946834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder 三维点云配准定位使用深度神经网络自编码器
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.265
Gil Elbaz, Tamar Avraham, A. Fischer
We present an algorithm for registration between a large-scale point cloud and a close-proximity scanned point cloud, providing a localization solution that is fully independent of prior information about the initial positions of the two point cloud coordinate systems. The algorithm, denoted LORAX, selects super-points–local subsets of points–and describes the geometric structure of each with a low-dimensional descriptor. These descriptors are then used to infer potential matching regions for an efficient coarse registration process, followed by a fine-tuning stage. The set of super-points is selected by covering the point clouds with overlapping spheres, and then filtering out those of low-quality or nonsalient regions. The descriptors are computed using state-of-the-art unsupervised machine learning, utilizing the technology of deep neural network based auto-encoders. Abstract This novel framework provides a strong alternative to the common practice of using manually designed key-point descriptors for coarse point cloud registration. Utilizing super-points instead of key-points allows the available geometrical data to be better exploited to find the correct transformation. Encoding local 3D geometric structures using a deep neural network auto-encoder instead of traditional descriptors continues the trend seen in other computer vision applications and indeed leads to superior results. The algorithm is tested on challenging point cloud registration datasets, and its advantages over previous approaches as well as its robustness to density changes, noise, and missing data are shown.
我们提出了一种大规模点云和近距离扫描点云之间的配准算法,提供了一种完全独立于两个点云坐标系统初始位置的先验信息的定位解决方案。该算法命名为LORAX,选取超点–点–的局部子集;并用低维描述符描述每个点的几何结构。然后使用这些描述符来推断潜在的匹配区域,以进行有效的粗配准过程,然后进行微调阶段。用重叠的球体覆盖点云,然后过滤掉低质量或非显著区域,从而选择超级点集。描述符使用最先进的无监督机器学习计算,利用基于深度神经网络的自编码器技术。这个新颖的框架为使用人工设计的关键点描述符进行粗点云配准提供了一个强大的替代方案。利用超点而不是关键点可以更好地利用可用的几何数据来找到正确的变换。使用深度神经网络自编码器编码局部三维几何结构,而不是传统的描述符,延续了其他计算机视觉应用的趋势,并确实带来了更好的结果。在具有挑战性的点云配准数据集上对该算法进行了测试,结果表明该算法相对于以往方法的优势以及对密度变化、噪声和缺失数据的鲁棒性。
{"title":"3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder","authors":"Gil Elbaz, Tamar Avraham, A. Fischer","doi":"10.1109/CVPR.2017.265","DOIUrl":"https://doi.org/10.1109/CVPR.2017.265","url":null,"abstract":"We present an algorithm for registration between a large-scale point cloud and a close-proximity scanned point cloud, providing a localization solution that is fully independent of prior information about the initial positions of the two point cloud coordinate systems. The algorithm, denoted LORAX, selects super-points–local subsets of points–and describes the geometric structure of each with a low-dimensional descriptor. These descriptors are then used to infer potential matching regions for an efficient coarse registration process, followed by a fine-tuning stage. The set of super-points is selected by covering the point clouds with overlapping spheres, and then filtering out those of low-quality or nonsalient regions. The descriptors are computed using state-of-the-art unsupervised machine learning, utilizing the technology of deep neural network based auto-encoders. Abstract This novel framework provides a strong alternative to the common practice of using manually designed key-point descriptors for coarse point cloud registration. Utilizing super-points instead of key-points allows the available geometrical data to be better exploited to find the correct transformation. Encoding local 3D geometric structures using a deep neural network auto-encoder instead of traditional descriptors continues the trend seen in other computer vision applications and indeed leads to superior results. The algorithm is tested on challenging point cloud registration datasets, and its advantages over previous approaches as well as its robustness to density changes, noise, and missing data are shown.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"37 1","pages":"2472-2481"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87371725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
Image Splicing Detection via Camera Response Function Analysis 基于相机响应函数分析的图像拼接检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.203
Can Chen, Scott McCloskey, Jingyi Yu
Recent advances on image manipulation techniques have made image forgery detection increasingly more challenging. An important component in such tools is to fake motion and/or defocus blurs through boundary splicing and copy-move operators, to emulate wide aperture and slow shutter effects. In this paper, we present a new technique based on the analysis of the camera response functions (CRF) for efficient and robust splicing and copy-move forgery detection and localization. We first analyze how non-linear CRFs affect edges in terms of the intensity-gradient bivariable histograms. We show distinguishable shape differences on real vs. forged blurs near edges after a splicing operation. Based on our analysis, we introduce a deep-learning framework to detect and localize forged edges. In particular, we show the problem can be transformed to a handwriting recognition problem an resolved by using a convolutional neural network. We generate a large dataset of forged images produced by splicing followed by retouching and comprehensive experiments show our proposed method outperforms the state-of-the-art techniques in accuracy and robustness.
近年来图像处理技术的进步使得图像伪造检测越来越具有挑战性。这些工具的一个重要组成部分是通过边界拼接和复制移动操作来伪造运动和/或散焦模糊,以模拟大光圈和慢快门效果。本文提出了一种基于摄像机响应函数(CRF)分析的新技术,用于高效鲁棒的拼接和复制-移动伪造检测与定位。我们首先根据强度梯度双变量直方图分析非线性crf如何影响边缘。在拼接操作后,我们在边缘附近的真实与伪造模糊上显示可区分的形状差异。基于我们的分析,我们引入了一个深度学习框架来检测和定位伪造边缘。特别地,我们展示了这个问题可以转化为一个手写识别问题,并通过使用卷积神经网络来解决。我们生成了一个由拼接产生的伪造图像的大型数据集,然后进行了修饰,综合实验表明,我们提出的方法在准确性和鲁棒性方面优于最先进的技术。
{"title":"Image Splicing Detection via Camera Response Function Analysis","authors":"Can Chen, Scott McCloskey, Jingyi Yu","doi":"10.1109/CVPR.2017.203","DOIUrl":"https://doi.org/10.1109/CVPR.2017.203","url":null,"abstract":"Recent advances on image manipulation techniques have made image forgery detection increasingly more challenging. An important component in such tools is to fake motion and/or defocus blurs through boundary splicing and copy-move operators, to emulate wide aperture and slow shutter effects. In this paper, we present a new technique based on the analysis of the camera response functions (CRF) for efficient and robust splicing and copy-move forgery detection and localization. We first analyze how non-linear CRFs affect edges in terms of the intensity-gradient bivariable histograms. We show distinguishable shape differences on real vs. forged blurs near edges after a splicing operation. Based on our analysis, we introduce a deep-learning framework to detect and localize forged edges. In particular, we show the problem can be transformed to a handwriting recognition problem an resolved by using a convolutional neural network. We generate a large dataset of forged images produced by splicing followed by retouching and comprehensive experiments show our proposed method outperforms the state-of-the-art techniques in accuracy and robustness.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"50 1","pages":"1876-1885"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82249215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Specular Highlight Removal in Facial Images 镜面高光去除在面部图像
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.297
Chen Li, Stephen Lin, Kun Zhou, K. Ikeuchi
We present a method for removing specular highlight reflections in facial images that may contain varying illumination colors. This is accurately achieved through the use of physical and statistical properties of human skin and faces. We employ a melanin and hemoglobin based model to represent the diffuse color variations in facial skin, and utilize this model to constrain the highlight removal solution in a manner that is effective even for partially saturated pixels. The removal of highlights is further facilitated through estimation of directionally variant illumination colors over the face, which is done while taking advantage of a statistically-based approximation of facial geometry. An important practical feature of the proposed method is that the skin color model is utilized in a way that does not require color calibration of the camera. Moreover, this approach does not require assumptions commonly needed in previous highlight removal techniques, such as uniform illumination color or piecewise-constant surface colors. We validate this technique through comparisons to existing methods for removing specular highlights.
我们提出了一种去除面部图像中可能包含不同照明颜色的高光反射的方法。这是通过使用人体皮肤和面部的物理和统计特性准确实现的。我们采用基于黑色素和血红蛋白的模型来表示面部皮肤的漫射颜色变化,并利用该模型以一种即使对部分饱和像素也有效的方式约束高光去除解决方案。通过估计面部方向上不同的照明颜色,进一步促进了高光的去除,这是在利用基于统计的面部几何近似的同时完成的。提出的方法的一个重要的实用特征是肤色模型的利用方式不需要相机的颜色校准。此外,这种方法不需要在以前的高光去除技术中通常需要的假设,例如均匀的照明颜色或分段恒定的表面颜色。我们通过与现有的去除镜面高光的方法进行比较来验证这种技术。
{"title":"Specular Highlight Removal in Facial Images","authors":"Chen Li, Stephen Lin, Kun Zhou, K. Ikeuchi","doi":"10.1109/CVPR.2017.297","DOIUrl":"https://doi.org/10.1109/CVPR.2017.297","url":null,"abstract":"We present a method for removing specular highlight reflections in facial images that may contain varying illumination colors. This is accurately achieved through the use of physical and statistical properties of human skin and faces. We employ a melanin and hemoglobin based model to represent the diffuse color variations in facial skin, and utilize this model to constrain the highlight removal solution in a manner that is effective even for partially saturated pixels. The removal of highlights is further facilitated through estimation of directionally variant illumination colors over the face, which is done while taking advantage of a statistically-based approximation of facial geometry. An important practical feature of the proposed method is that the skin color model is utilized in a way that does not require color calibration of the camera. Moreover, this approach does not require assumptions commonly needed in previous highlight removal techniques, such as uniform illumination color or piecewise-constant surface colors. We validate this technique through comparisons to existing methods for removing specular highlights.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"75 1","pages":"2780-2789"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77757208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1