首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Category-Blind Human Action Recognition: A Practical Recognition System 类别盲人类动作识别:一种实用的识别系统
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.505
Wenbo Li, Longyin Wen, M. Chuah, Siwei Lyu
Existing human action recognition systems for 3D sequences obtained from the depth camera are designed to cope with only one action category, either single-person action or two-person interaction, and are difficult to be extended to scenarios where both action categories co-exist. In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. In our CHARM approach, we represent a human action (either a single-person action or a two-person interaction) class using a co-occurrence of motion primitives. Subsequently, we classify an action instance based on matching its motion primitive co-occurrence patterns to each class representation. The matching task is formulated as maximum clique problems. We conduct extensive evaluations of CHARM using three datasets for single-person actions, two-person interactions, and their mixtures. Experimental results show that CHARM performs favorably when compared with several state-of-the-art single-person action and two-person interaction based methods without making explicit assumptions of action category.
现有的深度相机三维序列人体动作识别系统只能处理单一动作类别,即单人动作或两人交互,难以扩展到两种动作类别共存的场景。在本文中,我们提出了一种无需假设动作类别就能识别人类动作的类别盲人类识别方法(CHARM)。在我们的CHARM方法中,我们使用动作原语的共现来表示人类动作(单人动作或两人交互)类。随后,我们将动作实例的运动原语共现模式与每个类表示相匹配,从而对其进行分类。将匹配任务表述为最大团问题。我们使用三个数据集对CHARM进行了广泛的评估,包括单人行动、两人互动及其混合。实验结果表明,在没有明确动作类别假设的情况下,与几种最先进的基于单人动作和双人交互的方法相比,CHARM具有良好的性能。
{"title":"Category-Blind Human Action Recognition: A Practical Recognition System","authors":"Wenbo Li, Longyin Wen, M. Chuah, Siwei Lyu","doi":"10.1109/ICCV.2015.505","DOIUrl":"https://doi.org/10.1109/ICCV.2015.505","url":null,"abstract":"Existing human action recognition systems for 3D sequences obtained from the depth camera are designed to cope with only one action category, either single-person action or two-person interaction, and are difficult to be extended to scenarios where both action categories co-exist. In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. In our CHARM approach, we represent a human action (either a single-person action or a two-person interaction) class using a co-occurrence of motion primitives. Subsequently, we classify an action instance based on matching its motion primitive co-occurrence patterns to each class representation. The matching task is formulated as maximum clique problems. We conduct extensive evaluations of CHARM using three datasets for single-person actions, two-person interactions, and their mixtures. Experimental results show that CHARM performs favorably when compared with several state-of-the-art single-person action and two-person interaction based methods without making explicit assumptions of action category.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"4444-4452"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88804091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Multi-View Complementary Hash Tables for Nearest Neighbor Search 多视图互补哈希表的最近邻搜索
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.132
Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu, B. Lang
Recent years have witnessed the success of hashing techniques in fast nearest neighbor search. In practice many applications (eg., visual search, object detection, image matching, etc.) have enjoyed the benefits of complementary hash tables and information fusion over multiple views. However, most of prior research mainly focused on compact hash code cleaning, and rare work studies how to build multiple complementary hash tables, much less to adaptively integrate information stemming from multiple views. In this paper we first present a novel multi-view complementary hash table method that learns complementarity hash tables from the data with multiple views. For single multi-view table, using exemplar based feature fusion, we approximate the inherent data similarities with a low-rank matrix, and learn discriminative hash functions in an efficient way. To build complementary tables and meanwhile maintain scalable training and fast out-of-sample extension, an exemplar reweighting scheme is introduced to update the induced low-rank similarity in the sequential table construction framework, which indeed brings mutual benefits between tables by placing greater importance on exemplars shared by mis-separated neighbors. Extensive experiments on three large-scale image datasets demonstrate that the proposed method significantly outperforms various naive solutions and state-of-the-art multi-table methods.
近年来,哈希技术在快速近邻搜索中取得了成功。在实践中,许多应用(例如:(如视觉搜索、目标检测、图像匹配等)已经享受到了互补哈希表和多视图信息融合的好处。然而,以往的研究大多集中在紧凑的哈希码清理上,很少有研究如何构建多个互补哈希表,而很少有研究如何自适应集成来自多个视图的信息。本文首先提出了一种新的多视图互补哈希表方法,该方法从具有多视图的数据中学习互补哈希表。对于单个多视图表,采用基于样本的特征融合方法,用低秩矩阵近似数据固有的相似度,并有效地学习判别哈希函数。为了构建互补表,同时保持可扩展的训练和快速的样本外扩展,在顺序表构建框架中引入了一种样本重加权方案来更新诱导的低秩相似度,该方案更重视错分离邻居共享的样本,从而实现表间的互利。在三个大规模图像数据集上的大量实验表明,该方法明显优于各种朴素解决方案和最先进的多表方法。
{"title":"Multi-View Complementary Hash Tables for Nearest Neighbor Search","authors":"Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu, B. Lang","doi":"10.1109/ICCV.2015.132","DOIUrl":"https://doi.org/10.1109/ICCV.2015.132","url":null,"abstract":"Recent years have witnessed the success of hashing techniques in fast nearest neighbor search. In practice many applications (eg., visual search, object detection, image matching, etc.) have enjoyed the benefits of complementary hash tables and information fusion over multiple views. However, most of prior research mainly focused on compact hash code cleaning, and rare work studies how to build multiple complementary hash tables, much less to adaptively integrate information stemming from multiple views. In this paper we first present a novel multi-view complementary hash table method that learns complementarity hash tables from the data with multiple views. For single multi-view table, using exemplar based feature fusion, we approximate the inherent data similarities with a low-rank matrix, and learn discriminative hash functions in an efficient way. To build complementary tables and meanwhile maintain scalable training and fast out-of-sample extension, an exemplar reweighting scheme is introduced to update the induced low-rank similarity in the sequential table construction framework, which indeed brings mutual benefits between tables by placing greater importance on exemplars shared by mis-separated neighbors. Extensive experiments on three large-scale image datasets demonstrate that the proposed method significantly outperforms various naive solutions and state-of-the-art multi-table methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"120 1","pages":"1107-1115"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77405825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Fine-Grained Change Detection of Misaligned Scenes with Varied Illuminations 不同光照条件下不对齐场景的细粒度变化检测
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.149
Wei Feng, Fei-Peng Tian, Qian Zhang, N. Zhang, Liang Wan, Ji-zhou Sun
Detecting fine-grained subtle changes among a scene is critically important in practice. Previous change detection methods, focusing on detecting large-scale significant changes, cannot do this well. This paper proposes a feasible end-to-end approach to this challenging problem. We start from active camera relocation that quickly relocates camera to nearly the same pose and position of the last time observation. To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions. Given two times observations, we formulate fine-grained change detection as a joint optimization problem of three related factors, i.e., normal-aware lighting difference, camera geometry correction flow, and real scene change mask. We solve the three factors in a coarse-to-fine manner and achieve reliable change decision by rank minimization. We build three real-world datasets to benchmark fine-grained change detection of misaligned scenes under varied multiple lighting conditions. Extensive experiments show the superior performance of our approach over state-of-the-art change detection methods and its ability to distinguish real scene changes from false ones caused by lighting variations.
在实践中,检测场景中的细微变化是至关重要的。以前的变更检测方法侧重于检测大规模的重大变更,不能很好地做到这一点。本文提出了一种可行的端到端方法来解决这个具有挑战性的问题。我们从主动相机重新定位开始,快速将相机重新定位到与上次观察几乎相同的姿势和位置。为了保证微小变化的检测灵敏度和准确性,在一次观测中,我们在多个光照条件下捕获一组图像,这些图像只需要与上一次光照条件大致对齐。在两次观测的情况下,我们将细粒度变化检测定义为法向感知光照差、相机几何校正流和真实场景变化掩模三个相关因素的联合优化问题。我们用从粗到精的方法求解这三个因素,并通过秩最小化实现可靠的变更决策。我们建立了三个真实世界的数据集,在不同的多种照明条件下对不同的场景进行细粒度的变化检测。大量的实验表明,我们的方法优于最先进的变化检测方法,并且能够区分由照明变化引起的真实场景变化和虚假场景变化。
{"title":"Fine-Grained Change Detection of Misaligned Scenes with Varied Illuminations","authors":"Wei Feng, Fei-Peng Tian, Qian Zhang, N. Zhang, Liang Wan, Ji-zhou Sun","doi":"10.1109/ICCV.2015.149","DOIUrl":"https://doi.org/10.1109/ICCV.2015.149","url":null,"abstract":"Detecting fine-grained subtle changes among a scene is critically important in practice. Previous change detection methods, focusing on detecting large-scale significant changes, cannot do this well. This paper proposes a feasible end-to-end approach to this challenging problem. We start from active camera relocation that quickly relocates camera to nearly the same pose and position of the last time observation. To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions. Given two times observations, we formulate fine-grained change detection as a joint optimization problem of three related factors, i.e., normal-aware lighting difference, camera geometry correction flow, and real scene change mask. We solve the three factors in a coarse-to-fine manner and achieve reliable change decision by rank minimization. We build three real-world datasets to benchmark fine-grained change detection of misaligned scenes under varied multiple lighting conditions. Extensive experiments show the superior performance of our approach over state-of-the-art change detection methods and its ability to distinguish real scene changes from false ones caused by lighting variations.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"1260-1268"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86937170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Multiple Granularity Descriptors for Fine-Grained Categorization 用于细粒度分类的多粒度描述符
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.276
Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, X. Xue, Zeyu Zhang
Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task. This is due to two main issues: how to localize discriminative regions for recognition and how to learn sophisticated features for representation. Neither of them is easy to handle if there is insufficient labeled data. We leverage the fact that a subordinate-level object already has other labels in its ontology tree. These "free" labels can be used to train a series of CNN-based classifiers, each specialized at one grain level. The internal representations of these networks have different region of interests, allowing the construction of multi-grained descriptors that encode informative and discriminative features covering all the grain levels. Our multiple granularity framework can be learned with the weakest supervision, requiring only image-level label and avoiding the use of labor-intensive bounding box or part annotations. Experimental results on three challenging fine-grained image datasets demonstrate that our approach outperforms state-of-the-art algorithms, including those requiring strong labels.
细粒度分类是一项极具挑战性的任务,其目的是区分从属级别的类别,如鸟类或狗的品种。这主要是由于两个问题:如何定位识别的判别区域,以及如何学习复杂的特征来表示。如果没有足够的标记数据,这两种方法都不容易处理。我们利用了从属级对象在其本体树中已经有其他标签的事实。这些“免费”标签可以用来训练一系列基于cnn的分类器,每个分类器在一个粒度级别上进行专业化。这些网络的内部表示具有不同的兴趣区域,允许构建多粒度描述符,这些描述符编码涵盖所有粒度级别的信息和判别特征。我们的多粒度框架可以在最弱的监督下学习,只需要图像级别的标签,避免使用劳动密集型的边界框或部分注释。在三个具有挑战性的细粒度图像数据集上的实验结果表明,我们的方法优于最先进的算法,包括那些需要强标签的算法。
{"title":"Multiple Granularity Descriptors for Fine-Grained Categorization","authors":"Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, X. Xue, Zeyu Zhang","doi":"10.1109/ICCV.2015.276","DOIUrl":"https://doi.org/10.1109/ICCV.2015.276","url":null,"abstract":"Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task. This is due to two main issues: how to localize discriminative regions for recognition and how to learn sophisticated features for representation. Neither of them is easy to handle if there is insufficient labeled data. We leverage the fact that a subordinate-level object already has other labels in its ontology tree. These \"free\" labels can be used to train a series of CNN-based classifiers, each specialized at one grain level. The internal representations of these networks have different region of interests, allowing the construction of multi-grained descriptors that encode informative and discriminative features covering all the grain levels. Our multiple granularity framework can be learned with the weakest supervision, requiring only image-level label and avoiding the use of labor-intensive bounding box or part annotations. Experimental results on three challenging fine-grained image datasets demonstrate that our approach outperforms state-of-the-art algorithms, including those requiring strong labels.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"58 1","pages":"2399-2406"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85619473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 199
Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability 基于视觉代表性和前景可识别性的缩略图自动生成
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.37
Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin
We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing. These factors, while important for the image indexing purpose of thumbnails, have largely been ignored in previous methods, which instead are designed to highlight salient content while disregarding the effects of downsizing. We propose a set of image features for modeling these two considerations of thumbnails, and learn how to balance their relative effects on thumbnail generation through training on image pairs composed of photographs and their corresponding thumbnails created by an expert photographer. Experiments show the effectiveness of this approach on a variety of images, as well as its advantages over related techniques.
我们提出了一种基于两个基本考虑的自动缩略图生成技术:它们在视觉上代表原始照片的程度,以及缩略图裁剪和缩小步骤后前景的识别程度。这些因素虽然对缩略图的图像索引目的很重要,但在以前的方法中很大程度上被忽略了,这些方法的目的是突出突出的内容,而忽略缩小尺寸的影响。我们提出了一组图像特征来对缩略图的这两种考虑进行建模,并通过训练由照片和专业摄影师创建的相应缩略图组成的图像对来学习如何平衡它们对缩略图生成的相对影响。实验证明了该方法在多种图像上的有效性,以及相对于相关技术的优势。
{"title":"Automatic Thumbnail Generation Based on Visual Representativeness and Foreground Recognizability","authors":"Jingwei Huang, Huarong Chen, Bin Wang, Stephen Lin","doi":"10.1109/ICCV.2015.37","DOIUrl":"https://doi.org/10.1109/ICCV.2015.37","url":null,"abstract":"We present an automatic thumbnail generation technique based on two essential considerations: how well they visually represent the original photograph, and how well the foreground can be recognized after the cropping and downsizing steps of thumbnailing. These factors, while important for the image indexing purpose of thumbnails, have largely been ignored in previous methods, which instead are designed to highlight salient content while disregarding the effects of downsizing. We propose a set of image features for modeling these two considerations of thumbnails, and learn how to balance their relative effects on thumbnail generation through training on image pairs composed of photographs and their corresponding thumbnails created by an expert photographer. Experiments show the effectiveness of this approach on a variety of images, as well as its advantages over related techniques.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"253-261"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91175879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Rolling Shutter Super-Resolution 超分辨率卷帘式快门
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.71
Abhijith Punnappurath, Vijay Rengarajan, A. Rajagopalan
Classical multi-image super-resolution (SR) algorithms, designed for CCD cameras, assume that the motion among the images is global. But CMOS sensors that have increasingly started to replace their more expensive CCD counterparts in many applications do not respect this assumption if there is a motion of the camera relative to the scene during the exposure duration of an image because of the row-wise acquisition mechanism. In this paper, we study the hitherto unexplored topic of multi-image SR in CMOS cameras. We initially develop an SR observation model that accounts for the row-wise distortions called the "rolling shutter" (RS) effect observed in images captured using non-stationary CMOS cameras. We then propose a unified RS-SR framework to obtain an RS-free high-resolution image (and the row-wise motion) from distorted low-resolution images. We demonstrate the efficacy of the proposed scheme using synthetic data as well as real images captured using a hand-held CMOS camera. Quantitative and qualitative assessments reveal that our method significantly advances the state-of-the-art.
针对CCD相机设计的经典多图像超分辨率(SR)算法,假设图像之间的运动是全局的。但是,在许多应用中,CMOS传感器已经越来越多地开始取代更昂贵的CCD传感器,如果在图像的曝光期间,由于逐行采集机制,相机相对于场景有一个运动,则不尊重这一假设。在本文中,我们研究了CMOS相机中多图像SR的迄今未被探索的主题。我们最初开发了一个SR观测模型,该模型解释了在使用非静止CMOS相机拍摄的图像中观察到的称为“滚动快门”(RS)效应的行方向扭曲。然后,我们提出了一个统一的RS-SR框架,从扭曲的低分辨率图像中获得无rs的高分辨率图像(以及逐行运动)。我们使用合成数据以及使用手持CMOS相机捕获的真实图像来证明所提出方案的有效性。定量和定性评估表明,我们的方法显著推进了最先进的技术。
{"title":"Rolling Shutter Super-Resolution","authors":"Abhijith Punnappurath, Vijay Rengarajan, A. Rajagopalan","doi":"10.1109/ICCV.2015.71","DOIUrl":"https://doi.org/10.1109/ICCV.2015.71","url":null,"abstract":"Classical multi-image super-resolution (SR) algorithms, designed for CCD cameras, assume that the motion among the images is global. But CMOS sensors that have increasingly started to replace their more expensive CCD counterparts in many applications do not respect this assumption if there is a motion of the camera relative to the scene during the exposure duration of an image because of the row-wise acquisition mechanism. In this paper, we study the hitherto unexplored topic of multi-image SR in CMOS cameras. We initially develop an SR observation model that accounts for the row-wise distortions called the \"rolling shutter\" (RS) effect observed in images captured using non-stationary CMOS cameras. We then propose a unified RS-SR framework to obtain an RS-free high-resolution image (and the row-wise motion) from distorted low-resolution images. We demonstrate the efficacy of the proposed scheme using synthetic data as well as real images captured using a hand-held CMOS camera. Quantitative and qualitative assessments reveal that our method significantly advances the state-of-the-art.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"13 1 1","pages":"558-566"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89694167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
HICO: A Benchmark for Recognizing Human-Object Interactions in Images HICO:识别图像中人与物交互的基准
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.122
Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, Jia Deng
We introduce a new benchmark "Humans Interacting with Common Objects" (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common object categories, a list of well-defined, sense-based HOI categories, and an exhaustive labeling of co-occurring interactions with an object category in each image. We perform an in-depth analysis of representative current approaches and show that DNNs enjoy a significant edge. In addition, we show that semantic knowledge can significantly improve HOI recognition, especially for uncommon categories.
我们引入了一个新的基准“人类与公共对象交互”(HICO)来识别人-物交互(HOI)。我们展示了HICO的关键特征:与常见对象类别的多种交互,定义良好的基于感官的HOI类别列表,以及与每个图像中对象类别共同发生的交互的详尽标记。我们对具有代表性的当前方法进行了深入分析,并表明dnn具有显著的优势。此外,我们发现语义知识可以显著提高HOI识别,特别是对于不常见的类别。
{"title":"HICO: A Benchmark for Recognizing Human-Object Interactions in Images","authors":"Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, Jia Deng","doi":"10.1109/ICCV.2015.122","DOIUrl":"https://doi.org/10.1109/ICCV.2015.122","url":null,"abstract":"We introduce a new benchmark \"Humans Interacting with Common Objects\" (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common object categories, a list of well-defined, sense-based HOI categories, and an exhaustive labeling of co-occurring interactions with an object category in each image. We perform an in-depth analysis of representative current approaches and show that DNNs enjoy a significant edge. In addition, we show that semantic knowledge can significantly improve HOI recognition, especially for uncommon categories.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"1017-1025"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89756739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 258
Where to Buy It: Matching Street Clothing Photos in Online Shops 哪里可以买到:网上商店的街头服装照片
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.382
M. Kiapour, Xufeng Han, S. Lazebnik, A. Berg, Tamara L. Berg
In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of clothing items on people, mannequins, or in isolation, captured by professionals in more controlled settings). We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. We develop three different methods for Exact Street to Shop retrieval, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains. Experiments demonstrate that our learned similarity significantly outperforms our baselines that use existing deep learning based representations.
在本文中,我们定义了一个新任务,精确街到商店,我们的目标是将现实世界中的服装项目与在线商店中的相同项目相匹配。这是一项极具挑战性的任务,因为街头照片(人们在日常不受控制的环境中穿着衣服的照片)和网上商店照片(由专业人员在更受控制的环境中拍摄的人们、人体模型或孤立的衣服的照片)在视觉上存在差异。我们为这个应用程序收集了一个新的数据集,其中包含从25个不同的在线零售商收集的404,683张商店照片和20,357张街道照片,在街道和商店照片之间提供了总共39,479个服装项目匹配。我们开发了三种不同的精确街道到商店检索方法,包括两种深度学习基线方法和一种学习街道和商店域之间相似性度量的方法。实验表明,我们学习到的相似性显著优于使用现有的基于深度学习的表示的基线。
{"title":"Where to Buy It: Matching Street Clothing Photos in Online Shops","authors":"M. Kiapour, Xufeng Han, S. Lazebnik, A. Berg, Tamara L. Berg","doi":"10.1109/ICCV.2015.382","DOIUrl":"https://doi.org/10.1109/ICCV.2015.382","url":null,"abstract":"In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of clothing items on people, mannequins, or in isolation, captured by professionals in more controlled settings). We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. We develop three different methods for Exact Street to Shop retrieval, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains. Experiments demonstrate that our learned similarity significantly outperforms our baselines that use existing deep learning based representations.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"3343-3351"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87779871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 428
Model-Based Tracking at 300Hz Using Raw Time-of-Flight Observations 使用原始飞行时间观测的300Hz基于模型的跟踪
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.408
Jan Stühmer, Sebastian Nowozin, A. Fitzgibbon, R. Szeliski, Travis Perry, S. Acharya, D. Cremers, J. Shotton
Consumer depth cameras have dramatically improved our ability to track rigid, articulated, and deformable 3D objects in real-time. However, depth cameras have a limited temporal resolution (frame-rate) that restricts the accuracy and robustness of tracking, especially for fast or unpredictable motion. In this paper, we show how to perform model-based object tracking which allows to reconstruct the object's depth at an order of magnitude higher frame-rate through simple modifications to an off-the-shelf depth camera. We focus on phase-based time-of-flight (ToF) sensing, which reconstructs each low frame-rate depth image from a set of short exposure 'raw' infrared captures. These raw captures are taken in quick succession near the beginning of each depth frame, and differ in the modulation of their active illumination. We make two contributions. First, we detail how to perform model-based tracking against these raw captures. Second, we show that by reprogramming the camera to space the raw captures uniformly in time, we obtain a 10x higher frame-rate, and thereby improve the ability to track fast-moving objects.
消费者深度相机极大地提高了我们实时跟踪刚性、铰接和可变形3D物体的能力。然而,深度相机具有有限的时间分辨率(帧率),这限制了跟踪的准确性和鲁棒性,特别是对于快速或不可预测的运动。在本文中,我们展示了如何执行基于模型的对象跟踪,该跟踪允许通过简单修改现成的深度相机以更高的帧率重建对象的深度。我们专注于基于相位的飞行时间(ToF)传感,它从一组短曝光“原始”红外捕获中重建每个低帧率深度图像。这些原始捕获是在每个深度帧开始附近快速连续拍摄的,并且在其主动照明的调制方面有所不同。我们有两个贡献。首先,我们详细介绍了如何针对这些原始捕获执行基于模型的跟踪。其次,我们表明,通过重新编程相机,使原始捕获在时间上均匀间隔,我们获得了10倍的高帧率,从而提高了跟踪快速移动物体的能力。
{"title":"Model-Based Tracking at 300Hz Using Raw Time-of-Flight Observations","authors":"Jan Stühmer, Sebastian Nowozin, A. Fitzgibbon, R. Szeliski, Travis Perry, S. Acharya, D. Cremers, J. Shotton","doi":"10.1109/ICCV.2015.408","DOIUrl":"https://doi.org/10.1109/ICCV.2015.408","url":null,"abstract":"Consumer depth cameras have dramatically improved our ability to track rigid, articulated, and deformable 3D objects in real-time. However, depth cameras have a limited temporal resolution (frame-rate) that restricts the accuracy and robustness of tracking, especially for fast or unpredictable motion. In this paper, we show how to perform model-based object tracking which allows to reconstruct the object's depth at an order of magnitude higher frame-rate through simple modifications to an off-the-shelf depth camera. We focus on phase-based time-of-flight (ToF) sensing, which reconstructs each low frame-rate depth image from a set of short exposure 'raw' infrared captures. These raw captures are taken in quick succession near the beginning of each depth frame, and differ in the modulation of their active illumination. We make two contributions. First, we detail how to perform model-based tracking against these raw captures. Second, we show that by reprogramming the camera to space the raw captures uniformly in time, we obtain a 10x higher frame-rate, and thereby improve the ability to track fast-moving objects.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"3577-3585"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88220051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
You are Here: Mimicking the Human Thinking Process in Reading Floor-Plans 你在这里:模仿人类阅读平面图的思维过程
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.255
Hang Chu, Dong Ki Kim, Tsuhan Chen
A human can easily find his or her way in an unfamiliar building, by walking around and reading the floor-plan. We try to mimic and automate this human thinking process. More precisely, we introduce a new and useful task of locating an user in the floor-plan, by using only a camera and a floor-plan without any other prior information. We address the problem with a novel matching-localization algorithm that is inspired by human logic. We demonstrate through experiments that our method outperforms state-of-the-art floor-plan-based localization methods by a large margin, while also being highly efficient for real-time applications.
在不熟悉的建筑物里,人们可以通过四处走动和阅读楼层平面图来轻松地找到路。我们试图模仿和自动化人类的思维过程。更准确地说,我们引入了一个新的、有用的任务,即在平面图中定位用户,只使用相机和平面图,而不使用任何其他先验信息。我们用一种受人类逻辑启发的新颖匹配定位算法来解决这个问题。我们通过实验证明,我们的方法在很大程度上优于最先进的基于平面图的定位方法,同时在实时应用中也非常高效。
{"title":"You are Here: Mimicking the Human Thinking Process in Reading Floor-Plans","authors":"Hang Chu, Dong Ki Kim, Tsuhan Chen","doi":"10.1109/ICCV.2015.255","DOIUrl":"https://doi.org/10.1109/ICCV.2015.255","url":null,"abstract":"A human can easily find his or her way in an unfamiliar building, by walking around and reading the floor-plan. We try to mimic and automate this human thinking process. More precisely, we introduce a new and useful task of locating an user in the floor-plan, by using only a camera and a floor-plan without any other prior information. We address the problem with a novel matching-localization algorithm that is inspired by human logic. We demonstrate through experiments that our method outperforms state-of-the-art floor-plan-based localization methods by a large margin, while also being highly efficient for real-time applications.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"2210-2218"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86820098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1