首页 > 最新文献

2018 15th Conference on Computer and Robot Vision (CRV)最新文献

英文 中文
Grading Prenatal Hydronephrosis from Ultrasound Imaging Using Deep Convolutional Neural Networks 利用深度卷积神经网络从超声成像中分级产前肾积水
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00021
Kiret Dhindsa, Lauren C. Smail, M. McGrath, Luis H. Braga, S. Becker, R. Sonnadara
We evaluate the performance of a Deep Convolutional Neural Network in grading the severity of prenatal hydronephrosis (PHN), one of the most common congenital urological anomalies, from renal ultrasound images. We present results on a variety of classification tasks based on clinically defined grades of severity, including predictions of whether or not an ultrasound image represents a case that is at high risk for further complications requiring surgical intervention with approximately 80% accuracy. The prediction rates obtained by the model are well beyond the rates of agreement among trained clinicians, suggesting that this work can lead to a useful diagnostic aid.
我们评估的性能深度卷积神经网络分级的严重程度产前肾积水(PHN),最常见的先天性泌尿系统异常之一,从肾脏超声图像。我们提出了基于临床定义的严重程度等级的各种分类任务的结果,包括预测超声图像是否代表需要手术干预的高风险病例,准确率约为80%。该模型获得的预测率远远超过了训练有素的临床医生之间的一致性,这表明这项工作可以导致有用的诊断辅助。
{"title":"Grading Prenatal Hydronephrosis from Ultrasound Imaging Using Deep Convolutional Neural Networks","authors":"Kiret Dhindsa, Lauren C. Smail, M. McGrath, Luis H. Braga, S. Becker, R. Sonnadara","doi":"10.1109/CRV.2018.00021","DOIUrl":"https://doi.org/10.1109/CRV.2018.00021","url":null,"abstract":"We evaluate the performance of a Deep Convolutional Neural Network in grading the severity of prenatal hydronephrosis (PHN), one of the most common congenital urological anomalies, from renal ultrasound images. We present results on a variety of classification tasks based on clinically defined grades of severity, including predictions of whether or not an ultrasound image represents a case that is at high risk for further complications requiring surgical intervention with approximately 80% accuracy. The prediction rates obtained by the model are well beyond the rates of agreement among trained clinicians, suggesting that this work can lead to a useful diagnostic aid.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"48 27","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120812007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Real-Time 3D Face Verification with a Consumer Depth Camera 实时3D人脸验证与消费者深度相机
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00020
Gregory P. Meyer, M. Do
We present a system for accurate real-time 3D face verification using a low-quality consumer depth camera. To verify the identity of a subject, we built a high-quality reference model offline by fitting a 3D morphable model to a sequence of low-quality depth images. At runtime, we compare the similarity between the reference model and a single depth image by aligning the model to the image and measuring differences between every point on the two facial surfaces. The model and the image will not match exactly due to sensor noise, occlusions, as well as changes in expression, hairstyle, and eye-wear; therefore, we leverage a data driven approach to determine whether or not the model and the image match. We train a random decision forest to verify the identity of a subject where the point-to-point distances between the reference model and the depth image are used as input features to the classifier. Our approach runs in real-time and is designed to continuously authenticate a user as he/she uses his/her device. In addition, our proposed method outperforms existing 2D and 3D face verification methods on a benchmark data set.
我们提出了一个使用低质量消费者深度相机进行精确实时3D人脸验证的系统。为了验证目标的身份,我们通过将3D变形模型拟合到一系列低质量深度图像中来离线构建高质量的参考模型。在运行时,我们通过将模型与图像对齐并测量两个面部表面上每个点之间的差异来比较参考模型与单个深度图像之间的相似性。由于传感器噪声、遮挡以及表情、发型、眼镜佩戴的变化,模型和图像无法完全匹配;因此,我们利用数据驱动的方法来确定模型和图像是否匹配。我们训练一个随机决策森林来验证主题的身份,其中参考模型和深度图像之间的点对点距离用作分类器的输入特征。我们的方法是实时运行的,旨在在用户使用他/她的设备时持续验证用户。此外,我们提出的方法在基准数据集上优于现有的二维和三维人脸验证方法。
{"title":"Real-Time 3D Face Verification with a Consumer Depth Camera","authors":"Gregory P. Meyer, M. Do","doi":"10.1109/CRV.2018.00020","DOIUrl":"https://doi.org/10.1109/CRV.2018.00020","url":null,"abstract":"We present a system for accurate real-time 3D face verification using a low-quality consumer depth camera. To verify the identity of a subject, we built a high-quality reference model offline by fitting a 3D morphable model to a sequence of low-quality depth images. At runtime, we compare the similarity between the reference model and a single depth image by aligning the model to the image and measuring differences between every point on the two facial surfaces. The model and the image will not match exactly due to sensor noise, occlusions, as well as changes in expression, hairstyle, and eye-wear; therefore, we leverage a data driven approach to determine whether or not the model and the image match. We train a random decision forest to verify the identity of a subject where the point-to-point distances between the reference model and the depth image are used as input features to the classifier. Our approach runs in real-time and is designed to continuously authenticate a user as he/she uses his/her device. In addition, our proposed method outperforms existing 2D and 3D face verification methods on a benchmark data set.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127766981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Indoor Localization in Dynamic Human Environments Using Visual Odometry and Global Pose Refinement 基于视觉里程计和全局姿态优化的动态人类环境室内定位
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00057
Raghavender Sahdev, B. Chen, John K. Tsotsos
Indoor Localization is a primary task for social robots. We are particularly interested in how to solve this problem for a mobile robot using primarily vision sensors. This work examines a critical issue related to generalizing approaches for static environments to dynamic ones: (i) it considers how to deal with dynamic users in the environment that obscure landmarks that are key to safe navigation, and (ii) it considers how standard localization approaches for static environments can be augmented to deal with dynamic agents (e.g., humans). We propose an approach which integrates wheel odometry with stereo visual odometry and perform a global pose refinement to overcome previously accumulated errors due to visual and wheel odometry. We evaluate our approach through a series of controlled experiments to see how localization performance varies with increasing number of dynamic agents present in the scene.
室内定位是社交机器人的主要任务。我们特别感兴趣的是如何解决这个问题的移动机器人主要使用视觉传感器。这项工作研究了将静态环境的方法推广到动态环境的关键问题:(i)它考虑了如何处理环境中的动态用户,这些用户模糊了对安全导航至关重要的地标,(ii)它考虑了如何增强静态环境的标准定位方法来处理动态代理(例如,人类)。我们提出了一种将车轮里程计与立体视觉里程计相结合的方法,并进行全局姿态优化来克服先前由于视觉和车轮里程计而累积的误差。我们通过一系列控制实验来评估我们的方法,以了解本地化性能如何随着场景中动态代理数量的增加而变化。
{"title":"Indoor Localization in Dynamic Human Environments Using Visual Odometry and Global Pose Refinement","authors":"Raghavender Sahdev, B. Chen, John K. Tsotsos","doi":"10.1109/CRV.2018.00057","DOIUrl":"https://doi.org/10.1109/CRV.2018.00057","url":null,"abstract":"Indoor Localization is a primary task for social robots. We are particularly interested in how to solve this problem for a mobile robot using primarily vision sensors. This work examines a critical issue related to generalizing approaches for static environments to dynamic ones: (i) it considers how to deal with dynamic users in the environment that obscure landmarks that are key to safe navigation, and (ii) it considers how standard localization approaches for static environments can be augmented to deal with dynamic agents (e.g., humans). We propose an approach which integrates wheel odometry with stereo visual odometry and perform a global pose refinement to overcome previously accumulated errors due to visual and wheel odometry. We evaluate our approach through a series of controlled experiments to see how localization performance varies with increasing number of dynamic agents present in the scene.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124822751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Learning-Driven Depth from Defocus via Active Multispectral Quasi-Random Projections with Complex Subpatterns 基于复杂子模式的主动多光谱准随机投影的深度学习驱动离焦深度
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00048
A. Ma, A. Wong, David A Clausi
A promising approach to depth from defocus (DfD) involves actively projecting a quasi-random point pattern onto an object and assessing the blurriness of the point projection as captured by a camera to recover the depth of the scene. Recently, it was found that the depth inference can be made not only faster but also more accurate by leveraging deep learning approaches to computationally model and predict depth based on the quasi-random point projections as captured by a camera. Motivated by the fact that deep learning techniques can automatically learn useful features from the captured image of the projection, in this paper we present an extension of this quasi-random projection approach to DfD by introducing the use of a new quasi-random projection pattern consisting of complex subpatterns instead of points. The design and choice of the subpattern used in the quasi-random projection is a key factor in the ability to achieve improved depth recovery with high fidelity. Experimental results using quasi-random projection patterns composed of a variety of non-conventional subpattern designs on complex surfaces showed that the use of complex subpatterns in the quasi-random projection pattern can significantly improve depth reconstruction quality compared to a point pattern.
一种很有前途的离焦深度(DfD)方法包括主动将准随机点模式投影到物体上,并评估相机捕捉到的点投影的模糊程度,以恢复场景的深度。最近,人们发现,利用深度学习方法基于相机捕获的准随机点投影计算建模和预测深度,不仅可以更快而且更准确地进行深度推断。由于深度学习技术可以自动地从捕获的投影图像中学习有用的特征,在本文中,我们通过引入由复杂子模式而不是点组成的新的准随机投影模式,将这种准随机投影方法扩展到DfD。准随机投影中子模式的设计和选择是实现高保真深度恢复的关键因素。在复杂曲面上使用多种非常规子图案组成的准随机投影图案的实验结果表明,与点图案相比,在准随机投影图案中使用复杂子图案可以显著提高深度重建质量。
{"title":"Deep Learning-Driven Depth from Defocus via Active Multispectral Quasi-Random Projections with Complex Subpatterns","authors":"A. Ma, A. Wong, David A Clausi","doi":"10.1109/CRV.2018.00048","DOIUrl":"https://doi.org/10.1109/CRV.2018.00048","url":null,"abstract":"A promising approach to depth from defocus (DfD) involves actively projecting a quasi-random point pattern onto an object and assessing the blurriness of the point projection as captured by a camera to recover the depth of the scene. Recently, it was found that the depth inference can be made not only faster but also more accurate by leveraging deep learning approaches to computationally model and predict depth based on the quasi-random point projections as captured by a camera. Motivated by the fact that deep learning techniques can automatically learn useful features from the captured image of the projection, in this paper we present an extension of this quasi-random projection approach to DfD by introducing the use of a new quasi-random projection pattern consisting of complex subpatterns instead of points. The design and choice of the subpattern used in the quasi-random projection is a key factor in the ability to achieve improved depth recovery with high fidelity. Experimental results using quasi-random projection patterns composed of a variety of non-conventional subpattern designs on complex surfaces showed that the use of complex subpatterns in the quasi-random projection pattern can significantly improve depth reconstruction quality compared to a point pattern.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116388508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple Real-Time Multi-face Tracking Based on Convolutional Neural Networks 基于卷积神经网络的简单实时多人脸跟踪
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00054
Xile Li, J. Lang
We present a simple real-time system that is able to track multiple faces for live videos, broadcast, real-time conference recording, etc. Our proposed tracking system is comprised of three parts: face detection, feature extraction and tracking. We employ a previously proposed cascaded Multi-Task Convolutional Neural Network (MTCNN) to detect a face, a simple CNN to extract the features of detected faces and show that a shallow network for face tracking based on the extracted feature maps of the face is sufficient. Our multi-face tracker runs in real-time without any on-line training. We do not adjust any parameters according to different input videos, and the tracker's run-time will not significantly increase with an increase in the number of faces being tracked, i.e., it is easy to deploy in new real-time applications. We evaluate our tracker based on two commonly used metrics in comparison to five recent face trackers. Our proposed simple tracker can perform competitively in comparison to these trackers despite occlusions in the videos and false positives or false negatives during face detection.
我们提出了一个简单的实时系统,能够跟踪多个面孔的直播视频,广播,实时会议录制等。我们提出的跟踪系统由三个部分组成:人脸检测、特征提取和跟踪。我们使用先前提出的级联多任务卷积神经网络(MTCNN)来检测人脸,一个简单的CNN来提取被检测人脸的特征,并表明基于提取的人脸特征映射的浅网络用于人脸跟踪是足够的。我们的多面跟踪器实时运行,无需任何在线培训。我们不根据不同的输入视频调整任何参数,跟踪器的运行时间不会随着被跟踪人脸数量的增加而显著增加,即易于部署在新的实时应用中。我们根据两个常用的指标来评估我们的跟踪器,并与最近的五个面部跟踪器进行比较。我们提出的简单跟踪器可以与这些跟踪器相比具有竞争力,尽管视频中存在遮挡,并且在人脸检测过程中存在假阳性或假阴性。
{"title":"Simple Real-Time Multi-face Tracking Based on Convolutional Neural Networks","authors":"Xile Li, J. Lang","doi":"10.1109/CRV.2018.00054","DOIUrl":"https://doi.org/10.1109/CRV.2018.00054","url":null,"abstract":"We present a simple real-time system that is able to track multiple faces for live videos, broadcast, real-time conference recording, etc. Our proposed tracking system is comprised of three parts: face detection, feature extraction and tracking. We employ a previously proposed cascaded Multi-Task Convolutional Neural Network (MTCNN) to detect a face, a simple CNN to extract the features of detected faces and show that a shallow network for face tracking based on the extracted feature maps of the face is sufficient. Our multi-face tracker runs in real-time without any on-line training. We do not adjust any parameters according to different input videos, and the tracker's run-time will not significantly increase with an increase in the number of faces being tracked, i.e., it is easy to deploy in new real-time applications. We evaluate our tracker based on two commonly used metrics in comparison to five recent face trackers. Our proposed simple tracker can perform competitively in comparison to these trackers despite occlusions in the videos and false positives or false negatives during face detection.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125643410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Neural Networks: A Comparison on Different Computing Platforms 深度神经网络:不同计算平台的比较
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00060
M. Modasshir, Alberto Quattrini Li, Ioannis M. Rekleitis
Deep Neural Networks (DNN) have gained tremendous popularity over the last years for several computer vision tasks, including classification and object detection. Such techniques have been able to achieve human-level performance in many tasks and have produced results of unprecedented accuracy. As DNNs have intense computational requirements in the majority of applications, they utilize a cluster of computers or a cutting edge Graphical Processing Unit (GPU), often having excessive power consumption and generating a lot of heat. In many robotics applications the above requirements prove to be a challenge, as there is limited power on-board and heat dissipation is always a problem. In particular in underwater robotics with limited space, the above two requirements have been proven prohibitive. As first of this kind, this paper aims at analyzing and comparing the performance of several state-of-the-art DNNs on different platforms. With a focus on the underwater domain, the capabilities of the Jetson TX2 from NVIDIA and the Neural Compute Stick from Intel are of particular interest. Experiments on standard datasets show how different platforms are usable on an actual robotic system, providing insights on the current state-of-the-art embedded systems. Based on such results, we propose some guidelines in choosing the appropriate platform and network architecture for a robotic system.
在过去的几年里,深度神经网络(DNN)在一些计算机视觉任务中获得了极大的普及,包括分类和目标检测。这些技术已经能够在许多任务中达到人类水平的表现,并产生了前所未有的准确性。由于dnn在大多数应用中都有很强的计算需求,因此它们利用计算机集群或尖端图形处理单元(GPU),通常具有过高的功耗并产生大量热量。在许多机器人应用中,上述要求被证明是一个挑战,因为板上功率有限,散热始终是一个问题。特别是在空间有限的水下机器人中,上述两个要求已被证明是令人望而却步的。首先,本文旨在分析和比较几种最先进的深度神经网络在不同平台上的性能。专注于水下领域,NVIDIA的Jetson TX2和Intel的Neural Compute Stick的功能特别令人感兴趣。在标准数据集上的实验显示了不同平台在实际机器人系统上的可用性,为当前最先进的嵌入式系统提供了见解。基于这些结果,我们提出了一些为机器人系统选择合适的平台和网络架构的指导方针。
{"title":"Deep Neural Networks: A Comparison on Different Computing Platforms","authors":"M. Modasshir, Alberto Quattrini Li, Ioannis M. Rekleitis","doi":"10.1109/CRV.2018.00060","DOIUrl":"https://doi.org/10.1109/CRV.2018.00060","url":null,"abstract":"Deep Neural Networks (DNN) have gained tremendous popularity over the last years for several computer vision tasks, including classification and object detection. Such techniques have been able to achieve human-level performance in many tasks and have produced results of unprecedented accuracy. As DNNs have intense computational requirements in the majority of applications, they utilize a cluster of computers or a cutting edge Graphical Processing Unit (GPU), often having excessive power consumption and generating a lot of heat. In many robotics applications the above requirements prove to be a challenge, as there is limited power on-board and heat dissipation is always a problem. In particular in underwater robotics with limited space, the above two requirements have been proven prohibitive. As first of this kind, this paper aims at analyzing and comparing the performance of several state-of-the-art DNNs on different platforms. With a focus on the underwater domain, the capabilities of the Jetson TX2 from NVIDIA and the Neural Compute Stick from Intel are of particular interest. Experiments on standard datasets show how different platforms are usable on an actual robotic system, providing insights on the current state-of-the-art embedded systems. Based on such results, we propose some guidelines in choosing the appropriate platform and network architecture for a robotic system.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122646786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Hierarchical Feature Map Characterization in Fashion Interpretation 时尚诠释中的层次特征映射表征
Pub Date : 2018-05-08 DOI: 10.1109/CRV.2018.00022
M. Ziaeefard, J. Camacaro, C. Bessega
Recent advances in computer vision have been driven by the introduction of convolutional neural networks (ConvNets). Almost all existing methods that use hand-crafted features have been re-examined by ConvNets and achieved state of-the-art results on various tasks. However, how ConvNets features lead to outstanding performance is not completely interpretable to humans yet. In this paper, we propose a Hierarchical Feature Map Characterization (HFMC) pipeline in which semantic concepts are mapped to subsets of kernels based on feature maps and corresponding filter responses. We take a closer look at ConvNets feature maps and analyze how taking different sets of feature maps into account affect output accuracy. We first determine a set of kernels named Generic kernels and prune them from the network. We then extract a set of Semantic kernels and analyze their effects on the results. Generic kernels and Semantic kernels are extracted based on the co-occurrence and energy activation levels of feature maps in the network. To evaluate our proposed method, we design a visual recommendation system and apply our HFMC network to retrieve similar styles to query clothing items on the DeepFashion dataset. Extensive experiments demonstrate the effectiveness of our approach to the task of style retrieval on fashion products.
卷积神经网络(ConvNets)的引入推动了计算机视觉的最新进展。几乎所有使用手工特征的现有方法都经过了ConvNets的重新检查,并在各种任务上取得了最先进的结果。然而,卷积神经网络的特征是如何导致出色的性能的,人类还不能完全解释。在本文中,我们提出了一种层次特征映射表征(HFMC)管道,其中基于特征映射和相应的过滤器响应将语义概念映射到核子集。我们仔细研究了卷积神经网络的特征图,并分析了不同的特征图是如何影响输出精度的。我们首先确定一组称为通用核的核,并从网络中修剪它们。然后,我们提取一组语义核,并分析它们对结果的影响。基于网络中特征映射的共现度和能量激活度提取通用核和语义核。为了评估我们提出的方法,我们设计了一个视觉推荐系统,并应用我们的HFMC网络检索相似的风格来查询DeepFashion数据集上的服装项目。大量的实验证明了我们的方法在时尚产品风格检索任务中的有效性。
{"title":"Hierarchical Feature Map Characterization in Fashion Interpretation","authors":"M. Ziaeefard, J. Camacaro, C. Bessega","doi":"10.1109/CRV.2018.00022","DOIUrl":"https://doi.org/10.1109/CRV.2018.00022","url":null,"abstract":"Recent advances in computer vision have been driven by the introduction of convolutional neural networks (ConvNets). Almost all existing methods that use hand-crafted features have been re-examined by ConvNets and achieved state of-the-art results on various tasks. However, how ConvNets features lead to outstanding performance is not completely interpretable to humans yet. In this paper, we propose a Hierarchical Feature Map Characterization (HFMC) pipeline in which semantic concepts are mapped to subsets of kernels based on feature maps and corresponding filter responses. We take a closer look at ConvNets feature maps and analyze how taking different sets of feature maps into account affect output accuracy. We first determine a set of kernels named Generic kernels and prune them from the network. We then extract a set of Semantic kernels and analyze their effects on the results. Generic kernels and Semantic kernels are extracted based on the co-occurrence and energy activation levels of feature maps in the network. To evaluate our proposed method, we design a visual recommendation system and apply our HFMC network to retrieve similar styles to query clothing items on the DeepFashion dataset. Extensive experiments demonstrate the effectiveness of our approach to the task of style retrieval on fashion products.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122478671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting Symmetries of Distributions in CNNs and Folded Coding 利用cnn分布的对称性和折叠编码
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00017
Ehsan Emad Marvasti, Amir Emad Marvasti, H. Foroosh
We introduce the concept of Folded Coding" for continuous univariate distributions estimating the distribution and coding the samples simultaneously. Folded Coding assumes symmetries in the distribution and requires significantly fewer parameters compared to conventional models when the symmetry assumption is satisfied. We incorporate the mechanics of Folded Coding into Convolutional Neural Networks (CNN) in the form of layers referred to as Binary Expanded ReLU (BEReLU) Shared Convolutions and Instance Fully Connected (I-FC). BEReLU and I-FC force the network to have symmetric functionality in the space of samples. Therefore similar patterns of prediction are applied to sections of the space where the model does not have observed samples. We experimented with BEReLU on generic networks using different parameter sizes on CIFAR-10 and CIFAR-100. Our experiments show increased accuracy of the models equipped with the BEReLU layer when there are fewer parameters. The performance of the models with BEReLU layer remains similar to original network with the increase of parameter number. The experiments provide further evidence that estimation of distribution symmetry is part of CNNs' functionality.
对于连续单变量分布,我们引入了“折叠编码”的概念,在估计分布的同时对样本进行编码。折叠编码假设分布是对称的,在满足对称假设的情况下,与传统模型相比,折叠编码所需的参数要少得多。我们将折叠编码的机制以称为二进制扩展ReLU (BEReLU)共享卷积和实例完全连接(I-FC)层的形式纳入卷积神经网络(CNN)。BEReLU和I-FC迫使网络在样本空间中具有对称功能。因此,类似的预测模式应用于模型没有观察到样本的空间部分。我们在CIFAR-10和CIFAR-100上使用不同的参数大小在通用网络上对BEReLU进行了实验。我们的实验表明,当参数较少时,配备BEReLU层的模型精度提高。随着参数数的增加,带有BEReLU层的模型的性能与原始网络基本一致。实验进一步证明,估计分布对称性是cnn功能的一部分。
{"title":"Exploiting Symmetries of Distributions in CNNs and Folded Coding","authors":"Ehsan Emad Marvasti, Amir Emad Marvasti, H. Foroosh","doi":"10.1109/CRV.2018.00017","DOIUrl":"https://doi.org/10.1109/CRV.2018.00017","url":null,"abstract":"We introduce the concept of Folded Coding\" for continuous univariate distributions estimating the distribution and coding the samples simultaneously. Folded Coding assumes symmetries in the distribution and requires significantly fewer parameters compared to conventional models when the symmetry assumption is satisfied. We incorporate the mechanics of Folded Coding into Convolutional Neural Networks (CNN) in the form of layers referred to as Binary Expanded ReLU (BEReLU) Shared Convolutions and Instance Fully Connected (I-FC). BEReLU and I-FC force the network to have symmetric functionality in the space of samples. Therefore similar patterns of prediction are applied to sections of the space where the model does not have observed samples. We experimented with BEReLU on generic networks using different parameter sizes on CIFAR-10 and CIFAR-100. Our experiments show increased accuracy of the models equipped with the BEReLU layer when there are fewer parameters. The performance of the models with BEReLU layer remains similar to original network with the increase of parameter number. The experiments provide further evidence that estimation of distribution symmetry is part of CNNs' functionality.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129854583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-projector Resolution Enhancement Through Biased Interpolation 多投影仪分辨率增强通过偏置插值
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00035
Andrew Hryniowski, Ibrahim Ben Daya, A. Gawish, Mark Lamm, A. Wong, P. Fieguth
Projecting the same content with multiple overlapping projectors provides several advantages compared to using a single projector: increased brightness to overcome ambient light or projection surface anomalies, redundancy in case of projector failure, an increase in the area being projected on, and the possibility for increased content resolution. Multi-projector resolution enhancement is the process of using multiple projectors to achieve a resolution greater than any individual projector in the configuration. Current resolution enhancement techniques perform filtering on the sub-images produced by each projector using spatial or frequency based filters. The kernel based filtering adds significant overhead relative to the interpolation calculations. In addition the learned filters are extremely sensitive to calibration. This work develops a method for performing multi-projector resolution enhancement by integrating the filtering into the interpolation process. A system is developed to jointly condition multiple low resolution sub-images on each other to approximate high resolution original content.
与使用单个投影机相比,使用多个重叠的投影机投影相同的内容有几个优点:增加亮度以克服环境光或投影表面异常,投影机故障时的冗余,投影区域的增加,以及增加内容分辨率的可能性。多投影机分辨率增强是使用多台投影机实现比配置中的任何单个投影机更高的分辨率的过程。当前的分辨率增强技术使用基于空间或频率的滤波器对每个投影仪产生的子图像进行滤波。相对于插值计算,基于核的滤波增加了显著的开销。此外,所学习的滤波器对校准非常敏感。本工作开发了一种通过将滤波集成到插值过程中来执行多投影仪分辨率增强的方法。开发了一种将多个低分辨率子图像相互联合约束以近似高分辨率原始内容的系统。
{"title":"Multi-projector Resolution Enhancement Through Biased Interpolation","authors":"Andrew Hryniowski, Ibrahim Ben Daya, A. Gawish, Mark Lamm, A. Wong, P. Fieguth","doi":"10.1109/CRV.2018.00035","DOIUrl":"https://doi.org/10.1109/CRV.2018.00035","url":null,"abstract":"Projecting the same content with multiple overlapping projectors provides several advantages compared to using a single projector: increased brightness to overcome ambient light or projection surface anomalies, redundancy in case of projector failure, an increase in the area being projected on, and the possibility for increased content resolution. Multi-projector resolution enhancement is the process of using multiple projectors to achieve a resolution greater than any individual projector in the configuration. Current resolution enhancement techniques perform filtering on the sub-images produced by each projector using spatial or frequency based filters. The kernel based filtering adds significant overhead relative to the interpolation calculations. In addition the learned filters are extremely sensitive to calibration. This work develops a method for performing multi-projector resolution enhancement by integrating the filtering into the interpolation process. A system is developed to jointly condition multiple low resolution sub-images on each other to approximate high resolution original content.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"s3-30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130135284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Visual Object Tracking: The Initialisation Problem 视觉对象跟踪:初始化问题
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00029
George De Ath, R. Everson
Model initialisation is an important component of object tracking. Tracking algorithms are generally provided with the first frame of a sequence and a bounding box (BB) indicating the location of the object. This BB may contain a large number of background pixels in addition to the object and can lead to parts-based tracking algorithms initialising their object models in background regions of the BB. In this paper, we tackle this as a missing labels problem, marking pixels sufficiently away from the BB as belonging to the background and learning the labels of the unknown pixels. Three techniques, One-Class SVM (OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to the problem. These are evaluated with leave-one-video-out cross-validation on the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are capable of providing a good level of segmentation accuracy but are too parameter-dependent to be used in real-world scenarios. We show that LBDM achieves significantly increased performance with parameters selected by cross validation and we show that it is robust to parameter variation.
模型初始化是目标跟踪的重要组成部分。跟踪算法通常具有序列的第一帧和指示对象位置的边界框(BB)。除了对象之外,该BB可能还包含大量的背景像素,并且可能导致基于部件的跟踪算法在BB的背景区域初始化其对象模型。在本文中,我们将其作为缺少标签的问题来解决,将足够远离BB的像素标记为属于背景,并学习未知像素的标签。采用一类支持向量机(OC-SVM)、基于采样的背景模型(SBBM)(一种基于像素样本的新型背景模型)和基于学习的数字抠图(LBDM)三种技术来解决这一问题。通过在VOT2016跟踪基准上进行留一个视频交叉验证来评估这些。我们的评估表明,oc - svm和SBBM都能够提供良好的分割精度,但过于依赖于参数,无法在现实场景中使用。我们证明了LBDM通过交叉验证选择的参数显著提高了性能,并且我们证明了它对参数变化具有鲁棒性。
{"title":"Visual Object Tracking: The Initialisation Problem","authors":"George De Ath, R. Everson","doi":"10.1109/CRV.2018.00029","DOIUrl":"https://doi.org/10.1109/CRV.2018.00029","url":null,"abstract":"Model initialisation is an important component of object tracking. Tracking algorithms are generally provided with the first frame of a sequence and a bounding box (BB) indicating the location of the object. This BB may contain a large number of background pixels in addition to the object and can lead to parts-based tracking algorithms initialising their object models in background regions of the BB. In this paper, we tackle this as a missing labels problem, marking pixels sufficiently away from the BB as belonging to the background and learning the labels of the unknown pixels. Three techniques, One-Class SVM (OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to the problem. These are evaluated with leave-one-video-out cross-validation on the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are capable of providing a good level of segmentation accuracy but are too parameter-dependent to be used in real-world scenarios. We show that LBDM achieves significantly increased performance with parameters selected by cross validation and we show that it is robust to parameter variation.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130154941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2018 15th Conference on Computer and Robot Vision (CRV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1