首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
Computer Vision for the Visually Impaired: the Sound of Vision System 视障人士的计算机视觉:视觉之声系统
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.175
S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu
This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.
提出了一种基于计算机视觉的视障感觉替代装置。它的主要目标是为用户提供周围环境的3D表示,通过听觉和触觉来传达。该系统面临的最大挑战之一是确保普及性,即在任何室内或室外环境以及任何照明条件下都可以使用。这项工作揭示了用于开发这种感官替代装置的硬件(3D采集系统)和软件(3D处理管道),并提供了对其在各种场景中的开发的见解。盲人用户的初步实验显示了良好的可用性结果,并为系统改进提供了有价值的反馈。
{"title":"Computer Vision for the Visually Impaired: the Sound of Vision System","authors":"S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu","doi":"10.1109/ICCVW.2017.175","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.175","url":null,"abstract":"This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133349540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Dynamic Mode Decomposition for Background Modeling 动态模式分解背景建模
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.220
Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham
The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.
动态模态分解(DMD)是一种能够对视频流进行背景建模的时空矩阵分解方法。DMD是一种集傅里叶变换和奇异值分解于一体的回归技术。压缩感知的创新允许视频流的可扩展和快速分解,该分解随矩阵的固有秩而不是实际视频的大小而缩放。我们的研究结果表明,所得到的背景模型的质量是有竞争力的,通过f度量、召回率和精度来量化。GPU(图形处理单元)加速实现也可能允许算法有效地处理流数据。此外,可以利用许多数据流的本机压缩格式,例如高清视频和在傅里叶域中稀疏表示的计算物理代码,以大规模减少从CPU到GPU的数据传输,并启用稀疏矩阵乘法。
{"title":"Dynamic Mode Decomposition for Background Modeling","authors":"Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham","doi":"10.1109/ICCVW.2017.220","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.220","url":null,"abstract":"The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
PVNN: A Neural Network Library for Photometric Vision PVNN:用于光度视觉的神经网络库
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.69
Ye Yu, W. Smith
In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.
在本文中,我们展示了一个适合光度视觉任务的可微分的、基于物理的渲染器如何在深度神经网络中作为层实现。这些层包括用于表示转换的几何操作、任意数量光源的反射率评估和统计双向反射率分布函数(BRDF)模型。我们将这些层的实现作为Theano的神经网络库(PVNN)。这些层可以被整合到任何神经网络架构中,允许部分光度图像形成过程在通过反向传播端到端训练的网络中明确建模。作为示例应用,我们展示了如何训练具有编码器-解码器架构的网络,该网络学习以无监督的方式从单个图像中估计BRDF参数。
{"title":"PVNN: A Neural Network Library for Photometric Vision","authors":"Ye Yu, W. Smith","doi":"10.1109/ICCVW.2017.69","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.69","url":null,"abstract":"In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Siamese Networks for Chromosome Classification 染色体分类的暹罗网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.17
Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig
Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.
染色体核合是根据细胞图像中的大小、着丝粒位置和带型对23对人类染色体进行配对和排序的过程。临床细胞遗传学家经常使用中期核型分析人类染色体的诊断目的。它需要经验,领域的专业知识和相当大的人工努力,有效地执行核型和各种疾病的诊断。因此,自动化或甚至部分自动化是非常可取的,以协助技术人员并减少核型所必需的认知负荷。基于这些动机,在本文中,我们尝试通过借鉴深度学习的最新思想来开发染色体分类方法。更具体地说,我们对染色体进行拉直,并将它们输入到暹罗网络中,以使来自相似标签的样本嵌入得更近。此外,我们建议从成对数据集中进行平衡采样,同时为Siamese网络选择不同的训练对,并在从训练的Siamese网络获得的嵌入之上进行基于MLP的预测。我们在从医院收集的健康患者的真实世界数据集上进行实验,并通过在分类之前将它们应用于染色体图像,详尽地比较了不同矫直技术的效果。结果表明,该方法的训练和预测速度分别提高了83倍和3倍;同时超越了利用深度卷积神经网络创建的非常有竞争力的基线的性能。
{"title":"Siamese Networks for Chromosome Classification","authors":"Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig","doi":"10.1109/ICCVW.2017.17","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.17","url":null,"abstract":"Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125658186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Spatial Attention Improves Object Localization: A Biologically Plausible Neuro-Computational Model for Use in Virtual Reality 空间注意力提高对象定位:一种在虚拟现实中使用的生物学上合理的神经计算模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.320
A. Jamalian, Julia Bergelt, H. Dinkelbach
Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.
视觉注意是一种由大脑执行的智能机制,它可以避免不必要的处理,并将注意力集中在视觉场景中最相关的部分。它可以显著降低场景理解的计算复杂度。两种主要的自上而下的视觉注意信号是基于空间和特征的注意。前者处理场景中值得关注的地方,而后者更多地涉及物体的基本特征,如颜色、强度、边缘。原则上,产生空间注意信号有两个已知的来源:前额叶皮层的额眼场(FEF)和顶叶皮层的外侧顶叶内皮层(LIP)。本文首先介绍了一种腹侧流和背侧流的联合神经计算模型,然后在虚拟现实(VR)中证明了LIP提供的空间注意力作为跨跳记忆指针,加速了目标的定位。
{"title":"Spatial Attention Improves Object Localization: A Biologically Plausible Neuro-Computational Model for Use in Virtual Reality","authors":"A. Jamalian, Julia Bergelt, H. Dinkelbach","doi":"10.1109/ICCVW.2017.320","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.320","url":null,"abstract":"Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132555873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ViTS: Video Tagging System from Massive Web Multimedia Collections ViTS:基于海量网络多媒体馆藏的视频标签系统
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.48
Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou
The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1
随着网络上多媒体内容的普及,产生了对多媒体内容自动理解、索引和检索的需求。在本文中,我们提出了一种自动视频标记系统ViTS,它可以从视频、网络背景和社交网络上分享的评论中学习。ViTS通过互联网爬行分析大量多媒体收藏,并维护一个知识库,该知识库无需人工监督即可实时更新。因此,每个视频都用一组丰富的标签进行索引,并与其他相关内容链接。ViTS是一个正在开发的工业产品,拥有超过250万个概念词汇表,每月能够索引超过15万个视频。我们将我们的标签的质量和完整性与YouTube-8M数据集中的标签进行了比较,并展示了ViTS如何增强具有更多标签(10.04个标签/视频)的视频的语义注释,准确率为80,87%。提取的标签和视频摘要是公开的
{"title":"ViTS: Video Tagging System from Massive Web Multimedia Collections","authors":"Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou","doi":"10.1109/ICCVW.2017.48","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.48","url":null,"abstract":"The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Biophysical 3D Morphable Model of Face Appearance 面部外观的生物物理三维变形模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.102
S. Alotaibi, W. Smith
Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.
肤色在RGB空间中形成一个弯曲的流形。皮肤颜色的变化很大程度上是由黑色素和血红蛋白浓度的变化引起的。因此,外表或皮肤反照率的线性统计模型没有足够的约束(它们可以产生令人难以置信的肤色),并且缺乏紧凑性(它们需要额外的维度来线性近似弯曲的流形)。在本文中,我们建议使用皮肤颜色的生物物理模型,以便将皮肤颜色转换为可以进行线性统计建模的参数空间。因此,我们提出了生物物理和统计模型的混合。我们提出了一个皮肤颜色的两参数光谱模型,将模型拟合到光台上捕获的数据的方法,然后在这种注册数据的样本上建立我们的混合模型。我们展示了人脸编辑结果,并将我们的模型与直接建立在纹理上的纯统计模型进行了比较。
{"title":"A Biophysical 3D Morphable Model of Face Appearance","authors":"S. Alotaibi, W. Smith","doi":"10.1109/ICCVW.2017.102","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.102","url":null,"abstract":"Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images 用于大型自然图像语义分割的阶梯式密度集
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.37
Josip Krapac, Ivan Kreso, Sinisa Segvic
Recent progress of deep image classification models provides a large potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUS. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized PASCAL images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a DenseNet-based ladder-style architecture which is able to deliver high modelling power with very lean representations at the original resolution. The resulting fully convolutional models have few parameters, allow training at megapixel resolution on commodity hardware and display fair semantic segmentation performance even without ImageNet pre-training. We present experiments on Cityscapes and Pascal VOC 2012 datasets and report competitive results.
深度图像分类模型的最新进展为提高相关计算机视觉任务的最新性能提供了巨大的潜力。然而,向语义分割的过渡受到当代gpu严格的内存限制的阻碍。即使对于中等大小的PASCAL图像,卷积反向支撑所需的特征映射缓存的范围也带来了重大挑战,同时当源分辨率在百万像素范围内时,需要仔细考虑架构。为了解决这些问题,我们提出了一个基于densenet的阶梯式架构,它能够在原始分辨率下以非常精简的表示提供高建模能力。得到的全卷积模型参数很少,允许在商用硬件上以百万像素分辨率进行训练,即使没有ImageNet预训练,也能显示出相当好的语义分割性能。我们在cityscape和Pascal VOC 2012数据集上进行了实验,并报告了具有竞争力的结果。
{"title":"Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images","authors":"Josip Krapac, Ivan Kreso, Sinisa Segvic","doi":"10.1109/ICCVW.2017.37","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.37","url":null,"abstract":"Recent progress of deep image classification models provides a large potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUS. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized PASCAL images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a DenseNet-based ladder-style architecture which is able to deliver high modelling power with very lean representations at the original resolution. The resulting fully convolutional models have few parameters, allow training at megapixel resolution on commodity hardware and display fair semantic segmentation performance even without ImageNet pre-training. We present experiments on Cityscapes and Pascal VOC 2012 datasets and report competitive results.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133897071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Double-Task Deep Q-Learning with Multiple Views 基于多视图的双任务深度q学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.128
Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao
Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.
深度强化学习使自主机器人能够在最少的人为干预下学习大量的行为技能。然而,直接深度强化学习的应用一直受到限制。对于复杂的机器人系统,这些限制来自于高维动作空间、机器人系统的高度自由度和图像之间的高度相关性。本文引入了动作空间的新定义,提出了一种基于double-DQN和决斗- dqn的双任务多视图深度q网络(DMDQN)。为了扩展,我们为更复杂的作业定义了多任务模型。此外,还采用了数据增强策略,包括自动采样和动作翻转。将DMDQN与数据扩充相结合,形成勘探策略。为了机器人系统的稳定探索,根据工作条件设计了安全约束。实验表明,我们的多视图双任务DQN模型比单任务单视图模型性能更好。结合我们的DMDQN和数据增强,机器人系统可以以一种探索的方式到达目标。
{"title":"Double-Task Deep Q-Learning with Multiple Views","authors":"Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao","doi":"10.1109/ICCVW.2017.128","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.128","url":null,"abstract":"Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"239 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Margin Based Semi-Supervised Elastic Embedding for Face Image Analysis 基于边缘的半监督弹性嵌入人脸图像分析
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.156
F. Dornaika, Y. E. Traboulsi
This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.
介绍了一种基于图的半监督弹性嵌入方法及其核化版本,用于人脸图像的嵌入和分类。该框架将柔性流形嵌入和基于非线性图的嵌入相结合,用于半监督学习。在这两种方法中,非线性流形和映射(线性方法是线性变换,核方法是核乘子)同时估计,克服了级联估计的缺点。不同于许多受样本外问题困扰的最先进的非线性嵌入方法,我们提出的方法具有直接的样本外扩展到新样本。我们在四个公共数据库上进行了人脸识别和基于图像的人脸定位问题的实验。这些实验显示了基于标签传播或基于图的半监督嵌入的最先进算法的改进。
{"title":"Margin Based Semi-Supervised Elastic Embedding for Face Image Analysis","authors":"F. Dornaika, Y. E. Traboulsi","doi":"10.1109/ICCVW.2017.156","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.156","url":null,"abstract":"This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116355711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1