首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Video-Based Face Alignment With Local Motion Modeling 基于视频的人脸对齐与局部运动建模
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00228
Romain Belmonte, Nacim Ihaddadene, Pierre Tirilly, Ioan Marius Bilasco, C. Djeraba
Face alignment remains difficult under uncontrolled conditions due to the many variations that may considerably impact facial appearance. Recently, video-based approaches have been proposed, which take advantage of temporal coherence to improve robustness. These new approaches suffer from limited temporal connectivity. We show that early, direct pixel connectivity enables the detection of local motion patterns and the learning of a hierarchy of motion features. We integrate local motion to the two predominant models in the literature, coordinate regression networks and heatmap regression networks, and combine it with late connectivity based on recurrent neural networks. The experimental results on two datasets, 300VW and SNaP-2DFe, show that local motion improves video-based face alignment and is complementary to late temporal information. Despite the simplicity of the proposed architectures, our best model provides competitive performance with more complex models from the literature.
在不受控制的条件下,面部对齐仍然很困难,因为许多变化可能会严重影响面部外观。最近,人们提出了基于视频的方法,利用时间相干性来提高鲁棒性。这些新方法存在时间连接有限的问题。我们表明,早期的直接像素连接可以检测局部运动模式并学习运动特征的层次结构。我们将局部运动整合到文献中的两种主要模型中,坐标回归网络和热图回归网络,并将其与基于递归神经网络的晚期连通性相结合。在300VW和SNaP-2DFe两个数据集上的实验结果表明,局部运动改善了基于视频的人脸对齐,并与后期时间信息相补充。尽管所提出的体系结构很简单,但我们最好的模型提供了与文献中更复杂的模型竞争的性能。
{"title":"Video-Based Face Alignment With Local Motion Modeling","authors":"Romain Belmonte, Nacim Ihaddadene, Pierre Tirilly, Ioan Marius Bilasco, C. Djeraba","doi":"10.1109/WACV.2019.00228","DOIUrl":"https://doi.org/10.1109/WACV.2019.00228","url":null,"abstract":"Face alignment remains difficult under uncontrolled conditions due to the many variations that may considerably impact facial appearance. Recently, video-based approaches have been proposed, which take advantage of temporal coherence to improve robustness. These new approaches suffer from limited temporal connectivity. We show that early, direct pixel connectivity enables the detection of local motion patterns and the learning of a hierarchy of motion features. We integrate local motion to the two predominant models in the literature, coordinate regression networks and heatmap regression networks, and combine it with late connectivity based on recurrent neural networks. The experimental results on two datasets, 300VW and SNaP-2DFe, show that local motion improves video-based face alignment and is complementary to late temporal information. Despite the simplicity of the proposed architectures, our best model provides competitive performance with more complex models from the literature.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124072576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Good Choices for Deep Convolutional Feature Encoding 深度卷积特征编码的好选择
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00039
Yu Wang, Jien Kato
Deep convolutional neural networks can be used to produce discriminative image level features. However, when they are used as the feature extractor in a feature encoding pipeline, there are many design choices that are need to be made. In this work, we conduct a comprehensive study on deep convolutional feature encoding, by paying a special attention on its feature extraction aspect. We mainly evaluated the choices of the encoding methods; the choices of the base DCNN models; and the choices of the data augmentation methods. We not only quantitatively confirmed some known and previously unknown good choices for deep convolutional feature encoding, but also found out that some known good choices tune out to be bad. Base on the observations in the experiments, we present a very simple deep feature encoding pipeline, and confirmed its state-of-the-art performances on multiple image recognition datasets.
深度卷积神经网络可以用来产生判别的图像级特征。然而,当它们被用作特征编码管道中的特征提取器时,需要做出许多设计选择。在这项工作中,我们对深度卷积特征编码进行了全面的研究,特别关注其特征提取方面。主要评价了编码方法的选择;基本DCNN模型的选择;以及数据增强方法的选择。我们不仅定量地确定了一些已知的和以前未知的深度卷积特征编码的好选择,而且还发现了一些已知的好选择是坏的。在实验观察的基础上,我们提出了一种非常简单的深度特征编码管道,并在多个图像识别数据集上验证了其最先进的性能。
{"title":"Good Choices for Deep Convolutional Feature Encoding","authors":"Yu Wang, Jien Kato","doi":"10.1109/WACV.2019.00039","DOIUrl":"https://doi.org/10.1109/WACV.2019.00039","url":null,"abstract":"Deep convolutional neural networks can be used to produce discriminative image level features. However, when they are used as the feature extractor in a feature encoding pipeline, there are many design choices that are need to be made. In this work, we conduct a comprehensive study on deep convolutional feature encoding, by paying a special attention on its feature extraction aspect. We mainly evaluated the choices of the encoding methods; the choices of the base DCNN models; and the choices of the data augmentation methods. We not only quantitatively confirmed some known and previously unknown good choices for deep convolutional feature encoding, but also found out that some known good choices tune out to be bad. Base on the observations in the experiments, we present a very simple deep feature encoding pipeline, and confirmed its state-of-the-art performances on multiple image recognition datasets.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124282005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hidden States Exploration for 3D Skeleton-Based Gesture Recognition 基于3D骨骼的手势识别的隐藏状态探索
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00201
Xin Liu, Henglin Shi, Xiaopeng Hong, Haoyu Chen, D. Tao, Guoying Zhao
3D skeletal data has recently attracted wide attention in human behavior analysis for its robustness to variant scenes, while accurate gesture recognition is still challenging. The main reason lies in the high intra-class variance caused by temporal dynamics. A solution is resorting to the generative models, such as the hidden Markov model (HMM). However, existing methods commonly assume fixed anchors for each hidden state, which is hard to depict the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture's "hold" phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, the Long Short-Term Memory (LSTM) is utilized to learn probability distributions over states of HMM. The proposed method is validated on two challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures and actions, and achieve state-of-the-art performance.
近年来,三维骨骼数据因其对不同场景的鲁棒性而在人类行为分析中受到广泛关注,但准确的手势识别仍然是一个挑战。究其原因,主要是由于时间动态导致的类内方差较大。一个解决方案是诉诸于生成模型,如隐马尔可夫模型(HMM)。然而,现有的方法通常为每个隐藏状态假设固定的锚点,这很难描述手势的明确时间结构。基于观察到手势是一个具有明确相位的时间序列,我们提出了一种通过低秩矩阵分解来构建手势时间组成的新公式。唯一的假设是,手势的“保持”阶段与静态姿势之间是线性相关的。因此,手势序列可以被分割成具有语义意义和判别概念的时间状态。此外,与传统HMM在估计发射概率时倾向于使用特定距离度量进行聚类而忽略时间上下文信息不同,该方法利用长短期记忆(LSTM)来学习HMM在状态上的概率分布。在两个具有挑战性的数据集上验证了该方法。实验表明,我们的方法可以有效地处理各种手势和动作,并达到最先进的性能。
{"title":"Hidden States Exploration for 3D Skeleton-Based Gesture Recognition","authors":"Xin Liu, Henglin Shi, Xiaopeng Hong, Haoyu Chen, D. Tao, Guoying Zhao","doi":"10.1109/WACV.2019.00201","DOIUrl":"https://doi.org/10.1109/WACV.2019.00201","url":null,"abstract":"3D skeletal data has recently attracted wide attention in human behavior analysis for its robustness to variant scenes, while accurate gesture recognition is still challenging. The main reason lies in the high intra-class variance caused by temporal dynamics. A solution is resorting to the generative models, such as the hidden Markov model (HMM). However, existing methods commonly assume fixed anchors for each hidden state, which is hard to depict the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture's \"hold\" phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, the Long Short-Term Memory (LSTM) is utilized to learn probability distributions over states of HMM. The proposed method is validated on two challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures and actions, and achieve state-of-the-art performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122002572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning Generator Networks for Dynamic Patterns 动态模式的学习生成器网络
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00091
Tian Han, Yang Lu, Jiawen Wu, X. Xing, Y. Wu
We address the problem of learning dynamic patterns from unlabeled video sequences, either in the form of generating new video sequences, or recovering incomplete video sequences. This problem is challenging because the appearances and motions in the video sequences can be very complex. We propose to use the alternating back-propagation algorithm to learn the generator network with the spatial-temporal convolutional architecture. The proposed method is efficient and flexible. It can not only generate realistic video sequences, but can also recover the incomplete video sequences in the testing stage or even in the learning stage. The proposed algorithm can be further improved by using learned initialization which is useful for the recovery tasks. Further, the proposed algorithm can naturally help to learn the shared representation between different modalities. Our experiments show that our method is competitive with the existing state of the art methods both qualitatively and quantitatively.
我们解决了从未标记的视频序列中学习动态模式的问题,无论是以生成新视频序列的形式,还是以恢复不完整的视频序列的形式。这个问题具有挑战性,因为视频序列中的外观和运动可能非常复杂。我们提出使用交替反向传播算法来学习具有时空卷积结构的生成器网络。该方法高效、灵活。它不仅可以生成逼真的视频序列,还可以在测试阶段甚至学习阶段恢复不完整的视频序列。该算法可以通过使用学习初始化来进一步改进,这对恢复任务很有用。此外,所提出的算法可以自然地帮助学习不同模态之间的共享表示。我们的实验表明,我们的方法在定性和定量上都与现有的最先进的方法相竞争。
{"title":"Learning Generator Networks for Dynamic Patterns","authors":"Tian Han, Yang Lu, Jiawen Wu, X. Xing, Y. Wu","doi":"10.1109/WACV.2019.00091","DOIUrl":"https://doi.org/10.1109/WACV.2019.00091","url":null,"abstract":"We address the problem of learning dynamic patterns from unlabeled video sequences, either in the form of generating new video sequences, or recovering incomplete video sequences. This problem is challenging because the appearances and motions in the video sequences can be very complex. We propose to use the alternating back-propagation algorithm to learn the generator network with the spatial-temporal convolutional architecture. The proposed method is efficient and flexible. It can not only generate realistic video sequences, but can also recover the incomplete video sequences in the testing stage or even in the learning stage. The proposed algorithm can be further improved by using learned initialization which is useful for the recovery tasks. Further, the proposed algorithm can naturally help to learn the shared representation between different modalities. Our experiments show that our method is competitive with the existing state of the art methods both qualitatively and quantitatively.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115253819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Region-based active learning for efficient labeling in semantic segmentation 基于区域的主动学习在语义分割中的高效标注
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00123
Tejaswi Kasarla, G. Nagendar, Guruprasad M. Hegde, V. Balasubramanian, C. V. Jawahar
As vision-based autonomous systems, such as self-driving vehicles, become a reality, there is an increasing need for large annotated datasets for developing solutions to vision tasks. One important task that has seen significant interest in recent years is semantic segmentation. However, the cost of annotating every pixel for semantic segmentation is immense, and can be prohibitive in scaling to various settings and locations. In this paper, we propose a region-based active learning method for efficient labeling in semantic segmentation. Using the proposed active learning strategy, we show that we are able to judiciously select the regions for annotation such that we obtain 93.8% of the baseline performance (when all pixels are labeled) with labeling of 10% of the total number of pixels. Further, we show that this approach can be used to transfer annotations from a model trained on a given dataset (Cityscapes) to a different dataset (Mapillary), thus highlighting its promise and potential.
随着自动驾驶汽车等基于视觉的自主系统成为现实,越来越需要大型注释数据集来开发视觉任务的解决方案。语义分割是近年来备受关注的一项重要任务。然而,为语义分割标注每个像素的成本是巨大的,并且在扩展到各种设置和位置时可能会令人望而却步。本文提出了一种基于区域的主动学习方法,用于语义分割中的高效标注。使用提出的主动学习策略,我们表明我们能够明智地选择标注区域,这样我们就可以获得基线性能的93.8%(当所有像素都被标记时),标记像素总数的10%。此外,我们展示了这种方法可以用于将在给定数据集(cityscape)上训练的模型的注释转移到不同的数据集(Mapillary),从而突出了它的前景和潜力。
{"title":"Region-based active learning for efficient labeling in semantic segmentation","authors":"Tejaswi Kasarla, G. Nagendar, Guruprasad M. Hegde, V. Balasubramanian, C. V. Jawahar","doi":"10.1109/WACV.2019.00123","DOIUrl":"https://doi.org/10.1109/WACV.2019.00123","url":null,"abstract":"As vision-based autonomous systems, such as self-driving vehicles, become a reality, there is an increasing need for large annotated datasets for developing solutions to vision tasks. One important task that has seen significant interest in recent years is semantic segmentation. However, the cost of annotating every pixel for semantic segmentation is immense, and can be prohibitive in scaling to various settings and locations. In this paper, we propose a region-based active learning method for efficient labeling in semantic segmentation. Using the proposed active learning strategy, we show that we are able to judiciously select the regions for annotation such that we obtain 93.8% of the baseline performance (when all pixels are labeled) with labeling of 10% of the total number of pixels. Further, we show that this approach can be used to transfer annotations from a model trained on a given dataset (Cityscapes) to a different dataset (Mapillary), thus highlighting its promise and potential.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124914917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Video Summarization Via Actionness Ranking 通过行动排名进行视频总结
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00085
Mohamed Elfeki, A. Borji
To automatically produce a brief yet expressive summary of a long video, an automatic algorithm should start by resembling the human process of summary generation. Prior work proposed supervised and unsupervised algorithms to train models for learning the underlying behavior of humans by increasing modeling complexity or craft-designing better heuristics to simulate human summary generation process. In this work, we take a different approach by analyzing a major cue that humans exploit for summary generation; the nature and intensity of actions. We empirically observed that a frame is more likely to be included in human-generated summaries if it contains a substantial amount of deliberate motion performed by an agent, which is referred to as actionness. Therefore, we hypothesize that learning to automatically generate summaries involves an implicit knowledge of actionness estimation and ranking. We validate our hypothesis by running a user study that explores the correlation between human-generated summaries and actionness ranks. We also run a consensus and behavioral analysis between human subjects to ensure reliable and consistent results. The analysis exhibits a considerable degree of agreement among subjects within obtained data and verifying our initial hypothesis. Based on the study findings, we develop a method to incorporate actionness data to explicitly regulate a learning algorithm that is trained for summary generation. We assess the performance of our approach on 4 summarization benchmark datasets, and demonstrate an evident advantage compared to state-of-the-art summarization methods.
为了自动生成一个简短而富有表现力的长视频摘要,一个自动算法应该从类似于人类生成摘要的过程开始。先前的工作提出了监督和无监督算法,通过增加建模复杂性或工艺设计更好的启发式来模拟人类摘要生成过程,来训练模型以学习人类的潜在行为。在这项工作中,我们采取了不同的方法,通过分析人类利用摘要生成的主要线索;行动的性质和强度。我们根据经验观察到,如果一个框架包含大量由代理执行的故意动作(即行动性),那么它更有可能被包含在人类生成的摘要中。因此,我们假设学习自动生成摘要涉及对行动估计和排序的隐性知识。我们通过运行一项用户研究来验证我们的假设,该研究探索了人工生成的摘要与行动等级之间的相关性。我们还在人类受试者之间进行共识和行为分析,以确保可靠和一致的结果。该分析在获得的数据和验证我们最初的假设中显示出相当程度的一致性。基于研究结果,我们开发了一种方法,将行动性数据纳入明确规范学习算法,该算法被训练用于摘要生成。我们在4个摘要基准数据集上评估了我们的方法的性能,并展示了与最先进的摘要方法相比的明显优势。
{"title":"Video Summarization Via Actionness Ranking","authors":"Mohamed Elfeki, A. Borji","doi":"10.1109/WACV.2019.00085","DOIUrl":"https://doi.org/10.1109/WACV.2019.00085","url":null,"abstract":"To automatically produce a brief yet expressive summary of a long video, an automatic algorithm should start by resembling the human process of summary generation. Prior work proposed supervised and unsupervised algorithms to train models for learning the underlying behavior of humans by increasing modeling complexity or craft-designing better heuristics to simulate human summary generation process. In this work, we take a different approach by analyzing a major cue that humans exploit for summary generation; the nature and intensity of actions. We empirically observed that a frame is more likely to be included in human-generated summaries if it contains a substantial amount of deliberate motion performed by an agent, which is referred to as actionness. Therefore, we hypothesize that learning to automatically generate summaries involves an implicit knowledge of actionness estimation and ranking. We validate our hypothesis by running a user study that explores the correlation between human-generated summaries and actionness ranks. We also run a consensus and behavioral analysis between human subjects to ensure reliable and consistent results. The analysis exhibits a considerable degree of agreement among subjects within obtained data and verifying our initial hypothesis. Based on the study findings, we develop a method to incorporate actionness data to explicitly regulate a learning algorithm that is trained for summary generation. We assess the performance of our approach on 4 summarization benchmark datasets, and demonstrate an evident advantage compared to state-of-the-art summarization methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123363759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Visualizing Deep Similarity Networks 可视化深度相似网络
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00220
Abby Stylianou, Richard Souvenir, Robert Pless
For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fine-tuned learn to focus on different features. We also generalize our approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image.
对于优化图像嵌入的卷积神经网络模型,我们提出了一种方法来突出图像中对两两相似性贡献最大的区域。这项工作是为分类网络开发的可视化工具的必然结果,但适用于更适合相似学习的问题领域。可视化显示了经过微调的相似网络是如何学会关注不同特征的。我们还将我们的方法推广到使用不同池化策略的嵌入网络,并提供了一种简单的机制来支持对查询图像中的对象或子区域进行图像相似性搜索。
{"title":"Visualizing Deep Similarity Networks","authors":"Abby Stylianou, Richard Souvenir, Robert Pless","doi":"10.1109/WACV.2019.00220","DOIUrl":"https://doi.org/10.1109/WACV.2019.00220","url":null,"abstract":"For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fine-tuned learn to focus on different features. We also generalize our approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129669918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Multispectral Direct-Global Separation of Dynamic Scenes 动态场景多光谱直接全局分离
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00209
M. Torii, Takahiro Okabe, Toshiyuki Amano
In this paper, we propose a method for separating direct and global components of a dynamic scene per illumination color by using a projector-camera system; it exploits both the color switch and the temporal dithering of a DLP projector. Our proposed method is easy-to-implement because it does not require any self-built equipment and temporal synchronization between a projector and a camera. In addition, our method automatically calibrates the projector-camera correspondence in a dynamic scene on the basis of the consistency in pixel intensities, and optimizes the projection pattern on the basis of noise propagation analysis. We implemented the prototype setup and achieved multispectral direct-global separation of dynamic scenes in 60 Hz. Furthermore, we demonstrated that our method is effective for applications such as image-based material editing and multispectral relighting of dynamic scenes where wavelength-dependent phenomena such as fluorescence are observed.
在本文中,我们提出了一种使用投影-摄像机系统分离每个照明颜色的动态场景的直接和全局分量的方法;它利用了DLP投影仪的颜色开关和时间抖动。我们提出的方法易于实现,因为它不需要任何自建设备和投影仪和摄像机之间的时间同步。此外,该方法基于像素强度的一致性自动校准动态场景中的投影机-摄像机对应关系,并基于噪声传播分析优化投影模式。我们实现了原型设置,并实现了60 Hz动态场景的多光谱直接全局分离。此外,我们证明了我们的方法对于基于图像的材料编辑和动态场景的多光谱重照明等应用是有效的,其中可以观察到波长相关现象(如荧光)。
{"title":"Multispectral Direct-Global Separation of Dynamic Scenes","authors":"M. Torii, Takahiro Okabe, Toshiyuki Amano","doi":"10.1109/WACV.2019.00209","DOIUrl":"https://doi.org/10.1109/WACV.2019.00209","url":null,"abstract":"In this paper, we propose a method for separating direct and global components of a dynamic scene per illumination color by using a projector-camera system; it exploits both the color switch and the temporal dithering of a DLP projector. Our proposed method is easy-to-implement because it does not require any self-built equipment and temporal synchronization between a projector and a camera. In addition, our method automatically calibrates the projector-camera correspondence in a dynamic scene on the basis of the consistency in pixel intensities, and optimizes the projection pattern on the basis of noise propagation analysis. We implemented the prototype setup and achieved multispectral direct-global separation of dynamic scenes in 60 Hz. Furthermore, we demonstrated that our method is effective for applications such as image-based material editing and multispectral relighting of dynamic scenes where wavelength-dependent phenomena such as fluorescence are observed.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125245300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SPaSe - Multi-Label Page Segmentation for Presentation Slides 用于演示幻灯片的多标签页面分割
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00082
Monica Haurilet, Ziad Al-Halah, R. Stiefelhagen
We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.
我们介绍了第一个用于幻灯片页面分割的基准数据集。演示幻灯片是用于在网络、教育机构和企业之间交换思想的最重要的文档类型之一。该文档格式具有复杂的布局,其中包含丰富的图形(例如图表、徽标)、文本(例如标题、从属关系)和结构组件(例如枚举、图例)。由于缺乏注释数据,现代机器学习技术仍然无法实现这种庞大而流行的知识来源。为了解决这个问题,我们引入了SPaSe (Slide Page Segmentation),这是一个新的数据集,总共包含2000张幻灯片,其中包含25个类的密集的像素级注释。我们展示了幻灯片分割揭示了这个任务的一些有趣的特性。与常见的图像分割问题不同,不相交的类往往具有很高的区域重叠,从而使该分割任务成为一个多标签问题。此外,幻灯片中经常遇到的许多类都是位置敏感的(例如标题、脚注)。因此,我们相信我们的数据集代表了一个具有挑战性和有趣的新分割模型基准。最后,我们在我们的数据集上评估了最先进的深度分割模型,并表明它适合开发深度学习模型,而不需要任何预训练。我们的数据集将向公众发布,以促进对这一有趣任务的进一步研究。
{"title":"SPaSe - Multi-Label Page Segmentation for Presentation Slides","authors":"Monica Haurilet, Ziad Al-Halah, R. Stiefelhagen","doi":"10.1109/WACV.2019.00082","DOIUrl":"https://doi.org/10.1109/WACV.2019.00082","url":null,"abstract":"We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
No-Reference Image Quality Assessment: An Attention Driven Approach 无参考图像质量评估:一种注意力驱动的方法
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00046
Diqi Chen, Yizhou Wang, Hongyu Ren, Wen Gao
In this paper, we tackle no-reference image quality assessment (NR-IQA), which aims to predict the perceptual quality of a test image without referencing its pristine-quality counterpart. The free-energy brain theory implies that the human visual system (HVS) tends to predict the pristine image while perceiving a distorted one. Besides, image quality assessment heavily depends on the way how human beings attend to distorted images. Motivated by that, the distorted image is restored first. Then given the distorted-restored pair, we make the first attempt to formulate the NR-IQA as a dynamic attentional process and implement it via reinforcement learning. The reward is derived from two tasks—classifying the distortion type and predicting the perceptual score of a test image. The model learns a policy to sample a sequence of fixation areas with a goal to maximize the expectation of the accumulated rewards. The observations of the fixation areas are aggregated through a recurrent neural network (RNN) and the robust averaging strategy which assigns different weights on different fixation areas. Extensive experiments on TID2008, TID2013 and CSIQ demonstrate the superiority of our method.
在本文中,我们解决了无参考图像质量评估(NR-IQA),其目的是在不参考原始图像质量的情况下预测测试图像的感知质量。自由能脑理论暗示人类视觉系统(HVS)倾向于预测原始图像,而感知扭曲的图像。此外,图像质量评估在很大程度上取决于人们对扭曲图像的处理方式。受此驱动,被扭曲的图像首先被还原。然后,给定扭曲恢复对,我们首次尝试将NR-IQA描述为一个动态注意过程,并通过强化学习实现它。该奖励来源于两个任务-分类失真类型和预测测试图像的感知得分。该模型学习了一种策略,对一系列固定区域进行采样,目标是最大化累积奖励的期望。通过递归神经网络(RNN)和鲁棒平均策略对不同的注视区域分配不同的权重,对注视区域的观察结果进行聚合。在TID2008、TID2013和CSIQ上的大量实验证明了该方法的优越性。
{"title":"No-Reference Image Quality Assessment: An Attention Driven Approach","authors":"Diqi Chen, Yizhou Wang, Hongyu Ren, Wen Gao","doi":"10.1109/WACV.2019.00046","DOIUrl":"https://doi.org/10.1109/WACV.2019.00046","url":null,"abstract":"In this paper, we tackle no-reference image quality assessment (NR-IQA), which aims to predict the perceptual quality of a test image without referencing its pristine-quality counterpart. The free-energy brain theory implies that the human visual system (HVS) tends to predict the pristine image while perceiving a distorted one. Besides, image quality assessment heavily depends on the way how human beings attend to distorted images. Motivated by that, the distorted image is restored first. Then given the distorted-restored pair, we make the first attempt to formulate the NR-IQA as a dynamic attentional process and implement it via reinforcement learning. The reward is derived from two tasks—classifying the distortion type and predicting the perceptual score of a test image. The model learns a policy to sample a sequence of fixation areas with a goal to maximize the expectation of the accumulated rewards. The observations of the fixation areas are aggregated through a recurrent neural network (RNN) and the robust averaging strategy which assigns different weights on different fixation areas. Extensive experiments on TID2008, TID2013 and CSIQ demonstrate the superiority of our method.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1