首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
[Title page i] [标题页i]
Pub Date : 2019-01-01 DOI: 10.1109/wacv.2019.00001
{"title":"[Title page i]","authors":"","doi":"10.1109/wacv.2019.00001","DOIUrl":"https://doi.org/10.1109/wacv.2019.00001","url":null,"abstract":"","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134061331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Convolutional Neural Networks for In-Situ Video Monitoring of Selective Laser Melting 半监督卷积神经网络在选择性激光熔化现场视频监控中的应用
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00084
Bodi Yuan, B. Giera, G. Guss, Ibo Matthews, Sara McMains
Selective Laser Melting (SLM) is a metal additive manufacturing technique. The lack of SLM process repeatability is a barrier for industrial progression. SLM product quality is hard to control, even when using fixed system settings. Thus SLM could benefit from a monitoring system that provides quality assessments in real-time. Since there is no publicly available SLM dataset, we ran experiments to collect over one thousand videos, measured the physical output via height map images, and applied a proposed image processing algorithm to them to produce a dataset for semi-supervised learning. Then we trained convolutional neural networks (CNNs) to recognize desired quality metrics from videos. Experimental results demonstrate our the effectiveness of our proposed monitoring approach and also show that the semi-supervised model can mitigate the time and expense of labeling an entire SLM dataset.
选择性激光熔化(SLM)是一种金属增材制造技术。缺乏SLM过程的可重复性是工业发展的障碍。即使使用固定的系统设置,SLM产品质量也很难控制。因此,SLM可以从实时提供高质量评估的监测系统中受益。由于没有公开可用的SLM数据集,我们进行了实验,收集了1000多个视频,通过高度图图像测量了物理输出,并将提出的图像处理算法应用于它们,以生成用于半监督学习的数据集。然后我们训练卷积神经网络(cnn)从视频中识别所需的质量指标。实验结果证明了我们所提出的监测方法的有效性,也表明半监督模型可以减少标记整个SLM数据集的时间和费用。
{"title":"Semi-Supervised Convolutional Neural Networks for In-Situ Video Monitoring of Selective Laser Melting","authors":"Bodi Yuan, B. Giera, G. Guss, Ibo Matthews, Sara McMains","doi":"10.1109/WACV.2019.00084","DOIUrl":"https://doi.org/10.1109/WACV.2019.00084","url":null,"abstract":"Selective Laser Melting (SLM) is a metal additive manufacturing technique. The lack of SLM process repeatability is a barrier for industrial progression. SLM product quality is hard to control, even when using fixed system settings. Thus SLM could benefit from a monitoring system that provides quality assessments in real-time. Since there is no publicly available SLM dataset, we ran experiments to collect over one thousand videos, measured the physical output via height map images, and applied a proposed image processing algorithm to them to produce a dataset for semi-supervised learning. Then we trained convolutional neural networks (CNNs) to recognize desired quality metrics from videos. Experimental results demonstrate our the effectiveness of our proposed monitoring approach and also show that the semi-supervised model can mitigate the time and expense of labeling an entire SLM dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"56 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Single Image Deblurring and Camera Motion Estimation With Depth Map 单幅图像去模糊和相机运动估计与深度图
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00229
Liyuan Pan, Yuchao Dai, Miaomiao Liu
Camera shake during exposure is a major problem in hand-held photography, as it causes image blur that destroys details in the captured images. In the real world, such blur is mainly caused by both the camera motion and the complex scene structure. While considerable existing approaches have been proposed based on various assumptions regarding the scene structure or the camera motion, few existing methods could handle the real 6 DoF camera motion. In this paper, we propose to jointly estimate the 6 DoF camera motion and remove the non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with a single blurry image and its depth map (either direct depth measurements, or a learned depth map) as input. We formulate our joint deblurring and 6 DoF camera motion estimation as an energy minimization problem which is solved in an alternative manner. Our model enables the recovery of the 6 DoF camera motion and the latent clean image, which could also achieve the goal of generating a sharp sequence from a single blurry image. Experiments on challenging real-world and synthetic datasets demonstrate that image blur from camera shake can be well addressed within our proposed framework.
曝光时的相机抖动是手持摄影的一个主要问题,因为它会导致图像模糊,破坏拍摄图像的细节。在现实世界中,这种模糊主要是由摄像机的运动和复杂的场景结构造成的。虽然已有很多方法是基于场景结构或摄像机运动的各种假设提出的,但很少有方法可以处理真实的6自由度摄像机运动。在本文中,我们提出联合估计6 DoF相机运动,并通过利用其潜在的几何关系,以单个模糊图像及其深度图(直接深度测量或学习深度图)作为输入,消除由相机运动引起的不均匀模糊。我们将联合去模糊和6自由度相机运动估计作为能量最小化问题,并以另一种方式解决。我们的模型可以恢复6 DoF相机运动和潜在的干净图像,也可以实现从单个模糊图像生成清晰序列的目标。在具有挑战性的现实世界和合成数据集上的实验表明,在我们提出的框架内可以很好地解决相机抖动引起的图像模糊问题。
{"title":"Single Image Deblurring and Camera Motion Estimation With Depth Map","authors":"Liyuan Pan, Yuchao Dai, Miaomiao Liu","doi":"10.1109/WACV.2019.00229","DOIUrl":"https://doi.org/10.1109/WACV.2019.00229","url":null,"abstract":"Camera shake during exposure is a major problem in hand-held photography, as it causes image blur that destroys details in the captured images. In the real world, such blur is mainly caused by both the camera motion and the complex scene structure. While considerable existing approaches have been proposed based on various assumptions regarding the scene structure or the camera motion, few existing methods could handle the real 6 DoF camera motion. In this paper, we propose to jointly estimate the 6 DoF camera motion and remove the non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with a single blurry image and its depth map (either direct depth measurements, or a learned depth map) as input. We formulate our joint deblurring and 6 DoF camera motion estimation as an energy minimization problem which is solved in an alternative manner. Our model enables the recovery of the 6 DoF camera motion and the latent clean image, which could also achieve the goal of generating a sharp sequence from a single blurry image. Experiments on challenging real-world and synthetic datasets demonstrate that image blur from camera shake can be well addressed within our proposed framework.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130492688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Semantic Matching by Weakly Supervised 2D Point Set Registration 基于弱监督二维点集配准的语义匹配
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00118
Zakaria Laskar, H. R. Tavakoli, Juho Kannala
In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic key-points per image are available, but, without the correspondence information. To this end we propose a novel loss function based on cyclic consistency that solves this 2D point set registration problem by inferring the optimal geometric transformation model parameters. We train and test our approach on a standard benchmark dataset Proposal-Flow (PF-PASCAL). The proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method. In addition, we show our approach further benefits from additional training samples in PF-PASCAL generated by using category level information.
本文讨论了在同一对象的不同实例之间建立对应关系的问题。这个问题被提出为寻找对齐给定图像对的几何变换。我们使用卷积神经网络(CNN)直接回归变换模型的参数。对齐问题是在这样的情况下定义的:每个图像都有一组无序的语义关键点,但是没有对应的信息。为此,我们提出了一种新的基于循环一致性的损失函数,通过推断最优几何变换模型参数来解决二维点集配准问题。我们在一个标准基准数据集Proposal-Flow (PF-PASCAL)上训练和测试了我们的方法。所提出的方法达到了最先进的结果,证明了该方法的有效性。此外,我们还展示了我们的方法从使用类别级别信息生成的PF-PASCAL中额外的训练样本中获得的进一步好处。
{"title":"Semantic Matching by Weakly Supervised 2D Point Set Registration","authors":"Zakaria Laskar, H. R. Tavakoli, Juho Kannala","doi":"10.1109/WACV.2019.00118","DOIUrl":"https://doi.org/10.1109/WACV.2019.00118","url":null,"abstract":"In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic key-points per image are available, but, without the correspondence information. To this end we propose a novel loss function based on cyclic consistency that solves this 2D point set registration problem by inferring the optimal geometric transformation model parameters. We train and test our approach on a standard benchmark dataset Proposal-Flow (PF-PASCAL). The proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method. In addition, we show our approach further benefits from additional training samples in PF-PASCAL generated by using category level information.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Representation Learning Characterized by Inter-Class Separation for Image Clustering 基于类间分离的图像聚类深度表征学习
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00072
Dipanjan Das, Ratul Ghosh, B. Bhowmick
Despite significant advances in clustering methods in recent years, the outcome of clustering of a natural image dataset is still unsatisfactory due to two important drawbacks. Firstly, clustering of images needs a good feature representation of an image and secondly, we need a robust method which can discriminate these features for making them belonging to different clusters such that intra-class variance is less and inter-class variance is high. Often these two aspects are dealt with independently and thus the features are not sufficient enough to partition the data meaningfully. In this paper, we propose a method where we discover these features required for the separation of the images using deep autoencoder. Our method learns the image representation features automatically for the purpose of clustering and also select a coherent image and an incoherent image simultaneously for a given image so that the feature representation learning can learn better discriminative features for grouping the similar images in a cluster and at the same time separating the dissimilar images across clusters. Experiment results show that our method produces significantly better result than the state-of-the-art methods and we also show that our method is more generalized across different dataset without using any pre-trained model like other existing methods.
尽管近年来聚类方法取得了重大进展,但由于两个重要的缺陷,自然图像数据集的聚类结果仍然令人不满意。首先,图像聚类需要图像具有良好的特征表示,其次,我们需要一种鲁棒的方法来区分这些特征,使它们属于不同的聚类,使类内方差较小,类间方差较大。通常这两个方面是独立处理的,因此这些特征不足以对数据进行有意义的分区。在本文中,我们提出了一种方法,在该方法中,我们发现了使用深度自编码器分离图像所需的这些特征。该方法自动学习图像表示特征用于聚类,并对给定图像同时选择一个连贯图像和一个不连贯图像,从而使特征表示学习能够更好地学习到判别特征,以便在聚类中对相似图像进行分组,同时在聚类中对不同图像进行分离。实验结果表明,我们的方法比最先进的方法产生明显更好的结果,并且我们还表明,我们的方法在不同的数据集上更加一般化,而不像其他现有方法那样使用任何预训练模型。
{"title":"Deep Representation Learning Characterized by Inter-Class Separation for Image Clustering","authors":"Dipanjan Das, Ratul Ghosh, B. Bhowmick","doi":"10.1109/WACV.2019.00072","DOIUrl":"https://doi.org/10.1109/WACV.2019.00072","url":null,"abstract":"Despite significant advances in clustering methods in recent years, the outcome of clustering of a natural image dataset is still unsatisfactory due to two important drawbacks. Firstly, clustering of images needs a good feature representation of an image and secondly, we need a robust method which can discriminate these features for making them belonging to different clusters such that intra-class variance is less and inter-class variance is high. Often these two aspects are dealt with independently and thus the features are not sufficient enough to partition the data meaningfully. In this paper, we propose a method where we discover these features required for the separation of the images using deep autoencoder. Our method learns the image representation features automatically for the purpose of clustering and also select a coherent image and an incoherent image simultaneously for a given image so that the feature representation learning can learn better discriminative features for grouping the similar images in a cluster and at the same time separating the dissimilar images across clusters. Experiment results show that our method produces significantly better result than the state-of-the-art methods and we also show that our method is more generalized across different dataset without using any pre-trained model like other existing methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123163860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Hierarchical Grocery Store Image Dataset With Visual and Semantic Labels 具有视觉和语义标签的分层杂货店图像数据集
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00058
Marcus Klasson, Cheng Zhang, H. Kjellström
Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application - classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.
视觉支持系统和其他辅助设备中内置的图像分类模型需要提供对其环境的准确预测。我们专注于为视觉障碍人士提供辅助技术的应用,用于购物或烹饪等日常活动。在本文中,我们提供了一个新的基准数据集,用于该应用中的一个具有挑战性的任务-在杂货店中对水果,蔬菜和冷藏产品(例如牛奶包装和果汁纸箱)进行分类。为了使学习过程能够利用多种来源的结构化信息,该数据集不仅包含大量的自然图像,还包含来自在线购物网站的产品的相应信息。这些信息包含对象类的层次结构,以及每种类型对象的标志性图像。该数据集可用于训练和评估图像分类模型,以帮助自然环境中的视障人士。此外,我们还提供了在通常用于图像理解目的的预训练卷积神经网络上评估的基准结果,以及能够利用数据集中丰富的产品信息的多视图变分自编码器。
{"title":"A Hierarchical Grocery Store Image Dataset With Visual and Semantic Labels","authors":"Marcus Klasson, Cheng Zhang, H. Kjellström","doi":"10.1109/WACV.2019.00058","DOIUrl":"https://doi.org/10.1109/WACV.2019.00058","url":null,"abstract":"Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application - classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127581265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
FuturePose - Mixed Reality Martial Arts Training Using Real-Time 3D Human Pose Forecasting With a RGB Camera FuturePose -混合现实武术训练使用实时3D人体姿态预测与RGB相机
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00152
Erwin Wu, H. Koike
In this paper, we propose a novel mixed reality martial arts training system using deep learning based real-time human pose forecasting. Our training system is based on 3D pose estimation using a residual neural network with input from a RGB camera, which captures the motion of a trainer. The student wearing a head mounted display can see the virtual model of the trainer and his forecasted future pose. The pose forecasting is based on recurrent networks, to improve the learning quantity of the motion's temporal feature, we use a special lattice optical flow method for the joints movement estimation. We visualize the real-time human motion by a generated human model while the forecasted pose is shown by a red skeleton model. In our experiments, we evaluated the performance of our system when predicting 15 frames ahead in a 30-fps video (0.5s forecasting), the accuracies were acceptable since they are equal to or even outperforms some methods using depth IR cameras or fabric technologies, user studies showed that our system is helpful for beginners to understand martial arts and the usability is comfortable since the motions were captured by RGB camera.
在本文中,我们提出了一种基于深度学习的混合现实武术训练系统。我们的训练系统基于3D姿态估计,使用残差神经网络,输入来自RGB相机,该相机捕获训练器的运动。戴着头戴式显示器的学生可以看到教练的虚拟模型和他预测的未来姿势。姿态预测是基于循环网络的,为了提高运动时间特征的学习量,我们使用了一种特殊的点阵光流方法进行关节运动估计。我们通过生成的人体模型可视化实时人体运动,而预测的姿势由红色骨架模型显示。在我们的实验中,我们评估了我们的系统在预测30帧/秒视频(0.5s预测)中提前15帧时的性能,精度是可以接受的,因为它们等于甚至优于使用深度红外相机或织物技术的一些方法,用户研究表明我们的系统有助于初学者了解武术,并且可用性是舒适的,因为动作是由RGB相机捕获的。
{"title":"FuturePose - Mixed Reality Martial Arts Training Using Real-Time 3D Human Pose Forecasting With a RGB Camera","authors":"Erwin Wu, H. Koike","doi":"10.1109/WACV.2019.00152","DOIUrl":"https://doi.org/10.1109/WACV.2019.00152","url":null,"abstract":"In this paper, we propose a novel mixed reality martial arts training system using deep learning based real-time human pose forecasting. Our training system is based on 3D pose estimation using a residual neural network with input from a RGB camera, which captures the motion of a trainer. The student wearing a head mounted display can see the virtual model of the trainer and his forecasted future pose. The pose forecasting is based on recurrent networks, to improve the learning quantity of the motion's temporal feature, we use a special lattice optical flow method for the joints movement estimation. We visualize the real-time human motion by a generated human model while the forecasted pose is shown by a red skeleton model. In our experiments, we evaluated the performance of our system when predicting 15 frames ahead in a 30-fps video (0.5s forecasting), the accuracies were acceptable since they are equal to or even outperforms some methods using depth IR cameras or fabric technologies, user studies showed that our system is helpful for beginners to understand martial arts and the usability is comfortable since the motions were captured by RGB camera.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117096064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance 揭开多面视频摘要的神秘面纱:多样性、代表性、覆盖面和重要性之间的权衡
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00054
Vishal Kaushal, Rishabh K. Iyer, Khoshrav Doctor, Anurag Sahoo, P. Dubal, S. Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramakrishnan
This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and importance, and argue the utility of these different models depending on the application. While most of the prior work on submodular summarization approaches has focused on combining several models and learning weighted mixtures, we focus on the explainability of different models and featurizations, and how they apply to different domains. We also provide implementation details on summarization systems and the different modalities involved. We hope that the study from this paper will give insights into practitioners to appropriately choose the right summarization models for the problems at hand.
本文研究了一种统一的视频自动摘要方法。特别地,我们提出了一个用于抽取、查询库和实体摘要(如视频中的对象、场景、人和面孔等实体级别的摘要)的多方面摘要框架。我们研究了几种总结模型,这些模型捕捉了多样性、覆盖率、代表性和重要性的概念,并根据应用讨论了这些不同模型的效用。虽然之前关于子模块总结方法的大部分工作都集中在组合几个模型和学习加权混合上,但我们关注的是不同模型和特征的可解释性,以及它们如何应用于不同的领域。我们还提供了摘要系统的实施细节和所涉及的不同模式。我们希望本文的研究能给实践者提供一些见解,帮助他们针对手头的问题选择正确的总结模型。
{"title":"Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance","authors":"Vishal Kaushal, Rishabh K. Iyer, Khoshrav Doctor, Anurag Sahoo, P. Dubal, S. Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramakrishnan","doi":"10.1109/WACV.2019.00054","DOIUrl":"https://doi.org/10.1109/WACV.2019.00054","url":null,"abstract":"This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and importance, and argue the utility of these different models depending on the application. While most of the prior work on submodular summarization approaches has focused on combining several models and learning weighted mixtures, we focus on the explainability of different models and featurizations, and how they apply to different domains. We also provide implementation details on summarization systems and the different modalities involved. We hope that the study from this paper will give insights into practitioners to appropriately choose the right summarization models for the problems at hand.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Where to Focus on for Human Action Recognition? 人类行为识别的重点在哪里?
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00015
Srijan Das, Arpit Chaudhary, F. Brémond, M. Thonnat
In this paper, we present a new attention model for the recognition of human action from RGB-D videos. We propose an attention mechanism based on 3D articulated pose. The objective is to focus on the most relevant body parts involved in the action. For action classification, we propose a classification network compounded of spatio-temporal subnetworks modeling the appearance of human body parts and RNN attention subnetwork implementing our attention mechanism. Furthermore, we train our proposed network end-to-end using a regularized cross-entropy loss, leading to a joint training of the RNN delivering attention globally to the whole set of spatio-temporal features, extracted from 3D ConvNets. Our method outperforms the State-of-the-art methods on the largest human activity recognition dataset available to-date (NTU RGB+D Dataset) which is also multi-views and on a human action recognition dataset with object interaction (Northwestern-UCLA Multiview Action 3D Dataset).
在本文中,我们提出了一个新的注意力模型,用于从RGB-D视频中识别人类行为。我们提出了一种基于三维关节姿态的注意机制。目标是关注与动作相关的身体部位。对于动作分类,我们提出了一个由模拟人体部位外观的时空子网络和实现我们的注意机制的RNN注意子网络组成的分类网络。此外,我们使用正则化交叉熵损失对我们提出的网络进行端到端训练,导致RNN的联合训练,将注意力传递到从3D卷积神经网络中提取的整个时空特征集。我们的方法在迄今为止可用的最大的人类活动识别数据集(NTU RGB+D数据集)上优于最先进的方法,该数据集也是多视图的,并且在具有对象交互的人类动作识别数据集(西北加州大学洛杉矶分校多视图动作3D数据集)上优于最先进的方法。
{"title":"Where to Focus on for Human Action Recognition?","authors":"Srijan Das, Arpit Chaudhary, F. Brémond, M. Thonnat","doi":"10.1109/WACV.2019.00015","DOIUrl":"https://doi.org/10.1109/WACV.2019.00015","url":null,"abstract":"In this paper, we present a new attention model for the recognition of human action from RGB-D videos. We propose an attention mechanism based on 3D articulated pose. The objective is to focus on the most relevant body parts involved in the action. For action classification, we propose a classification network compounded of spatio-temporal subnetworks modeling the appearance of human body parts and RNN attention subnetwork implementing our attention mechanism. Furthermore, we train our proposed network end-to-end using a regularized cross-entropy loss, leading to a joint training of the RNN delivering attention globally to the whole set of spatio-temporal features, extracted from 3D ConvNets. Our method outperforms the State-of-the-art methods on the largest human activity recognition dataset available to-date (NTU RGB+D Dataset) which is also multi-views and on a human action recognition dataset with object interaction (Northwestern-UCLA Multiview Action 3D Dataset).","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126855722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Resultant Based Incremental Recovery of Camera Pose From Pairwise Matches 基于结果的相机姿势从成对匹配中增量恢复
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00120
Y. Kasten, M. Galun, R. Basri
Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image I_n given n-1 images, I_1,...,I_n-1, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with I_n. Our algorithm uses just six corresponding pairs of 2D points, extracted each from I_n and from any of the preceding n-1 images, allowing the recovery of the full six degrees of freedom of the n'th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein's theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach.
增量(在线)结构从运动管道寻求恢复与图像I_n相关联的相机矩阵给定n-1图像,I_1,…,I_n-1,其相机矩阵已经恢复。在本文中,我们引入了一种新的六点在线算法来恢复与I_n相关的外部参数。我们的算法只使用6对对应的2D点,分别从I_n和之前的n-1张图像中提取,允许恢复第n个相机的全部6个自由度,并且与普通方法不同,不需要跟踪三个或更多图像中的特征点。我们的新解决方案基于构建Dixon结,与现有解决方案相比,该解决方案既高效又准确。进一步利用Bernstein定理证明了复解个数的紧界。我们的实验证明了我们方法的实用性。
{"title":"Resultant Based Incremental Recovery of Camera Pose From Pairwise Matches","authors":"Y. Kasten, M. Galun, R. Basri","doi":"10.1109/WACV.2019.00120","DOIUrl":"https://doi.org/10.1109/WACV.2019.00120","url":null,"abstract":"Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image I_n given n-1 images, I_1,...,I_n-1, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with I_n. Our algorithm uses just six corresponding pairs of 2D points, extracted each from I_n and from any of the preceding n-1 images, allowing the recovery of the full six degrees of freedom of the n'th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein's theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126461420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1