首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

英文 中文
Measuring Energy Expenditure in Sports by Thermal Video Analysis 用热视频分析测量运动中的能量消耗
Rikke Gade, R. Larsen, T. Moeslund
Estimation of human energy expenditure in sports and exercise contributes to performance analyses and tracking of physical activity levels. The focus of this work is to develop a video-based method for estimation of energy expenditure in athletes. We propose a method using thermal video analysis to automatically extract the cyclic motion pattern, in walking and running represented as steps, and analyse the frequency. Experiments are performed with one subject in two different tests, each at 5, 8, 10, and 12 km/h. The results of our proposed video-based method is compared to concurrent measurements of oxygen uptake. These initial experiments indicate a correlation between estimated step frequency and oxygen uptake. Based on the preliminary results we conclude that the proposed method has potential as a future non-invasive approach to estimate energy expenditure during sports.
对运动和锻炼中人体能量消耗的估计有助于对身体活动水平的表现分析和跟踪。这项工作的重点是开发一种基于视频的方法来估计运动员的能量消耗。提出了一种利用热视频分析自动提取步行和跑步中以步长表示的循环运动模式,并分析频率的方法。实验在两种不同的测试中进行,分别以5、8、10和12公里/小时的速度进行。我们提出的基于视频的方法的结果与同时测量的摄氧量进行了比较。这些初步实验表明,估计的步进频率和摄氧量之间存在相关性。基于初步结果,我们得出结论,所提出的方法有潜力作为未来的非侵入性方法来估计运动期间的能量消耗。
{"title":"Measuring Energy Expenditure in Sports by Thermal Video Analysis","authors":"Rikke Gade, R. Larsen, T. Moeslund","doi":"10.1109/CVPRW.2017.29","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.29","url":null,"abstract":"Estimation of human energy expenditure in sports and exercise contributes to performance analyses and tracking of physical activity levels. The focus of this work is to develop a video-based method for estimation of energy expenditure in athletes. We propose a method using thermal video analysis to automatically extract the cyclic motion pattern, in walking and running represented as steps, and analyse the frequency. Experiments are performed with one subject in two different tests, each at 5, 8, 10, and 12 km/h. The results of our proposed video-based method is compared to concurrent measurements of oxygen uptake. These initial experiments indicate a correlation between estimated step frequency and oxygen uptake. Based on the preliminary results we conclude that the proposed method has potential as a future non-invasive approach to estimate energy expenditure during sports.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"19 1","pages":"187-194"},"PeriodicalIF":0.0,"publicationDate":"2017-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88407443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Generating 5D Light Fields in Scattering Media for Representing 3D Images 在散射介质中产生5D光场以表示三维图像
E. Yuasa, Fumihiko Sakaue, J. Sato
In this paper, we propose a novel method for displaying 3D images based on a 5D light field representation. In our method, the light fields emitted by a light field projector are projected into 3D scattering media such as fog. The intensity of light lays projected into the scattering media decreases because of the scattering effect of the media. As a result, 5D light fields are generated in the scattering media. The proposed method models the relationship between the 5D light fields and observed images, and uses the relationship for projecting light fields so that the observed image changes according to the viewpoint of observers. In order to achieve accurate and efficient 3D image representation, we describe the relationship not by using a parametric model, but by using an observation based model obtained from a point spread function (PSF) of scattering media. The experimental results show the efficiency of the proposed method.
在本文中,我们提出了一种基于5D光场表示的3D图像显示方法。在我们的方法中,光场投影仪发出的光场被投射到三维散射介质中,如雾。由于介质的散射作用,入射到散射介质中的光强度降低。因此,在散射介质中产生5D光场。该方法建立了5D光场与观测图像之间的关系模型,并利用该关系投影光场,使观测图像根据观测者的视点发生变化。为了实现准确、高效的三维图像表示,本文采用基于散射介质点扩散函数(PSF)的观测模型,而不是参数模型来描述两者之间的关系。实验结果表明了该方法的有效性。
{"title":"Generating 5D Light Fields in Scattering Media for Representing 3D Images","authors":"E. Yuasa, Fumihiko Sakaue, J. Sato","doi":"10.1109/CVPRW.2017.169","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.169","url":null,"abstract":"In this paper, we propose a novel method for displaying 3D images based on a 5D light field representation. In our method, the light fields emitted by a light field projector are projected into 3D scattering media such as fog. The intensity of light lays projected into the scattering media decreases because of the scattering effect of the media. As a result, 5D light fields are generated in the scattering media. The proposed method models the relationship between the 5D light fields and observed images, and uses the relationship for projecting light fields so that the observed image changes according to the viewpoint of observers. In order to achieve accurate and efficient 3D image representation, we describe the relationship not by using a parametric model, but by using an observation based model obtained from a point spread function (PSF) of scattering media. The experimental results show the efficiency of the proposed method.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"18 1","pages":"1287-1294"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73559430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trusting the Computer in Computer Vision: A Privacy-Affirming Framework 在计算机视觉中信任计算机:一个隐私确认框架
A. Chen, M. Biglari-Abhari, K. Wang
The use of surveillance cameras continues to increase, ranging from conventional applications such as law enforcement to newer scenarios with looser requirements such as gathering business intelligence. Humans still play an integral part in using and interpreting the footage from these systems, but are also a significant factor in causing unintentional privacy breaches. As computer vision methods continue to improve, we argue in this position paper that system designers should reconsider the role of machines in surveillance, and how automation can be used to help protect privacy. We explore this by discussing the impact of the human-in-the-loop, the potential for using abstraction and distributed computing to further privacy goals, and an approach for determining when video footage should be hidden from human users. We propose that in an ideal surveillance scenario, a privacy-affirming framework causes collected camera footage to be processed by computers directly, and never shown to humans. This implicitly requires humans to establish trust, to believe that computer vision systems can generate sufficiently accurate results without human supervision, so that if information about people must be gathered, unintentional data collection is mitigated as much as possible.
监控摄像头的使用在不断增加,从传统的应用,如执法,到要求较宽松的新场景,如收集商业情报。人类在使用和解读这些系统的视频中仍然扮演着不可或缺的角色,但也是造成无意隐私泄露的重要因素。随着计算机视觉方法的不断改进,我们认为系统设计者应该重新考虑机器在监控中的作用,以及如何使用自动化来帮助保护隐私。我们通过讨论人在环的影响,使用抽象和分布式计算进一步实现隐私目标的潜力,以及确定何时应该对人类用户隐藏视频片段的方法来探讨这一点。我们建议,在理想的监控场景中,隐私确认框架会使收集到的摄像机镜头直接由计算机处理,而不会向人类展示。这隐含地要求人类建立信任,相信计算机视觉系统可以在没有人类监督的情况下产生足够准确的结果,因此,如果必须收集有关人的信息,则尽可能减少无意的数据收集。
{"title":"Trusting the Computer in Computer Vision: A Privacy-Affirming Framework","authors":"A. Chen, M. Biglari-Abhari, K. Wang","doi":"10.1109/CVPRW.2017.178","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.178","url":null,"abstract":"The use of surveillance cameras continues to increase, ranging from conventional applications such as law enforcement to newer scenarios with looser requirements such as gathering business intelligence. Humans still play an integral part in using and interpreting the footage from these systems, but are also a significant factor in causing unintentional privacy breaches. As computer vision methods continue to improve, we argue in this position paper that system designers should reconsider the role of machines in surveillance, and how automation can be used to help protect privacy. We explore this by discussing the impact of the human-in-the-loop, the potential for using abstraction and distributed computing to further privacy goals, and an approach for determining when video footage should be hidden from human users. We propose that in an ideal surveillance scenario, a privacy-affirming framework causes collected camera footage to be processed by computers directly, and never shown to humans. This implicitly requires humans to establish trust, to believe that computer vision systems can generate sufficiently accurate results without human supervision, so that if information about people must be gathered, unintentional data collection is mitigated as much as possible.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"65 1","pages":"1360-1367"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74401892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
I Know That Person: Generative Full Body and Face De-identification of People in Images 我认识那个人:图像中人的生成全身和面部去认同
K. Brkić, I. Sikirić, T. Hrkać, Z. Kalafatić
We propose a model for full body and face deidentification of humans in images. Given a segmentation of the human figure, our model generates a synthetic human image with an alternative appearance that looks natural and fits the segmentation outline. The model is usable with various levels of segmentation, from simple human figure blobs to complex garment-level segmentations. The level of detail in the de-identified output depends on the level of detail in the input segmentation. The model de-identifies not only primary biometric identifiers (e.g. the face), but also soft and non-biometric identifiers including clothing, hairstyle, etc. Quantitative and perceptual experiments indicate that our model produces de-identified outputs that thwart human and machine recognition, while preserving data utility and naturalness.
我们提出了一种人体和人脸图像去识别模型。给定人体的分割,我们的模型生成具有替代外观的合成人体图像,该图像看起来自然且符合分割轮廓。该模型可用于各种级别的分割,从简单的人体斑点到复杂的服装级别分割。去识别输出的细节程度取决于输入分割的细节程度。该模型不仅去识别主要的生物特征(如面部),还去识别软特征和非生物特征,如服装、发型等。定量和感知实验表明,我们的模型产生的去识别输出阻碍了人类和机器的识别,同时保留了数据的实用性和自然性。
{"title":"I Know That Person: Generative Full Body and Face De-identification of People in Images","authors":"K. Brkić, I. Sikirić, T. Hrkać, Z. Kalafatić","doi":"10.1109/CVPRW.2017.173","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.173","url":null,"abstract":"We propose a model for full body and face deidentification of humans in images. Given a segmentation of the human figure, our model generates a synthetic human image with an alternative appearance that looks natural and fits the segmentation outline. The model is usable with various levels of segmentation, from simple human figure blobs to complex garment-level segmentations. The level of detail in the de-identified output depends on the level of detail in the input segmentation. The model de-identifies not only primary biometric identifiers (e.g. the face), but also soft and non-biometric identifiers including clothing, hairstyle, etc. Quantitative and perceptual experiments indicate that our model produces de-identified outputs that thwart human and machine recognition, while preserving data utility and naturalness.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"10 1","pages":"1319-1328"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82057116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Real-Time Hand Grasp Recognition Using Weakly Supervised Two-Stage Convolutional Neural Networks for Understanding Manipulation Actions 基于弱监督两阶段卷积神经网络的实时手抓识别
Ji Woong Kim, Sujeong You, S. Ji, Hong-Seok Kim
Understanding human hand usage is one of the richest information source to recognize human manipulation actions. Since humans use various tools during actions, grasp recognition gives important cues to figure out humans' intention and tasks. Earlier studies analyzed grasps with positions of hand joints by attaching sensors, but since these types of sensors prevent humans from naturally conducting actions, visual approaches have been focused in recent years. Convolutional neural networks require a vast annotated dataset, but, to our knowledge, no human grasping dataset includes ground truth of hand regions. In this paper, we propose a grasp recognition method only with image-level labels by the weakly supervised learning framework. In addition, we split the grasp recognition process into two stages that are hand localization and grasp classification so as to speed up. Experimental results demonstrate that the proposed method outperforms existing methods and can perform in real-time.
了解人手的使用是识别人类操作动作的最丰富的信息源之一。由于人类在行动过程中使用各种工具,掌握识别为了解人类的意图和任务提供了重要线索。早期的研究通过附加传感器来分析手关节位置的抓取,但由于这些类型的传感器会阻止人类自然地进行动作,因此近年来的研究重点是视觉方法。卷积神经网络需要一个庞大的带注释的数据集,但是,据我们所知,没有一个人类抓取数据集包含手区域的基础真值。本文提出了一种基于弱监督学习框架的图像级标签抓取识别方法。此外,我们将抓握识别过程分为手部定位和抓握分类两个阶段,以加快识别速度。实验结果表明,该方法优于现有方法,具有较好的实时性。
{"title":"Real-Time Hand Grasp Recognition Using Weakly Supervised Two-Stage Convolutional Neural Networks for Understanding Manipulation Actions","authors":"Ji Woong Kim, Sujeong You, S. Ji, Hong-Seok Kim","doi":"10.1109/CVPRW.2017.67","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.67","url":null,"abstract":"Understanding human hand usage is one of the richest information source to recognize human manipulation actions. Since humans use various tools during actions, grasp recognition gives important cues to figure out humans' intention and tasks. Earlier studies analyzed grasps with positions of hand joints by attaching sensors, but since these types of sensors prevent humans from naturally conducting actions, visual approaches have been focused in recent years. Convolutional neural networks require a vast annotated dataset, but, to our knowledge, no human grasping dataset includes ground truth of hand regions. In this paper, we propose a grasp recognition method only with image-level labels by the weakly supervised learning framework. In addition, we split the grasp recognition process into two stages that are hand localization and grasp classification so as to speed up. Experimental results demonstrate that the proposed method outperforms existing methods and can perform in real-time.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"41 1","pages":"481-483"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88110755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What Will I Do Next? The Intention from Motion Experiment 接下来我该做什么?动作意图实验
Andrea Zunino, Jacopo Cavazza, A. Koul, A. Cavallo, C. Becchio, Vittorio Murino
In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems.In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside).We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.
在计算机视觉中,基于视频的方法已被广泛用于对动作或活动的早期分类和预测。然而,目前尚不清楚这种模式(与3D运动学相比)是否仍然可以可靠地预测人类意图,定义为嵌入在动作序列中的总体目标。由于相同的动作可以用不同的意图来执行,所以这个问题更具挑战性,但正如通过动作捕捉系统获得的3D运动学的定量认知研究所证明的那样。在本文中,我们通过展示基于视频的方法预测人类意图的有效性,将认知和计算机视觉研究联系起来。准确地说,我们提出了“来自运动的意图”,这是一种新的范式,在不使用任何上下文信息的情况下,我们考虑了涉及瓶子的瞬时抓取运动行为,以预测瓶子本身被拿到的原因(传递它或放在盒子里,或倒或喝里面的液体)。我们只处理抓取事件,将意图预测作为分类框架。利用我们的多模态采集(3D动作捕捉数据和2D光学视频),我们将认知研究中最常用的3D描述符与最先进的基于视频的技术进行了比较。由于这两种分析达到了相当的性能,我们证明计算机视觉工具在捕获运动学和面对人类意图预测的认知问题方面是有效的。
{"title":"What Will I Do Next? The Intention from Motion Experiment","authors":"Andrea Zunino, Jacopo Cavazza, A. Koul, A. Cavallo, C. Becchio, Vittorio Murino","doi":"10.1109/CVPRW.2017.7","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.7","url":null,"abstract":"In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems.In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside).We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"19 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79219674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation 基于对帧运动串联的深度卷积网络视频动作识别
Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang
Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.
基于深度卷积网络的策略在不同的识别任务中表现出了显著的性能。不幸的是,在各种现实场景中,准确和稳健的识别是困难的,特别是对视频。不同的挑战,如混乱的背景或观点变化等,可能会产生诸如巨大的内在和外在类别变化等问题。此外,数据不足的问题也会使设计的模型在学习和更新过程中降级。因此,将逐帧运动引入动态学习模型的有效方法在当代视频分析研究中越来越受到关注。,,,,,,为了克服这些限制,在这项工作中,我们提出了一种基于更深层次卷积网络的方法,该方法具有成对运动连接,称为深度时间卷积网络。在这项工作中,引入了一个时间运动积累机制作为卷积网络学习的有效数据输入。具体而言,为了处理可能存在的数据不足,还利用了转移ResNet-101权值和数据变异增强的有益做法,以达到鲁棒识别的目的。在具有挑战性的数据集UCF101和ODAR数据集上的实验表明,与其他先进的研究成果相比,该方法具有更好的性能。
{"title":"Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation","authors":"Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang","doi":"10.1109/CVPRW.2017.162","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.162","url":null,"abstract":"Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 1","pages":"1226-1235"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91235967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
DeepSpace: Mood-Based Image Texture Generation for Virtual Reality from Music DeepSpace:基于情绪的音乐虚拟现实图像纹理生成
Misha Sra, Prashanth Vijayaraghavan, Ognjen Rudovic, P. Maes, D. Roy
Affective virtual spaces are of interest for many VR applications in areas of wellbeing, art, education, and entertainment. Creating content for virtual environments is a laborious task involving multiple skills like 3D modeling, texturing, animation, lighting, and programming. One way to facilitate content creation is to automate sub-processes like assignment of textures and materials within virtual environments. To this end, we introduce the DeepSpace approach that automatically creates and applies image textures to objects in procedurally created 3D scenes. The main novelty of our DeepSpace approach is that it uses music to automatically create kaleidoscopic textures for virtual environments designed to elicit emotional responses in users. Specifically, DeepSpace exploits the modeling power of deep neural networks, which have shown great performance in image generation tasks, to achieve mood-based image generation. Our study results indicate the virtual environments created by DeepSpace elicit positive emotions and achieve high presence scores.
情感虚拟空间是许多VR应用在健康、艺术、教育和娱乐领域的兴趣所在。为虚拟环境创建内容是一项费力的任务,涉及多种技能,如3D建模、纹理、动画、照明和编程。促进内容创建的一种方法是自动化子过程,如在虚拟环境中分配纹理和材料。为此,我们介绍了DeepSpace方法,该方法自动创建并将图像纹理应用于程序创建的3D场景中的对象。我们的DeepSpace方法的主要新颖之处在于,它使用音乐自动为虚拟环境创建万花千月的纹理,旨在引发用户的情绪反应。具体而言,DeepSpace利用深度神经网络的建模能力来实现基于情绪的图像生成,深度神经网络在图像生成任务中表现出色。我们的研究结果表明,深度空间创造的虚拟环境引发了积极的情绪,并获得了较高的存在分数。
{"title":"DeepSpace: Mood-Based Image Texture Generation for Virtual Reality from Music","authors":"Misha Sra, Prashanth Vijayaraghavan, Ognjen Rudovic, P. Maes, D. Roy","doi":"10.1109/CVPRW.2017.283","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.283","url":null,"abstract":"Affective virtual spaces are of interest for many VR applications in areas of wellbeing, art, education, and entertainment. Creating content for virtual environments is a laborious task involving multiple skills like 3D modeling, texturing, animation, lighting, and programming. One way to facilitate content creation is to automate sub-processes like assignment of textures and materials within virtual environments. To this end, we introduce the DeepSpace approach that automatically creates and applies image textures to objects in procedurally created 3D scenes. The main novelty of our DeepSpace approach is that it uses music to automatically create kaleidoscopic textures for virtual environments designed to elicit emotional responses in users. Specifically, DeepSpace exploits the modeling power of deep neural networks, which have shown great performance in image generation tasks, to achieve mood-based image generation. Our study results indicate the virtual environments created by DeepSpace elicit positive emotions and achieve high presence scores.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"33 1","pages":"2289-2298"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88457622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Estimation of Affective Level in the Wild with Multiple Memory Networks 基于多记忆网络的野外情感水平估计
Jianshu Li, Yunpeng Chen, Shengtao Xiao, Jian Zhao, S. Roy, Jiashi Feng, Shuicheng Yan, T. Sim
This paper presents the proposed solution to the "affect in the wild" challenge, which aims to estimate the affective level, i.e. the valence and arousal values, of every frame in a video. A carefully designed deep convolutional neural network (a variation of residual network) for affective level estimation of facial expressions is first implemented as a baseline. Next we use multiple memory networks to model the temporal relations between the frames. Finally ensemble models are used to combine the predictions from multiple memory networks. Our proposed solution outperforms the baseline model by a factor of 10.62% in terms of mean square error (MSE).
本文提出了“野外情感”挑战的解决方案,该挑战旨在估计视频中每帧的情感水平,即价值和唤醒值。首先将精心设计的深度卷积神经网络(残差网络的一种变体)用于面部表情的情感水平估计作为基线。接下来,我们使用多个记忆网络来建模帧之间的时间关系。最后,采用集成模型对多个记忆网络的预测结果进行组合。我们提出的解决方案在均方误差(MSE)方面优于基线模型10.62%。
{"title":"Estimation of Affective Level in the Wild with Multiple Memory Networks","authors":"Jianshu Li, Yunpeng Chen, Shengtao Xiao, Jian Zhao, S. Roy, Jiashi Feng, Shuicheng Yan, T. Sim","doi":"10.1109/CVPRW.2017.244","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.244","url":null,"abstract":"This paper presents the proposed solution to the \"affect in the wild\" challenge, which aims to estimate the affective level, i.e. the valence and arousal values, of every frame in a video. A carefully designed deep convolutional neural network (a variation of residual network) for affective level estimation of facial expressions is first implemented as a baseline. Next we use multiple memory networks to model the temporal relations between the frames. Finally ensemble models are used to combine the predictions from multiple memory networks. Our proposed solution outperforms the baseline model by a factor of 10.62% in terms of mean square error (MSE).","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"3 1","pages":"1947-1954"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78432413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Rear-Stitched View Panorama: A Low-Power Embedded Implementation for Smart Rear-View Mirrors on Vehicles 后缝全景视图:车辆智能后视镜的低功耗嵌入式实现
Janice Pan, Vikram V. Appia, Jesse Villarreal, Lucas Weaver, Do-Kyoung Kwon
Automobiles are currently equipped with a three-mirror system for rear-view visualization. The two side-view mirrors show close the periphery on the left and right sides of the vehicle, and the center rear-view mirror is typically adjusted to allow the driver to see through the vehicle's rear windshield. This three-mirror system, however, imposes safety concerns in requiring drivers to shift their attention and gaze to look in each mirror to obtain a full visualization of the rear-view surroundings, which takes attention off the scene in front of the vehicle. We present an alternative to the three-mirror rear-view system, which we call Rear-Stitched View Panorama (RSVP). The proposed system uses four rear-facing cameras, strategically placed to overcome the traditional blind spot problem, and stitches the feeds from each camera together to generate a single panoramic view, which can display the entire rear surroundings. We project individually captured frames onto a single virtual view using precomputed system calibration parameters. Then we determine optimal seam lines, along which the images are fused together to form the single RSVP view presented to the driver. Furthermore, we highlight techniques that enable efficient embedded implementation of the system and showcase a real-time system utilizing under 2W of power, making it suitable for in-cabin deployment in vehicles.
目前,汽车配备了三镜系统,以实现后视镜的可视化。两个侧视镜靠近车辆左右两侧的外围,而中央后视镜通常会进行调整,以使驾驶员能够透过车辆的后挡风玻璃看到车辆。然而,这种三镜系统要求驾驶员将注意力和目光转移到每一面镜子上,以获得对后视镜周围环境的全面可视化,从而分散了对车辆前方场景的注意力,从而带来了安全问题。我们提出了一个替代三镜后视镜系统,我们称之为后缝视图全景(RSVP)。该系统使用四个后置摄像头,巧妙地放置以克服传统的盲点问题,并将每个摄像头的馈送缝合在一起,形成一个单一的全景视图,可以显示整个后方环境。我们使用预先计算的系统校准参数将单独捕获的帧投影到单个虚拟视图上。然后我们确定最佳接缝线,沿着接缝线将图像融合在一起,形成呈现给驾驶员的单一RSVP视图。此外,我们还重点介绍了能够实现系统高效嵌入式实现的技术,并展示了一个使用不到2W功率的实时系统,使其适合在车辆舱内部署。
{"title":"Rear-Stitched View Panorama: A Low-Power Embedded Implementation for Smart Rear-View Mirrors on Vehicles","authors":"Janice Pan, Vikram V. Appia, Jesse Villarreal, Lucas Weaver, Do-Kyoung Kwon","doi":"10.1109/CVPRW.2017.157","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.157","url":null,"abstract":"Automobiles are currently equipped with a three-mirror system for rear-view visualization. The two side-view mirrors show close the periphery on the left and right sides of the vehicle, and the center rear-view mirror is typically adjusted to allow the driver to see through the vehicle's rear windshield. This three-mirror system, however, imposes safety concerns in requiring drivers to shift their attention and gaze to look in each mirror to obtain a full visualization of the rear-view surroundings, which takes attention off the scene in front of the vehicle. We present an alternative to the three-mirror rear-view system, which we call Rear-Stitched View Panorama (RSVP). The proposed system uses four rear-facing cameras, strategically placed to overcome the traditional blind spot problem, and stitches the feeds from each camera together to generate a single panoramic view, which can display the entire rear surroundings. We project individually captured frames onto a single virtual view using precomputed system calibration parameters. Then we determine optimal seam lines, along which the images are fused together to form the single RSVP view presented to the driver. Furthermore, we highlight techniques that enable efficient embedded implementation of the system and showcase a real-time system utilizing under 2W of power, making it suitable for in-cabin deployment in vehicles.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"32 1","pages":"1184-1193"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81517148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1