首页 > 最新文献

2015 IEEE International Conference on Multimedia and Expo (ICME)最新文献

英文 中文
An architecture to assist multimedia application authors and presentation engine developers 帮助多媒体应用程序作者和表示引擎开发人员的架构
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177397
R. C. M. Santos, M. Moreno, L. Soares
This paper presents an architecture for monitoring the presentation of multimedia declarative applications, providing feedback about variables states, object properties, media presentation times, among others. Monitoring tools that follow the proposed architecture are able to detect if visual problems are being caused by programming errors or by player malfunctioning. The architecture presents a communication protocol designed to be independent of the declarative language used in the development of multimedia applications. The main goal is to provide an open and generic architecture that can assist multimedia application authors and presentation engine developers. As an example of the architecture use, the paper also presents a monitoring tool integrated into a graphical user interface developed for the ITU-T reference implementation of the Ginga-NCL middleware.
本文提出了一种用于监视多媒体声明性应用程序的表示的体系结构,提供关于变量状态、对象属性、媒体表示时间等方面的反馈。遵循建议架构的监控工具能够检测出视觉问题是由编程错误还是由玩家故障引起的。该体系结构提供了一种独立于多媒体应用程序开发中使用的声明性语言的通信协议。其主要目标是提供一个开放的通用架构,以帮助多媒体应用程序作者和表示引擎开发人员。作为架构使用的一个示例,本文还介绍了一个集成到图形用户界面中的监控工具,该界面是为ITU-T Ginga-NCL中间件的参考实现开发的。
{"title":"An architecture to assist multimedia application authors and presentation engine developers","authors":"R. C. M. Santos, M. Moreno, L. Soares","doi":"10.1109/ICME.2015.7177397","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177397","url":null,"abstract":"This paper presents an architecture for monitoring the presentation of multimedia declarative applications, providing feedback about variables states, object properties, media presentation times, among others. Monitoring tools that follow the proposed architecture are able to detect if visual problems are being caused by programming errors or by player malfunctioning. The architecture presents a communication protocol designed to be independent of the declarative language used in the development of multimedia applications. The main goal is to provide an open and generic architecture that can assist multimedia application authors and presentation engine developers. As an example of the architecture use, the paper also presents a monitoring tool integrated into a graphical user interface developed for the ITU-T reference implementation of the Ginga-NCL middleware.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134209493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating visual and textual features for predicting user ‘likes’ 评估用于预测用户“喜欢”的视觉和文本特征
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177381
Sharath Chandra Guntuku, S. Roy, Weisi Lin
Computationally modeling users `liking' for image(s) requires understanding how to effectively represent the image so that different factors influencing user `likes' are considered. In this work, an evaluation of the state-of-the-art visual features in multimedia understanding at the task of predicting user `likes' is presented, based on a collection of images crawled from Flickr. Secondly, a probabilistic approach for modeling `likes' based only on tags is proposed. The approach of using both visual and text-based features is shown to improve the state-of-the-art performance by 12%. Analysis of the results indicate that more human-interpretable and semantic representations are important for the task of predicting very subtle response of `likes'.
计算建模用户对图像的“喜欢”需要理解如何有效地表示图像,以便考虑影响用户“喜欢”的不同因素。在这项工作中,基于从Flickr抓取的图像集合,对预测用户“喜欢”任务中多媒体理解中最先进的视觉特征进行了评估。其次,提出了一种仅基于标签的“喜欢”建模的概率方法。同时使用视觉和基于文本的特征的方法可以将最先进的性能提高12%。对结果的分析表明,更多的人类可解释和语义表示对于预测非常微妙的“喜欢”反应的任务很重要。
{"title":"Evaluating visual and textual features for predicting user ‘likes’","authors":"Sharath Chandra Guntuku, S. Roy, Weisi Lin","doi":"10.1109/ICME.2015.7177381","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177381","url":null,"abstract":"Computationally modeling users `liking' for image(s) requires understanding how to effectively represent the image so that different factors influencing user `likes' are considered. In this work, an evaluation of the state-of-the-art visual features in multimedia understanding at the task of predicting user `likes' is presented, based on a collection of images crawled from Flickr. Secondly, a probabilistic approach for modeling `likes' based only on tags is proposed. The approach of using both visual and text-based features is shown to improve the state-of-the-art performance by 12%. Analysis of the results indicate that more human-interpretable and semantic representations are important for the task of predicting very subtle response of `likes'.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A hybrid approach for retrieving diverse social images of landmarks 一种用于检索各种社会地标图像的混合方法
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177486
Duc-Tien Dang-Nguyen, Luca Piras, G. Giacinto, G. Boato, F. D. Natale
In this paper, we present a novel method that can produce a visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. The main idea of this method is to filter out non-relevant images at a first stage and then cluster the images according to textual descriptors first, and then to visual descriptors. The extraction of images from different clusters according to a measure of user's credibility, allows obtaining a reliable set of diverse and relevant images. Experimental results performed on the MediaEval 2014 “Retrieving Diverse Social Images” dataset show that the proposed approach can achieve very good performance outperforming state-of-art techniques.
在本文中,我们提出了一种新的方法,可以通过从社区提供的数据集中选择最能描述所查询位置的所有细节的最多样化的图片来生成地标的视觉描述。该方法的主要思想是在第一阶段过滤掉不相关的图像,然后根据文本描述符对图像进行聚类,然后根据视觉描述符对图像进行聚类。根据用户可信度的度量,从不同的聚类中提取图像,可以获得一组可靠的不同和相关的图像。在MediaEval 2014“检索多样化社会图像”数据集上进行的实验结果表明,所提出的方法可以取得非常好的性能,优于最先进的技术。
{"title":"A hybrid approach for retrieving diverse social images of landmarks","authors":"Duc-Tien Dang-Nguyen, Luca Piras, G. Giacinto, G. Boato, F. D. Natale","doi":"10.1109/ICME.2015.7177486","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177486","url":null,"abstract":"In this paper, we present a novel method that can produce a visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. The main idea of this method is to filter out non-relevant images at a first stage and then cluster the images according to textual descriptors first, and then to visual descriptors. The extraction of images from different clusters according to a measure of user's credibility, allows obtaining a reliable set of diverse and relevant images. Experimental results performed on the MediaEval 2014 “Retrieving Diverse Social Images” dataset show that the proposed approach can achieve very good performance outperforming state-of-art techniques.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"741 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Scene segmentation using temporal clustering for accessing and re-using broadcast video 基于时间聚类的广播视频访问和复用场景分割
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177476
L. Baraldi, C. Grana, R. Cucchiara
Scene detection is a fundamental tool for allowing effective video browsing and re-using. In this paper we present a model that automatically divides videos into coherent scenes, which is based on a novel combination of local image descriptors and temporal clustering techniques. Experiments are performed to demonstrate the effectiveness of our approach, by comparing our algorithm against two recent proposals for automatic scene segmentation. We also propose improved performance measures that aim to reduce the gap between numerical evaluation and expected results.
场景检测是一个基本的工具,允许有效的视频浏览和重用。在本文中,我们提出了一种基于局部图像描述符和时间聚类技术的新组合的模型,该模型将视频自动划分为连贯的场景。通过将我们的算法与最近提出的两种自动场景分割算法进行比较,实验证明了我们的方法的有效性。我们还提出了改进的绩效指标,旨在减少数值评估与预期结果之间的差距。
{"title":"Scene segmentation using temporal clustering for accessing and re-using broadcast video","authors":"L. Baraldi, C. Grana, R. Cucchiara","doi":"10.1109/ICME.2015.7177476","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177476","url":null,"abstract":"Scene detection is a fundamental tool for allowing effective video browsing and re-using. In this paper we present a model that automatically divides videos into coherent scenes, which is based on a novel combination of local image descriptors and temporal clustering techniques. Experiments are performed to demonstrate the effectiveness of our approach, by comparing our algorithm against two recent proposals for automatic scene segmentation. We also propose improved performance measures that aim to reduce the gap between numerical evaluation and expected results.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132409985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Egocentric hand pose estimation and distance recovery in a single RGB image 单幅RGB图像中以自我为中心的手部姿态估计和距离恢复
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177448
Hui Liang, Junsong Yuan, D. Thalmann
Articulated hand pose recovery in egocentric vision is useful for in-air interaction with the wearable devices, such as the Google glasses. Despite the progress obtained with the depth camera, this task is still challenging with ordinary RGB cameras. In this paper we demonstrate the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view. We address this problem by modeling the distance as a hidden variable and use the Conditional Regression Forest to infer the pose and distance jointly. Especially, we find that the pose estimation accuracy can be further enhanced by incorporating the hand part semantics. The experimental results show that the proposed method achieves good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments. In addition, our system runs in real-time at more than 10fps.
自我中心视觉中的关节手姿态恢复对于与谷歌眼镜等可穿戴设备的空中交互非常有用。尽管深度相机取得了进展,但对于普通RGB相机来说,这项任务仍然具有挑战性。在本文中,我们展示了在自我中心视图中使用单个RGB相机恢复关节手姿势及其与相机距离的可能性。我们通过将距离建模为隐变量,并使用条件回归森林来共同推断姿态和距离来解决这个问题。特别是,我们发现加入手部语义可以进一步提高姿态估计的精度。实验结果表明,该方法在合成数据集和在不同环境下捕获的多个真实彩色图像序列上都取得了良好的性能。此外,我们的系统以超过10fps的速度实时运行。
{"title":"Egocentric hand pose estimation and distance recovery in a single RGB image","authors":"Hui Liang, Junsong Yuan, D. Thalmann","doi":"10.1109/ICME.2015.7177448","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177448","url":null,"abstract":"Articulated hand pose recovery in egocentric vision is useful for in-air interaction with the wearable devices, such as the Google glasses. Despite the progress obtained with the depth camera, this task is still challenging with ordinary RGB cameras. In this paper we demonstrate the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view. We address this problem by modeling the distance as a hidden variable and use the Conditional Regression Forest to infer the pose and distance jointly. Especially, we find that the pose estimation accuracy can be further enhanced by incorporating the hand part semantics. The experimental results show that the proposed method achieves good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments. In addition, our system runs in real-time at more than 10fps.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115073564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Estimating heart rate via depth video motion tracking 通过深度视频运动跟踪估计心率
Pub Date : 2015-07-03 DOI: 10.1109/ICME.2015.7177517
Cheng Yang, Gene Cheung, V. Stanković
Depth sensors like Microsoft Kinect can acquire partial geometric information in a 3D scene via captured depth images, with potential application to non-contact health monitoring. However, captured depth videos typically suffer from low bit-depth representation and acquisition noise corruption, and hence using them to deduce health metrics that require tracking subtle 3D structural details is difficult. In this paper, we propose to capture depth video using Kinect 2.0 to estimate the heart rate of a human subject; as blood is pumped to circulate through the head, tiny oscillatory head motion can be detected for periodicity analysis. Specifically, we first perform a joint bit-depth enhancement / denoising procedure to improve the quality of the captured depth images, using a graph-signal smoothness prior for regularization. We then track an automatically detected nose region throughout the depth video to deduce 3D motion vectors. The deduced 3D vectors are then analyzed via principal component analysis to estimate heart rate. Experimental results show improved tracking accuracy using our proposed joint bit-depth enhancement / denoising procedure, and estimated heart rates are close to ground truth.
像微软Kinect这样的深度传感器可以通过捕获的深度图像获取3D场景中的部分几何信息,并有可能应用于非接触式健康监测。然而,捕获的深度视频通常受到低位深表示和采集噪声损坏的影响,因此使用它们来推断需要跟踪微妙3D结构细节的健康指标是困难的。在本文中,我们建议使用Kinect 2.0捕获深度视频来估计人类受试者的心率;当血液通过头部泵送循环时,可以检测到头部的微小振荡运动,以进行周期性分析。具体而言,我们首先执行联合位深度增强/去噪过程,以提高捕获深度图像的质量,使用图形信号平滑先验进行正则化。然后,我们在整个深度视频中跟踪自动检测到的鼻子区域,以推断3D运动向量。然后通过主成分分析对导出的三维矢量进行分析,以估计心率。实验结果表明,我们提出的联合比特深度增强/去噪方法提高了跟踪精度,估计的心率接近地面真实值。
{"title":"Estimating heart rate via depth video motion tracking","authors":"Cheng Yang, Gene Cheung, V. Stanković","doi":"10.1109/ICME.2015.7177517","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177517","url":null,"abstract":"Depth sensors like Microsoft Kinect can acquire partial geometric information in a 3D scene via captured depth images, with potential application to non-contact health monitoring. However, captured depth videos typically suffer from low bit-depth representation and acquisition noise corruption, and hence using them to deduce health metrics that require tracking subtle 3D structural details is difficult. In this paper, we propose to capture depth video using Kinect 2.0 to estimate the heart rate of a human subject; as blood is pumped to circulate through the head, tiny oscillatory head motion can be detected for periodicity analysis. Specifically, we first perform a joint bit-depth enhancement / denoising procedure to improve the quality of the captured depth images, using a graph-signal smoothness prior for regularization. We then track an automatically detected nose region throughout the depth video to deduce 3D motion vectors. The deduced 3D vectors are then analyzed via principal component analysis to estimate heart rate. Experimental results show improved tracking accuracy using our proposed joint bit-depth enhancement / denoising procedure, and estimated heart rates are close to ground truth.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131427500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A DVFS based HEVC decoder for energy-efficient software implementation on embedded processors 基于DVFS的HEVC解码器在嵌入式处理器上的节能软件实现
Pub Date : 2015-06-29 DOI: 10.1109/ICME.2015.7177406
Erwan Nogues, Romain Berrada, M. Pelcat, D. Ménard, E. Raffin
Software video decoders for mobile devices are now a reality thanks to recent advances in Systems-on-Chip (SoC). The challenge has now moved to designing energy efficient systems. In this paper, we propose a light Dynamic Voltage Frequency Scaling (DVFS)-enabled software adapted to the much varying processing load of High Efficiency Video Coding (HEVC) real-time decoding. We analyze a practical evaluation of a HEVC decoder using our proposal on a Samsung Exynos low-power SoC widely used in portable devices. Experimental results show more than 50% of power savings on a real-time decoding when compared to the same software managed by the OnDemand Linux power management. For mobile applications, the proposed method can achieve 720p video HEVC decoding at 60 frames per second consuming approximately 1.1W with pure software decoding on a general purpose processor.
由于片上系统(SoC)的最新进展,移动设备的软件视频解码器现在已经成为现实。现在的挑战已经转移到设计节能系统上。在本文中,我们提出了一个轻量级的动态电压频率缩放(DVFS)软件,以适应高效视频编码(HEVC)实时解码的大量变化的处理负载。我们分析了HEVC解码器的实际评估使用我们的提案在三星Exynos低功耗SoC广泛应用于便携式设备。实验结果表明,与OnDemand Linux电源管理管理的相同软件相比,实时解码节省了50%以上的功耗。对于移动应用,该方法可以在通用处理器上使用纯软件解码,以每秒60帧的速度实现720p视频HEVC解码,功耗约为1.1W。
{"title":"A DVFS based HEVC decoder for energy-efficient software implementation on embedded processors","authors":"Erwan Nogues, Romain Berrada, M. Pelcat, D. Ménard, E. Raffin","doi":"10.1109/ICME.2015.7177406","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177406","url":null,"abstract":"Software video decoders for mobile devices are now a reality thanks to recent advances in Systems-on-Chip (SoC). The challenge has now moved to designing energy efficient systems. In this paper, we propose a light Dynamic Voltage Frequency Scaling (DVFS)-enabled software adapted to the much varying processing load of High Efficiency Video Coding (HEVC) real-time decoding. We analyze a practical evaluation of a HEVC decoder using our proposal on a Samsung Exynos low-power SoC widely used in portable devices. Experimental results show more than 50% of power savings on a real-time decoding when compared to the same software managed by the OnDemand Linux power management. For mobile applications, the proposed method can achieve 720p video HEVC decoding at 60 frames per second consuming approximately 1.1W with pure software decoding on a general purpose processor.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121218027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Perceiving user's intention-for-interaction: A probabilistic multimodal data fusion scheme 感知用户交互意图:一种概率多模态数据融合方案
Pub Date : 2015-06-29 DOI: 10.1109/ICME.2015.7177514
C. Mollaret, Alhayat Ali Mekonnen, I. Ferrané, J. Pinquier, F. Lerasle
Understanding people's intention, be it action or thought, plays a fundamental role in establishing coherent communication amongst people, especially in non-proactive robotics, where the robot has to understand explicitly when to start an interaction in a natural way. In this work, a novel approach is presented to detect people's intention-for-interaction. The proposed detector fuses multimodal cues, including estimated head pose, shoulder orientation and vocal activity detection, using a probabilistic discrete state Hidden Markov Model. The multimodal detector achieves up to 80% correct detection rates improving purely audio and RGB-D based variants.
理解人们的意图,无论是行动还是思想,在人与人之间建立连贯的沟通中起着至关重要的作用,尤其是在非主动机器人中,机器人必须明确地理解何时以自然的方式开始互动。在这项工作中,提出了一种新的方法来检测人们的互动意图。该检测器使用概率离散状态隐马尔可夫模型融合多模态线索,包括估计的头部姿势、肩部方向和声音活动检测。多模态检测器实现了高达80%的正确检测率,改善了纯音频和RGB-D变体。
{"title":"Perceiving user's intention-for-interaction: A probabilistic multimodal data fusion scheme","authors":"C. Mollaret, Alhayat Ali Mekonnen, I. Ferrané, J. Pinquier, F. Lerasle","doi":"10.1109/ICME.2015.7177514","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177514","url":null,"abstract":"Understanding people's intention, be it action or thought, plays a fundamental role in establishing coherent communication amongst people, especially in non-proactive robotics, where the robot has to understand explicitly when to start an interaction in a natural way. In this work, a novel approach is presented to detect people's intention-for-interaction. The proposed detector fuses multimodal cues, including estimated head pose, shoulder orientation and vocal activity detection, using a probabilistic discrete state Hidden Markov Model. The multimodal detector achieves up to 80% correct detection rates improving purely audio and RGB-D based variants.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"359 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114837247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Learning Deep Trajectory Descriptor for action recognition in videos using deep neural networks 使用深度神经网络学习视频动作识别的深度轨迹描述符
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177461
Yemin Shi, Wei Zeng, Tiejun Huang, Yaowei Wang
Human action recognition is widely recognized as a challenging task due to the difficulty of effectively characterizing human action in a complex scene. Recent studies have shown that the dense-trajectory-based methods can achieve state-of-the-art recognition results on some challenging datasets. However, in these methods, each dense trajectory is often represented as a vector of coordinates, consequently losing the structural relationship between different trajectories. To address the problem, this paper proposes a novel Deep Trajectory Descriptor (DTD) for action recognition. First, we extract dense trajectories from multiple consecutive frames and then project them onto a canvas. This will result in a “trajectory texture” image which can effectively characterize the relative motion in these frames. Based on these trajectory texture images, a deep neural network (DNN) is utilized to learn a more compact and powerful representation of dense trajectories. In the action recognition system, the DTD descriptor, together with other non-trajectory features such as HOG, HOF and MBH, can provide an effective way to characterize human action from various aspects. Experimental results show that our system can statistically outperform several state-of-the-art approaches, with an average accuracy of 95:6% on KTH and an accuracy of 92.14% on UCF50.
人体动作识别是一项具有挑战性的任务,因为很难在复杂的场景中有效地表征人体动作。最近的研究表明,基于密集轨迹的方法可以在一些具有挑战性的数据集上获得最先进的识别结果。然而,在这些方法中,每个密集轨迹往往被表示为一个坐标向量,从而失去了不同轨迹之间的结构关系。为了解决这一问题,本文提出了一种新的用于动作识别的深度轨迹描述符(Deep Trajectory Descriptor, DTD)。首先,我们从多个连续帧中提取密集的轨迹,然后将它们投影到画布上。这将产生一个“轨迹纹理”图像,可以有效地表征这些帧中的相对运动。基于这些轨迹纹理图像,利用深度神经网络(DNN)学习更紧凑和强大的密集轨迹表示。在动作识别系统中,DTD描述符与HOG、HOF、MBH等非轨迹特征一起,可以从多个方面对人体动作进行有效表征。实验结果表明,该系统在统计上优于几种最先进的方法,在KTH上的平均准确率为95.6%,在UCF50上的平均准确率为92.14%。
{"title":"Learning Deep Trajectory Descriptor for action recognition in videos using deep neural networks","authors":"Yemin Shi, Wei Zeng, Tiejun Huang, Yaowei Wang","doi":"10.1109/ICME.2015.7177461","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177461","url":null,"abstract":"Human action recognition is widely recognized as a challenging task due to the difficulty of effectively characterizing human action in a complex scene. Recent studies have shown that the dense-trajectory-based methods can achieve state-of-the-art recognition results on some challenging datasets. However, in these methods, each dense trajectory is often represented as a vector of coordinates, consequently losing the structural relationship between different trajectories. To address the problem, this paper proposes a novel Deep Trajectory Descriptor (DTD) for action recognition. First, we extract dense trajectories from multiple consecutive frames and then project them onto a canvas. This will result in a “trajectory texture” image which can effectively characterize the relative motion in these frames. Based on these trajectory texture images, a deep neural network (DNN) is utilized to learn a more compact and powerful representation of dense trajectories. In the action recognition system, the DTD descriptor, together with other non-trajectory features such as HOG, HOF and MBH, can provide an effective way to characterize human action from various aspects. Experimental results show that our system can statistically outperform several state-of-the-art approaches, with an average accuracy of 95:6% on KTH and an accuracy of 92.14% on UCF50.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125101381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Oscillation compensating Dynamic Adaptive Streaming over HTTP 振荡补偿动态自适应流在HTTP
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177435
Christopher Müller, Stefan Lederer, Reinhard Grandl, C. Timmerer
Streaming multimedia over the Internet is omnipresent but still in its infancy, specifically when it comes to the adaptation based on bandwidth/throughput measurements, clients competing for limited/shared bandwidth, and the presence of a caching infrastructure. In this paper we present a buffer-based adaptation logic in combination with a toolset of client metrics to compensate for erroneous adaptation decisions. These erroneous adaptation decisions are due to insufficient network information available at the client and issues introduced when multiple clients compete for limited/shared bandwidth and/or when caches are deployed. Our metrics enable the detection of oscillations on the client - in contrast to server-based approaches - and provide an effective compensation mechanism. We evaluate the proposed adaptation logic, which incorporates the oscillation detection and compensation method, and compare it against a throughput-based adaptation logic for scenarios comprising competing clients with and without caching enabled. In anticipation of the results, we show how the presented metrics detect oscillation periods and how such undesirable situations can be compensated while increasing the effective media throughput of the clients.
互联网上的流媒体无处不在,但仍处于起步阶段,特别是在基于带宽/吞吐量测量的适应、争夺有限/共享带宽的客户端以及缓存基础设施的存在方面。在本文中,我们提出了一个基于缓冲区的适应逻辑,结合客户端度量工具集来补偿错误的适应决策。这些错误的适应决策是由于客户机上可用的网络信息不足,以及多个客户机竞争有限/共享带宽和/或部署缓存时引入的问题。与基于服务器的方法相比,我们的指标能够检测客户端的振荡,并提供有效的补偿机制。我们评估了提议的自适应逻辑,其中包含振荡检测和补偿方法,并将其与基于吞吐量的自适应逻辑进行比较,用于包含启用和不启用缓存的竞争客户端的场景。在对结果的预期中,我们展示了所提供的度量如何检测振荡周期,以及如何在增加客户机的有效媒体吞吐量的同时补偿这种不希望出现的情况。
{"title":"Oscillation compensating Dynamic Adaptive Streaming over HTTP","authors":"Christopher Müller, Stefan Lederer, Reinhard Grandl, C. Timmerer","doi":"10.1109/ICME.2015.7177435","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177435","url":null,"abstract":"Streaming multimedia over the Internet is omnipresent but still in its infancy, specifically when it comes to the adaptation based on bandwidth/throughput measurements, clients competing for limited/shared bandwidth, and the presence of a caching infrastructure. In this paper we present a buffer-based adaptation logic in combination with a toolset of client metrics to compensate for erroneous adaptation decisions. These erroneous adaptation decisions are due to insufficient network information available at the client and issues introduced when multiple clients compete for limited/shared bandwidth and/or when caches are deployed. Our metrics enable the detection of oscillations on the client - in contrast to server-based approaches - and provide an effective compensation mechanism. We evaluate the proposed adaptation logic, which incorporates the oscillation detection and compensation method, and compare it against a throughput-based adaptation logic for scenarios comprising competing clients with and without caching enabled. In anticipation of the results, we show how the presented metrics detect oscillation periods and how such undesirable situations can be compensated while increasing the effective media throughput of the clients.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130345679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2015 IEEE International Conference on Multimedia and Expo (ICME)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1