首页 > 最新文献

2012 IEEE International Conference on Multimedia and Expo Workshops最新文献

英文 中文
Automatic QOE Prediction in Stereoscopic Videos 立体视频中的QOE自动预测
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.107
H. Malekmohamadi, W. Fernando, A. Kondoz
In this paper, we propose a method for automatic quality of experience (QoE) prediction in stereoscopic videos. QoE, though embodying the subjective measures of the end user's perceived quality, can be expressed in relation to some quality of service (QoS) parameters. Having information on content types in modelling QoE-QoS interactions is advantageous as videos with the same QoS parameters may have different subjective scores due to different content types. Consequently, using content clustering with the help of spatio-temporal activities within depth layers, QoE predictor is designed per each content cluster utilising full reference (FR) and no reference (NR) metrics. Finally, the performance of the proposed QoE prediction algorithm is evaluated extensively and the overall measure of success value equal to 95.4% is achieved for the test sequences. This model can be applied for QoE control in video provisioning systems.
本文提出了一种立体视频中体验质量(QoE)的自动预测方法。QoE虽然体现了最终用户感知质量的主观度量,但可以用一些服务质量(QoS)参数来表示。在建模QoS -QoS交互时,拥有内容类型的信息是有利的,因为具有相同QoS参数的视频可能由于内容类型的不同而产生不同的主观评分。因此,在深度层的时空活动的帮助下使用内容聚类,QoE预测器利用全参考(FR)和无参考(NR)指标为每个内容聚类设计。最后,对所提出的QoE预测算法的性能进行了广泛的评价,测试序列的总体测量成功率为95.4%。该模型可用于视频发放系统的QoE控制。
{"title":"Automatic QOE Prediction in Stereoscopic Videos","authors":"H. Malekmohamadi, W. Fernando, A. Kondoz","doi":"10.1109/ICMEW.2012.107","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.107","url":null,"abstract":"In this paper, we propose a method for automatic quality of experience (QoE) prediction in stereoscopic videos. QoE, though embodying the subjective measures of the end user's perceived quality, can be expressed in relation to some quality of service (QoS) parameters. Having information on content types in modelling QoE-QoS interactions is advantageous as videos with the same QoS parameters may have different subjective scores due to different content types. Consequently, using content clustering with the help of spatio-temporal activities within depth layers, QoE predictor is designed per each content cluster utilising full reference (FR) and no reference (NR) metrics. Finally, the performance of the proposed QoE prediction algorithm is evaluated extensively and the overall measure of success value equal to 95.4% is achieved for the test sequences. This model can be applied for QoE control in video provisioning systems.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125424298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Advanced "Webble" Application Development Directly in the Browser by Utilizing the Full Power of Meme Media Customization and Event Management Capabilities 通过充分利用Meme媒体定制和事件管理功能,直接在浏览器中开发先进的“Webble”应用程序
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.43
M. Kuwahara, Yuzuru Tanaka
A meme media object, also known as a Webble, always come with a set of familiar generic behaviors together with another set of specialized ones for that particular Webble. But what if there is a need for a custom behavior or interface that was not originally intended when first created. With Webble technology, that does not need to be a problem. In this paper we will attempt to show how simple it is, due to the design and construction of Webbles, to insert new customizable behaviors in any Webble available, or control application level events and actions, all through an intuitive, user-friendly interface. We claim that within a few hours of combining generic Webble building blocks and the setting up of configurable event handlers directly in the web browser, without traditional programming, we can create any arbitrary Silver light-based web application, ready to be shared to the cloud and the world.
一个模因媒体对象,也被称为Webble,总是伴随着一组熟悉的通用行为以及另一组针对该特定Webble的专门行为。但是,如果需要自定义行为或界面,而最初创建时并不打算使用这些行为或界面,该怎么办呢?有了Webble技术,这就不是问题了。在本文中,我们将尝试展示,由于Webble的设计和构造,在任何可用的Webble中插入新的可定制行为,或控制应用程序级别的事件和动作是多么简单,所有这些都通过一个直观的,用户友好的界面完成。我们声称,在几个小时内,将通用的Webble构建块和直接在web浏览器中设置可配置的事件处理程序结合起来,无需传统编程,我们就可以创建任意基于silverlight的web应用程序,随时可以与云和世界共享。
{"title":"Advanced \"Webble\" Application Development Directly in the Browser by Utilizing the Full Power of Meme Media Customization and Event Management Capabilities","authors":"M. Kuwahara, Yuzuru Tanaka","doi":"10.1109/ICMEW.2012.43","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.43","url":null,"abstract":"A meme media object, also known as a Webble, always come with a set of familiar generic behaviors together with another set of specialized ones for that particular Webble. But what if there is a need for a custom behavior or interface that was not originally intended when first created. With Webble technology, that does not need to be a problem. In this paper we will attempt to show how simple it is, due to the design and construction of Webbles, to insert new customizable behaviors in any Webble available, or control application level events and actions, all through an intuitive, user-friendly interface. We claim that within a few hours of combining generic Webble building blocks and the setting up of configurable event handlers directly in the web browser, without traditional programming, we can create any arbitrary Silver light-based web application, ready to be shared to the cloud and the world.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124312911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Supervised, Geometry-Aware Segmentation of 3D Mesh Models 三维网格模型的监督、几何感知分割
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.16
Keisuke Bamba, Ryutarou Ohbuchi
Segmentation of 3D model models has applications, e.g., in mesh editing and 3D model retrieval. Unsupervised, automatic segmentation of 3D models can be useful. However, some applications require user-guided, interactive segmentation that captures user intention. This paper presents a supervised, local-geometry aware segmentation algorithm for 3D mesh models. The algorithm segments manifold meshes based on interactive guidance from users. The method casts user-guided mesh segmentation as a semi-supervised learning problem that propagates segmentation labels given to a subset of faces to the unlabeled faces of a 3D model. The proposed algorithm employs Zhou's Manifold Ranking [18] algorithm, which takes both local and global consistency in high-dimensional feature space for the label propagation. Evaluation using a 3D model segmentation benchmark dataset has shown that the method is effective, although achieving interactivity for a large and complex mesh requires some work.
三维模型的分割在网格编辑和三维模型检索等方面具有广泛的应用。对3D模型进行无监督的自动分割是很有用的。然而,一些应用程序需要以用户为导向的交互式分段,以捕获用户意图。提出了一种监督的、局部几何感知的三维网格模型分割算法。该算法基于用户的交互式引导对流形网格进行分割。该方法将用户引导的网格分割作为一个半监督学习问题,将给定的人脸子集的分割标签传播到3D模型的未标记人脸。该算法采用了Zhou的流形排序算法[18],该算法在高维特征空间中同时考虑了标签传播的局部一致性和全局一致性。使用3D模型分割基准数据集的评估表明,该方法是有效的,尽管实现大型复杂网格的交互性需要一些工作。
{"title":"Supervised, Geometry-Aware Segmentation of 3D Mesh Models","authors":"Keisuke Bamba, Ryutarou Ohbuchi","doi":"10.1109/ICMEW.2012.16","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.16","url":null,"abstract":"Segmentation of 3D model models has applications, e.g., in mesh editing and 3D model retrieval. Unsupervised, automatic segmentation of 3D models can be useful. However, some applications require user-guided, interactive segmentation that captures user intention. This paper presents a supervised, local-geometry aware segmentation algorithm for 3D mesh models. The algorithm segments manifold meshes based on interactive guidance from users. The method casts user-guided mesh segmentation as a semi-supervised learning problem that propagates segmentation labels given to a subset of faces to the unlabeled faces of a 3D model. The proposed algorithm employs Zhou's Manifold Ranking [18] algorithm, which takes both local and global consistency in high-dimensional feature space for the label propagation. Evaluation using a 3D model segmentation benchmark dataset has shown that the method is effective, although achieving interactivity for a large and complex mesh requires some work.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130600214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SVD Filter Based Multiscale Approach for Image Quality Assessment 基于SVD滤波的图像质量多尺度评价方法
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.15
Ashirbani Saha, G. Bhatnagar, Q.M. Jonathan Wu
Automatic assessment of image quality in accordance with the human visual system (HVS) finds application in various image processing tasks. In the last decade, a substantial proliferation in image quality assessment (IQA) based on structural similarity has been observed. The structural information estimation includes statistical values (mean, variance, and correlation), gradient information, Harris response and singular values. In this paper, we propose a multiscale image quality metric which exploits the properties of Singular Value Decomposition (SVD) to get approximate pyramid structure for its use in IQA. The proposed multiscale metric has been extensively evaluated in the LIVE database and CSIQ database. Experiments have been carried out on the effective number of scales used as well as on the effective proportion of different scales required for the metric. The proposed metric achieves competitive performance with the structural similarity based state-of-the-art methods.
根据人类视觉系统(HVS)对图像质量进行自动评估在各种图像处理任务中得到了应用。在过去的十年中,基于结构相似性的图像质量评估(IQA)得到了大量的发展。结构信息估计包括统计值(均值、方差和相关)、梯度信息、哈里斯响应和奇异值。本文提出了一种利用奇异值分解(SVD)的特性得到近似金字塔结构的多尺度图像质量度量,并将其应用于IQA中。提出的多尺度度量已在LIVE数据库和CSIQ数据库中进行了广泛的评估。对所使用的有效标尺数以及度量所需的不同标尺的有效比例进行了实验。所提出的度量通过基于结构相似性的最先进方法实现竞争性能。
{"title":"SVD Filter Based Multiscale Approach for Image Quality Assessment","authors":"Ashirbani Saha, G. Bhatnagar, Q.M. Jonathan Wu","doi":"10.1109/ICMEW.2012.15","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.15","url":null,"abstract":"Automatic assessment of image quality in accordance with the human visual system (HVS) finds application in various image processing tasks. In the last decade, a substantial proliferation in image quality assessment (IQA) based on structural similarity has been observed. The structural information estimation includes statistical values (mean, variance, and correlation), gradient information, Harris response and singular values. In this paper, we propose a multiscale image quality metric which exploits the properties of Singular Value Decomposition (SVD) to get approximate pyramid structure for its use in IQA. The proposed multiscale metric has been extensively evaluated in the LIVE database and CSIQ database. Experiments have been carried out on the effective number of scales used as well as on the effective proportion of different scales required for the metric. The proposed metric achieves competitive performance with the structural similarity based state-of-the-art methods.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124627412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Social Attribute Annotation for Personal Photo Collection 个人照片收藏的社会属性标注
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.47
Zhipeng Wu, K. Aizawa
Social attributes for photos, which simply refer to a set of labels {Who, When, Where, What}, are intrinsic attributes of an image. For instance, given a scenery photo without human bodies or faces, we cannot say the photo has no relation with social individuals. In fact, it could have been taken when we went travelling with other friends. To effectively annotate social attributes, we obtain training images from friends' SNS albums. Moreover, to cope with limited training data and organize photos in a feature-effective way, we introduce a batch-based framework, which pre-clusters photos by events. After graph learning based annotation, a post processing step is proposed to refine the annotation result. Experimental results show the effectiveness of the proposed batch-based social attribute annotation framework.
照片的社会属性,简单地指一组标签{谁,何时,何地,什么},是图像的内在属性。例如,给定一张没有人体或面孔的风景照片,我们不能说照片与社会个体没有关系。事实上,这张照片可能是我们和其他朋友一起旅行时拍的。为了有效地标注社会属性,我们从朋友的SNS相册中获取训练图像。此外,为了处理有限的训练数据并有效地组织照片,我们引入了基于批处理的框架,该框架根据事件对照片进行预聚类。在基于图学习的标注之后,提出了对标注结果进行细化的后处理步骤。实验结果表明了所提出的基于批处理的社会属性标注框架的有效性。
{"title":"Social Attribute Annotation for Personal Photo Collection","authors":"Zhipeng Wu, K. Aizawa","doi":"10.1109/ICMEW.2012.47","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.47","url":null,"abstract":"Social attributes for photos, which simply refer to a set of labels {Who, When, Where, What}, are intrinsic attributes of an image. For instance, given a scenery photo without human bodies or faces, we cannot say the photo has no relation with social individuals. In fact, it could have been taken when we went travelling with other friends. To effectively annotate social attributes, we obtain training images from friends' SNS albums. Moreover, to cope with limited training data and organize photos in a feature-effective way, we introduce a batch-based framework, which pre-clusters photos by events. After graph learning based annotation, a post processing step is proposed to refine the annotation result. Experimental results show the effectiveness of the proposed batch-based social attribute annotation framework.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122335646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Rule-Based Virtual Director Enhancing Group Communication 基于规则的虚拟主任增强群组通信
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.39
Rene Kaiser, Wolfgang Weiss, Manolis Falelakis, Spiros Michalakopoulos, M. Ursu
Audiovisual group communication systems deal with a large number of video streams, and, unlike less advanced videoconferencing systems, require intelligence for selecting adequate views for each of the connected rooms, in order to convey best what is happening in the other locations. Such a decision making component, in our implementation called Orchestration Engine (OE), acts as a Virtual Director. It processes low level events, emitted by content analysis sensors, into editing commands. The OE has two main components: one that semantically lifts low-level events into communication events and one that associates editing decisions to communication contexts. The former has to deal with uncertain and delayed information. The latter subsumes knowledge that reflects both conversation and narrative principles. Both components include contradicting bodies of knowledge. We investigate a rule-based event processing approach and reflect the scalability of our solution regarding competing and contradicting rules.
视听组通信系统处理大量视频流,并且与较不先进的视频会议系统不同,它需要智能来为每个连接的房间选择适当的视图,以便最好地传达其他地点正在发生的事情。这样一个决策制定组件,在我们的实现中称为编排引擎(Orchestration Engine, OE),充当虚拟主管。它将内容分析传感器发出的低级事件处理为编辑命令。OE有两个主要组件:一个在语义上将低级事件提升为通信事件,另一个将编辑决策与通信上下文关联起来。前者必须处理不确定和延迟的信息。后者包含了反映对话和叙事原则的知识。这两个组成部分都包含相互矛盾的知识体系。我们研究了一种基于规则的事件处理方法,并反映了我们的解决方案在竞争和矛盾规则方面的可扩展性。
{"title":"A Rule-Based Virtual Director Enhancing Group Communication","authors":"Rene Kaiser, Wolfgang Weiss, Manolis Falelakis, Spiros Michalakopoulos, M. Ursu","doi":"10.1109/ICMEW.2012.39","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.39","url":null,"abstract":"Audiovisual group communication systems deal with a large number of video streams, and, unlike less advanced videoconferencing systems, require intelligence for selecting adequate views for each of the connected rooms, in order to convey best what is happening in the other locations. Such a decision making component, in our implementation called Orchestration Engine (OE), acts as a Virtual Director. It processes low level events, emitted by content analysis sensors, into editing commands. The OE has two main components: one that semantically lifts low-level events into communication events and one that associates editing decisions to communication contexts. The former has to deal with uncertain and delayed information. The latter subsumes knowledge that reflects both conversation and narrative principles. Both components include contradicting bodies of knowledge. We investigate a rule-based event processing approach and reflect the scalability of our solution regarding competing and contradicting rules.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114519942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Kinect-Like Depth Compression with 2D+T Prediction 2D+T预测的类kinect深度压缩
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.110
Jingjing Fu, Dan Miao, Weiren Yu, Shiqi Wang, Yan Lu, Shipeng Li
The Kinect-like depth compression becomes increasingly important due to the growing requirement on Kinect depth data transmission and storage. Considering the temporal inconsistency of Kinect depth introduced by the random depth measurement error, we propose 2D+T prediction algorithm aiming at fully exploiting the temporal depth correlation to enhance the Kinect depth compression efficiency. In our 2D+T prediction, each depth block is treated as a subsurface, and it the motion trend is detected by comparing with the reliable 3D reconstruction surface, which is integrated by accumulated depth information stored in depth volume. The comparison is implemented under the error tolerant rule, which is derived from the depth error model. The experimental results demonstrate our algorithm can remarkably reduce the bitrate cost and the compression complexity. And the visual quality of the 3D reconstruction results generated from our reconstructed depth is similar to that of traditional video compression algorithm.
由于对Kinect深度数据传输和存储的要求越来越高,类Kinect深度压缩变得越来越重要。考虑到随机深度测量误差带来的Kinect深度的时间不一致性,我们提出了2D+T预测算法,旨在充分利用时间深度相关性来提高Kinect深度压缩效率。在我们的2D+T预测中,将每个深度块视为一个次表面,通过与可靠的三维重建表面进行比较来检测运动趋势,并通过存储在深度体中的累积深度信息进行积分。在深度误差模型的容错规则下进行比较。实验结果表明,该算法可以显著降低比特率成本和压缩复杂度。并且基于重构深度生成的三维重建结果的视觉质量与传统的视频压缩算法相似。
{"title":"Kinect-Like Depth Compression with 2D+T Prediction","authors":"Jingjing Fu, Dan Miao, Weiren Yu, Shiqi Wang, Yan Lu, Shipeng Li","doi":"10.1109/ICMEW.2012.110","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.110","url":null,"abstract":"The Kinect-like depth compression becomes increasingly important due to the growing requirement on Kinect depth data transmission and storage. Considering the temporal inconsistency of Kinect depth introduced by the random depth measurement error, we propose 2D+T prediction algorithm aiming at fully exploiting the temporal depth correlation to enhance the Kinect depth compression efficiency. In our 2D+T prediction, each depth block is treated as a subsurface, and it the motion trend is detected by comparing with the reliable 3D reconstruction surface, which is integrated by accumulated depth information stored in depth volume. The comparison is implemented under the error tolerant rule, which is derived from the depth error model. The experimental results demonstrate our algorithm can remarkably reduce the bitrate cost and the compression complexity. And the visual quality of the 3D reconstruction results generated from our reconstructed depth is similar to that of traditional video compression algorithm.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115116681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multiscale Browsing through Video Collections in Smartphones Using Scalable Storyboards 使用可缩放的故事板在智能手机中浏览视频集
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.54
Luis Herranz
This paper explores how multiscale browsing can be integrated with smart phone interfaces to provide enhanced navigation through video collections. We propose a system that allows the user to interactively change the scale of the storyboards, so the user can easily adjust the amount of information provided by them. Three different methods to select key frames are studied, including an efficient method that analyzes the video and creates a scalable description, with very little computational cost. Then, storyboards of any length can be retrieved on demand without any further analysis, which is very convenient to provide fast multiscale navigation. Experimental evaluations show how this method improves the utility of the summaries and enhances user experience.
本文探讨了如何将多尺度浏览与智能手机界面集成,以通过视频集合提供增强的导航。我们提出了一个系统,允许用户交互地改变故事板的规模,这样用户就可以很容易地调整他们提供的信息量。研究了三种不同的关键帧选择方法,其中包括一种以很小的计算成本分析视频并创建可扩展描述的有效方法。然后,任何长度的故事板都可以根据需要检索,而无需进一步分析,这非常方便地提供快速的多尺度导航。实验评估表明,该方法提高了摘要的实用性,增强了用户体验。
{"title":"Multiscale Browsing through Video Collections in Smartphones Using Scalable Storyboards","authors":"Luis Herranz","doi":"10.1109/ICMEW.2012.54","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.54","url":null,"abstract":"This paper explores how multiscale browsing can be integrated with smart phone interfaces to provide enhanced navigation through video collections. We propose a system that allows the user to interactively change the scale of the storyboards, so the user can easily adjust the amount of information provided by them. Three different methods to select key frames are studied, including an efficient method that analyzes the video and creates a scalable description, with very little computational cost. Then, storyboards of any length can be retrieved on demand without any further analysis, which is very convenient to provide fast multiscale navigation. Experimental evaluations show how this method improves the utility of the summaries and enhances user experience.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128235852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Video Content Dependent Directional Transform for High Performance Video Coding 基于视频内容的高性能视频编码方向变换
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.21
Long Xu, K. Ngan
In Mode-Dependent Directional Transform (MDDT), Karhunen-CLoeve Transform (KLT) was employed to better compress the directional edges of intra prediction residues. The transform bases of MDDT were derived from the Singular Value Decomposition (SVD) of the intra prediction residues with the diversity of video characteristics. MDDT was mode dependent, but not video content dependent. It was expected to be efficient to most video sequences. However, it did not consider the difference of video content for designing transform basis. In this paper, a video content feature is firstly defined as a concatenation of coefficient magnitude, dominant gradient and spatial activity histograms of residues. Secondly, each KLT basis which is obtained from off-line training is associated with a given feature. Thirdly, a histogram-based feature matching algorithm is proposed to select the best transform basis from the provided multiple candidates for encoding a frame. The experiments show that the average Rate-Distortion (R-D) improvement of 0.65dB PSNR can be achieved by the proposed video Content Dependent Directional Transform (CDDT) compared to the state-of-the-art MDDT for inter frame coding. Compared to Rate-Distortion Optimized Transform (RDOT), CDDT also has about 3% bits saving and comparable PSNR improvement.
在模相关方向变换(MDDT)中,利用Karhunen-CLoeve变换(KLT)更好地压缩了预测残差的方向边缘。MDDT的变换基是根据视频特征的多样性对预测残差进行奇异值分解(SVD)得到的。MDDT依赖于模式,但不依赖于视频内容。预计它对大多数视频序列都是有效的。但是,在设计变换依据时没有考虑到视频内容的差异性。本文首先将视频内容特征定义为系数幅度、优势梯度和残差空间活动直方图的串联。其次,将离线训练得到的KLT基与给定的特征相关联。第三,提出了一种基于直方图的特征匹配算法,从提供的多个候选变换基中选择最佳变换基进行编码。实验表明,与目前最先进的帧间编码MDDT相比,所提出的视频内容相关方向变换(CDDT)可以实现0.65dB PSNR的平均率失真(R-D)改善。与率失真优化变换(RDOT)相比,CDDT还节省了约3%的比特,并提高了相当的PSNR。
{"title":"Video Content Dependent Directional Transform for High Performance Video Coding","authors":"Long Xu, K. Ngan","doi":"10.1109/ICMEW.2012.21","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.21","url":null,"abstract":"In Mode-Dependent Directional Transform (MDDT), Karhunen-CLoeve Transform (KLT) was employed to better compress the directional edges of intra prediction residues. The transform bases of MDDT were derived from the Singular Value Decomposition (SVD) of the intra prediction residues with the diversity of video characteristics. MDDT was mode dependent, but not video content dependent. It was expected to be efficient to most video sequences. However, it did not consider the difference of video content for designing transform basis. In this paper, a video content feature is firstly defined as a concatenation of coefficient magnitude, dominant gradient and spatial activity histograms of residues. Secondly, each KLT basis which is obtained from off-line training is associated with a given feature. Thirdly, a histogram-based feature matching algorithm is proposed to select the best transform basis from the provided multiple candidates for encoding a frame. The experiments show that the average Rate-Distortion (R-D) improvement of 0.65dB PSNR can be achieved by the proposed video Content Dependent Directional Transform (CDDT) compared to the state-of-the-art MDDT for inter frame coding. Compared to Rate-Distortion Optimized Transform (RDOT), CDDT also has about 3% bits saving and comparable PSNR improvement.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133827037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
User Requirements Elicitation of Stereoscopic 3D Video Interaction 立体三维视频交互的用户需求激发
Pub Date : 2012-07-09 DOI: 10.1109/ICMEW.2012.13
Haiyue Yuan, J. Calic, W. Fernando, A. Kondoz
The recent development of three dimensional (3D) display technologies has resulted in a proliferation of 3D video production and broadcasting, attracting a lot of research into capture, compression and delivery of stereoscopic content. However, the predominant design practice of interactions with 3D video content has failed to address its differences and possibilities in comparison the existing 2D video interactions. This paper presents a study of user requirements related to interaction with the stereoscopic 3D video. The study suggests that the change of view, zoom in/out, dynamic video browsing and textual information are the most relevant interactions with stereoscopic 3D video. In addition, we identified a strong demand for object selection that resulted in a follow-up study of user preferences in 3D selection using virtual-hand and ray-casting metaphors. These results indicate that interaction modality affects users' decision of object selection in terms of chosen location in 3D, while user attitudes do not have significant impact. Furthermore, the ray-casting based interaction using Wiimote can outperform the volume-based interaction technique using mouse and keyboard for object positioning accuracy.
近年来三维显示技术的发展导致了三维视频制作和广播的激增,吸引了大量关于立体内容的捕获、压缩和传输的研究。然而,与3D视频内容交互的主要设计实践未能解决其与现有2D视频交互的差异和可能性。本文研究了与立体三维视频交互相关的用户需求。研究表明,视角变化、放大/缩小、动态视频浏览和文本信息是与立体3D视频最相关的交互。此外,我们确定了对对象选择的强烈需求,这导致了使用虚拟手和光线投射隐喻对用户在3D选择中的偏好进行了后续研究。这些结果表明,交互方式影响用户在三维空间中选择位置的对象选择决策,而用户态度的影响不显著。此外,使用Wiimote的基于光线投射的交互在物体定位精度上优于使用鼠标和键盘的基于体积的交互技术。
{"title":"User Requirements Elicitation of Stereoscopic 3D Video Interaction","authors":"Haiyue Yuan, J. Calic, W. Fernando, A. Kondoz","doi":"10.1109/ICMEW.2012.13","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.13","url":null,"abstract":"The recent development of three dimensional (3D) display technologies has resulted in a proliferation of 3D video production and broadcasting, attracting a lot of research into capture, compression and delivery of stereoscopic content. However, the predominant design practice of interactions with 3D video content has failed to address its differences and possibilities in comparison the existing 2D video interactions. This paper presents a study of user requirements related to interaction with the stereoscopic 3D video. The study suggests that the change of view, zoom in/out, dynamic video browsing and textual information are the most relevant interactions with stereoscopic 3D video. In addition, we identified a strong demand for object selection that resulted in a follow-up study of user preferences in 3D selection using virtual-hand and ray-casting metaphors. These results indicate that interaction modality affects users' decision of object selection in terms of chosen location in 3D, while user attitudes do not have significant impact. Furthermore, the ray-casting based interaction using Wiimote can outperform the volume-based interaction technique using mouse and keyboard for object positioning accuracy.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132885245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2012 IEEE International Conference on Multimedia and Expo Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1