2012 IEEE International Conference on Multimedia and Expo最新文献

英文中文

Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation 基于混合建模的高效标签挖掘基于实时搜索的图像标注

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.104

Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu

Although it has been extensively studied for many years, automatic image annotation is still a challenging problem. Recently, data-driven approaches have demonstrated their great success to image auto-annotation. Such approaches leverage abundant partially annotated web images to annotate an uncaptioned image. Specifically, they first retrieve a group of visually closely similar images given an uncaptioned image as a query, then figure out meaningful phrases from the surrounding texts of the image search results. Since the surrounding texts are generally noisy, how to effectively mine meaningful phrases is crucial for the success of such approaches. We propose a mixture modeling approach which assumes that a tag is generated from a convex combination of topics. Different from a typical topic modeling approach like LDA, topics in our approach are explicitly learnt from a definitive catalog of the Web, i.e. the Open Directory Project (ODP). Compared with previous works, it has two advantages: Firstly, it uses an open vocabulary rather than a limited one defined by a training set. Secondly, it is efficient for real-time annotation. Experimental results conducted on two billion web images show the efficiency and effectiveness of the proposed approach.

尽管对图像自动标注进行了多年的广泛研究，但它仍然是一个具有挑战性的问题。最近，数据驱动的方法在图像自动标注方面取得了巨大的成功。这种方法利用大量部分注释的web图像来注释未注释的图像。具体来说，他们首先检索一组视觉上非常相似的图像，给出一个没有字幕的图像作为查询，然后从图像搜索结果的周围文本中找出有意义的短语。由于周围的文本通常是嘈杂的，如何有效地挖掘有意义的短语对这种方法的成功至关重要。我们提出了一种混合建模方法，该方法假设标签是由主题的凸组合生成的。与典型的主题建模方法(如LDA)不同，我们的方法中的主题明确地从Web的确定目录中学习，即开放目录项目(ODP)。与以往的工作相比，它有两个优点:首先，它使用了一个开放的词汇表，而不是由训练集定义的有限词汇表。其次，实时标注效率高。在20亿张网络图像上的实验结果表明了该方法的有效性。

{"title":"Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation","authors":"Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu","doi":"10.1109/ICME.2012.104","DOIUrl":"https://doi.org/10.1109/ICME.2012.104","url":null,"abstract":"Although it has been extensively studied for many years, automatic image annotation is still a challenging problem. Recently, data-driven approaches have demonstrated their great success to image auto-annotation. Such approaches leverage abundant partially annotated web images to annotate an uncaptioned image. Specifically, they first retrieve a group of visually closely similar images given an uncaptioned image as a query, then figure out meaningful phrases from the surrounding texts of the image search results. Since the surrounding texts are generally noisy, how to effectively mine meaningful phrases is crucial for the success of such approaches. We propose a mixture modeling approach which assumes that a tag is generated from a convex combination of topics. Different from a typical topic modeling approach like LDA, topics in our approach are explicitly learnt from a definitive catalog of the Web, i.e. the Open Directory Project (ODP). Compared with previous works, it has two advantages: Firstly, it uses an open vocabulary rather than a limited one defined by a training set. Secondly, it is efficient for real-time annotation. Experimental results conducted on two billion web images show the efficiency and effectiveness of the proposed approach.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127638159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Evaluating Gaussian Like Image Representations over Local Features 评估局部特征上的类高斯图像表示

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.23

Yu-Chuan Su, Guan-Long Wu, Tzu-Hsuan Chiu, Winston H. Hsu, Kuo-Wei Chang

Recently, several Gaussian like image representations are proposed as an alternative of the bag-of-word representation over local features. These representations are proposed to overcome the quantization error problem faced in bag-of-word representation. They are shown to be effective in different applications, the Extended Hierarchical Gaussianization reached excellent performance using single feature in VOC2009, Vector of Locally Aggregated Descriptors and Fisher Kernel reached excellent performance using only signature like representation on Holiday dataset. Despite their success and similarity, no comparative study about these representations has been made. In this paper, we perform a systematic comparison about three emerging different gaussian like representations: Extended Hierarchical Gaussianization, Fisher Kernel and Vector of Locally Aggregated Descriptors. We evaluate the performance and the influence of feature and parameters of these representations on Holiday and CC_Web_Video datasets, and several important properties about these representations have been observed during our investigation. This study provides better understanding about these gaussian like image representations that are believed to be promising in various applications.

最近，人们提出了几种类高斯图像表示，作为局部特征的词袋表示的替代方法。提出这些表示是为了克服词袋表示中存在的量化误差问题。它们在不同的应用中被证明是有效的，扩展分层高斯化在VOC2009中使用单个特征获得了出色的性能，局部聚合描述子向量和Fisher核在Holiday数据集上仅使用签名表示获得了出色的性能。尽管它们都取得了成功，也有相似之处，但还没有对这些表述进行比较研究。在本文中，我们对三种新兴的不同的类高斯表示进行了系统的比较:扩展层次高斯化，Fisher核和局部聚合描述子向量。我们在Holiday和CC_Web_Video数据集上评估了这些表征的性能以及特征和参数的影响，并且在我们的研究中观察到了这些表征的几个重要特性。这项研究提供了更好的理解这些被认为在各种应用中有前途的高斯图像表示。

{"title":"Evaluating Gaussian Like Image Representations over Local Features","authors":"Yu-Chuan Su, Guan-Long Wu, Tzu-Hsuan Chiu, Winston H. Hsu, Kuo-Wei Chang","doi":"10.1109/ICME.2012.23","DOIUrl":"https://doi.org/10.1109/ICME.2012.23","url":null,"abstract":"Recently, several Gaussian like image representations are proposed as an alternative of the bag-of-word representation over local features. These representations are proposed to overcome the quantization error problem faced in bag-of-word representation. They are shown to be effective in different applications, the Extended Hierarchical Gaussianization reached excellent performance using single feature in VOC2009, Vector of Locally Aggregated Descriptors and Fisher Kernel reached excellent performance using only signature like representation on Holiday dataset. Despite their success and similarity, no comparative study about these representations has been made. In this paper, we perform a systematic comparison about three emerging different gaussian like representations: Extended Hierarchical Gaussianization, Fisher Kernel and Vector of Locally Aggregated Descriptors. We evaluate the performance and the influence of feature and parameters of these representations on Holiday and CC_Web_Video datasets, and several important properties about these representations have been observed during our investigation. This study provides better understanding about these gaussian like image representations that are believed to be promising in various applications.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126264007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding 基于改进稀疏局部编码的动态场景语义运动模式学习

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.133

Wei Fu, Jinqiao Wang, Zechao Li, Hanqing Lu, Songde Ma

With the proliferation of cameras in public areas, it becomes increasingly desirable to develop fully automated surveillance and monitoring systems. In this paper, we propose a novel unsupervised approach to automatically explore motion patterns occurring in dynamic scenes under an improved sparse topical coding (STC) framework. Given an input video with a fixed camera, we first segment the whole video into a sequence of clips (documents) without overlapping. Optical flow features are extracted from each pair of consecutive frames, and quantized into discrete visual words. Then the video is represented by a word-document hierarchical topic model through a generative process. Finally, an improved sparse topical coding approach is proposed for model learning. The semantic motion patterns (latent topics) are learned automatically and each video clip is represented as a weighted summation of these patterns with only a few nonzero coefficients. The proposed approach is purely data-driven and scene independent (not an object-class specific), which make it suitable for very large range of scenarios. Experiments demonstrate that our approach outperforms the state-of-the art technologies in dynamic scene analysis.

随着公共场所摄像机的普及，开发全自动监控系统变得越来越迫切。在本文中，我们提出了一种新的无监督方法，在改进的稀疏主题编码(STC)框架下自动探索动态场景中发生的运动模式。给定一个带有固定摄像机的输入视频，我们首先将整个视频分割成一系列片段(文档)，而不重叠。从每对连续帧中提取光流特征，并将其量化为离散的视觉词。然后通过生成过程将视频用word-document分层主题模型表示。最后，提出了一种改进的稀疏主题编码方法用于模型学习。语义运动模式(潜在主题)被自动学习，每个视频片段被表示为这些模式的加权和，只有几个非零系数。所提出的方法是纯粹的数据驱动和场景独立(不是特定于对象类)，这使得它适用于非常大范围的场景。实验表明，我们的方法在动态场景分析中优于最先进的技术。

{"title":"Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding","authors":"Wei Fu, Jinqiao Wang, Zechao Li, Hanqing Lu, Songde Ma","doi":"10.1109/ICME.2012.133","DOIUrl":"https://doi.org/10.1109/ICME.2012.133","url":null,"abstract":"With the proliferation of cameras in public areas, it becomes increasingly desirable to develop fully automated surveillance and monitoring systems. In this paper, we propose a novel unsupervised approach to automatically explore motion patterns occurring in dynamic scenes under an improved sparse topical coding (STC) framework. Given an input video with a fixed camera, we first segment the whole video into a sequence of clips (documents) without overlapping. Optical flow features are extracted from each pair of consecutive frames, and quantized into discrete visual words. Then the video is represented by a word-document hierarchical topic model through a generative process. Finally, an improved sparse topical coding approach is proposed for model learning. The semantic motion patterns (latent topics) are learned automatically and each video clip is represented as a weighted summation of these patterns with only a few nonzero coefficients. The proposed approach is purely data-driven and scene independent (not an object-class specific), which make it suitable for very large range of scenarios. Experiments demonstrate that our approach outperforms the state-of-the art technologies in dynamic scene analysis.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130137220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Predicting Image Popularity in an Incomplete Social Media Community by a Weighted Bi-partite Graph 基于加权二部图的不完全社交媒体社区图像人气预测

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.43

Xiang Niu, Lusong Li, Tao Mei, Jialie Shen, Ke Xu

Popularity prediction is a key problem in networks to analyze the information diffusion, especially in social media communities. Recently, there have been some custom-build prediction models in Digg and YouTube. However, these models are hardly transplant to an incomplete social network site (e.g., Flickr) by their unique parameters. In addition, because of the large scale of the network in Flickr, it is difficult to get all of the photos and the whole network. Thus, we are seeking for a method which can be used in such incomplete network. Inspired by a collaborative filtering method-Network-based Inference (NBI), we devise a weighted bipartite graph with undetected users and items to represent the resource allocation process in an incomplete network. Instead of image analysis, we propose a modified interdisciplinary models, called Incomplete Network-based Inference (INI). Using the data from 30 months in Flickr, we show the proposed INI is able to increase prediction accuracy by over 58.1%, compared with traditional NBI. We apply our proposed INI approach to personalized advertising application and show that it is more attractive than traditional Flickr advertising.

人气预测是网络中分析信息扩散的关键问题，尤其是在社交媒体社区中。最近，在Digg和YouTube上出现了一些定制的预测模型。然而，这些模型由于其独特的参数，很难移植到一个不完整的社交网站(如Flickr)。另外，由于Flickr的网络规模很大，很难得到所有的照片和整个网络。因此，我们正在寻找一种可以用于这种不完全网络的方法。受协同过滤方法-基于网络的推理(NBI)的启发，我们设计了一个带有未检测到的用户和项目的加权二部图来表示不完全网络中的资源分配过程。代替图像分析，我们提出了一个改进的跨学科模型，称为不完全基于网络的推理(INI)。使用Flickr 30个月的数据，我们发现与传统的NBI相比，所提出的INI能够将预测精度提高58.1%以上。我们将我们提出的INI方法应用于个性化广告应用，并表明它比传统的Flickr广告更具吸引力。

{"title":"Predicting Image Popularity in an Incomplete Social Media Community by a Weighted Bi-partite Graph","authors":"Xiang Niu, Lusong Li, Tao Mei, Jialie Shen, Ke Xu","doi":"10.1109/ICME.2012.43","DOIUrl":"https://doi.org/10.1109/ICME.2012.43","url":null,"abstract":"Popularity prediction is a key problem in networks to analyze the information diffusion, especially in social media communities. Recently, there have been some custom-build prediction models in Digg and YouTube. However, these models are hardly transplant to an incomplete social network site (e.g., Flickr) by their unique parameters. In addition, because of the large scale of the network in Flickr, it is difficult to get all of the photos and the whole network. Thus, we are seeking for a method which can be used in such incomplete network. Inspired by a collaborative filtering method-Network-based Inference (NBI), we devise a weighted bipartite graph with undetected users and items to represent the resource allocation process in an incomplete network. Instead of image analysis, we propose a modified interdisciplinary models, called Incomplete Network-based Inference (INI). Using the data from 30 months in Flickr, we show the proposed INI is able to increase prediction accuracy by over 58.1%, compared with traditional NBI. We apply our proposed INI approach to personalized advertising application and show that it is more attractive than traditional Flickr advertising.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Fast Near-Duplicate Video Retrieval via Motion Time Series Matching 基于运动时间序列匹配的快速近重复视频检索

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.111

John R. Zhang, J. Ren, Fangzhe Chang, Thomas L. Wood, J. Kender

This paper introduces a method for the efficient comparison and retrieval of near duplicates of a query video from a video database. The method generates video signatures from histograms of orientations of optical flow of feature points computed from uniformly sampled video frames concatenated over time to produce time series, which are then aligned and matched. Major incline matching, a data reduction and peak alignment method for time series, is adapted for faster performance. The resultant method is compact and robust against a number of common transformations including: flipping, cropping, picture-in-picture, photometric, addition of noise and other artifacts. We evaluate on the MUSCLE VCD 2007 dataset and a dataset derived from TRECVID 2009. Good precision (average 88.8%) at significantly higher speeds (average durations: 45 seconds for signature generation plus 92 seconds for a linear search of 81-second query video in a 300 hour dataset) than results reported in the literature are shown.

本文介绍了一种从视频数据库中对查询视频的近重复点进行高效比较和检索的方法。该方法从均匀采样的视频帧中计算出的特征点的光流方向直方图中生成视频签名，这些特征点随时间串联产生时间序列，然后对其进行对齐和匹配。主要倾斜匹配，数据减少和峰值对齐方法的时间序列，适应更快的性能。所得到的方法是紧凑和鲁棒的一些常见的变换，包括:翻转，裁剪，画中画，光度，添加噪声和其他人工制品。我们对MUSCLE VCD 2007数据集和来自TRECVID 2009的数据集进行了评估。与文献中报道的结果相比，本文显示了高得多的速度下的良好精度(平均88.8%)(签名生成的平均持续时间:45秒，对300小时数据集中81秒的查询视频进行线性搜索的平均持续时间为92秒)。

引用次数: 18

Principal Components Analysis-Based Edge-Directed Image Interpolation 基于主成分分析的边缘图像插值

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.153

Bing Yang, Zhiyong Gao, Xiaoyun Zhang

This paper presents an edge-directed, noniterative image interpolation algorithm. In the proposed algorithm, the gradient directions are explicitly estimated with a statistical-based approach. The local dominant gradient directions are obtained by using principal components analysis (PCA) on the four nearest gradients. The angles of the whole gradient plane are divided into four parts, and each gradient direction falls into one part. Then we implement the interpolation with one-dimention (1-D) cubic convolution interpolation perpendicular to the gradient direction. Compared to the state of-the-art interpolation methods, simulation results show that the proposed PCA-based edge-directed interpolation method preserves edges well while maintaining a high PSNR value.

提出了一种边缘导向的非迭代图像插值算法。在该算法中，采用基于统计的方法显式估计梯度方向。利用主成分分析(PCA)对4个最近的梯度进行分析，得到了局部优势梯度方向。将整个梯度平面的角度划分为四个部分，每个梯度方向落为一个部分。然后用垂直于梯度方向的一维(1-D)三次卷积插值实现插值。仿真结果表明，与现有插值方法相比，基于pca的边缘定向插值方法在保留边缘的同时保持了较高的PSNR值。

引用次数: 12

Error Modeling and Estimation Fusion for Indoor Localization 室内定位误差建模与估计融合

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.106

Weipeng Zhuo, Bo Zhang, S. Chan, E. Chang

There has been much interest in offering multimedia location-based service (LBS) to indoor users (e.g., sending video/audio streams according to user locations). Offering good LBS largely depends on accurate indoor localization of mobile stations (MSs). To achieve that, in this paper we first model and analyze the error characteristics of important indoor localization schemes, using Radio Frequency Identification (RFID) and Wi-Fi. Our models are simple to use, capturing important system parameters and measurement noises, and quantifying how they affect the accuracies of the localization. Given that there have been many indoor localization techniques deployed, an MS may receive simultaneously multiple co-existing estimations on its location. Equipped with the understanding of location errors, we then investigate how to optimally combine, or fuse, all the co-existing estimations of an MS's location. We present computationally-efficient closed-form expressions to fuse the outputs of the estimators. Simulation and experimental results show that our fusion technique achieves higher location accuracy in spite of location errors in the estimators.

人们对向室内用户提供基于位置的多媒体服务(LBS)很感兴趣(例如，根据用户位置发送视频/音频流)。提供良好的定位在很大程度上取决于移动站(MSs)的精确室内定位。为了实现这一目标，本文首先使用射频识别(RFID)和Wi-Fi建模并分析了重要的室内定位方案的误差特征。我们的模型使用简单，捕获重要的系统参数和测量噪声，并量化它们如何影响定位精度。鉴于已经部署了许多室内定位技术，MS可能同时接收多个共存的位置估计。有了对位置误差的理解，我们研究了如何最优地组合或融合所有共存的MS位置估计。我们提出了计算效率高的封闭表达式来融合估计量的输出。仿真和实验结果表明，在估计器存在定位误差的情况下，该融合方法仍能获得较高的定位精度。

{"title":"Error Modeling and Estimation Fusion for Indoor Localization","authors":"Weipeng Zhuo, Bo Zhang, S. Chan, E. Chang","doi":"10.1109/ICME.2012.106","DOIUrl":"https://doi.org/10.1109/ICME.2012.106","url":null,"abstract":"There has been much interest in offering multimedia location-based service (LBS) to indoor users (e.g., sending video/audio streams according to user locations). Offering good LBS largely depends on accurate indoor localization of mobile stations (MSs). To achieve that, in this paper we first model and analyze the error characteristics of important indoor localization schemes, using Radio Frequency Identification (RFID) and Wi-Fi. Our models are simple to use, capturing important system parameters and measurement noises, and quantifying how they affect the accuracies of the localization. Given that there have been many indoor localization techniques deployed, an MS may receive simultaneously multiple co-existing estimations on its location. Equipped with the understanding of location errors, we then investigate how to optimally combine, or fuse, all the co-existing estimations of an MS's location. We present computationally-efficient closed-form expressions to fuse the outputs of the estimators. Simulation and experimental results show that our fusion technique achieves higher location accuracy in spite of location errors in the estimators.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Traffic Reduction for Multiple Users in Multi-view Video Streaming 多视频流中多用户的流量减少

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.185

T. Fujihashi, Ziyuan Pan, Takashi Watanabe

Multi-view video consists of multiple video sequences captured simultaneously from different angles by multiple closely spaced cameras. It enables the users to freely change their viewpoints by playing different video sequences. Transmission of multi-view video requires more bandwidth than conventional multimedia. To reduce the bandwidth, UDMVT (User Dependent Multi-view Video Transmission) based on MVC (Multi-view Video Coding) has been proposed for single user. In UDMVT, for multiple users the same frames are encoded into different versions for each user, which increases the redundant transmission. For this problem, this paper proposes UMSM (User dependent Multi-view video Streaming for Multi-users). UMSM possesses two characteristics. The first characteristic is that the overlapped frames that are required by multiple users are transmitted only once using the multicast to avoid unnecessary duplication of transmission. The second characteristic is that a time lag of the video request by multiple users is adjusted to coincide with the next request. Simulation results using benchmark test sequences provided by MERL show that UMSM decreases the transmission bit-rate 55.3% on average for 5 users watching the same multi-view video as compared with UDMVT.

多视图视频由多个紧密间隔的摄像机从不同角度同时捕获的多个视频序列组成。它使用户可以通过播放不同的视频序列来自由地改变他们的观点。多视点视频的传输比传统多媒体需要更多的带宽。为了减少带宽，针对单用户提出了基于MVC(多视图视频编码)的用户依赖多视图视频传输(UDMVT)。在UDMVT中，对于多个用户，相同的帧被编码为每个用户的不同版本，这增加了冗余传输。针对这一问题，本文提出了UMSM (User dependent Multi-view video Streaming For Multi-users)。UMSM有两个特点。第一个特点是多个用户需要的重叠帧使用组播只传输一次，避免了不必要的重复传输。第二个特点是多个用户的视频请求的时间延迟被调整为与下一个请求一致。使用MERL提供的基准测试序列进行的仿真结果表明，在5个用户观看相同的多视点视频时，UMSM比UDMVT平均降低了55.3%的传输比特率。

{"title":"Traffic Reduction for Multiple Users in Multi-view Video Streaming","authors":"T. Fujihashi, Ziyuan Pan, Takashi Watanabe","doi":"10.1109/ICME.2012.185","DOIUrl":"https://doi.org/10.1109/ICME.2012.185","url":null,"abstract":"Multi-view video consists of multiple video sequences captured simultaneously from different angles by multiple closely spaced cameras. It enables the users to freely change their viewpoints by playing different video sequences. Transmission of multi-view video requires more bandwidth than conventional multimedia. To reduce the bandwidth, UDMVT (User Dependent Multi-view Video Transmission) based on MVC (Multi-view Video Coding) has been proposed for single user. In UDMVT, for multiple users the same frames are encoded into different versions for each user, which increases the redundant transmission. For this problem, this paper proposes UMSM (User dependent Multi-view video Streaming for Multi-users). UMSM possesses two characteristics. The first characteristic is that the overlapped frames that are required by multiple users are transmitted only once using the multicast to avoid unnecessary duplication of transmission. The second characteristic is that a time lag of the video request by multiple users is adjusted to coincide with the next request. Simulation results using benchmark test sequences provided by MERL show that UMSM decreases the transmission bit-rate 55.3% on average for 5 users watching the same multi-view video as compared with UDMVT.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114505067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Energy-Aware Operation of Black Box Surveillance Cameras under Event Uncertainty and Memory Constraint 事件不确定性和内存约束下黑匣子监控摄像机的能量感知运行

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.21

Giwon Kim, Jungsoo Kim, Jongpil Jung, C. Kyung

In this paper, we propose an event-driven black box surveillance camera which reduces energy consumption by waking up the system only when an event is detected and dynamically adjusting the video encoding and the resultant image distortion according to the criticality of captured frames called significance level. To achieve this goal, we find an encoding bitrate minimizing the energy consumption of the camera while satisfying the limited memory space constraint and distortion requirement at each significance level by judiciously allocating bit-rate to each significance level. To do that, we considered the trade-off relations between the total energy consumption vs. encoding bit-rate according to the significance level. For further energy savings, we also proposed a low complexity solution which adjusts the energy-minimal encoding bit-rate based on the dynamically changing event behavior, i.e., timing and duration of events. Experimental results show that the proposed method yields up to 67.49% (49.19% on average) energy savings compared to the conventional bitrate allocation methods.

在本文中，我们提出了一种事件驱动的黑箱监控摄像机，它通过仅在检测到事件时唤醒系统并根据捕获帧的重要性(称为显著性水平)动态调整视频编码和由此产生的图像失真来降低能耗。为了实现这一目标，我们找到了一种编码比特率，通过明智地分配比特率到每个显著性水平，在满足有限的存储空间约束和每个显著性水平失真要求的同时，最大限度地减少了相机的能量消耗。为此，我们根据显著性水平考虑了总能耗与编码比特率之间的权衡关系。为了进一步节省能量，我们还提出了一种低复杂度的解决方案，该方案根据动态变化的事件行为(即事件的时间和持续时间)调整能量最小编码比特率。实验结果表明，与传统的比特率分配方法相比，该方法可节省67.49%(平均49.19%)的能量。

引用次数: 5

A Unified Estimation-Theoretic Framework for Error-Resilient Scalable Video Coding 一种容错可伸缩视频编码的统一估计理论框架

2012 IEEE International Conference on Multimedia and Expo

Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.76

Jingning Han, Vinay Melkote, K. Rose

A novel scalable video coding (SVC) scheme is proposed for video transmission over loss networks, which builds on an estimation-theoretic (ET) framework for optimal prediction and error concealment, given all available information from both the current base layer and prior enhancement layer frames. It incorporates a recursive end-to-end distortion estimation technique, namely, the spectral coefficient-wise optimal recursive estimate (SCORE), which accounts for all ET operations and tracks the first and second moments of decoder reconstructed transform coefficients. The overall framework enables optimization of ET-SVC systems for transmission over lossy networks, while accounting for all relevant conditions including the effects of quantization, channel loss, concealment, and error propagation. It thus resolves longstanding difficulties in combining truly optimal prediction and concealment with optimal end-to-end distortion and error-resilient SVC coding decisions. Experiments demonstrate that the proposed scheme offers substantial performance gains over existing error-resilient SVC systems, under a wide range of packet loss and bit rates.

提出了一种新的可扩展视频编码(SVC)方案，该方案建立在估计理论(ET)框架的基础上，在给定当前基础层和先前增强层帧的所有可用信息的情况下，实现最优预测和错误隐藏。它结合了递归端到端失真估计技术，即频谱系数最优递归估计(SCORE)，它考虑了所有ET操作并跟踪解码器重构变换系数的第一和第二矩。整体框架能够优化ET-SVC系统在有损网络上的传输，同时考虑到所有相关条件，包括量化、信道损耗、隐藏和错误传播的影响。因此，它解决了将真正最优的预测和隐藏与最优的端到端失真和抗错误SVC编码决策相结合的长期困难。实验表明，在大范围的丢包率和比特率下，该方案比现有的抗错误SVC系统提供了显著的性能提升。

{"title":"A Unified Estimation-Theoretic Framework for Error-Resilient Scalable Video Coding","authors":"Jingning Han, Vinay Melkote, K. Rose","doi":"10.1109/ICME.2012.76","DOIUrl":"https://doi.org/10.1109/ICME.2012.76","url":null,"abstract":"A novel scalable video coding (SVC) scheme is proposed for video transmission over loss networks, which builds on an estimation-theoretic (ET) framework for optimal prediction and error concealment, given all available information from both the current base layer and prior enhancement layer frames. It incorporates a recursive end-to-end distortion estimation technique, namely, the spectral coefficient-wise optimal recursive estimate (SCORE), which accounts for all ET operations and tracks the first and second moments of decoder reconstructed transform coefficients. The overall framework enables optimization of ET-SVC systems for transmission over lossy networks, while accounting for all relevant conditions including the effects of quantization, channel loss, concealment, and error propagation. It thus resolves longstanding difficulties in combining truly optimal prediction and concealment with optimal end-to-end distortion and error-resilient SVC coding decisions. Experiments demonstrate that the proposed scheme offers substantial performance gains over existing error-resilient SVC systems, under a wide range of packet loss and bit rates.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116981544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Conference on Multimedia and Expo

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀