首页 > 最新文献

2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文 中文
Motion vector coding algorithm based on adaptive template matching 基于自适应模板匹配的运动矢量编码算法
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662023
Wen Yang, O. Au, Jingjing Dai, Feng Zou, Chao Pang, Yu Liu
Motion estimation as well as the corresponding motion compensation is a core part of modern video coding standards, which highly improves the compression efficiency. On the other hand, motion information takes considerable portion of compressed bit stream, especially in low bit rate situation. In this paper, an efficient motion vector prediction algorithm is proposed to minimize the bits used for coding the motion information. First, a possible motion vector predictor (MVP) candidate set (CS) including several scaled spatial and temporal predictors is defined. To increase the diversity of predictors, the spatial predictor is adaptively changed based on current distribution of neighboring motion vectors. After that, adaptive template matching technique is applied to remove non-effective predictors from the CS so that the bits used for the MVP index can be significantly reduced. As the final MVP is chosen based on minimum motion vector difference criterion, a guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally omitted. The experimental results indicate that the proposed method can achieve an average bit rate reduction of 5.9% compared with the H.264 standard.
运动估计以及相应的运动补偿是现代视频编码标准的核心部分,它极大地提高了压缩效率。另一方面,运动信息在压缩比特流中占有相当大的比例,特别是在低比特率的情况下。本文提出了一种有效的运动矢量预测算法,以减少用于编码运动信息的比特数。首先,定义一个可能的运动矢量预测(MVP)候选集(CS),包括多个缩放的空间和时间预测。为了增加预测因子的多样性,空间预测因子根据邻近运动向量的当前分布自适应变化。之后,应用自适应模板匹配技术从CS中去除无效的预测因子,从而显著减少用于MVP指数的比特数。由于最终的MVP是基于最小运动矢量差准则选择的,因此进一步引入了一种猜测策略,以便在某些情况下,将MVP索引信号发送给解码器所消耗的比特可以完全省略。实验结果表明,与H.264标准相比,该方法平均比特率降低了5.9%。
{"title":"Motion vector coding algorithm based on adaptive template matching","authors":"Wen Yang, O. Au, Jingjing Dai, Feng Zou, Chao Pang, Yu Liu","doi":"10.1109/MMSP.2010.5662023","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662023","url":null,"abstract":"Motion estimation as well as the corresponding motion compensation is a core part of modern video coding standards, which highly improves the compression efficiency. On the other hand, motion information takes considerable portion of compressed bit stream, especially in low bit rate situation. In this paper, an efficient motion vector prediction algorithm is proposed to minimize the bits used for coding the motion information. First, a possible motion vector predictor (MVP) candidate set (CS) including several scaled spatial and temporal predictors is defined. To increase the diversity of predictors, the spatial predictor is adaptively changed based on current distribution of neighboring motion vectors. After that, adaptive template matching technique is applied to remove non-effective predictors from the CS so that the bits used for the MVP index can be significantly reduced. As the final MVP is chosen based on minimum motion vector difference criterion, a guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally omitted. The experimental results indicate that the proposed method can achieve an average bit rate reduction of 5.9% compared with the H.264 standard.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Controlling virtual world by the real world devices with an MPEG-V framework 用MPEG-V框架控制现实世界设备的虚拟世界
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662028
Seungju Han, Jae-Joon Han, Youngkyoo Hwang, Jungbae Kim, Won-Chul Bang, J. D. Kim, Chang-Yeong Kim
The recent online networked virtual worlds such as SecondLife, World of Warcraft and Lineage have been increasingly popular. A life-scale virtual world presentation and the intuitive interaction between the users and the virtual worlds would provide more natural and immersive experience for users. The emergence of novel interaction technologies such as sensing the facial expression and the motion of the users and the real world environments could be used to provide a strong connection between them. For the wide acceptance and use of the virtual world, a various type of novel interaction devices should have a unified interaction formats between the real world and the virtual world and interoperability among virtual worlds. Thus, MPEG-V Media Context and Control (ISO/IEC 23005) standardizes such connecting information. The paper provides an overview and its usage example of MPEG-V from the real world to the virtual world (R2V) on interfaces for controlling avatars and virtual objects in the virtual world by the real world devices. In particular, we investigate how the MPEG-V framework can be applied for the facial animation of an avatar in various types of virtual worlds.
最近在线网络虚拟世界,如SecondLife,魔兽世界和天堂越来越受欢迎。逼真的虚拟世界呈现,用户与虚拟世界的直观交互,将为用户提供更加自然、身临其境的体验。新的交互技术的出现,如感知用户的面部表情和运动,以及现实世界的环境,可以用来提供他们之间的强连接。为了虚拟世界的广泛接受和使用,各种类型的新型交互设备应该具有现实世界与虚拟世界之间统一的交互格式和虚拟世界之间的互操作性。因此,MPEG-V媒体环境和控制(ISO/IEC 23005)对这种连接信息进行了标准化。本文概述了MPEG-V从现实世界到虚拟世界(R2V)技术在现实世界设备控制虚拟世界中的虚拟人物和虚拟对象的接口上的应用实例。特别是,我们研究了如何将MPEG-V框架应用于各种虚拟世界中角色的面部动画。
{"title":"Controlling virtual world by the real world devices with an MPEG-V framework","authors":"Seungju Han, Jae-Joon Han, Youngkyoo Hwang, Jungbae Kim, Won-Chul Bang, J. D. Kim, Chang-Yeong Kim","doi":"10.1109/MMSP.2010.5662028","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662028","url":null,"abstract":"The recent online networked virtual worlds such as SecondLife, World of Warcraft and Lineage have been increasingly popular. A life-scale virtual world presentation and the intuitive interaction between the users and the virtual worlds would provide more natural and immersive experience for users. The emergence of novel interaction technologies such as sensing the facial expression and the motion of the users and the real world environments could be used to provide a strong connection between them. For the wide acceptance and use of the virtual world, a various type of novel interaction devices should have a unified interaction formats between the real world and the virtual world and interoperability among virtual worlds. Thus, MPEG-V Media Context and Control (ISO/IEC 23005) standardizes such connecting information. The paper provides an overview and its usage example of MPEG-V from the real world to the virtual world (R2V) on interfaces for controlling avatars and virtual objects in the virtual world by the real world devices. In particular, we investigate how the MPEG-V framework can be applied for the facial animation of an avatar in various types of virtual worlds.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Overcoming asynchrony in Audio-Visual Speech Recognition 克服视听语音识别中的异步性
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662066
V. Estellers, J. Thiran
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.
在本文中,我们提出了两种替代方案来克服在视听语音识别模式的自然异步。我们首先研究了基于不同异步级别的动态贝叶斯网络的异步统计模型的使用。我们表明,视听模型应该考虑词边界内的异步性,而不是音素水平。该问题的第二种方法包括在用于识别之前对特征进行额外处理。所提出的技术将音频和视频流的时间演变与语音识别系统相一致,并允许使用更简单的统计模型进行分类。在这两种情况下,我们报告了使用CUAVE数据库的实验,与传统系统相比,所提出的异步模型和特征处理技术获得了改进。
{"title":"Overcoming asynchrony in Audio-Visual Speech Recognition","authors":"V. Estellers, J. Thiran","doi":"10.1109/MMSP.2010.5662066","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662066","url":null,"abstract":"In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130315000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hybrid Compressed Sensing of images 图像的混合压缩感知
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662001
A. A. Moghadam, H. Radha
We consider the problem of recovering a signal/image (x) with a k-sparse representation, from hybrid (complex and real), noiseless linear samples (y) using a mixture of complex-valued sparse and real-valued dense projections within a single matrix. The proposed Hybrid Compressed Sensing (HCS) employs the complex-sparse part of the projection matrix to divide the n-dimensional signal (x) into subsets. In turn, each subset of the signal (coefficients) is mapped onto a complex sample of the measurement vector (y). Under a worst-case scenario of such sparsity-induced mapping, when the number of complex sparse measurements is sufficiently large then this mapping leads to the isolation of a significant fraction of the k non-zero coefficients into different complex measurement samples from y. Using a simple property of complex numbers (namely complex phases) one can identify the isolated non-zeros of x. After reducing the effect of the identified non-zero coefficients from the compressive samples, we utilize the real-valued dense submatrix to form a full rank system of equations to recover the signal values in the remaining indices (that are not recovered by the sparse complex projection part). We show that the proposed hybrid approach can recover a k-sparse signal (with high probability) while requiring only m ≈ 3√n/2k real measurements (where each complex sample is counted as two real measurements). We also derive expressions for the optimal mix of complex-sparse and real-dense rows within an HCS projection matrix. Further, in a practical range of sparsity ratio (k/n) suitable for images, the hybrid approach outperforms even the most complex compressed sensing frameworks (namely basis pursuit with dense Gaussian matrices). The theoretical complexity of HCS is less than the complexity of solving a full-rank system of m linear equations. In practice, the complexity can be lower than this bound.
我们考虑在单个矩阵中使用复值稀疏和实值密集投影的混合,从混合(复和实),无噪声线性样本(y)中恢复具有k稀疏表示的信号/图像(x)的问题。提出的混合压缩感知(HCS)利用投影矩阵的复稀疏部分将n维信号(x)划分为子集。反过来,信号的每个子集(系数)被映射到测量向量(y)的一个复杂样本上。在这种稀疏性映射的最坏情况下,当复杂稀疏测量的数量足够大时,这种映射导致k个非零系数中的很大一部分被隔离到来自y的不同复杂测量样本中。使用复数的简单性质(即复相),可以识别x的孤立非零。在减少从压缩样本中识别的非零系数的影响后,我们利用实值稠密子矩阵形成一个全秩方程组来恢复剩余指标(未被稀疏复投影部分恢复)中的信号值。我们表明,所提出的混合方法可以恢复k-稀疏信号(高概率),同时只需要m≈3√n/2k个实际测量(其中每个复杂样本被视为两个实际测量)。我们还推导了HCS投影矩阵中复稀疏行和实密集行最优混合的表达式。此外,在适合图像的稀疏比(k/n)的实际范围内,混合方法甚至优于最复杂的压缩感知框架(即密集高斯矩阵的基追踪)。HCS的理论复杂度小于求解m个线性方程组的全秩系统的复杂度。在实践中,复杂度可以低于这个界限。
{"title":"Hybrid Compressed Sensing of images","authors":"A. A. Moghadam, H. Radha","doi":"10.1109/MMSP.2010.5662001","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662001","url":null,"abstract":"We consider the problem of recovering a signal/image (x) with a k-sparse representation, from hybrid (complex and real), noiseless linear samples (y) using a mixture of complex-valued sparse and real-valued dense projections within a single matrix. The proposed Hybrid Compressed Sensing (HCS) employs the complex-sparse part of the projection matrix to divide the n-dimensional signal (x) into subsets. In turn, each subset of the signal (coefficients) is mapped onto a complex sample of the measurement vector (y). Under a worst-case scenario of such sparsity-induced mapping, when the number of complex sparse measurements is sufficiently large then this mapping leads to the isolation of a significant fraction of the k non-zero coefficients into different complex measurement samples from y. Using a simple property of complex numbers (namely complex phases) one can identify the isolated non-zeros of x. After reducing the effect of the identified non-zero coefficients from the compressive samples, we utilize the real-valued dense submatrix to form a full rank system of equations to recover the signal values in the remaining indices (that are not recovered by the sparse complex projection part). We show that the proposed hybrid approach can recover a k-sparse signal (with high probability) while requiring only m ≈ 3√n/2k real measurements (where each complex sample is counted as two real measurements). We also derive expressions for the optimal mix of complex-sparse and real-dense rows within an HCS projection matrix. Further, in a practical range of sparsity ratio (k/n) suitable for images, the hybrid approach outperforms even the most complex compressed sensing frameworks (namely basis pursuit with dense Gaussian matrices). The theoretical complexity of HCS is less than the complexity of solving a full-rank system of m linear equations. In practice, the complexity can be lower than this bound.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131158365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data hiding of motion information in chroma and luma samples for video compression 色度和亮度样本中运动信息的数据隐藏
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662022
Jean-Marc Thiesse, Joël Jung, M. Antonini
2010 appears to be the launching date for new compression activities intended to challenge the current video compression standard H.264/AVC. Several improvements of this standard are already known like competition-based motion vector prediction. However the targeted 50% bitrate saving for equivalent quality is not yet achieved. In this context, this paper proposes to reduce the signaling information resulting from this vector competition, by using data hiding techniques. As data hiding and video compression traditionally have contradictory goals, a study of data hiding is first performed. Then, an efficient way of using data hiding for video compression is proposed. The main idea is to hide the indices into appropriately selected chroma and luma transform coefficients. To minimize the prediction errors, the modification is performed via a rate-distortion optimization. Objective improvements (up to 2.3% bitrate saving) and subjective assess ment of chroma loss are reported and analyzed for several sequences.
2010年似乎是新的压缩活动的发布日期,旨在挑战当前的视频压缩标准H.264/AVC。这个标准的几个改进已经为人所知,比如基于竞争的运动矢量预测。然而,目标50%比特率节省同等质量尚未实现。在此背景下,本文提出通过使用数据隐藏技术来减少这种矢量竞争产生的信令信息。由于传统上数据隐藏和视频压缩的目标是相互矛盾的,因此首先对数据隐藏进行了研究。然后,提出了一种利用数据隐藏进行视频压缩的有效方法。主要思想是将指数隐藏到适当选择的色度和亮度变换系数中。为了使预测误差最小化,修改是通过率失真优化来执行的。客观改进(高达2.3%比特率节省)和主观评估色度损失报告和分析了几个序列。
{"title":"Data hiding of motion information in chroma and luma samples for video compression","authors":"Jean-Marc Thiesse, Joël Jung, M. Antonini","doi":"10.1109/MMSP.2010.5662022","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662022","url":null,"abstract":"2010 appears to be the launching date for new compression activities intended to challenge the current video compression standard H.264/AVC. Several improvements of this standard are already known like competition-based motion vector prediction. However the targeted 50% bitrate saving for equivalent quality is not yet achieved. In this context, this paper proposes to reduce the signaling information resulting from this vector competition, by using data hiding techniques. As data hiding and video compression traditionally have contradictory goals, a study of data hiding is first performed. Then, an efficient way of using data hiding for video compression is proposed. The main idea is to hide the indices into appropriately selected chroma and luma transform coefficients. To minimize the prediction errors, the modification is performed via a rate-distortion optimization. Objective improvements (up to 2.3% bitrate saving) and subjective assess ment of chroma loss are reported and analyzed for several sequences.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131246292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A subjective experiment for 3D-mesh segmentation evaluation 三维网格分割评价的主观实验
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662046
H. Benhabiles, G. Lavoué, Jean-Philippe Vandeborre, M. Daoudi
In this paper we present a subjective quality assessment experiment for 3D-mesh segmentation. For this end, we carefully designed a protocol with respect to several factors namely the rendering conditions, the possible interactions, the rating range, and the number of human subjects. To carry out the subjective experiment, more than 40 human observers have rated a set of 250 segmentation results issued from various algorithms. The obtained Mean Opinion Scores, which represent the human subjects' point of view toward the quality of each segmentation, have then been used to evaluate both the quality of automatic segmentation algorithms and the quality of similarity metrics used in recent mesh segmentation benchmarking systems.
本文提出了一种用于三维网格分割的主观质量评价实验。为此,我们精心设计了一个协议,考虑了几个因素,即渲染条件、可能的交互、评级范围和人类受试者的数量。为了进行主观实验,40多名人类观察者对各种算法发布的250个分割结果进行了评分。所获得的平均意见分数代表了人类受试者对每个分割质量的观点,然后用于评估自动分割算法的质量和最近网格分割基准系统中使用的相似度量的质量。
{"title":"A subjective experiment for 3D-mesh segmentation evaluation","authors":"H. Benhabiles, G. Lavoué, Jean-Philippe Vandeborre, M. Daoudi","doi":"10.1109/MMSP.2010.5662046","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662046","url":null,"abstract":"In this paper we present a subjective quality assessment experiment for 3D-mesh segmentation. For this end, we carefully designed a protocol with respect to several factors namely the rendering conditions, the possible interactions, the rating range, and the number of human subjects. To carry out the subjective experiment, more than 40 human observers have rated a set of 250 segmentation results issued from various algorithms. The obtained Mean Opinion Scores, which represent the human subjects' point of view toward the quality of each segmentation, have then been used to evaluate both the quality of automatic segmentation algorithms and the quality of similarity metrics used in recent mesh segmentation benchmarking systems.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"684 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132868006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A hierarchical statistical model for object classification 用于对象分类的分层统计模型
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662071
A. Bakhtiari, N. Bouguila
In many applications it is necessary to be able to classify images in a database accurately and with acceptable speed. The main problem is to assign different images to right categories. The later problem becomes more challenging while dealing with large databases with many categories and subcategories. In this paper we propose a novel classification method based on an adopted hierarchical Dirichlet generative model, previously proposed for corpora document classification. In order to adopt the model to work with image data we use the bag of visual words model. We show that if properly applied the model can achieve adequate results for hierarchical image classification. Experimental results are presented and discussed to show the merits of the proposed approach.
在许多应用中,必须能够以可接受的速度准确地对数据库中的图像进行分类。主要问题是将不同的图像分配到正确的类别。在处理具有许多类别和子类别的大型数据库时,后一个问题变得更具挑战性。在本文中,我们提出了一种新的分类方法,该方法基于先前提出的用于语料库文档分类的分层狄利克雷生成模型。为了使该模型适用于图像数据,我们使用了视觉词包模型。结果表明,如果应用得当,该模型可以达到较好的图像分层分类效果。实验结果显示了该方法的优点。
{"title":"A hierarchical statistical model for object classification","authors":"A. Bakhtiari, N. Bouguila","doi":"10.1109/MMSP.2010.5662071","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662071","url":null,"abstract":"In many applications it is necessary to be able to classify images in a database accurately and with acceptable speed. The main problem is to assign different images to right categories. The later problem becomes more challenging while dealing with large databases with many categories and subcategories. In this paper we propose a novel classification method based on an adopted hierarchical Dirichlet generative model, previously proposed for corpora document classification. In order to adopt the model to work with image data we use the bag of visual words model. We show that if properly applied the model can achieve adequate results for hierarchical image classification. Experimental results are presented and discussed to show the merits of the proposed approach.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"503 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134031344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Color transfer for complex content images based on intrinsic component 基于内禀分量的复杂内容图像色彩转移
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662011
Wan-Chien Chiou, Yi-Lei Chen, Chiou-Ting Hsu
This paper proposes an automatic color transfer method for processing images with complex content based on intrinsic component. Although several automatic color transfer methods has been proposed by including region information and/or using multiple references, these methods tend to become ineffective when processing images with complex content and lighting variation. In this paper, our goal is to incorporate the idea of intrinsic component to better characterize the local organization within an image and to reduce the color-bleeding artifact across complex regions. Using intrinsic information, we first represent each image in region level and determine the best-matched reference region for each target region. Next, we conduct color transfer between the best-matched region pairs and perform weighted color transfer for pixels across complex regions in a de-correlated color space. Both subjective and objective evaluation of our experiments demonstrates that the proposed method outperforms the existing methods.
提出了一种基于内禀分量的复杂内容图像的自动色彩转移方法。虽然已经提出了几种包含区域信息和/或使用多个参考的自动色彩转移方法,但这些方法在处理复杂内容和光照变化的图像时往往无效。在本文中,我们的目标是结合内在成分的思想,以更好地表征图像中的局部组织,并减少跨复杂区域的变色工件。首先利用固有信息对图像进行区域级表示,并确定每个目标区域的最佳匹配参考区域。接下来,我们在最匹配的区域对之间进行颜色转移,并在去相关的颜色空间中对复杂区域的像素进行加权颜色转移。实验的主观和客观评价表明,本文提出的方法优于现有的方法。
{"title":"Color transfer for complex content images based on intrinsic component","authors":"Wan-Chien Chiou, Yi-Lei Chen, Chiou-Ting Hsu","doi":"10.1109/MMSP.2010.5662011","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662011","url":null,"abstract":"This paper proposes an automatic color transfer method for processing images with complex content based on intrinsic component. Although several automatic color transfer methods has been proposed by including region information and/or using multiple references, these methods tend to become ineffective when processing images with complex content and lighting variation. In this paper, our goal is to incorporate the idea of intrinsic component to better characterize the local organization within an image and to reduce the color-bleeding artifact across complex regions. Using intrinsic information, we first represent each image in region level and determine the best-matched reference region for each target region. Next, we conduct color transfer between the best-matched region pairs and perform weighted color transfer for pixels across complex regions in a de-correlated color space. Both subjective and objective evaluation of our experiments demonstrates that the proposed method outperforms the existing methods.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124033100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Real-time particle filtering with heuristics for 3D motion capture by monocular vision 基于启发式算法的单目三维运动捕捉实时粒子滤波
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662008
David Antonio Gómez Jáuregui, P. Horain, Manoj Kumar Rajagopal, S. S. Karri
Particle filtering is known as a robust approach for motion tracking by vision, at the cost of heavy computation in a high dimensional pose space. In this work, we describe a number of heuristics that we demonstrate to jointly improve robustness and real-time for motion capture. 3D human motion capture by monocular vision without markers can be achieved in realtime by registering a 3D articulated model on a video. First, we search the high-dimensional space of 3D poses by generating new hypotheses (or particles) with equivalent 2D projection by kinematic flipping. Second, we use a semi-deterministic particle prediction based on local optimization. Third, we deterministi-cally resample the probability distribution for a more efficient selection of particles. Particles (or poses) are evaluated using a match cost function and penalized with a Gaussian probability pose distribution learned off-line. In order to achieve real-time, measurement step is parallelized on GPU using the OpenCL API. We present experimental results demonstrating robust real-time 3D motion capture with a consumer computer and webcam.
粒子滤波是一种鲁棒的视觉运动跟踪方法,但代价是在高维姿态空间中进行大量的计算。在这项工作中,我们描述了一些启发式,我们展示了共同提高鲁棒性和实时性的运动捕捉。通过在视频上注册3D铰接模型,可以实现无标记的单目视觉三维人体运动捕捉。首先,我们通过运动学翻转生成具有等效二维投影的新假设(或粒子)来搜索三维姿态的高维空间。其次,我们使用了基于局部优化的半确定性粒子预测。第三,我们确定性地重新采样概率分布,以便更有效地选择粒子。粒子(或姿态)使用匹配代价函数进行评估,并使用离线学习的高斯概率姿态分布进行惩罚。为了实现实时性,采用OpenCL API在GPU上并行化测量步骤。我们提出的实验结果表明,鲁棒实时3D运动捕捉与消费电脑和网络摄像头。
{"title":"Real-time particle filtering with heuristics for 3D motion capture by monocular vision","authors":"David Antonio Gómez Jáuregui, P. Horain, Manoj Kumar Rajagopal, S. S. Karri","doi":"10.1109/MMSP.2010.5662008","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662008","url":null,"abstract":"Particle filtering is known as a robust approach for motion tracking by vision, at the cost of heavy computation in a high dimensional pose space. In this work, we describe a number of heuristics that we demonstrate to jointly improve robustness and real-time for motion capture. 3D human motion capture by monocular vision without markers can be achieved in realtime by registering a 3D articulated model on a video. First, we search the high-dimensional space of 3D poses by generating new hypotheses (or particles) with equivalent 2D projection by kinematic flipping. Second, we use a semi-deterministic particle prediction based on local optimization. Third, we deterministi-cally resample the probability distribution for a more efficient selection of particles. Particles (or poses) are evaluated using a match cost function and penalized with a Gaussian probability pose distribution learned off-line. In order to achieve real-time, measurement step is parallelized on GPU using the OpenCL API. We present experimental results demonstrating robust real-time 3D motion capture with a consumer computer and webcam.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124080466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An improved foresighted resource reciprocation strategy for multimedia streaming applications 一种改进的多媒体流应用的预见资源交换策略
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662055
Ester Gutiérrez, Hyunggon Park, P. Frossard
In this paper, we present a solution to efficient multimedia streaming applications over P2P networks based on the foresighted resource reciprocation strategy. We study several priority functions that can explicitly consider the timing constraints and the importance of each data segment in terms of multimedia quality, and successfully incorporate them into the foresighted resource reciprocation strategy. This enables peers to enhance their multimedia streaming capability. The simulation results confirm that the proposed approach outperforms existing algorithms such as tit-for-tat in BitTorrent and BiToS solutions.
在本文中,我们提出了一种基于前瞻性资源交换策略的P2P网络上高效多媒体流应用的解决方案。我们研究了几个可以明确考虑时间约束和每个数据段在多媒体质量方面的重要性的优先级函数,并成功地将它们纳入前瞻性资源互惠策略中。这使对等体能够增强其多媒体流能力。仿真结果证实了该方法优于现有的BitTorrent和BiToS解决方案中的针锋相对算法。
{"title":"An improved foresighted resource reciprocation strategy for multimedia streaming applications","authors":"Ester Gutiérrez, Hyunggon Park, P. Frossard","doi":"10.1109/MMSP.2010.5662055","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662055","url":null,"abstract":"In this paper, we present a solution to efficient multimedia streaming applications over P2P networks based on the foresighted resource reciprocation strategy. We study several priority functions that can explicitly consider the timing constraints and the importance of each data segment in terms of multimedia quality, and successfully incorporate them into the foresighted resource reciprocation strategy. This enables peers to enhance their multimedia streaming capability. The simulation results confirm that the proposed approach outperforms existing algorithms such as tit-for-tat in BitTorrent and BiToS solutions.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114402849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2010 IEEE International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1