首页 > 最新文献

2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文 中文
Movement recognition exploiting multi-view information 利用多视角信息的运动识别
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662059
Alexandros Iosifidis, N. Nikolaidis, I. Pitas
In this paper a novel view-invariant movement recognition method is presented. A multi-camera setup is used to capture the movement from different observation angles. Identification of the position of each camera with respect to the subject's body is achieved by a procedure based on morphological operations and the proportions of the human body. Binary body masks from frames of all cameras, consistently arranged through the previous procedure, are concatenated to produce the so-called multi-view binary mask. These masks are rescaled and vectorized to create feature vectors in the input space. Fuzzy vector quantization is performed to associate input feature vectors with movement representations and linear discriminant analysis is used to map movements in a low dimensionality discriminant feature space. Experimental results show that the method can achieve very satisfactory recognition rates.
提出了一种新的视觉不变运动识别方法。多摄像头设置用于从不同的观察角度捕捉运动。通过基于形态学操作和人体比例的程序来识别每个相机相对于受试者身体的位置。所有摄像机帧的二进制体掩码,通过前面的程序一致地排列,被连接起来产生所谓的多视图二进制掩码。这些掩码被重新缩放和矢量化,以在输入空间中创建特征向量。通过模糊向量量化将输入特征向量与运动表示相关联,并使用线性判别分析在低维判别特征空间中映射运动。实验结果表明,该方法能取得令人满意的识别率。
{"title":"Movement recognition exploiting multi-view information","authors":"Alexandros Iosifidis, N. Nikolaidis, I. Pitas","doi":"10.1109/MMSP.2010.5662059","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662059","url":null,"abstract":"In this paper a novel view-invariant movement recognition method is presented. A multi-camera setup is used to capture the movement from different observation angles. Identification of the position of each camera with respect to the subject's body is achieved by a procedure based on morphological operations and the proportions of the human body. Binary body masks from frames of all cameras, consistently arranged through the previous procedure, are concatenated to produce the so-called multi-view binary mask. These masks are rescaled and vectorized to create feature vectors in the input space. Fuzzy vector quantization is performed to associate input feature vectors with movement representations and linear discriminant analysis is used to map movements in a low dimensionality discriminant feature space. Experimental results show that the method can achieve very satisfactory recognition rates.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126804575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Multimodal speech recognition of a person with articulation disorders using AAM and MAF 利用AAM和MAF对发音障碍患者进行多模态语音识别
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662075
Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li
We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.
我们调查了一个人的语音识别与衔接障碍导致的动脉样脑瘫。由于言语相关肌肉的紧张,言语的发音会变得不稳定,从而导致言语识别能力的下降。因此,我们使用多声框架(MAF)作为声学特征来解决这个问题。此外,在真实环境中,由于噪声的影响,当前的语音识别系统没有足够的性能。除了声学特征外,视觉特征还用于提高真实环境中的噪声鲁棒性。然而,脑瘫患者往往会不规律地移动头部,从而导致识别问题。我们研究了一种使用主动外观模型(AAM)的姿态鲁棒性视听语音识别方法,以解决由动脉样脑瘫引起的发音障碍患者的这一问题。aam用于人脸跟踪,提取具有姿态鲁棒性的人脸特征点。通过对发音障碍患者嘈杂言语的识别实验,验证了该方法的有效性。
{"title":"Multimodal speech recognition of a person with articulation disorders using AAM and MAF","authors":"Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li","doi":"10.1109/MMSP.2010.5662075","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662075","url":null,"abstract":"We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
The Iteration-Tuned Dictionary for sparse representations 用于稀疏表示的迭代调优字典
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662000
J. Zepeda, C. Guillemot, Ewa Kijak
We introduce a new dictionary structure for sparse representations better adapted to pursuit algorithms used in practical scenarios. The new structure, which we call an Iteration-Tuned Dictionary (ITD), consists of a set of dictionaries each associated to a single iteration index of a pursuit algorithm. In this work we first adapt pursuit decompositions to the case of ITD structures and then introduce a training algorithm used to construct ITDs. The training algorithm consists of applying a K-means to the (i −1)-th residuals of the training set to thus produce the i-th dictionary of the ITD structure. In the results section we compare our algorithm against the state-of-the-art dictionary training scheme and show that our method produces sparse representations yielding better signal approximations for the same sparsity level.
我们为稀疏表示引入了一种新的字典结构,它更适合于实际场景中使用的追踪算法。我们称之为迭代调优字典(ITD)的新结构由一组字典组成,每个字典都与一个追踪算法的单个迭代索引相关联。在这项工作中,我们首先使追踪分解适应过渡段结构的情况,然后引入一种用于构建过渡段的训练算法。训练算法包括对训练集的(i−1)-个残差应用K-means,从而产生ITD结构的第i个字典。在结果部分,我们将我们的算法与最先进的字典训练方案进行比较,并表明我们的方法产生的稀疏表示可以在相同的稀疏度级别上产生更好的信号近似值。
{"title":"The Iteration-Tuned Dictionary for sparse representations","authors":"J. Zepeda, C. Guillemot, Ewa Kijak","doi":"10.1109/MMSP.2010.5662000","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662000","url":null,"abstract":"We introduce a new dictionary structure for sparse representations better adapted to pursuit algorithms used in practical scenarios. The new structure, which we call an Iteration-Tuned Dictionary (ITD), consists of a set of dictionaries each associated to a single iteration index of a pursuit algorithm. In this work we first adapt pursuit decompositions to the case of ITD structures and then introduce a training algorithm used to construct ITDs. The training algorithm consists of applying a K-means to the (i −1)-th residuals of the training set to thus produce the i-th dictionary of the ITD structure. In the results section we compare our algorithm against the state-of-the-art dictionary training scheme and show that our method produces sparse representations yielding better signal approximations for the same sparsity level.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123082867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Considering security and robustness constraints for watermark-based Tardos fingerprinting 考虑基于水印的Tardos指纹识别的安全性和鲁棒性约束
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661992
B. Mathon, P. Bas, François Cayre, B. Macq
This article is a theoretical study on binary Tardos' fingerprinting codes embedded using watermarking schemes. Our approach is derived from [1] and encompasses both security and robustness constraints. We assume here that the coalition has estimated the symbols of the fingerprinting code by the way of a security attack, the quality of the estimation relying on the security of the watermarking scheme. Taking into account the fact that the coalition can perform estimation errors, we update the Worst Case Attack, which minimises the mutual information between the sequence of one colluder and the pirated sequence forged by the coalition. After comparing the achievable rates of the previous and proposed Worst Case Attack according to the estimation error, we conclude this analysis by comparing the robustness of no-secure embedding schemes versus secure ones. We show that, for low probabilities of error during the decoding stage (e.g. highly robust watermarking schemes), security enables to increase the achievable rate of the fingerprinting scheme.
本文对利用水印技术嵌入二进制塔尔多斯指纹码进行了理论研究。我们的方法来源于[1],包含了安全性和健壮性约束。我们假设联盟已经通过安全攻击的方式估计了指纹码的符号,估计的质量依赖于水印方案的安全性。考虑到联盟可能产生估计误差的事实,我们更新了最坏情况攻击,使一个共谋者的序列与联盟伪造的盗版序列之间的互信息最小化。在根据估计误差比较了之前和提出的最坏情况攻击的可实现率之后,我们通过比较非安全嵌入方案与安全嵌入方案的鲁棒性来总结本文的分析。我们表明,对于解码阶段的低错误概率(例如高度鲁棒的水印方案),安全性可以提高指纹方案的可实现率。
{"title":"Considering security and robustness constraints for watermark-based Tardos fingerprinting","authors":"B. Mathon, P. Bas, François Cayre, B. Macq","doi":"10.1109/MMSP.2010.5661992","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661992","url":null,"abstract":"This article is a theoretical study on binary Tardos' fingerprinting codes embedded using watermarking schemes. Our approach is derived from [1] and encompasses both security and robustness constraints. We assume here that the coalition has estimated the symbols of the fingerprinting code by the way of a security attack, the quality of the estimation relying on the security of the watermarking scheme. Taking into account the fact that the coalition can perform estimation errors, we update the Worst Case Attack, which minimises the mutual information between the sequence of one colluder and the pirated sequence forged by the coalition. After comparing the achievable rates of the previous and proposed Worst Case Attack according to the estimation error, we conclude this analysis by comparing the robustness of no-secure embedding schemes versus secure ones. We show that, for low probabilities of error during the decoding stage (e.g. highly robust watermarking schemes), security enables to increase the achievable rate of the fingerprinting scheme.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131392869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generation of see-through baseball movie from multi-camera views 一代透明的棒球电影从多摄像机视图
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662060
Takanori Hashimoto, Yuko Uematsu, H. Saito
This paper presents a method of generating new view point movie for the baseball game. One of the most interesting view point on the baseball game is looking from behind the catcher. If only one camera is placed behind the catcher, however, the view is occluded by the umpire and catcher. In this paper, we propose a method for generating a see-through movie which is captured from behind the catcher by recovering the pitcher's appearance with multiple cameras, so that we can virtually remove the obstacles (catcher and umpire) from the movie. Our method consists of three processes; recovering the pitcher's appearance by Homography, detecting obstacles by Graph Cut, projecting the ball's trajectory. For demonstrating the effectiveness of our method, in the experiment, we generate a see-through movie by applying our method to the multiple camera movies which are taken in the real baseball stadium. In the see-through movie, the pitcher can be appeared through the catcher and umpire.
提出了一种棒球比赛新视点电影的生成方法。从接球手后面看棒球比赛是最有趣的观点之一。但是,如果在捕手后面只放置一个摄像机,则视野会被裁判和捕手遮挡。在本文中,我们提出了一种方法,通过多个摄像机还原投手的外观,从捕手后面捕获一个透明的电影,这样我们就可以从电影中消除障碍物(捕手和裁判)。我们的方法包括三个过程;用同形图法还原投手的外形,用图切法检测障碍物,投影球的运动轨迹。为了证明该方法的有效性,在实验中,我们将该方法应用于真实棒球场的多镜头电影,生成了一个透明的电影。在透明电影中,投手可以通过捕手和裁判出现。
{"title":"Generation of see-through baseball movie from multi-camera views","authors":"Takanori Hashimoto, Yuko Uematsu, H. Saito","doi":"10.1109/MMSP.2010.5662060","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662060","url":null,"abstract":"This paper presents a method of generating new view point movie for the baseball game. One of the most interesting view point on the baseball game is looking from behind the catcher. If only one camera is placed behind the catcher, however, the view is occluded by the umpire and catcher. In this paper, we propose a method for generating a see-through movie which is captured from behind the catcher by recovering the pitcher's appearance with multiple cameras, so that we can virtually remove the obstacles (catcher and umpire) from the movie. Our method consists of three processes; recovering the pitcher's appearance by Homography, detecting obstacles by Graph Cut, projecting the ball's trajectory. For demonstrating the effectiveness of our method, in the experiment, we generate a see-through movie by applying our method to the multiple camera movies which are taken in the real baseball stadium. In the see-through movie, the pitcher can be appeared through the catcher and umpire.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Bit allocation and encoded view selection for optimal multiview image representation 位分配和编码视图选择的最佳多视图图像表示
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662025
Gene Cheung, V. Velisavljevic
Novel coding tools have been proposed recently to encode texture and depth maps of multiview images, exploiting inter-view correlations, for depth-image-based rendering (DIBR). However, the important associated bit allocation problem for DIBR remains open: for chosen view coding and synthesis tools, how to allocate bits among texture and depth maps across encoded views, so that the fidelity of a set of V views reconstructed at the decoder is maximized, for a fixed bitrate budget? In this paper, we present an optimization strategy to select subset of texture and depth maps of the original V views for encoding at appropriate quantization levels, so that at the decoder, the combined quality of decoded views (using encoded texture maps) and synthesized views (using encoded texture and depth maps of neighboring views) is maximized. We show that using the monotonicity property, complexity of our strategy can be greatly reduced. Experiments show that using our strategy, one can achieve up to 0.83dB gain in PSNR improvement over a heuristic scheme of encoding only texture maps of all V views at constant quantization levels. Further, computation can be reduced by up to 66% over a full parameter search approach.
近年来,人们提出了一种新的编码工具来编码多视点图像的纹理图和深度图,利用视点间的相关性进行深度图像渲染(DIBR)。然而,DIBR的重要相关位分配问题仍然是开放的:对于选择的视图编码和合成工具,如何在编码视图的纹理和深度图之间分配位,以便在解码器重构的一组V视图的保真度最大化,以固定的比特率预算?在本文中,我们提出了一种优化策略,选择原始V视图的纹理和深度图子集在适当的量化级别进行编码,从而在解码器上最大化解码视图(使用编码的纹理图)和合成视图(使用编码的相邻视图的纹理和深度图)的综合质量。结果表明,利用单调性可以大大降低策略的复杂度。实验表明,与仅编码所有V视图的纹理映射的启发式方案相比,使用我们的策略可以实现高达0.83dB的PSNR改进。此外,与全参数搜索方法相比,计算量最多可减少66%。
{"title":"Bit allocation and encoded view selection for optimal multiview image representation","authors":"Gene Cheung, V. Velisavljevic","doi":"10.1109/MMSP.2010.5662025","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662025","url":null,"abstract":"Novel coding tools have been proposed recently to encode texture and depth maps of multiview images, exploiting inter-view correlations, for depth-image-based rendering (DIBR). However, the important associated bit allocation problem for DIBR remains open: for chosen view coding and synthesis tools, how to allocate bits among texture and depth maps across encoded views, so that the fidelity of a set of V views reconstructed at the decoder is maximized, for a fixed bitrate budget? In this paper, we present an optimization strategy to select subset of texture and depth maps of the original V views for encoding at appropriate quantization levels, so that at the decoder, the combined quality of decoded views (using encoded texture maps) and synthesized views (using encoded texture and depth maps of neighboring views) is maximized. We show that using the monotonicity property, complexity of our strategy can be greatly reduced. Experiments show that using our strategy, one can achieve up to 0.83dB gain in PSNR improvement over a heuristic scheme of encoding only texture maps of all V views at constant quantization levels. Further, computation can be reduced by up to 66% over a full parameter search approach.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134565132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient MV prediction for zonal search in video transcoding 视频转码中区域搜索的高效MV预测
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662024
S. Marcelino, S. Faria, P. Assunção, S. Moiron, M. Ghanbari
This paper proposes a method to efficiently find motion vector predictions for zonal search motion re-estimation in fast video transcoders. The motion information extracted from the incoming video stream is processed to generate accurate motion vector predictions for transcoding with reduced complexity. Our results demonstrate that motion vector predictions computed by the proposed method outperform those generated by the highly efficient EPZS (Enhanced Predictive Zonal Search) algorithm in H.264/AVC transcoders. The computational complexity is reduced up to 59.6% at negligible cost in R-D performance. The proposed method can be useful in multimedia systems and applications using any type of transcoder, such as transrating and/or spatial resolution downsizing.
针对快速视频转码器中区域搜索运动重估计问题,提出了一种高效的运动向量预测方法。从传入视频流中提取的运动信息被处理以生成准确的运动矢量预测,从而降低了转码的复杂性。结果表明,在H.264/AVC转码器中,采用该方法计算的运动向量预测结果优于高效的EPZS (Enhanced Predictive zone Search)算法。在研发性能方面,计算复杂性降低了59.6%,成本可以忽略不计。所提出的方法可用于使用任何类型的转码器的多媒体系统和应用,例如翻译和/或空间分辨率缩小。
{"title":"Efficient MV prediction for zonal search in video transcoding","authors":"S. Marcelino, S. Faria, P. Assunção, S. Moiron, M. Ghanbari","doi":"10.1109/MMSP.2010.5662024","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662024","url":null,"abstract":"This paper proposes a method to efficiently find motion vector predictions for zonal search motion re-estimation in fast video transcoders. The motion information extracted from the incoming video stream is processed to generate accurate motion vector predictions for transcoding with reduced complexity. Our results demonstrate that motion vector predictions computed by the proposed method outperform those generated by the highly efficient EPZS (Enhanced Predictive Zonal Search) algorithm in H.264/AVC transcoders. The computational complexity is reduced up to 59.6% at negligible cost in R-D performance. The proposed method can be useful in multimedia systems and applications using any type of transcoder, such as transrating and/or spatial resolution downsizing.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131898265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generalized multiscale seam carving 广义多尺度缝刻
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662048
David D. Conger, Mrityunjay Kumar, H. Radha
With the abundance and variety of display devices, novel image resizing techniques have become more desirable. Content-aware image resizing (retargeting) techniques have been proposed that show improvement over traditional techniques such as cropping and resampling. In particular, seam carving has gained attention as an effective solution, using simple filters to detect and preserve the high-energy areas of an image. Yet, it stands to be more robust to a variety of image types. To facilitate such improvement, we recast seam carving in a more general framework and in the context of filter banks. This enables improved filter design, and leads to a multiscale model that addresses the problem of scale of image features. We have found our generalized multiscale model to improve on the existing seam carving method for a variety of images.
随着显示设备的丰富和多样化,新的图像大小调整技术已成为人们所需要的。内容感知图像调整(重定向)技术已经被提出,显示出传统技术如裁剪和重采样的改进。特别是,接缝雕刻作为一种有效的解决方案而受到关注,它使用简单的过滤器来检测和保留图像的高能区域。然而,它对各种图像类型的鲁棒性更强。为了促进这种改进,我们在更一般的框架和过滤器组的背景下重新定义了接缝雕刻。这使得改进的滤波器设计,并导致多尺度模型,解决了图像特征的规模问题。我们建立了一种广义的多尺度模型,以改进现有的各种图像的缝雕刻方法。
{"title":"Generalized multiscale seam carving","authors":"David D. Conger, Mrityunjay Kumar, H. Radha","doi":"10.1109/MMSP.2010.5662048","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662048","url":null,"abstract":"With the abundance and variety of display devices, novel image resizing techniques have become more desirable. Content-aware image resizing (retargeting) techniques have been proposed that show improvement over traditional techniques such as cropping and resampling. In particular, seam carving has gained attention as an effective solution, using simple filters to detect and preserve the high-energy areas of an image. Yet, it stands to be more robust to a variety of image types. To facilitate such improvement, we recast seam carving in a more general framework and in the context of filter banks. This enables improved filter design, and leads to a multiscale model that addresses the problem of scale of image features. We have found our generalized multiscale model to improve on the existing seam carving method for a variety of images.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Optimal mode switching for multi-hypothesis motion compensated prediction 多假设运动补偿预测的最优模式切换
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662021
Ramdas Satyan, F. Labeau, K. Rose
Transmission of compressed video over unreliable networks is vulnerable to errors and error propagation. Multi-hypothesis motion compensated prediction (MHMCP) which was originally developed to improve compression efficiency has been shown to have a good error resilience property. In this paper we improve the overall performance of MHMCP in packet loss scenarios by performing optimal mode switching within a rate distortion framework. The approach builds on the recursive optimal per-pixel estimate (ROPE), which is extended by re-deriving recursion formulas for the more complex MHMCP setting, so as to achieve an accurate estimation of the end-to-end distortion. Simulation results show significant performance gains over the standard MHMCP scheme and the importance of performing effective mode decisions. We also show results with comparison to conventional ROPE.
在不可靠的网络上传输压缩视频容易出现错误和错误传播。多假设运动补偿预测(MHMCP)最初是为了提高压缩效率而发展起来的,具有良好的抗误差性能。在本文中,我们通过在速率失真框架内执行最优模式切换来提高丢包场景下MHMCP的整体性能。该方法建立在递归最优逐像素估计(ROPE)的基础上,通过对更复杂的MHMCP设置重新推导递归公式进行扩展,从而实现对端到端失真的准确估计。仿真结果表明,与标准MHMCP方案相比,该方案的性能有了显著提高,同时也表明了进行有效模式决策的重要性。我们还展示了与传统ROPE的比较结果。
{"title":"Optimal mode switching for multi-hypothesis motion compensated prediction","authors":"Ramdas Satyan, F. Labeau, K. Rose","doi":"10.1109/MMSP.2010.5662021","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662021","url":null,"abstract":"Transmission of compressed video over unreliable networks is vulnerable to errors and error propagation. Multi-hypothesis motion compensated prediction (MHMCP) which was originally developed to improve compression efficiency has been shown to have a good error resilience property. In this paper we improve the overall performance of MHMCP in packet loss scenarios by performing optimal mode switching within a rate distortion framework. The approach builds on the recursive optimal per-pixel estimate (ROPE), which is extended by re-deriving recursion formulas for the more complex MHMCP setting, so as to achieve an accurate estimation of the end-to-end distortion. Simulation results show significant performance gains over the standard MHMCP scheme and the importance of performing effective mode decisions. We also show results with comparison to conventional ROPE.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised detection of multimodal clusters in edited recordings 编辑录音中多模态簇的无监督检测
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662015
Alfred Dielmann
Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters, avoiding to make assumptions on the recording content, such as the presence of specific participant voices or faces. Sequences of audio and shot clusters are automatically identified using unsupervised audio diarization and shot segmentation techniques. Audio-Visual clusters are then formed by ranking the co-occurrences between these two segmentations and selecting those which significantly go beyond chance. Numerical experiments performed on a collection of 70 political debates, comprising more than 43 hours of live edited recordings, showed that automatically extracted AudioVisual clusters well match the ground-truth annotation, achieving high purity performances.
编辑过的录像,如谈话节目和情景喜剧,通常包括视听组:密切相关的声音和视觉内容的频繁重复。例如,在一场政治辩论中,每当一个给定的参与者占据谈话空间时,她/他的声音往往与显示她/他肖像的镜头同时出现。与以往的视听聚类工作不同,本文提出了一种无监督的方法来检测视听聚类,避免了对录音内容进行假设,例如特定参与者的声音或面孔的存在。使用无监督音频化和镜头分割技术自动识别音频和镜头簇序列。然后通过对这两个部分之间的共同出现进行排序并选择那些明显超出偶然的部分来形成视听集群。在70场政治辩论中进行的数值实验,包括超过43小时的现场编辑录音,表明自动提取的视听集群与基础事实注释很好地匹配,实现了高纯度的性能。
{"title":"Unsupervised detection of multimodal clusters in edited recordings","authors":"Alfred Dielmann","doi":"10.1109/MMSP.2010.5662015","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662015","url":null,"abstract":"Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters, avoiding to make assumptions on the recording content, such as the presence of specific participant voices or faces. Sequences of audio and shot clusters are automatically identified using unsupervised audio diarization and shot segmentation techniques. Audio-Visual clusters are then formed by ranking the co-occurrences between these two segmentations and selecting those which significantly go beyond chance. Numerical experiments performed on a collection of 70 political debates, comprising more than 43 hours of live edited recordings, showed that automatically extracted AudioVisual clusters well match the ground-truth annotation, achieving high purity performances.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116248218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2010 IEEE International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1