首页 > 最新文献

2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文 中文
Bilateral depth-discontinuity filter for novel view synthesis 用于新型视图合成的双侧深度不连续滤波器
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662009
Ismaël Daribo, H. Saito
In this paper, a new filtering technique addresses the disocclusions problem issued from the depth image based rendering (DIBR) technique within 3DTV framework. An inherent problem with DIBR is to fill in the newly exposed areas (holes) caused by the image warping process. In opposition with multiview video (MVV) systems, such as free viewpoint television (FTV), where multiple reference views are used for recovering the disocclusions, we consider in this paper a 3DTV system based on a video-plus-depth sequence which provides only one reference view of the scene. To overcome this issue, disocclusion removal can be achieved by pre-processing the depth video and/or post-processing the warped image through hole-filling techniques. Specifically, we propose in this paper a pre-processing of the depth video based on a bilateral filtering according to the strength of the depth discontinuity. Experimental results are shown to illustrate the efficiency of the proposed method compared to the traditional methods.
本文提出了一种新的滤波技术,解决了3DTV框架下基于深度图像渲染(DIBR)技术产生的光闭塞问题。DIBR的一个固有问题是填充由于图像扭曲过程而产生的新暴露区域(孔)。与多视角视频(MVV)系统(如自由视点电视(FTV))使用多个参考视图来恢复脱节相反,我们在本文中考虑了基于视频加深度序列的3DTV系统,该系统仅提供场景的一个参考视图。为了克服这个问题,可以通过预处理深度视频和/或通过填充孔技术对扭曲图像进行后处理来实现去咬合。具体来说,本文提出了一种基于深度不连续强度的双边滤波的深度视频预处理方法。实验结果表明,与传统方法相比,该方法是有效的。
{"title":"Bilateral depth-discontinuity filter for novel view synthesis","authors":"Ismaël Daribo, H. Saito","doi":"10.1109/MMSP.2010.5662009","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662009","url":null,"abstract":"In this paper, a new filtering technique addresses the disocclusions problem issued from the depth image based rendering (DIBR) technique within 3DTV framework. An inherent problem with DIBR is to fill in the newly exposed areas (holes) caused by the image warping process. In opposition with multiview video (MVV) systems, such as free viewpoint television (FTV), where multiple reference views are used for recovering the disocclusions, we consider in this paper a 3DTV system based on a video-plus-depth sequence which provides only one reference view of the scene. To overcome this issue, disocclusion removal can be achieved by pre-processing the depth video and/or post-processing the warped image through hole-filling techniques. Specifically, we propose in this paper a pre-processing of the depth video based on a bilateral filtering according to the strength of the depth discontinuity. Experimental results are shown to illustrate the efficiency of the proposed method compared to the traditional methods.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127209054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Rate-distortion optimized low-delay 3D video communications 率失真优化低延迟3D视频通信
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661998
E. Masala
This paper focuses on the rate-distortion optimization of low-delay 3D video communications based on the latest H.264/MVC video coding standard. The first part of the work proposes a new low-complexity model for distortion estimation suitable for low-delay stereoscopic video communication scenarios such as 3D videoconferencing. The distortion introduced by the loss of a given frame is investigated and a model is designed in order to accurately estimate the impact that the loss of each frame would have on future frames. The model is then employed in a rate-distortion optimized framework for video communications over a generic QoS-enabled network. Simulations results show consistent performance gains, up to 1.7 dB PSNR, with respect to a traditional a priori technique based on frame dependency information only. Moreover, the performance is shown to be consistently close to the one of the prescient technique that has perfect knowledge of the distortion characteristics of future frames.
本文主要研究基于最新H.264/MVC视频编码标准的低延迟3D视频通信的率失真优化问题。第一部分提出了一种新的低复杂度的失真估计模型,适用于低延迟的立体视频通信场景,如3D视频会议。研究了给定帧的丢失所带来的失真,并设计了一个模型,以便准确地估计每一帧的丢失对未来帧的影响。然后将该模型应用于一个速率失真优化框架中,用于在通用qos支持的网络上进行视频通信。仿真结果表明,与仅基于帧依赖信息的传统先验技术相比,该方法的性能提高了1.7 dB PSNR。此外,性能被证明是一致的接近一个具有完美的知识未来帧的失真特性的先见之明的技术。
{"title":"Rate-distortion optimized low-delay 3D video communications","authors":"E. Masala","doi":"10.1109/MMSP.2010.5661998","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661998","url":null,"abstract":"This paper focuses on the rate-distortion optimization of low-delay 3D video communications based on the latest H.264/MVC video coding standard. The first part of the work proposes a new low-complexity model for distortion estimation suitable for low-delay stereoscopic video communication scenarios such as 3D videoconferencing. The distortion introduced by the loss of a given frame is investigated and a model is designed in order to accurately estimate the impact that the loss of each frame would have on future frames. The model is then employed in a rate-distortion optimized framework for video communications over a generic QoS-enabled network. Simulations results show consistent performance gains, up to 1.7 dB PSNR, with respect to a traditional a priori technique based on frame dependency information only. Moreover, the performance is shown to be consistently close to the one of the prescient technique that has perfect knowledge of the distortion characteristics of future frames.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116827026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Challenging the security of Content-Based Image Retrieval systems 挑战基于内容的图像检索系统的安全性
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661993
Thanh-Toan Do, Ewa Kijak, T. Furon, L. Amsaleg
Content-Based Image Retrieval (CBIR) has been recently used as a filtering mechanism against the piracy of multimedia contents. Many publications in the last few years have proposed very robust schemes where pirated contents are detected despite severe modifications. As none of these systems have addressed the piracy problem from a security perspective, it is time to check whether they are secure: Can pirates mount violent attacks against CBIR systems by carefully studying the technology they use? This paper is an initial analysis of the security flaws of the typical technology blocks used in state-of-the-art CBIR systems. It is so far too early to draw any definitive conclusion about their inherent security, but it motivates and encourages further studies on this topic.
基于内容的图像检索(CBIR)最近被用作一种防止多媒体内容盗版的过滤机制。在过去几年中,许多出版物都提出了非常强大的方案,尽管经过了严格的修改,仍能检测到盗版内容。由于这些系统都没有从安全角度解决盗版问题,现在是时候检查它们是否安全了:海盗是否可以通过仔细研究他们使用的技术来对CBIR系统发动暴力攻击?本文对最先进的CBIR系统中使用的典型技术块的安全缺陷进行了初步分析。现在就它们的内在安全性得出任何明确的结论还为时过早,但它激励并鼓励对这一主题进行进一步的研究。
{"title":"Challenging the security of Content-Based Image Retrieval systems","authors":"Thanh-Toan Do, Ewa Kijak, T. Furon, L. Amsaleg","doi":"10.1109/MMSP.2010.5661993","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661993","url":null,"abstract":"Content-Based Image Retrieval (CBIR) has been recently used as a filtering mechanism against the piracy of multimedia contents. Many publications in the last few years have proposed very robust schemes where pirated contents are detected despite severe modifications. As none of these systems have addressed the piracy problem from a security perspective, it is time to check whether they are secure: Can pirates mount violent attacks against CBIR systems by carefully studying the technology they use? This paper is an initial analysis of the security flaws of the typical technology blocks used in state-of-the-art CBIR systems. It is so far too early to draw any definitive conclusion about their inherent security, but it motivates and encourages further studies on this topic.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Robust background subtraction method based on 3D model projections with likelihood 基于似然的三维模型投影鲁棒背景相减方法
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662014
Hiroshi Sankoh, A. Ishikawa, S. Naito, S. Sakazawa
We propose a robust background subtraction method for multi-view images, which is essential for realizing free viewpoint video where an accurate 3D model is required. Most of the conventional methods determine background using only visual information from a single camera image, and the precise silhouette cannot be obtained. Our method employs an approach of integrating multi-view images taken by multiple cameras, in which the background region is determined using a 3D model generated by multi-view images. We apply the likelihood of background to each pixel of camera images, and derive an integrated likelihood for each voxel in a 3D model. Then, the background region is determined based on the minimization of energy functions of the voxel likelihood. Furthermore, the proposed method also applies a robust refining process, where a foreground region obtained by a projection of a 3D model is improved according to geometric information as well as visual information. A 3D model is finally reconstructed using the improved foreground silhouettes. Experimental results show the effectiveness of the proposed method compared with conventional works.
我们提出了一种鲁棒的多视点图像背景减法,这对于实现需要精确3D模型的自由视点视频至关重要。传统的方法大多只利用单个相机图像的视觉信息来确定背景,无法获得精确的轮廓。我们的方法采用了一种整合多摄像机拍摄的多视图图像的方法,其中背景区域是使用多视图图像生成的3D模型来确定的。我们将背景似然应用于相机图像的每个像素,并推导出3D模型中每个体素的综合似然。然后,根据体素似然的能量函数最小化来确定背景区域。此外,该方法还采用了鲁棒的细化过程,根据几何信息和视觉信息对三维模型投影得到的前景区域进行改进。最后利用改进的前景轮廓重建三维模型。实验结果表明,与传统方法相比,该方法是有效的。
{"title":"Robust background subtraction method based on 3D model projections with likelihood","authors":"Hiroshi Sankoh, A. Ishikawa, S. Naito, S. Sakazawa","doi":"10.1109/MMSP.2010.5662014","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662014","url":null,"abstract":"We propose a robust background subtraction method for multi-view images, which is essential for realizing free viewpoint video where an accurate 3D model is required. Most of the conventional methods determine background using only visual information from a single camera image, and the precise silhouette cannot be obtained. Our method employs an approach of integrating multi-view images taken by multiple cameras, in which the background region is determined using a 3D model generated by multi-view images. We apply the likelihood of background to each pixel of camera images, and derive an integrated likelihood for each voxel in a 3D model. Then, the background region is determined based on the minimization of energy functions of the voxel likelihood. Furthermore, the proposed method also applies a robust refining process, where a foreground region obtained by a projection of a 3D model is improved according to geometric information as well as visual information. A 3D model is finally reconstructed using the improved foreground silhouettes. Experimental results show the effectiveness of the proposed method compared with conventional works.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fast environment extraction for lighting and occlusion of virtual objects in real scenes 真实场景中虚拟物体光照和遮挡的快速环境提取
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662007
François Fouquet, Jean-Philippe Farrugia, Brice Michoud, S. Brandel
Augmented reality aims to insert virtual objects in real scenes. In order to obtain a coherent and realistic integration, these objects have to be relighted according to their positions and real light conditions. They also have to deal with occlusion by nearest parts of the real scene. To achieve this, we have to extract photometry and geometry from the real scene. In this paper, we adapt high dynamic range reconstruction and depth estimation methods to deal with real-time constraint and consumer devices. We present their limitations along with significant parameters influencing computing time and image quality. We tune these parameters to accelerate computation and evaluate their impact on the resulting quality. To fit with the augmented reality context, we propose a real-time extraction of these information from video streams, in a single pass.
增强现实旨在将虚拟物体插入真实场景中。为了获得连贯和真实的整合,这些物体必须根据它们的位置和真实的光照条件进行重新照明。他们还必须处理真实场景中最近部分的遮挡。为了实现这一点,我们必须从真实场景中提取光度和几何形状。在本文中,我们采用高动态范围重构和深度估计方法来处理实时约束和消费设备。我们指出了它们的局限性以及影响计算时间和图像质量的重要参数。我们调整这些参数以加速计算并评估它们对结果质量的影响。为了适应增强现实环境,我们提出了一种从视频流中实时提取这些信息的方法。
{"title":"Fast environment extraction for lighting and occlusion of virtual objects in real scenes","authors":"François Fouquet, Jean-Philippe Farrugia, Brice Michoud, S. Brandel","doi":"10.1109/MMSP.2010.5662007","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662007","url":null,"abstract":"Augmented reality aims to insert virtual objects in real scenes. In order to obtain a coherent and realistic integration, these objects have to be relighted according to their positions and real light conditions. They also have to deal with occlusion by nearest parts of the real scene. To achieve this, we have to extract photometry and geometry from the real scene. In this paper, we adapt high dynamic range reconstruction and depth estimation methods to deal with real-time constraint and consumer devices. We present their limitations along with significant parameters influencing computing time and image quality. We tune these parameters to accelerate computation and evaluate their impact on the resulting quality. To fit with the augmented reality context, we propose a real-time extraction of these information from video streams, in a single pass.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130525906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Gaussian mixture vector quantization-based video summarization using independent component analysis 基于独立分量分析的高斯混合矢量量化视频摘要
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662062
Junfeng Jiang, Xiao-Ping Zhang
In this paper, we propose a new Gaussian mixture vector quantization (GMVQ)-based method to summarize the video content. In particular, in order to explore the semantic characteristics of video data, we present a new feature extraction method using independent component analysis (ICA) and color histogram difference to build a compact 3D feature space first. A new GMVQ method is then developed to find the optimized quantization codebook. The optimal codebook size is determined by Bayes information criterion (BIC). The video frames that are the nearest-neighbours to the quanta in the GMVQ quantization codebook are sampled to summarize the video content. A kD-tree-based nearest-neighbour search strategy is employed to accelerate the search procedure. Experimental results show that our method is computationally efficient and practically effective to build a content-based video summarization system.
本文提出了一种新的基于高斯混合矢量量化(GMVQ)的视频内容汇总方法。特别是,为了挖掘视频数据的语义特征,我们提出了一种新的特征提取方法,首先利用独立分量分析(ICA)和颜色直方图差异构建紧凑的三维特征空间。然后提出了一种新的GMVQ方法来寻找优化的量化码本。根据贝叶斯信息准则(BIC)确定最优码本大小。对GMVQ量化码本中距离量子最近的视频帧进行采样以总结视频内容。采用基于kd树的最近邻搜索策略加快了搜索速度。实验结果表明,该方法对构建基于内容的视频摘要系统具有较好的计算效率和实用性。
{"title":"Gaussian mixture vector quantization-based video summarization using independent component analysis","authors":"Junfeng Jiang, Xiao-Ping Zhang","doi":"10.1109/MMSP.2010.5662062","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662062","url":null,"abstract":"In this paper, we propose a new Gaussian mixture vector quantization (GMVQ)-based method to summarize the video content. In particular, in order to explore the semantic characteristics of video data, we present a new feature extraction method using independent component analysis (ICA) and color histogram difference to build a compact 3D feature space first. A new GMVQ method is then developed to find the optimized quantization codebook. The optimal codebook size is determined by Bayes information criterion (BIC). The video frames that are the nearest-neighbours to the quanta in the GMVQ quantization codebook are sampled to summarize the video content. A kD-tree-based nearest-neighbour search strategy is employed to accelerate the search procedure. Experimental results show that our method is computationally efficient and practically effective to build a content-based video summarization system.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123900958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A resilient and low-delay P2P streaming system based on network coding with random multicast trees 基于随机组播树网络编码的弹性低延迟P2P流系统
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662054
Marco Toldo, E. Magli
Network coding is known to provide increased throughput and reduced delay for communications over networks. In this paper we propose a peer-to-peer video streaming system that exploits network coding in order to achieve low start-up delay, high streaming rate, and high resiliency to peers' dynamics. In particular, we introduce the concept of random multicast trees as overlay topology. This topology offers all benefits of tree-based overlays, notably a short start-up delay, but is much more efficient at distributing data and recovering from ungraceful peers departures. We develop a push-based streaming system that leverages network coding to efficiently distribute the information in the overlay without using buffer maps. We show performance results of the proposed system and compare it with an optimized pull systems based on Coolstreaming, showing significant improvement.
众所周知,网络编码可以为网络上的通信提供更高的吞吐量和更少的延迟。在本文中,我们提出了一种利用网络编码的点对点视频流系统,以实现低启动延迟、高流速率和对对等体动态的高弹性。特别地,我们引入了随机组播树作为覆盖拓扑的概念。这种拓扑结构提供了基于树的覆盖的所有优点,特别是较短的启动延迟,但在分发数据和从不体面的对等点离开中恢复方面效率更高。我们开发了一个基于推送的流系统,该系统利用网络编码在覆盖层中有效地分发信息,而不使用缓冲映射。我们展示了该系统的性能结果,并将其与基于Coolstreaming的优化拉系统进行了比较,显示出显着的改进。
{"title":"A resilient and low-delay P2P streaming system based on network coding with random multicast trees","authors":"Marco Toldo, E. Magli","doi":"10.1109/MMSP.2010.5662054","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662054","url":null,"abstract":"Network coding is known to provide increased throughput and reduced delay for communications over networks. In this paper we propose a peer-to-peer video streaming system that exploits network coding in order to achieve low start-up delay, high streaming rate, and high resiliency to peers' dynamics. In particular, we introduce the concept of random multicast trees as overlay topology. This topology offers all benefits of tree-based overlays, notably a short start-up delay, but is much more efficient at distributing data and recovering from ungraceful peers departures. We develop a push-based streaming system that leverages network coding to efficiently distribute the information in the overlay without using buffer maps. We show performance results of the proposed system and compare it with an optimized pull systems based on Coolstreaming, showing significant improvement.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123914599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Time-space acoustical feature for fast video copy detection 用于快速视频复制检测的时空声学特征
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662070
Y. Itoh, Masahiro Erokuumae, K. Kojima, M. Ishigame, Kazuyo Tanaka
We propose a new time-space acoustical feature for fast video copy detection to search a video segment for a number of video streams to find illegal video copies on Internet video site and so on. We extract a small number of feature vectors from acoustically peculiar points that express the point of local maximum/minimum in the time sequence of acoustical power envelopes in video data. The relative values of the feature points are extracted, so called time-space acoustical feature, because the volume in the video stream differs in different recording environments. The features can be obtained quickly compared with representative features such as MFCC, and they require a short processing time for matching because the number and the dimension of each feature vector are both small. The accuracy and the computation time of the proposed method is evaluated using recorded TV movie programs for input data, and a 30 sec. −3 min. segment in DVD for reference data, assuming a copyright holder of a movie searches the illegal copies for video streams. We could confirm that the proposed method completed all processes within the computation time of the former feature extraction with 93.2% of F-measure in 3 minutes video segment detection.
我们提出了一种新的时空声学特征用于快速视频复制检测,以搜索视频片段中的多个视频流,查找网络视频站点上的非法视频副本等。我们从声学特征点中提取少量特征向量,这些特征向量表示视频数据中声功率包络时间序列中的局部最大值/最小值点。由于在不同的录制环境下,视频流中的音量是不同的,因此提取特征点的相对值,称为时空声学特征。与MFCC等代表性特征相比,可以快速获得特征,并且由于每个特征向量的数量和维数都很小,匹配处理时间短。假设电影的版权所有者在非法拷贝中搜索视频流,使用录制的电视电影节目作为输入数据,并使用DVD中的30秒- 3分钟片段作为参考数据,评估所提出方法的准确性和计算时间。我们可以证实,该方法在前一种特征提取的计算时间内完成了所有过程,在3分钟视频片段检测中F-measure率为93.2%。
{"title":"Time-space acoustical feature for fast video copy detection","authors":"Y. Itoh, Masahiro Erokuumae, K. Kojima, M. Ishigame, Kazuyo Tanaka","doi":"10.1109/MMSP.2010.5662070","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662070","url":null,"abstract":"We propose a new time-space acoustical feature for fast video copy detection to search a video segment for a number of video streams to find illegal video copies on Internet video site and so on. We extract a small number of feature vectors from acoustically peculiar points that express the point of local maximum/minimum in the time sequence of acoustical power envelopes in video data. The relative values of the feature points are extracted, so called time-space acoustical feature, because the volume in the video stream differs in different recording environments. The features can be obtained quickly compared with representative features such as MFCC, and they require a short processing time for matching because the number and the dimension of each feature vector are both small. The accuracy and the computation time of the proposed method is evaluated using recorded TV movie programs for input data, and a 30 sec. −3 min. segment in DVD for reference data, assuming a copyright holder of a movie searches the illegal copies for video streams. We could confirm that the proposed method completed all processes within the computation time of the former feature extraction with 93.2% of F-measure in 3 minutes video segment detection.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114642681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Measuring errors for massive triangle meshes 大规模三角形网格的测量误差
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662050
Anis Meftah, Arnaud Roquel, F. Payan, M. Antonini
Our proposal is a method for computing the distance between two surfaces modeled by massive triangle meshes which can not be both loaded entirely in memory. The method consists in loading at each step a small part of the two meshes and computing the symmetrical distance for these areas. These areas are chosen in such a way as the orthogonal projection, used to compute this distance, have to be in it. For this, one of the two meshes is simplified and then a correspondence between the simplified mesh and the triangles of the input meshes is done. The experiments show that the proposed method is very efficient in terms of memory cost, while producing results comparable to the existent tools for the small and medium size meshes. Moreover, the proposed method enables us to compute the distance for massive meshes.
我们提出了一种计算两个表面之间的距离的方法,这些表面是由大量三角形网格建模的,它们不能全部加载到内存中。该方法包括在每一步加载两个网格的一小部分,并计算这些区域的对称距离。这些区域的选择方法是用正交投影,用来计算这个距离,必须在其中。为此,对两个网格中的一个进行简化,然后在简化网格和输入网格的三角形之间进行对应。实验表明,该方法在存储成本方面是非常有效的,并且在中小型网格上产生的结果与现有工具相当。此外,所提出的方法使我们能够计算大量网格的距离。
{"title":"Measuring errors for massive triangle meshes","authors":"Anis Meftah, Arnaud Roquel, F. Payan, M. Antonini","doi":"10.1109/MMSP.2010.5662050","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662050","url":null,"abstract":"Our proposal is a method for computing the distance between two surfaces modeled by massive triangle meshes which can not be both loaded entirely in memory. The method consists in loading at each step a small part of the two meshes and computing the symmetrical distance for these areas. These areas are chosen in such a way as the orthogonal projection, used to compute this distance, have to be in it. For this, one of the two meshes is simplified and then a correspondence between the simplified mesh and the triangles of the input meshes is done. The experiments show that the proposed method is very efficient in terms of memory cost, while producing results comparable to the existent tools for the small and medium size meshes. Moreover, the proposed method enables us to compute the distance for massive meshes.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127740408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Video super-resolution for dual-mode digital cameras via scene-matched learning 通过场景匹配学习实现双模数码相机的视频超分辨率
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662061
Guangtao Zhai, Xiaolin Wu
Many consumer digital cameras support dual shooting mode of both low-resolution (LR) video and high-resolution (HR) image. By periodically switching between the video and image modes, this type of cameras make it possible to super-resolve the LR video with the assistance of neighboring HR still images. We propose a model-based video super-resolution (VSR) technique for the above dual-mode cameras. A HR video frame is modeled as a 2D piecewise autoregressive (PAR) process. The PAR model parameters are learnt from the HR still images inserted between LR video frames. By registering the LR video frames and the HR still images, we base the learning on sample statistics that matches the scene to be constructed. The resulting PAR model is more accurate and robust than if the model parameters are estimated from the LR video frames without referring to the HR images or from a training set. Aided by the powerful scene-matched model the LR video frame is upsampled to the resolution of the HR image via adaptive interpolation. As such, the proposed VSR technique does not require explicit motion estimation of subpixel precision nor the solution of a large-scale inverse problem. The new VSR technique is competitive in visual quality against existing techniques with a fraction of the computational cost.
许多消费类数码相机都支持低分辨率(LR)视频和高分辨率(HR)图像的双重拍摄模式。通过定期在视频和图像模式之间切换,这种类型的相机可以在邻近的HR静止图像的帮助下实现LR视频的超分辨率。我们提出了一种基于模型的视频超分辨率(VSR)技术。将HR视频帧建模为二维分段自回归(PAR)过程。PAR模型参数从插入在LR视频帧之间的HR静止图像中学习。通过注册LR视频帧和HR静态图像,我们基于与要构建的场景相匹配的样本统计进行学习。所得到的PAR模型比从LR视频帧中估计模型参数而不参考HR图像或从训练集估计模型参数更准确和鲁棒。借助强大的场景匹配模型,通过自适应插值将LR视频帧上采样到HR图像的分辨率。因此,所提出的VSR技术不需要明确的亚像素精度的运动估计,也不需要解决大规模的反问题。新的VSR技术在视觉质量上与现有技术相比具有竞争力,而计算成本只是现有技术的一小部分。
{"title":"Video super-resolution for dual-mode digital cameras via scene-matched learning","authors":"Guangtao Zhai, Xiaolin Wu","doi":"10.1109/MMSP.2010.5662061","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662061","url":null,"abstract":"Many consumer digital cameras support dual shooting mode of both low-resolution (LR) video and high-resolution (HR) image. By periodically switching between the video and image modes, this type of cameras make it possible to super-resolve the LR video with the assistance of neighboring HR still images. We propose a model-based video super-resolution (VSR) technique for the above dual-mode cameras. A HR video frame is modeled as a 2D piecewise autoregressive (PAR) process. The PAR model parameters are learnt from the HR still images inserted between LR video frames. By registering the LR video frames and the HR still images, we base the learning on sample statistics that matches the scene to be constructed. The resulting PAR model is more accurate and robust than if the model parameters are estimated from the LR video frames without referring to the HR images or from a training set. Aided by the powerful scene-matched model the LR video frame is upsampled to the resolution of the HR image via adaptive interpolation. As such, the proposed VSR technique does not require explicit motion estimation of subpixel precision nor the solution of a large-scale inverse problem. The new VSR technique is competitive in visual quality against existing techniques with a fraction of the computational cost.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2010 IEEE International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1