首页 > 最新文献

2011 IEEE 13th International Workshop on Multimedia Signal Processing最新文献

英文 中文
Learning-based video super-resolution reconstruction using particle swarm optimization 基于粒子群优化的基于学习的视频超分辨率重建
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093780
Hsuan-Ying Chen, Jin-Jang Leou
In this study, a learning-based video super-resolution (SR) reconstruction approach using particle swarm optimization (PSO) is proposed. First, a 5×5×5 motion-compensated volume containing five 5×5 motion-compensated patches is extracted and the orientation of the volume is determined for each pixel in the “central” reference low-resolution (LR) video frame. Then, the pixel values of the “central” reference high-resolution (HR) video frame are reconstructed by using the corresponding SR reconstruction filtering masks, based on the orientation of the volume and the coordinates of the pixels to be reconstructed. To simplify the PSO learning processes for determining the weights in SR reconstruction filtering masks, simple mask flippings are employed. Based on the experimental results obtained in this study, the SR reconstruction results of the proposed approach are better than those of three comparison approaches, whereas the computational complexity of the proposed approach is higher than those of two “simple” comparison approaches, NN and Bicubic, and lower than that of the recent comparison approach, modified NLM.
本文提出了一种基于粒子群优化(PSO)的学习视频超分辨率重建方法。首先,提取包含五个5×5运动补偿补丁的5×5×5运动补偿体积,并确定“中心”参考低分辨率(LR)视频帧中的每个像素的体积方向。然后,根据体的方向和待重构像素的坐标,使用相应的SR重构滤波掩模重构“中心”参考高分辨率(HR)视频帧的像素值。为了简化PSO学习过程来确定SR重构滤波掩模中的权重,采用了简单的掩模翻转。根据本研究的实验结果,本文方法的SR重建结果优于三种比较方法,而计算复杂度高于两种“简单”的比较方法NN和Bicubic,低于最近的比较方法修正NLM。
{"title":"Learning-based video super-resolution reconstruction using particle swarm optimization","authors":"Hsuan-Ying Chen, Jin-Jang Leou","doi":"10.1109/MMSP.2011.6093780","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093780","url":null,"abstract":"In this study, a learning-based video super-resolution (SR) reconstruction approach using particle swarm optimization (PSO) is proposed. First, a 5×5×5 motion-compensated volume containing five 5×5 motion-compensated patches is extracted and the orientation of the volume is determined for each pixel in the “central” reference low-resolution (LR) video frame. Then, the pixel values of the “central” reference high-resolution (HR) video frame are reconstructed by using the corresponding SR reconstruction filtering masks, based on the orientation of the volume and the coordinates of the pixels to be reconstructed. To simplify the PSO learning processes for determining the weights in SR reconstruction filtering masks, simple mask flippings are employed. Based on the experimental results obtained in this study, the SR reconstruction results of the proposed approach are better than those of three comparison approaches, whereas the computational complexity of the proposed approach is higher than those of two “simple” comparison approaches, NN and Bicubic, and lower than that of the recent comparison approach, modified NLM.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image super-resolution via feature-based affine transform 基于特征仿射变换的图像超分辨率
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093845
Chih-Chung Hsu, Chia-Wen Lin
State-of-the-art image super-resolution methods usually rely on search in a comprehensive dataset for appropriate high-resolution patch candidates to achieve good visual quality of reconstructed image. Exploiting different scales and orientations in images can effectively enrich a dataset. A large dataset, however, usually leads to high computational complexity and memory requirement, which makes the implementation impractical. This paper proposes a universal framework for enriching the dataset for search-based super-resolution schemes with reasonable computation and memory cost. Toward this end, the proposed method first extracts important features with multiple scales and orientations of patches based on the SIFT (Scale-invariant feature transform) descriptors and then use the extracted features to search in the dataset for the best-match HR patch(es). Once the matched features of patches are found, the found HR patch will be aligned with LR patch using homography estimation. Experimental results demonstrate that the proposed method achieves significant subjective and objective improvement when integrated with several state-of-the-art image super-resolution methods without significantly increasing the cost.
目前的图像超分辨率方法通常依赖于在全面的数据集中搜索合适的高分辨率候选补丁,以获得良好的重建图像视觉质量。利用图像中不同的尺度和方向可以有效地丰富数据集。然而,大型数据集通常会导致高计算复杂度和内存需求,这使得实现不切实际。本文提出了一种具有合理计算和存储成本的通用框架,用于丰富基于搜索的超分辨率方案的数据集。为此,该方法首先基于SIFT (Scale-invariant feature transform,尺度不变特征变换)描述符提取具有多个尺度和方向的重要特征,然后利用提取的特征在数据集中搜索最匹配的HR补丁。一旦找到匹配的斑块特征,利用单应性估计将找到的HR斑块与LR斑块对齐。实验结果表明,该方法与几种最先进的图像超分辨率方法相结合,在不显著增加成本的情况下,主客观分辨率均有显著提高。
{"title":"Image super-resolution via feature-based affine transform","authors":"Chih-Chung Hsu, Chia-Wen Lin","doi":"10.1109/MMSP.2011.6093845","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093845","url":null,"abstract":"State-of-the-art image super-resolution methods usually rely on search in a comprehensive dataset for appropriate high-resolution patch candidates to achieve good visual quality of reconstructed image. Exploiting different scales and orientations in images can effectively enrich a dataset. A large dataset, however, usually leads to high computational complexity and memory requirement, which makes the implementation impractical. This paper proposes a universal framework for enriching the dataset for search-based super-resolution schemes with reasonable computation and memory cost. Toward this end, the proposed method first extracts important features with multiple scales and orientations of patches based on the SIFT (Scale-invariant feature transform) descriptors and then use the extracted features to search in the dataset for the best-match HR patch(es). Once the matched features of patches are found, the found HR patch will be aligned with LR patch using homography estimation. Experimental results demonstrate that the proposed method achieves significant subjective and objective improvement when integrated with several state-of-the-art image super-resolution methods without significantly increasing the cost.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114568359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Tap-to-search: Interactive and contextual visual search on mobile devices 点击搜索:移动设备上的交互式和上下文可视化搜索
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093802
Ning Zhang, Tao Mei, Xiansheng Hua, L. Guan, Shipeng Li
Mobile visual search has been an emerging topic for both researching and industrial communities. Among various methods, visual search has its merit in providing an alternative solution, where text and voice searches are not applicable. This paper proposes an interactive “tap-to-search” approach utilizing both individual's intention in selecting interested regions via “tap” actions on the mobile touch screen, as well as a visual recognition by search mechanism in a large-scale image database. Automatic image segmentation technique is applied in order to provide region candidates. Visual vocabulary tree based search is adopted by incorporating rich contextual information which are collected from mobile sensors. The proposed approach has been conducted on an image dataset with the scale of two million. We demonstrated that using GPS contextual information, such an approach can further achieve satisfactory results with the standard information retrieval evaluation.
移动视觉搜索已经成为研究和工业界的一个新兴话题。在各种方法中,视觉搜索的优点在于提供了文本和语音搜索不适用的替代解决方案。本文提出了一种交互式的“点击-搜索”方法,既利用了个人在移动触摸屏上通过“点击”动作选择感兴趣区域的意图,又利用了大规模图像数据库中的搜索机制进行视觉识别。采用自动图像分割技术提供候选区域。结合从移动传感器采集的丰富的上下文信息,采用基于视觉词汇树的搜索方法。该方法已在200万规模的图像数据集上进行了实验。结果表明,利用GPS上下文信息,该方法可以在标准信息检索评价下取得满意的结果。
{"title":"Tap-to-search: Interactive and contextual visual search on mobile devices","authors":"Ning Zhang, Tao Mei, Xiansheng Hua, L. Guan, Shipeng Li","doi":"10.1109/MMSP.2011.6093802","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093802","url":null,"abstract":"Mobile visual search has been an emerging topic for both researching and industrial communities. Among various methods, visual search has its merit in providing an alternative solution, where text and voice searches are not applicable. This paper proposes an interactive “tap-to-search” approach utilizing both individual's intention in selecting interested regions via “tap” actions on the mobile touch screen, as well as a visual recognition by search mechanism in a large-scale image database. Automatic image segmentation technique is applied in order to provide region candidates. Visual vocabulary tree based search is adopted by incorporating rich contextual information which are collected from mobile sensors. The proposed approach has been conducted on an image dataset with the scale of two million. We demonstrated that using GPS contextual information, such an approach can further achieve satisfactory results with the standard information retrieval evaluation.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126911067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Smoke detection in videos using Non-Redundant Local Binary Pattern-based features 基于非冗余局部二值模式特征的视频烟雾检测
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093844
Hongda Tian, W. Li, P. Ogunbona, D. Nguyen, Ce Zhan
This paper presents a novel and low complexity method for real-time video-based smoke detection. As a local texture operator, Non-Redundant Local Binary Pattern (NRLBP) is more discriminative and robust to illumination changes in comparison with original Local Binary Pattern (LBP), thus is employed to encode the appearance information of smoke. Non-Redundant Local Motion Binary Pattern (NRLMBP), which is computed on the difference image of consecutive frames, is introduced to capture the motion information of smoke. Experimental results show that NRLBP outperforms the original LBP in the smoke detection task. Furthermore, the combination of NRLBP and NRLMBP, which can be considered as a spatial-temporal descriptor of smoke, can lead to remarkable improvement on detection performance.
本文提出了一种新颖、低复杂度的实时视频烟雾检测方法。非冗余局部二值模式(non - redundancy local Binary Pattern, NRLBP)作为一种局部纹理算子,与原始局部二值模式(local Binary Pattern, LBP)相比,对光照变化具有更强的识别力和鲁棒性,因此被用于对烟雾的外观信息进行编码。引入基于连续帧差分图像计算的非冗余局部运动二值模式(NRLMBP)来捕获烟雾的运动信息。实验结果表明,NRLBP算法在烟雾检测任务中的性能优于原始LBP算法。此外,NRLBP和NRLMBP的结合可以作为烟雾的时空描述符,可以显著提高检测性能。
{"title":"Smoke detection in videos using Non-Redundant Local Binary Pattern-based features","authors":"Hongda Tian, W. Li, P. Ogunbona, D. Nguyen, Ce Zhan","doi":"10.1109/MMSP.2011.6093844","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093844","url":null,"abstract":"This paper presents a novel and low complexity method for real-time video-based smoke detection. As a local texture operator, Non-Redundant Local Binary Pattern (NRLBP) is more discriminative and robust to illumination changes in comparison with original Local Binary Pattern (LBP), thus is employed to encode the appearance information of smoke. Non-Redundant Local Motion Binary Pattern (NRLMBP), which is computed on the difference image of consecutive frames, is introduced to capture the motion information of smoke. Experimental results show that NRLBP outperforms the original LBP in the smoke detection task. Furthermore, the combination of NRLBP and NRLMBP, which can be considered as a spatial-temporal descriptor of smoke, can lead to remarkable improvement on detection performance.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121627923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
OpenCL implementation of motion estimation for cloud video processing OpenCL实现的用于云视频处理的运动估计
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093846
R. Gaetano, B. Pesquet-Popescu
With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend to limit the access to the core technology. In this work, we focus on the motion estimation problem, well-known to be the most time consuming step of a majority of video coding techniques. By relying on the use of the OpenCL standard, which provides a cross-platform framework for parallel programming, we propose here a scalable CPU/GPU implementation of the full search motion estimation algorithm (FSBM), and study its performances also with respect to the issues raised by the use of OpenCL.
随着云计算基础设施的兴起和并行计算设备(如gpu和多核cpu)的可访问性的增加,并行编程最近重新引起了人们的兴趣。在视频编码领域尤其如此,算法的复杂性和时间消耗往往限制了对核心技术的访问。在这项工作中,我们专注于运动估计问题,众所周知,这是大多数视频编码技术中最耗时的一步。基于OpenCL标准,我们提出了一种可扩展的全搜索运动估计算法(FSBM)的CPU/GPU实现,并针对OpenCL带来的问题对其性能进行了研究。
{"title":"OpenCL implementation of motion estimation for cloud video processing","authors":"R. Gaetano, B. Pesquet-Popescu","doi":"10.1109/MMSP.2011.6093846","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093846","url":null,"abstract":"With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend to limit the access to the core technology. In this work, we focus on the motion estimation problem, well-known to be the most time consuming step of a majority of video coding techniques. By relying on the use of the OpenCL standard, which provides a cross-platform framework for parallel programming, we propose here a scalable CPU/GPU implementation of the full search motion estimation algorithm (FSBM), and study its performances also with respect to the issues raised by the use of OpenCL.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121711916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Optimal resource allocation for video streaming over cognitive radio networks 认知无线网络视频流的最优资源分配
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093822
Bo Guan, Yifeng He
Cognitive Radio (CR) is a new paradigm in wireless communications to enhance utilization of limited spectrum resources. In this paper, we investigate the resource allocation problem for video streaming over spectrum underlay cognitive radio networks where secondary users and primary users transmit data simultaneously in a common frequency band. We formulate the resource allocation problem into an optimization problem, which jointly optimizes the source rate, the transmission rate, and the transmission power at each secondary session to provide QoS guarantee to the video streaming sessions. The optimization problem is formulated into a Geometric Programming (GP) problem, which can be solved efficiently. In the simulations, we demonstrate that the proposed scheme can achieve a lower Packet Loss Rate (PLR) and queuing delay, thus leading to a higher video quality for the video streaming sessions, compared to the uniform scheme.
认知无线电(CR)是提高有限频谱资源利用率的无线通信新范式。本文研究了基于认知无线网络的视频流资源分配问题,在这种网络中,辅助用户和主用户在一个共同的频带内同时传输数据。我们将资源分配问题转化为优化问题,共同优化每个二次会话的源速率、传输速率和传输功率,为视频流会话提供QoS保证。将优化问题转化为几何规划问题,可以有效地求解。在仿真中,我们证明了该方案可以实现较低的丢包率(PLR)和排队延迟,从而为视频流会话带来更高的视频质量,与统一方案相比。
{"title":"Optimal resource allocation for video streaming over cognitive radio networks","authors":"Bo Guan, Yifeng He","doi":"10.1109/MMSP.2011.6093822","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093822","url":null,"abstract":"Cognitive Radio (CR) is a new paradigm in wireless communications to enhance utilization of limited spectrum resources. In this paper, we investigate the resource allocation problem for video streaming over spectrum underlay cognitive radio networks where secondary users and primary users transmit data simultaneously in a common frequency band. We formulate the resource allocation problem into an optimization problem, which jointly optimizes the source rate, the transmission rate, and the transmission power at each secondary session to provide QoS guarantee to the video streaming sessions. The optimization problem is formulated into a Geometric Programming (GP) problem, which can be solved efficiently. In the simulations, we demonstrate that the proposed scheme can achieve a lower Packet Loss Rate (PLR) and queuing delay, thus leading to a higher video quality for the video streaming sessions, compared to the uniform scheme.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130324267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Depth map coding using graph based transform and transform domain sparsification 深度图编码采用基于图的变换和变换域稀疏化
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093810
Gene Cheung, Woo-Shik Kim, Antonio Ortega, Junichi Ishida, Akira Kubota
Depth map compression is important for compact “texture-plus-depth” representation of a 3D scene, where texture and depth maps captured from multiple camera viewpoints are coded into the same format. Having received such format, the decoder can synthesize any novel intermediate view using texture and depth maps of two neighboring captured views via depth-image-based rendering (DIBR). In this paper, we combine two previously proposed depth map compression techniques that promote sparsity in the transform domain for coding gain-graph-based transform (GBT) and transform domain sparsification (TDS) — together under one unified optimization framework. The key to combining GBT and TDS is to adaptively select the simplest transform per block that leads to a sparse representation. For blocks without detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is low, and TDS can effectively identify a sparse depth signal in fixed DCT domain within a large search space of good signals with small synthesized view distortion. For blocks with detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is high, and the search space of good depth signals for TDS to find sparse representations in DCT domain is small. In this case, GBT is first performed on a graph defining all detected edges, so that filtering across edges is avoided, resulting in a sparsity count ρ in GBT. We then incrementally add the most important edge to an initial no-edge graph, each time performing TDS in the resulting GBT domain, until the same sparsity count ρ is achieved. Experimentation on two sets of multiview images showed gain of up to 0.7dB in PSNR in synthesized view quality compared to previous techniques that employ either GBT or TDS alone.
深度图压缩对于3D场景的紧凑“纹理加深度”表示非常重要,其中从多个摄像机视点捕获的纹理和深度图被编码成相同的格式。接收到这种格式后,解码器可以通过深度图像渲染(deep -image-based rendering, DIBR),利用相邻两个捕获视图的纹理图和深度图合成任何新的中间视图。本文将基于编码增益图的变换(GBT)和变换域稀疏化(TDS)两种深度图压缩技术结合在一个统一的优化框架下,提高了编码增益图变换域的稀疏性。结合GBT和TDS的关键是自适应地选择每个块最简单的变换,从而得到稀疏表示。对于未检测到突出边缘的块,合成视图对深度图误差的畸变敏感性较低,TDS可以在大的合成视图畸变小的良好信号搜索空间内有效识别固定DCT域中的稀疏深度信号。对于检测到突出边缘的块,合成视图对深度图误差的失真敏感性高,TDS在DCT域中寻找稀疏表示的良好深度信号的搜索空间小。在这种情况下,GBT首先在定义所有检测到的边的图上执行,从而避免了跨边过滤,从而导致GBT中的稀疏性计数ρ。然后,我们增量地将最重要的边添加到初始无边图中,每次在得到的GBT域中执行TDS,直到获得相同的稀疏度计数ρ。在两组多视图图像上的实验表明,与之前单独使用GBT或TDS的技术相比,合成视图质量的PSNR增益高达0.7dB。
{"title":"Depth map coding using graph based transform and transform domain sparsification","authors":"Gene Cheung, Woo-Shik Kim, Antonio Ortega, Junichi Ishida, Akira Kubota","doi":"10.1109/MMSP.2011.6093810","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093810","url":null,"abstract":"Depth map compression is important for compact “texture-plus-depth” representation of a 3D scene, where texture and depth maps captured from multiple camera viewpoints are coded into the same format. Having received such format, the decoder can synthesize any novel intermediate view using texture and depth maps of two neighboring captured views via depth-image-based rendering (DIBR). In this paper, we combine two previously proposed depth map compression techniques that promote sparsity in the transform domain for coding gain-graph-based transform (GBT) and transform domain sparsification (TDS) — together under one unified optimization framework. The key to combining GBT and TDS is to adaptively select the simplest transform per block that leads to a sparse representation. For blocks without detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is low, and TDS can effectively identify a sparse depth signal in fixed DCT domain within a large search space of good signals with small synthesized view distortion. For blocks with detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is high, and the search space of good depth signals for TDS to find sparse representations in DCT domain is small. In this case, GBT is first performed on a graph defining all detected edges, so that filtering across edges is avoided, resulting in a sparsity count ρ in GBT. We then incrementally add the most important edge to an initial no-edge graph, each time performing TDS in the resulting GBT domain, until the same sparsity count ρ is achieved. Experimentation on two sets of multiview images showed gain of up to 0.7dB in PSNR in synthesized view quality compared to previous techniques that employ either GBT or TDS alone.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130661250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Reducing complexity in H.264/AVC motion estimation by using a GPU 通过使用GPU降低H.264/AVC运动估计的复杂性
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093785
Rafael Rodríguez-Sánchez, José Luis Martínez, G. Fernández-Escribano, J. M. Claver, J. L. Sánchez
H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of speedup) while maintaining the coding efficiency.
为了减少视频序列的时间冗余,H.264/AVC采用了计算复杂度较高的复杂模式决策技术。近年来,文献中提出了几种算法,旨在加速这部分编码过程。近年来,随着多核处理器或多核加速器的出现,可以采用一种新的方法来降低H.264/AVC编码算法的复杂度。本文针对H.264/AVC中采用的降低互预测复杂度的问题,提出了一种基于gpu的CUDA实现方法。实验结果表明,该方法在保持编码效率的同时,将编码复杂度降低了99%(加速速度提高了100倍)。
{"title":"Reducing complexity in H.264/AVC motion estimation by using a GPU","authors":"Rafael Rodríguez-Sánchez, José Luis Martínez, G. Fernández-Escribano, J. M. Claver, J. L. Sánchez","doi":"10.1109/MMSP.2011.6093785","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093785","url":null,"abstract":"H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of speedup) while maintaining the coding efficiency.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131209112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Optimized reference frame selection for video coding by cloud 优化了云视频编码的参考帧选择
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093770
Bin Li, Jizheng Xu, Houqiang Li, Feng Wu
We investigate how to improve video coding efficiency via optimized reference frame selection using large-scale computation resources, e.g., a cloud. We first formulate the optimization problem for reference frame selection in video coding, which can be simplified to a manageable level. Given the maximum number of reference frames for encoding one frame, we give the upper bound of the coding efficiency on the High Efficiency Video Coding (HEVC) platform, which, although ideal, may require a huge amount of reference frame buffering at the decoder. Then we give a solution and the corresponding performance when the reference frame buffer size at the decoder is constrained. Experimental results show that when the number of reference frames is four, the proposed encoding scheme can achieve up to 16.9% bit-saving compared to HEVC, the state-of-the-art video coding system. The proposed encoding scheme is standard-compliant and can also be applied to H.264/AVC to improve coding efficiency.
我们研究了如何通过使用大规模计算资源(例如云)优化参考帧选择来提高视频编码效率。我们首先提出了视频编码中参考帧选择的优化问题,该问题可以简化到易于管理的程度。给定编码一帧的最大参考帧数,给出了高效视频编码(HEVC)平台上编码效率的上限,尽管HEVC平台很理想,但在解码器上可能需要大量的参考帧缓冲。然后给出了解码器参考帧缓冲区大小受限时的解决方案和相应的性能。实验结果表明,在参考帧数为4帧的情况下,与当前最先进的视频编码系统HEVC相比,所提出的编码方案最多可节省16.9%的比特。该编码方案符合标准,也可应用于H.264/AVC,提高编码效率。
{"title":"Optimized reference frame selection for video coding by cloud","authors":"Bin Li, Jizheng Xu, Houqiang Li, Feng Wu","doi":"10.1109/MMSP.2011.6093770","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093770","url":null,"abstract":"We investigate how to improve video coding efficiency via optimized reference frame selection using large-scale computation resources, e.g., a cloud. We first formulate the optimization problem for reference frame selection in video coding, which can be simplified to a manageable level. Given the maximum number of reference frames for encoding one frame, we give the upper bound of the coding efficiency on the High Efficiency Video Coding (HEVC) platform, which, although ideal, may require a huge amount of reference frame buffering at the decoder. Then we give a solution and the corresponding performance when the reference frame buffer size at the decoder is constrained. Experimental results show that when the number of reference frames is four, the proposed encoding scheme can achieve up to 16.9% bit-saving compared to HEVC, the state-of-the-art video coding system. The proposed encoding scheme is standard-compliant and can also be applied to H.264/AVC to improve coding efficiency.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133567274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Strategies for orca call retrieval to support collaborative annotation of a large archive 支持大型档案协作注释的逆戟鲸呼叫检索策略
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093798
S. Ness, Alexander Lerch, G. Tzanetakis
The Orchive is a large audio archive of hydrophone recordings of Killer whale (Orcinus orca) vocalizations. Researchers and users from around the world can interact with the archive using a collaborative web-based annotation, visualization and retrieval interface. In addition a mobile client has been written in order to crowdsource Orca call annotation. In this paper we describe and compare different strategies for the retrieval of discrete Orca calls. In addition, the results of the automatic analysis are integrated in the user interface facilitating annotation as well as leveraging the existing annotations for supervised learning. The best strategy achieves a mean average precision of 0.77 with the first retrieved item being relevant 95% of the time in a dataset of 185 calls belonging to 4 types.
兰花是虎鲸(Orcinus orca)发声的水听器录音的大型音频档案。来自世界各地的研究人员和用户可以使用基于web的协作注释、可视化和检索界面与档案进行交互。此外,还编写了一个移动客户端,以便众包Orca呼叫注释。在本文中,我们描述并比较了不同的策略,以检索离散虎鲸呼叫。此外,自动分析的结果集成在用户界面中,促进注释,并利用现有的注释进行监督学习。最佳策略的平均精度为0.77,在属于4种类型的185个呼叫的数据集中,第一个检索项的相关性为95%。
{"title":"Strategies for orca call retrieval to support collaborative annotation of a large archive","authors":"S. Ness, Alexander Lerch, G. Tzanetakis","doi":"10.1109/MMSP.2011.6093798","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093798","url":null,"abstract":"The Orchive is a large audio archive of hydrophone recordings of Killer whale (Orcinus orca) vocalizations. Researchers and users from around the world can interact with the archive using a collaborative web-based annotation, visualization and retrieval interface. In addition a mobile client has been written in order to crowdsource Orca call annotation. In this paper we describe and compare different strategies for the retrieval of discrete Orca calls. In addition, the results of the automatic analysis are integrated in the user interface facilitating annotation as well as leveraging the existing annotations for supervised learning. The best strategy achieves a mean average precision of 0.77 with the first retrieved item being relevant 95% of the time in a dataset of 185 calls belonging to 4 types.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2011 IEEE 13th International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1