首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
V-PCC Component Synchronization for Point Cloud Reconstruction 点云重建的V-PCC组件同步
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287092
D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto
For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.
为了使V-PCC系统能够重建点云的单个实例,必须将一个V-PCC单元转移到3D点云重建模块。然而,需要所有的V-PCC组件,即占用图,几何,地图集和属性暂时对齐。原则上,这可能会带来挑战,因为被解码的子比特流的时间结构在V-PCC子比特流中不一致。本文提出了一种解码后的V-PCC子比特流的输出延迟调整机制,为点云重构模块提供同步的V-PCC分量输入。
{"title":"V-PCC Component Synchronization for Point Cloud Reconstruction","authors":"D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto","doi":"10.1109/MMSP48831.2020.9287092","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287092","url":null,"abstract":"For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126168165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Fast and Efficient VVC Encoding 实现快速高效的VVC编码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287093
J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe
Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.
通用视频编码(VVC)是一项新的国际视频编码标准,将于2020年7月定稿。它的设计是在相同的主观视觉质量下提供大约50%的比特率节省,比它的前身,高效视频编码(H.265/HEVC)。在标准开发过程中,与HEVC参考软件(HM)相比,VVC参考软件(VTM)的客观比特率节省了约40%。未优化的VTM编码器约为9倍,解码器约为2倍,比HM慢。本文从软件运行时的角度讨论了VVC编码器的复杂性。该标准的模块化设计允许VVC编码器权衡比特率节省和编码器运行时间。基于详细的权衡分析,报告了不同操作点的结果。此外,还介绍了软件和算法优化方面的初步工作。通过优化的软件算法,可以实现比VTM快22倍以上的单线程编码器运行时间的操作点,即比HM快2.5倍左右,同时仍然提供比HM节省30%以上的比特率。最后,我们的实验证明了VVC的灵活性及其优化软件编码器实现的潜力。
{"title":"Towards Fast and Efficient VVC Encoding","authors":"J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe","doi":"10.1109/MMSP48831.2020.9287093","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287093","url":null,"abstract":"Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124484794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Joint Cross-Component Linear Model For Chroma Intra Prediction 色度内预测的联合交叉分量线性模型
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287167
R. G. Youvalari, J. Lainema
The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.
跨分量线性模型(Cross-Component Linear Model, CCLM)是一种用于即将到来的通用视频编码(VVC)标准的帧内预测技术。CCLM试图通过使用线性模型来降低信道间的相关性。为此,基于亮度通道重构样本和色度编码块相邻样本计算模型参数。为了提高刀具的预测效率,本文提出了一种新的方法——联合交叉分量线性模型(J-CCLM)。提出的J-CCLM技术采用多假设方法对编码块样本进行预测,该方法结合了两种内预测模式。为此,通过将传统的CCLM模式与从共定位光斑块中导出的角模式相结合来实现块的最终预测。在VVC的VTM-8.0测试模型上进行的实验表明,该方法在色度通道上的BD-Rate平均增益大于1.0%。此外,在4:2:0和4:4:4颜色格式下,加权YCbCr比特率分别节省0.24%和0.54%。
{"title":"Joint Cross-Component Linear Model For Chroma Intra Prediction","authors":"R. G. Youvalari, J. Lainema","doi":"10.1109/MMSP48831.2020.9287167","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287167","url":null,"abstract":"The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116603788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Suitability of Texture Vibrations Based on Visually Perceived Virtual Textures in Bimodal and Trimodal Conditions 基于视觉感知虚拟纹理的双峰和三峰条件下纹理振动的适宜性
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287066
U. A. Alma, M. Altinsoy
In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.
在本研究中,根据屏幕上显示的视觉纹理来评估记录和简化的纹理振动的适用性。测试的振动是1)记录振动,2)单正弦波和3)在之前的工作中使用的带限白噪声。在之前的研究中,纹理振动的适宜性是根据真实纹理通过触摸来评估的。然而,考虑到用户在触摸设备上只与虚拟(视觉)对象交互,纹理振动也应该基于纹理图像进行测试。因此,本研究的目的是评估触觉振动反馈与纹理图像在听觉反馈缺失和存在情况下的一致性。三模态测试使用两种类型的听觉反馈,并在不同的响度水平下进行测试。因此,在探索视觉纹理时,振动触觉和音频刺激的最合理组合可以确定。在心理物理测试的基础上,双峰条件下织构振动的相似度评分与前一研究相比没有显著差异。在三模态判断中,合成声音显著影响相似性评级,而触摸声音对感知相似性没有影响。
{"title":"The Suitability of Texture Vibrations Based on Visually Perceived Virtual Textures in Bimodal and Trimodal Conditions","authors":"U. A. Alma, M. Altinsoy","doi":"10.1109/MMSP48831.2020.9287066","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287066","url":null,"abstract":"In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114308865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scalable Mesh Representation for Depth from Breakpoint-Adaptive Wavelet Coding 基于断点自适应小波编码的深度可扩展网格表示
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287145
Yue Li, R. Mathew, D. Taubman
A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.
在许多应用中需要高度可扩展和紧凑的深度数据表示,这对于使用深度信息进行新视图合成和访谈预测的全光学多视图图像压缩框架尤为重要。有效地编码深度数据可能是困难的,因为它包含明显的不连续。断点自适应离散小波变换(BPA-DWT)目前作为JPEG 2000 part -17扩展的一部分被标准化,已经发现适合编码具有硬不连续的空间媒体。在本文中,我们探索了一种对原始bp - dwt的改进,用新提出的仿射外推法取代传统的常数外推策略,用于重建不连续区域附近的深度数据。我们还提出了一种深度重建方案,该方案可以直接将BPA-DWT系数和断点解码为紧凑且可扩展的基于网格的表示,与基于样本的描述相比,它具有许多潜在的优点。对于执行深度补偿视图预测,我们提出的深度数据的三角形网格表示非常适合现代图形架构。
{"title":"Scalable Mesh Representation for Depth from Breakpoint-Adaptive Wavelet Coding","authors":"Yue Li, R. Mathew, D. Taubman","doi":"10.1109/MMSP48831.2020.9287145","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287145","url":null,"abstract":"A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Video Coding for Machines with Feature-Based Rate-Distortion Optimization 基于特征率失真优化的机器视频编码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287136
Kristian Fischer, Fabian Brand, Christian Herglotz, A. Kaup
Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49% bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95% compared to conventional VTM.
常见的最先进的视频编解码器经过优化,通过为最终的人类观察者提供一定的质量来提供低比特率,这是通过率失真优化(RDO)实现的。但是,随着神经网络解决计算机视觉任务能力的不断提高,越来越多的多媒体数据不再由人类观察,而是由神经网络直接分析。在本文中,我们提出了一种符合标准的基于特征的RDO (FRDO),旨在提高机器视频编码场景中解码帧的神经网络分析编码性能。在这种程度上,我们将VTM-8.0的传统RDO中基于像素的失真度量替换为在神经网络的第一层创建的特征空间中计算的失真度量。通过对分割网络Mask R-CNN和来自cityscape数据集的单个图像的多次测试,我们将提出的FRDO及其混合版本HFRDO与传统的RDO在特征空间中具有不同的失真措施进行了比较。与VTM-8.0相比,HFRDO在Bjøntegaard Delta Rate方面可以节省5.49%的比特率,并使用加权平均精度作为质量指标。此外,允许编码器改变量化参数,与传统VTM相比,所提出的HFRDO的编码增益高达9.95%。
{"title":"Video Coding for Machines with Feature-Based Rate-Distortion Optimization","authors":"Kristian Fischer, Fabian Brand, Christian Herglotz, A. Kaup","doi":"10.1109/MMSP48831.2020.9287136","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287136","url":null,"abstract":"Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49% bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95% compared to conventional VTM.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Motion JPEG Decoding via Iterative Thresholding and Motion-Compensated Deflickering 基于迭代阈值和运动补偿的运动JPEG解码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287147
E. Belyaev, Linlin Bie, J. Korhonen
This paper studies the problem of decoding video sequences compressed by Motion JPEG (M-JPEG) at the best possible perceived video quality. We consider decoding of M-JPEG video as signal recovery from incomplete measurements known in compressive sensing. We take all quantized nonzero Discrete Cosine Transform (DCT) coefficients as measurements and the remaining zero coefficients as data that should be recovered. The output video is reconstructed via iterative thresholding algorithm, where Video Block Matching and 4-D filtering (VBM4D) is used as thresholding operator. To reduce non-linearities in the measurements caused by the quantization in JPEG, we propose to apply spatio-temporal pre-filtering before measurements calculation and recovery. Since temporal inconsistencies of the residual coding artifacts lead to strong flickering in recovered video, we also propose to apply motion-compensated deflickering filter as a post-filter. Experimental results show that the proposed approach provides 0.44–0.51 dB average improvement in Peak Signal to Noise Ratio (PSNR), as well as lower flickering level compared to the state-of-the-art method based on Coefficient Graph Laplacians (COGL). We have also conducted a subjective comparison study, indicating that the proposed approach outperforms state-of-the-art methods in terms of subjective video quality.
本文研究了运动JPEG (M-JPEG)压缩视频序列的解码问题,以获得最佳的感知视频质量。我们认为M-JPEG视频的解码是压缩感知中已知的不完整测量的信号恢复。我们将所有量化的非零离散余弦变换(DCT)系数作为测量值,将剩余的零系数作为需要恢复的数据。以视频块匹配和四维滤波(video Block Matching and 4-D filtering, VBM4D)作为阈值算子,通过迭代阈值算法重构输出视频。为了减少JPEG中量化引起的测量非线性,我们提出在测量计算和恢复之前应用时空预滤波。由于残余编码伪影的时间不一致性导致恢复视频中强烈的闪烁,我们还建议应用运动补偿的闪烁滤波器作为后滤波器。实验结果表明,与基于系数图拉普拉斯算子(COGL)的方法相比,该方法的峰值信噪比(PSNR)平均提高了0.44 ~ 0.51 dB,且闪烁水平较低。我们还进行了一项主观比较研究,表明所提出的方法在主观视频质量方面优于最先进的方法。
{"title":"Motion JPEG Decoding via Iterative Thresholding and Motion-Compensated Deflickering","authors":"E. Belyaev, Linlin Bie, J. Korhonen","doi":"10.1109/MMSP48831.2020.9287147","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287147","url":null,"abstract":"This paper studies the problem of decoding video sequences compressed by Motion JPEG (M-JPEG) at the best possible perceived video quality. We consider decoding of M-JPEG video as signal recovery from incomplete measurements known in compressive sensing. We take all quantized nonzero Discrete Cosine Transform (DCT) coefficients as measurements and the remaining zero coefficients as data that should be recovered. The output video is reconstructed via iterative thresholding algorithm, where Video Block Matching and 4-D filtering (VBM4D) is used as thresholding operator. To reduce non-linearities in the measurements caused by the quantization in JPEG, we propose to apply spatio-temporal pre-filtering before measurements calculation and recovery. Since temporal inconsistencies of the residual coding artifacts lead to strong flickering in recovered video, we also propose to apply motion-compensated deflickering filter as a post-filter. Experimental results show that the proposed approach provides 0.44–0.51 dB average improvement in Peak Signal to Noise Ratio (PSNR), as well as lower flickering level compared to the state-of-the-art method based on Coefficient Graph Laplacians (COGL). We have also conducted a subjective comparison study, indicating that the proposed approach outperforms state-of-the-art methods in terms of subjective video quality.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130513961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest 现实生活中的少量目标检测:自动采集的案例研究
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287053
Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet
Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.
COVID-19期间的禁闭对世界各地的农业造成了严重影响。作为一种高效的采收方案,基于目标检测和机器人采收的机械采收/自动采收成为迫切需要。在自动收获系统中,鲁棒的少镜头目标检测模型是瓶颈之一,因为系统需要处理新的蔬菜/水果类别,而收集所有新类别的大规模注释数据集是昂贵的。有许多由社区开发的小镜头目标检测模型。然而,它们是否可以直接用于现实生活中的农业应用仍然是一个问题,因为在常用的训练数据集和在现实生活中的农业场景中收集的图像之间存在上下文差距。为此,在本研究中,我们提出了一个新的黄瓜数据集,并提出了两种有助于弥合上下文差距的数据增强策略。实验结果表明:1)最先进的少镜头目标检测模型在新型“黄瓜”类别上表现不佳;2)所提增强策略优于常用增强策略。
{"title":"Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest","authors":"Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet","doi":"10.1109/MMSP48831.2020.9287053","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287053","url":null,"abstract":"Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"413 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115953895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating the Performance of Apple’s Low-Latency HLS 评估苹果低延迟HLS的性能
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287117
Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen
In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.
在2019年6月的年度开发者大会上,苹果宣布了对其流行的HTTP直播(HLS)协议的向后兼容扩展,以实现低延迟的直播。这个扩展提供了新的功能,如生成部分片段,使用播放列表delta更新,阻止播放列表重新加载和提供再现报告的能力。与传统的HLS相比,这些特性需要原始服务器和内容交付网络中的缓存上的新功能。虽然HLS在规模上的性能很好,但它的低延迟扩展可能会消耗大量的服务器和网络资源,这可能会引起对其可伸缩性的担忧。在本文中,我们首次尝试理解这个新的扩展是如何工作和执行的。我们还提供了与低延迟DASH方法的1:1比较,后者是作为开放标准开发的竞争性低延迟解决方案。
{"title":"Evaluating the Performance of Apple’s Low-Latency HLS","authors":"Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen","doi":"10.1109/MMSP48831.2020.9287117","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287117","url":null,"abstract":"In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121500281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
ABR prediction using supervised learning algorithms 使用监督学习算法的ABR预测
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287123
Hiba Yousef, J. L. Feuvre, Alexandre Storelli
With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.
随着互联网上视频流量的大量增加,HTTP自适应流媒体已经成为信息娱乐内容传输的主要技术。在这种情况下,出现了许多带宽自适应算法,每个算法都旨在使用不同的会话信息(如TCP吞吐量、缓冲区占用、下载时间等)来提高用户的QoE。尽管它们在执行上有所不同,但它们大多使用相同的输入来适应媒体会话的不同条件。在本文中,我们证明了有可能预测任何ABR算法的比特率决策,这要归功于机器学习技术,特别是监督分类。这种方法具有通用性,因此它不需要玩家ABR算法本身的任何知识,但假设无论背后的逻辑是什么,它都将使用一组通用的输入功能。然后,使用机器学习特征选择,可以预测相关特征,然后在实际观察上训练模型。我们使用著名的ABR算法模拟来测试我们的方法,然后我们使用不同的VoD和Live现实数据集在商业闭源播放器上验证结果。结果表明,随机森林和梯度增强在其他ml分类器中都取得了非常高的预测精度。
{"title":"ABR prediction using supervised learning algorithms","authors":"Hiba Yousef, J. L. Feuvre, Alexandre Storelli","doi":"10.1109/MMSP48831.2020.9287123","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287123","url":null,"abstract":"With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122764206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1