首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data 基于深度学习的到达时差估计——从声学模拟到记录数据
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287131
Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske
The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.
声源的空间信息由声波传递到传声器阵列,通过估计传声器之间的相位和幅度差来观测。到达时间差(TDoA)捕获麦克风之间波前的传播延迟,可用于引导波束形成器或定位源。但是混响和干扰会使TDoA估计变差。与传统的基于相关的方法相比,深度神经网络通过监督学习可以在更恶劣的条件下提取语音相关的tdoa。声学模拟提供了大量带有注释的数据,而真实记录需要手动注释或使用带有适当校准程序的参考传感器。这两个数据源的分布可能不同。当使用模拟数据训练的DNN模型与来自不同分布的真实数据呈现时,如果不适当处理,其性能会下降。为了降低基于深度神经网络的TDoA估计误差,本研究探讨了不同输入归一化技术的作用,混合模拟和真实数据进行训练,并应用对抗域自适应技术。结果量化了使用不同方法对真实数据的TDoA误差的减少。在训练过程中使用归一化方法、领域自适应和真实数据可以明显降低TDoA误差。
{"title":"Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data","authors":"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske","doi":"10.1109/MMSP48831.2020.9287131","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287131","url":null,"abstract":"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131418773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction 基于带间递归预测的HEVC多光谱图像压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287132
Anna Meyer, Nils Genser, A. Kaup
Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.
光学传感器的最新发展为多光谱成像提供了广泛的应用,例如,在监视,光学分选和生命科学仪器中。提高空间和光谱分辨率可以创建更高质量的产品,然而,它在处理如此大量的数据时带来了挑战。因此,需要专门的多光谱图像压缩技术。高效视频编码(HEVC)被认为是视频编码和静止图像编码效率最高的技术。本文提出了一种基于HEVC的多光谱数据高效编码的交叉光谱压缩方案。通过一种新的带间预测器扩展图像内预测,可以有效地利用光谱和空间冗余。采用自适应线性回归模型,综合考虑了当前波段与进一步光谱参考之间的依赖关系。所提出的反向预测方案不需要额外的侧信息进行解码。我们表明,我们的新方法能够在率失真性能方面优于最先进的有损压缩技术。在不同的数据集上,与HEVC和文献中的参考方法相比,平均Bjøntegaard δ速率分别节省了82%和55%。
{"title":"Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction","authors":"Anna Meyer, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287132","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287132","url":null,"abstract":"Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Convolution Autoencoder-Based Sparse Representation Wavelet for Image Classification 基于卷积自编码器的稀疏表示小波图像分类
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287107
Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi
In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.
本文提出了一种有效的卷积自编码器(AE)模型,用于小波域分类(SRWC)中的稀疏表示(SR)。该方法包括一个带有稀疏隐层的自编码器,用于学习小波特征的稀疏编码。估计的稀疏代码用于使用基于残差的概率准则为测试样本分配类。在各种数据集上进行的密集实验表明,与最近几种基于深度学习的方法相比,所提出的方法产生了更好的分类精度,同时显示出网络参数数量的显着减少。
{"title":"Convolution Autoencoder-Based Sparse Representation Wavelet for Image Classification","authors":"Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi","doi":"10.1109/MMSP48831.2020.9287107","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287107","url":null,"abstract":"In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting 基于频谱加权的点不可知频率选择网格图像重采样
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287096
Viktoria Heimann, Nils Genser, A. Kaup
Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.
图像处理中的许多应用需要将任意位置的样本重新采样到规则的网格位置。这在帧率上转换、超分辨率和图像扭曲等方面非常重要。频率选择性网格重采样是一种基于模型的高质量重采样技术,它需要对关键点进行预估计。在本文中,我们提出了一种新的关键点不可知的频率选择网格到网格的重采样,它不依赖于预估计的关键点。因此,包含的数据点数量大大减少,运行时间也大大缩短。为了补偿关键点,引入了一个光谱加权函数来模拟光学传递函数,以便更倾向于低频而不是高频。从而可靠地抑制了振铃等重采样伪影,提高了重采样质量。平均而言,新的AFSMR在概念上更简单,与原始的网格到网格重采样相比,PSNR的增益高达1.2 dB,而速度约为14.5倍。
{"title":"Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting","authors":"Viktoria Heimann, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287096","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287096","url":null,"abstract":"Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117267510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Successive Refinement of Bounding Volumes for Point Cloud Coding 点云编码中边界体的逐次细化
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287106
I. Tabus, E. C. Kaya, S. Schwarz
The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.
提出了一种新的有损点云几何编码方法。该方案仅从与单个投影方向相关的两个深度图重建几何形状,然后使用适当定义的锚点提出渐进重建过程。两个深度图像的重建遵循几个用于分析和编码的原语,其中几个是可选的。生成的比特流是嵌入的,可以在边界体的各个重建级别上截断。用于编码所需实体的编码工具非常简单,并且可以灵活组合。该方案还可以与G-PCC编码相结合,对稀疏点云进行无损重构。实验结果表明,与单独使用G-PCC编解码器相比,该方法与G-PCC编解码器结合使用后的码率失真性能得到了改善。
{"title":"Successive Refinement of Bounding Volumes for Point Cloud Coding","authors":"I. Tabus, E. C. Kaya, S. Schwarz","doi":"10.1109/MMSP48831.2020.9287106","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287106","url":null,"abstract":"The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131537550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Surface Lightfield Support in Video-based Point Cloud Coding 基于视频的点云编码中的表面光场支持
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287115
Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela
Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.
表面光场(SLF)是一组颜色向量到一组光线向量的映射,这些光线向量起源于表面上的一点。它可以在扩展现实应用程序中渲染逼真的视点。但是,表示SLF所需的数据量要大得多。因此,存储和分发slf需要一种有效的压缩表示。电影专家组(MPEG)有一个正在进行的压缩点云的标准活动。直到最近,这个活动都是针对单一纹理信息的压缩,但现在正在研究视图依赖的纹理。在本文中,我们提出了在不影响视觉质量的情况下优化视图相关颜色编码的方法。我们的结果表明,与独立编码所有纹理相比,本文提供的优化方法在全帧内配置下将编码HEVC比特率降低了64%,在随机访问配置下降低了52%。
{"title":"Surface Lightfield Support in Video-based Point Cloud Coding","authors":"Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela","doi":"10.1109/MMSP48831.2020.9287115","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287115","url":null,"abstract":"Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Smart caching for live 360° video streaming in mobile networks 智能缓存的实时360°视频流在移动网络
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287059
P. Maniotis, N. Thomos
Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.
尽管5G系统取得了进步,但由于360°视频文件的大小,在移动网络中传输360°视频内容仍然具有挑战性。最近,边缘缓存已经被证明可以为360°视频点播(VoD)交付系统带来巨大的性能提升,但是现有系统不能直接应用于实时360°视频流。为了解决这个问题,我们研究了边缘缓存辅助的360°实时视频流。由于视频和贴片的受欢迎程度随时间而变化,我们的框架采用长短期记忆(LSTM)网络来确定最佳缓存放置/删除策略,从而优化用户呈现的视频质量。为进一步提升传送的视像质素,位于多个附属电台覆盖范围重叠的用户可从其中任何一个附属电台接收数据。我们评估和比较我们的方法与最先进的系统的性能。结果表明了该方法相对于同类方法的优越性,并明确了LSTM网络对瓷砖流行度的准确预测以及用户与多个sbs的关联在交付质量方面的好处。
{"title":"Smart caching for live 360° video streaming in mobile networks","authors":"P. Maniotis, N. Thomos","doi":"10.1109/MMSP48831.2020.9287059","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287059","url":null,"abstract":"Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"165 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality 主观测试数据集和基于元数据的360°流媒体视频质量模型
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287065
S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake
During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.
在过去几年中,可用于流媒体的360°视频数量迅速增加,导致需要对360°流媒体视频质量进行评估。在本文中,我们报告并发布了三个主观360°视频质量测试的结果,这些测试的条件用于反映现实世界的比特率和分辨率,包括4K, 6K和8K,前两次测试各产生64个刺激,第三次测试各产生63个刺激。作为播放设备,我们使用HTC Vive进行了第一次测试,HTC Vive Pro进行了剩下的两次测试。视频质量评级是用5分绝对类别评级量表收集的。本文提供的360°数据集包含使用的源视频链接、原始主观评分、视频相关元数据、头部旋转数据和每个刺激和每个受试者的模拟器疾病问卷结果,以确保所提供结果的可重复性。此外,我们使用我们的数据集来比较最先进的全参考质量指标的性能,如VMAF、PSNR、SSIM、ADM2、WS-PSNR和WS-SSIM。在所有指标中,VMAF与主观得分的相关性最高。此外,我们评估了VMAF的中心裁剪版本(“VMAF-cc”),显示其提供与完整VMAF相似的性能。除了数据集和客观度量评估之外,我们还提出了两种新的视频质量预测模型,即基于比特流元数据的模型和以视频的比特率、分辨率和像素信息为输入的混合无参考模型。新的轻量级模型提供与全参考模型相似的性能,同时实现快速计算。
{"title":"Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality","authors":"S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake","doi":"10.1109/MMSP48831.2020.9287065","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287065","url":null,"abstract":"During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF (\"VMAF-cc\") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video 基于通用信息的高效低比特率360度视频帧内编码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287050
Fariha Afsana, M. Paul, M. Murshed, D. Taubman
With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.
随着视频技术的发展,包括360度沉浸式视频在内的超分辨率视频已经成为现实,因为增强/虚拟/混合现实等令人兴奋的应用可以更好地进行交互,并且与传统视频相比,具有窄聚焦视角的用户观看场景的广角体验。新一代视频内容具有高分辨率、高比特率、低延迟传输等带宽密集型特点,对解决传输瓶颈和存储负担提出了挑战。传统的视频编码方案由于处理块的大小固定,提高帧内视频编码效率的优化空间有限。本文提出了一种改进帧内编码的新方法,特别是在HEVC有损模式下360度视频的低比特率视频传输中。该方法在传统HEVC内部预测方法的基础上,利用多级离散小波变换提取图像中常见的重要信息,充分利用整个帧的全局冗余性。本文证明,该方法仅考虑帧的低频信息并进行编码,可以在低比特率下优于HEVC标准。实验结果表明,在低比特率场景下,与HEVC相比,所提出的帧内编码策略实现了平均54.07%的降噪和2.84 dB的降噪增益。在编码时间方面也取得了显著的改善,平均减少了66.84%。此外,这一发现还表明,当我们在小波频域上应用HEVC时,现有的HEVC块划分可以应用于变换域,从而更好地利用信息集中。
{"title":"Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video","authors":"Fariha Afsana, M. Paul, M. Murshed, D. Taubman","doi":"10.1109/MMSP48831.2020.9287050","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287050","url":null,"abstract":"With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132179551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MMSP 2020 Cover Page MMSP 2020封面
Pub Date : 2020-09-21 DOI: 10.1109/mmsp48831.2020.9287079
{"title":"MMSP 2020 Cover Page","authors":"","doi":"10.1109/mmsp48831.2020.9287079","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287079","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1