首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Study on viewing completion ratio of video streaming 视频流媒体观看完成率的研究
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287091
Pierre R. Lebreton, Kazuhisa Yamagishi
In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.
本文研究了自适应比特率视频流的编码优化模型。为此,研究了用完成率测量的质量、内容工期和可接受性之间的关系。这项工作是基于在实验室环境中进行的密集主观测试,并显示了刺激持续时间在接受性研究中的重要性。给出了一个预测视频完成率的模型,该模型具有较好的准确性。利用该模型,可以根据目标放弃率和内容持续时间推导出质量需求。这项工作将帮助视频流媒体提供商在准备在其平台上播放的内容时定义合适的编码条件,以保持用户参与度。
{"title":"Study on viewing completion ratio of video streaming","authors":"Pierre R. Lebreton, Kazuhisa Yamagishi","doi":"10.1109/MMSP48831.2020.9287091","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287091","url":null,"abstract":"In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoSTAR: ROS-based Telerobotic Control via Augmented Reality RoSTAR:基于ros的远程机器人控制通过增强现实
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287100
Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray
Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.
现实世界与虚拟世界的交流与互动将成为未来智能制造生态系统的基石。人机交互被认为是未来工厂的基本要素。尽管可穿戴设备和增强现实(AR)等不同技术取得了进步,但人机交互(HRI)仍然极具挑战性。虽然在支持HRI的不同机制的开发方面取得了进展,但仍存在成本、自然和直观的交互以及跨异构系统的通信等问题。为了减轻这些限制,提出了RoSTAR。RoSTAR是一种基于机器人操作系统(ROS)和增强现实技术的新型开源HRI系统。部署AR头戴式显示器(HMD)。它使用户能够通过ROS驱动的机械臂进行交互和通信。机器人手臂的模型被直接导入到Unity Game引擎中,与这个虚拟机器人手臂的任何交互都被传达给ROS机器人手臂。该系统有潜力用于不同的工艺任务,如机器人上胶、点胶和弧焊,作为可互操作、低成本、便携和自然互动体验的一部分。
{"title":"RoSTAR: ROS-based Telerobotic Control via Augmented Reality","authors":"Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray","doi":"10.1109/MMSP48831.2020.9287100","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287100","url":null,"abstract":"Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Object-Oriented Motion Estimation using Edge-Based Image Registration 基于边缘图像配准的面向对象运动估计
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287129
Md. Asikuzzaman, Deepak Rajamohan, M. Pickering
Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.
采用适当的运动补偿预测技术,最大限度地减少帧间的时间冗余信息,从而降低视频数据的存储和传输成本。在当前的视频编码标准中,使用基于全局运动估计的方法利用相邻帧来预测当前帧的运动。然而,一帧的全局运动估计可能不会产生帧中单个物体的实际运动,因为一帧中的每个物体通常都有自己的运动。本文提出了一种基于边缘的运动估计技术,该技术可以发现帧中每个物体的运动,而不是寻找该帧的全局运动。该方法采用基于边缘位置差(EPD)相似度度量的两帧图像配准方法对帧内的目标进行配准。然后应用超像素搜索对注册对象进行分割。最后,利用本文提出的基于边缘的图像配准技术和Demons算法对当前帧中的目标进行预测。实验分析表明,与现有的基于全局运动估计的方法相比,该算法可以准确地估计当前帧中单个物体的运动。
{"title":"Object-Oriented Motion Estimation using Edge-Based Image Registration","authors":"Md. Asikuzzaman, Deepak Rajamohan, M. Pickering","doi":"10.1109/MMSP48831.2020.9287129","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287129","url":null,"abstract":"Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution 基于调制广义倍频卷积的变速率多频图像压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287082
Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu
In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.
在此方案中,我们设计了一种学习的多频图像压缩方法,该方法使用广义倍频卷积将潜在表示分解为高频(HF)和低频(LF)分量,并且低频分量的分辨率低于高频分量,这可以改善率失真性能,类似于小波变换。此外,与原始的八度卷积相比,本文提出的广义八度卷积(GoConv)和内置激活层的八度转置卷积(GoTConv)保留了更多的信息空间结构,并能更有效地过滤高频分量和低频分量,进一步提高了性能。此外,我们开发了一种可变速率方案,使用拉格朗日参数来调制自编码器中的所有内部特征映射,这使得该方案仅使用三个模型就可以实现JPEG AI的大比特率范围。实验表明,该方案比VVC实现了更好的Y MS-SSIM。在YUV PSNR方面,我们的方案与HEVC非常相似。
{"title":"Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution","authors":"Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu","doi":"10.1109/MMSP48831.2020.9287082","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287082","url":null,"abstract":"In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125230118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Haze-robust image understanding via context-aware deep feature refinement 通过上下文感知深度特征细化的模糊鲁棒图像理解
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287089
Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu
Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.
由于不均匀的能见度下降,雾天场景下的图像理解具有很大的挑战性。尽管已经提出了各种图像去雾方法,但它们通常旨在提高图像在像素空间中的可见性(如PSNR/SSIM),而不是特征空间,这对计算机视觉的感知至关重要。由于这种不匹配,现有的除雾方法在促进雾景理解方面是有限的,甚至是不利的。在本文中,我们提出了一个广义的深度特征细化模块,以最小化清晰图像和模糊图像在特征空间中的差异。它与计算机感知一致,可以嵌入到现有的检测或分割主干中进行联合优化。我们的特征细化模块建立在图卷积网络的基础上,有利于上下文信息的捕获和不同语义对象的区分。我们在雾天场景下的检测和分割任务中验证了我们的方法。大量的实验结果表明,我们的方法优于最先进的基于去雾的预处理和模糊图像的微调结果。
{"title":"Haze-robust image understanding via context-aware deep feature refinement","authors":"Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/MMSP48831.2020.9287089","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287089","url":null,"abstract":"Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming 移动边缘协同多用户360°视频计算和流媒体
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287148
Jacob Chakareski, Nicholas Mastronarde
We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.
我们研究了一种新型的通信系统,该系统集成了可扩展的多层360°视频平铺,视口自适应速率失真优化资源分配,以及以VR为中心的边缘计算和缓存,以实现未来高质量的不受约束的VR流。我们的系统包括一组5G小单元,可以汇集它们的通信、计算和存储资源,共同向移动VR客户端提供更高质量的可扩展360°视频内容。我们的主要贡献是严格设计多层360°平铺和统计用户导航的相关模型,分析和优化基于边缘的多用户VR流,集成视口适应和服务器合作。我们还探讨了网络编码数据操作的可能性及其对本文所追求的分析、优化和系统性能的影响。我们在提供沉浸式保真度方面取得了相当大的进步,具有更高的360°视口峰值信噪比(PSNR)和VR视频帧率和空间分辨率。
{"title":"Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming","authors":"Jacob Chakareski, Nicholas Mastronarde","doi":"10.1109/MMSP48831.2020.9287148","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287148","url":null,"abstract":"We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressing Head-Related Transfer Function databases by Eigen decomposition 基于特征分解的头部相关传递函数数据库压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287134
Camilo Arévalo, J. Villegas
A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).
介绍了一种减少头部相关传递函数(hrtf)内存占用的方法。基于hrtf的特征分解,该方法能够将包含6344个测量值的数据库从36.30 MB减少到2.41MB(约15:1的压缩比)。将压缩数据库中的合成hrtf设置为在0.1和16 kHz之间具有小于1dB的频谱失真。压缩测量值与原始数据库中的测量值之间的差异似乎不会转化为感知定位精度的降低。该方法获得的高压缩度允许在数据库中包含插值的hrtf,以缓解虚拟现实(VR)中的实时音频空间化。
{"title":"Compressing Head-Related Transfer Function databases by Eigen decomposition","authors":"Camilo Arévalo, J. Villegas","doi":"10.1109/MMSP48831.2020.9287134","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287134","url":null,"abstract":"A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132370010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Translation of Perceived Video Quality Across Displays 感知视频质量跨显示器的翻译
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287143
Jessie Lin, N. Birkbeck, Balu Adsumilli
Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.
显示设备可以显著影响视频的感知质量。在本文中,我们关注视频分辨率不超过屏幕分辨率的场景,并研究手机,笔记本电脑和电视上感知视频质量的关系。提出了一种在不同设备之间的平均意见分数(MOS)的新转换,并被证明在实验室和人群来源的主观研究中有效地规范化用户设备之间的评分。该模型使我们能够更专注于实验室主观研究,因为我们可以减少测试设备的数量,并帮助我们减少在众包主观视频质量测试期间的噪音。它也比利用现有设备相关的客观指标在设备间转换MOS评级更有效。
{"title":"Translation of Perceived Video Quality Across Displays","authors":"Jessie Lin, N. Birkbeck, Balu Adsumilli","doi":"10.1109/MMSP48831.2020.9287143","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287143","url":null,"abstract":"Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127693552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario 简单衍射场景中包含几何信息的非视距到达时间差定位
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287166
Sönke Südbeck, Thomas C. Krause, J. Ostermann
Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.
到达时间差(TDOA)定位是一种寻找波发射物体(例如汽车喇叭)位置的技术。针对视距条件下的TDOA定位,提出了许多算法。在非视距(NLOS)情况下,这些算法的性能通常会下降。有一些技术可以减少NLOS条件带来的误差,但是,这些技术并没有直接考虑到周围环境的几何信息。本文描述了一种将环境信息纳入方程系统的简单衍射场景的NLOS TDOA定位方法。通过三种不同扬声器位置的实验验证了该方法的有效性。定位误差小于源到最近麦克风位置距离的6.2%。仿真结果表明,该方法在足够低的TDOA噪声水平下达到了Cramer-Rao-Lower-Bound。
{"title":"Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario","authors":"Sönke Südbeck, Thomas C. Krause, J. Ostermann","doi":"10.1109/MMSP48831.2020.9287166","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287166","url":null,"abstract":"Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129147058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability 基于深度学习的点云几何编码与分辨率可扩展性
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287060
André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira
Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.
点云是一种3D视觉表示格式,最近在沉浸式和交互式多媒体应用程序中变得非常重要。考虑到实际相关点云的点数量众多,市场需求不断增加,高效的点云编码已成为一个重要的研究课题。此外,可扩展性是点云编码的一个重要特征,特别是在实时应用中,对解码点云的快速高效访问是很重要的;然而,这一问题在文献中仍未得到充分探讨。在此背景下,本文提出了一种新的基于深度学习的点云几何编码方案,该方案通过隔行子采样具有分辨率可扩展性。随着额外层的解码,重构点云中的点数量增加,整体质量也随之提高。实验结果表明,所提出的可扩展点云几何编码方案优于当前基于MPEG几何的点云压缩标准。
{"title":"Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability","authors":"André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira","doi":"10.1109/MMSP48831.2020.9287060","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287060","url":null,"abstract":"Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1