首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
RoSTAR: ROS-based Telerobotic Control via Augmented Reality RoSTAR:基于ros的远程机器人控制通过增强现实
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287100
Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray
Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.
现实世界与虚拟世界的交流与互动将成为未来智能制造生态系统的基石。人机交互被认为是未来工厂的基本要素。尽管可穿戴设备和增强现实(AR)等不同技术取得了进步,但人机交互(HRI)仍然极具挑战性。虽然在支持HRI的不同机制的开发方面取得了进展,但仍存在成本、自然和直观的交互以及跨异构系统的通信等问题。为了减轻这些限制,提出了RoSTAR。RoSTAR是一种基于机器人操作系统(ROS)和增强现实技术的新型开源HRI系统。部署AR头戴式显示器(HMD)。它使用户能够通过ROS驱动的机械臂进行交互和通信。机器人手臂的模型被直接导入到Unity Game引擎中,与这个虚拟机器人手臂的任何交互都被传达给ROS机器人手臂。该系统有潜力用于不同的工艺任务,如机器人上胶、点胶和弧焊,作为可互操作、低成本、便携和自然互动体验的一部分。
{"title":"RoSTAR: ROS-based Telerobotic Control via Augmented Reality","authors":"Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray","doi":"10.1109/MMSP48831.2020.9287100","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287100","url":null,"abstract":"Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution 基于调制广义倍频卷积的变速率多频图像压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287082
Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu
In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.
在此方案中,我们设计了一种学习的多频图像压缩方法,该方法使用广义倍频卷积将潜在表示分解为高频(HF)和低频(LF)分量,并且低频分量的分辨率低于高频分量,这可以改善率失真性能,类似于小波变换。此外,与原始的八度卷积相比,本文提出的广义八度卷积(GoConv)和内置激活层的八度转置卷积(GoTConv)保留了更多的信息空间结构,并能更有效地过滤高频分量和低频分量,进一步提高了性能。此外,我们开发了一种可变速率方案,使用拉格朗日参数来调制自编码器中的所有内部特征映射,这使得该方案仅使用三个模型就可以实现JPEG AI的大比特率范围。实验表明,该方案比VVC实现了更好的Y MS-SSIM。在YUV PSNR方面,我们的方案与HEVC非常相似。
{"title":"Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution","authors":"Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu","doi":"10.1109/MMSP48831.2020.9287082","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287082","url":null,"abstract":"In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125230118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Learned BRIEF – transferring the knowledge from hand-crafted to learning-based descriptors 学习概要-将知识从手工制作到基于学习的描述符
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287159
Nina Žižakić, A. Pižurica
In this paper, we present a novel approach for designing local image descriptors that learn from data and from hand-crafted descriptors. In particular, we construct a learning model that first mimics the behaviour of a hand-crafted descriptor and then learns to improve upon it in an unsupervised manner. We demonstrate the use of this knowledge-transfer framework by constructing the learned BRIEF descriptor based on the well-known hand-crafted descriptor BRIEF. We implement our learned BRIEF with a convolutional autoencoder architecture. Evaluation on the HPatches benchmark for local image descriptors shows the effectiveness of the proposed approach in the tasks of patch retrieval, patch verification, and image matching.
在本文中,我们提出了一种新的方法来设计从数据和手工制作的描述符中学习的局部图像描述符。特别是,我们构建了一个学习模型,该模型首先模仿手工制作的描述符的行为,然后以无监督的方式学习改进它。我们通过基于众所周知的手工描述符BRIEF构建学习到的BRIEF描述符来演示这种知识转移框架的使用。我们用卷积自编码器架构实现我们的学习BRIEF。对局部图像描述符的HPatches基准的评估表明了该方法在补丁检索、补丁验证和图像匹配任务中的有效性。
{"title":"Learned BRIEF – transferring the knowledge from hand-crafted to learning-based descriptors","authors":"Nina Žižakić, A. Pižurica","doi":"10.1109/MMSP48831.2020.9287159","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287159","url":null,"abstract":"In this paper, we present a novel approach for designing local image descriptors that learn from data and from hand-crafted descriptors. In particular, we construct a learning model that first mimics the behaviour of a hand-crafted descriptor and then learns to improve upon it in an unsupervised manner. We demonstrate the use of this knowledge-transfer framework by constructing the learned BRIEF descriptor based on the well-known hand-crafted descriptor BRIEF. We implement our learned BRIEF with a convolutional autoencoder architecture. Evaluation on the HPatches benchmark for local image descriptors shows the effectiveness of the proposed approach in the tasks of patch retrieval, patch verification, and image matching.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bi-directional intra prediction based measurement coding for compressive sensing images 基于双向内预测的压缩感知图像测量编码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287074
Thuy Thi Thu Tran, Jirayu Peetakul, Chi Do-Kim Pham, Jinjia Zhou
This work proposes a bi-directional intra prediction-based measurement coding algorithm for compressive sensing images. Compressive sensing is capable of reducing the size of the sparse signals, in which the high-dimensional signals are represented by the under-determined linear measurements. In order to explore the spatial redundancy in measurements, the corresponding pixel domain information extracted using the structure of measurement matrix. Firstly, the mono-directional prediction modes (i.e. horizontal mode and vertical mode), which refer to the nearest information of neighboring pixel blocks, are obtained by the structure of the measurement matrix. Secondly, we design bi-directional intra prediction modes (i.e. Diagonal + Horizontal, Diagonal + Vertical) base on the already obtained mono-directional prediction modes. Experimental results show that this work improves 0.01 - 0.02 dB PSNR improvement and the birate reductions of on average 19%, up to 36% compared to the state-of-the-art.
本文提出了一种基于双向内预测的压缩感知图像测量编码算法。压缩感知能够减小稀疏信号的大小,其中高维信号由欠确定的线性测量值表示。为了探索测量中的空间冗余性,利用测量矩阵的结构提取相应的像素域信息。首先,通过测量矩阵的结构获得指向相邻像素块最近信息的单向预测模式(即水平模式和垂直模式);其次,在已有的单向预测模式的基础上,设计了双向预测模式(即对角+水平、对角+垂直)。实验结果表明,与现有技术相比,该技术提高了0.01 ~ 0.02 dB的PSNR,平均降低了19%的比特率,最高可达36%。
{"title":"Bi-directional intra prediction based measurement coding for compressive sensing images","authors":"Thuy Thi Thu Tran, Jirayu Peetakul, Chi Do-Kim Pham, Jinjia Zhou","doi":"10.1109/MMSP48831.2020.9287074","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287074","url":null,"abstract":"This work proposes a bi-directional intra prediction-based measurement coding algorithm for compressive sensing images. Compressive sensing is capable of reducing the size of the sparse signals, in which the high-dimensional signals are represented by the under-determined linear measurements. In order to explore the spatial redundancy in measurements, the corresponding pixel domain information extracted using the structure of measurement matrix. Firstly, the mono-directional prediction modes (i.e. horizontal mode and vertical mode), which refer to the nearest information of neighboring pixel blocks, are obtained by the structure of the measurement matrix. Secondly, we design bi-directional intra prediction modes (i.e. Diagonal + Horizontal, Diagonal + Vertical) base on the already obtained mono-directional prediction modes. Experimental results show that this work improves 0.01 - 0.02 dB PSNR improvement and the birate reductions of on average 19%, up to 36% compared to the state-of-the-art.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122180891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Haze-robust image understanding via context-aware deep feature refinement 通过上下文感知深度特征细化的模糊鲁棒图像理解
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287089
Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu
Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.
由于不均匀的能见度下降,雾天场景下的图像理解具有很大的挑战性。尽管已经提出了各种图像去雾方法,但它们通常旨在提高图像在像素空间中的可见性(如PSNR/SSIM),而不是特征空间,这对计算机视觉的感知至关重要。由于这种不匹配,现有的除雾方法在促进雾景理解方面是有限的,甚至是不利的。在本文中,我们提出了一个广义的深度特征细化模块,以最小化清晰图像和模糊图像在特征空间中的差异。它与计算机感知一致,可以嵌入到现有的检测或分割主干中进行联合优化。我们的特征细化模块建立在图卷积网络的基础上,有利于上下文信息的捕获和不同语义对象的区分。我们在雾天场景下的检测和分割任务中验证了我们的方法。大量的实验结果表明,我们的方法优于最先进的基于去雾的预处理和模糊图像的微调结果。
{"title":"Haze-robust image understanding via context-aware deep feature refinement","authors":"Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/MMSP48831.2020.9287089","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287089","url":null,"abstract":"Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Translation of Perceived Video Quality Across Displays 感知视频质量跨显示器的翻译
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287143
Jessie Lin, N. Birkbeck, Balu Adsumilli
Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.
显示设备可以显著影响视频的感知质量。在本文中,我们关注视频分辨率不超过屏幕分辨率的场景,并研究手机,笔记本电脑和电视上感知视频质量的关系。提出了一种在不同设备之间的平均意见分数(MOS)的新转换,并被证明在实验室和人群来源的主观研究中有效地规范化用户设备之间的评分。该模型使我们能够更专注于实验室主观研究,因为我们可以减少测试设备的数量,并帮助我们减少在众包主观视频质量测试期间的噪音。它也比利用现有设备相关的客观指标在设备间转换MOS评级更有效。
{"title":"Translation of Perceived Video Quality Across Displays","authors":"Jessie Lin, N. Birkbeck, Balu Adsumilli","doi":"10.1109/MMSP48831.2020.9287143","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287143","url":null,"abstract":"Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127693552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Compressing Head-Related Transfer Function databases by Eigen decomposition 基于特征分解的头部相关传递函数数据库压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287134
Camilo Arévalo, J. Villegas
A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).
介绍了一种减少头部相关传递函数(hrtf)内存占用的方法。基于hrtf的特征分解,该方法能够将包含6344个测量值的数据库从36.30 MB减少到2.41MB(约15:1的压缩比)。将压缩数据库中的合成hrtf设置为在0.1和16 kHz之间具有小于1dB的频谱失真。压缩测量值与原始数据库中的测量值之间的差异似乎不会转化为感知定位精度的降低。该方法获得的高压缩度允许在数据库中包含插值的hrtf,以缓解虚拟现实(VR)中的实时音频空间化。
{"title":"Compressing Head-Related Transfer Function databases by Eigen decomposition","authors":"Camilo Arévalo, J. Villegas","doi":"10.1109/MMSP48831.2020.9287134","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287134","url":null,"abstract":"A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132370010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming 移动边缘协同多用户360°视频计算和流媒体
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287148
Jacob Chakareski, Nicholas Mastronarde
We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.
我们研究了一种新型的通信系统,该系统集成了可扩展的多层360°视频平铺,视口自适应速率失真优化资源分配,以及以VR为中心的边缘计算和缓存,以实现未来高质量的不受约束的VR流。我们的系统包括一组5G小单元,可以汇集它们的通信、计算和存储资源,共同向移动VR客户端提供更高质量的可扩展360°视频内容。我们的主要贡献是严格设计多层360°平铺和统计用户导航的相关模型,分析和优化基于边缘的多用户VR流,集成视口适应和服务器合作。我们还探讨了网络编码数据操作的可能性及其对本文所追求的分析、优化和系统性能的影响。我们在提供沉浸式保真度方面取得了相当大的进步,具有更高的360°视口峰值信噪比(PSNR)和VR视频帧率和空间分辨率。
{"title":"Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming","authors":"Jacob Chakareski, Nicholas Mastronarde","doi":"10.1109/MMSP48831.2020.9287148","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287148","url":null,"abstract":"We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning for Individual Listening Zone 个人聆听区的深度学习
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287161
Giovanni Pepe, L. Gabrielli, S. Squartini, L. Cattani, Carlo Tripodi
A recent trend in car audio systems is the generation of Individual Listening Zones (ILZ), allowing to improve phone call privacy and reduce disturbance to other passengers, without wearing headphones or earpieces. This is generally achieved by using loudspeaker arrays. In this paper, we describe an approach to achieve ILZ exploiting general purpose car loudspeakers and processing the signal through carefully designed Finite Impulse Response (FIR) filters. We propose a deep neural network approach for the design of filters coefficients in order to obtain a so-called bright zone, where the signal is clearly heard, and a dark zone, where the signal is attenuated. Additionally, the frequency response in the bright zone is constrained to be as flat as possible. Numerical experiments were performed taking the impulse responses measured with either one binaural pair or three binaural pairs for each passenger. The results in terms of attenuation and flatness prove the viability of the approach.
汽车音响系统最近的一个趋势是个人听音区(ILZ)的产生,可以提高通话隐私,减少对其他乘客的干扰,而无需戴耳机或耳塞。这通常是通过使用扬声器阵列来实现的。在本文中,我们描述了一种利用通用汽车扬声器实现ILZ的方法,并通过精心设计的有限脉冲响应(FIR)滤波器处理信号。我们提出了一种深度神经网络方法来设计滤波器系数,以获得一个所谓的亮区,其中信号被清晰地听到,和一个暗区,其中信号被衰减。此外,在明亮区域的频率响应被限制为尽可能平坦。对每位乘客分别使用一对或三对双耳进行脉冲响应测量,并进行了数值实验。在衰减和平整度方面的结果证明了该方法的可行性。
{"title":"Deep Learning for Individual Listening Zone","authors":"Giovanni Pepe, L. Gabrielli, S. Squartini, L. Cattani, Carlo Tripodi","doi":"10.1109/MMSP48831.2020.9287161","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287161","url":null,"abstract":"A recent trend in car audio systems is the generation of Individual Listening Zones (ILZ), allowing to improve phone call privacy and reduce disturbance to other passengers, without wearing headphones or earpieces. This is generally achieved by using loudspeaker arrays. In this paper, we describe an approach to achieve ILZ exploiting general purpose car loudspeakers and processing the signal through carefully designed Finite Impulse Response (FIR) filters. We propose a deep neural network approach for the design of filters coefficients in order to obtain a so-called bright zone, where the signal is clearly heard, and a dark zone, where the signal is attenuated. Additionally, the frequency response in the bright zone is constrained to be as flat as possible. Numerical experiments were performed taking the impulse responses measured with either one binaural pair or three binaural pairs for each passenger. The results in terms of attenuation and flatness prove the viability of the approach.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127745710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
V-PCC Component Synchronization for Point Cloud Reconstruction 点云重建的V-PCC组件同步
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287092
D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto
For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.
为了使V-PCC系统能够重建点云的单个实例,必须将一个V-PCC单元转移到3D点云重建模块。然而,需要所有的V-PCC组件,即占用图,几何,地图集和属性暂时对齐。原则上,这可能会带来挑战,因为被解码的子比特流的时间结构在V-PCC子比特流中不一致。本文提出了一种解码后的V-PCC子比特流的输出延迟调整机制,为点云重构模块提供同步的V-PCC分量输入。
{"title":"V-PCC Component Synchronization for Point Cloud Reconstruction","authors":"D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto","doi":"10.1109/MMSP48831.2020.9287092","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287092","url":null,"abstract":"For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126168165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1