首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Local Luminance Patterns for Point Cloud Quality Assessment 用于点云质量评估的局部亮度模式
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287154
Rafael Diniz, P. Freitas, Mylène C. Q. Farias
In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.
近年来,点云(PC)作为表示3D视觉内容的首选数据结构越来越受欢迎。PC应用程序的示例范围从小物体的3D表示到大地图。PC的出现引发了新的编码、传输和表示方法的发展。与此同时,还出现了评估PC内容视觉质量的新方法。本文提出了一种新的PC内容的客观全参考视觉质量度量,它使用了一个被提议的描述符,称为局部亮度模式(LLP)。它提取参考PC和测试PC的亮度信息的统计数据,并将其统计数据进行比较,以评估测试PC的感知质量。所提出的PC机质量评价方法适用于大型和小型PC机。使用公开可用的PC质量数据集,我们将所提出的方法与当前最先进的PC质量指标进行了比较,得到了相互竞争的结果。
{"title":"Local Luminance Patterns for Point Cloud Quality Assessment","authors":"Rafael Diniz, P. Freitas, Mylène C. Q. Farias","doi":"10.1109/MMSP48831.2020.9287154","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287154","url":null,"abstract":"In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116343394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Graph-based skeleton data compression 基于图的骨架数据压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287103
Pratyusha Das, Antonio Ortega
With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.
随着可靠、快速、便携式采集系统的发展,人体运动捕捉数据正广泛应用于许多工业、医疗和监控应用中。这些系统可以同时跟踪多人,提供全身骨骼关键点以及面部、手部和脚部更详细的地标。这导致需要传输或存储大量的骨架数据。本文介绍了基于图的骨架压缩(GSC),这是一种有效的基于图的近无损压缩方法。我们使用了一个可分离的时空图变换和非均匀量化,系数扫描和熵编码与游程码近无损压缩。我们在大型NTU-RGB活动数据集上评估了所提出方法的压缩性能。我们的方法优于一维离散余弦变换方法沿时间方向应用。在近无损模式下,我们提出的压缩不影响动作识别性能。
{"title":"Graph-based skeleton data compression","authors":"Pratyusha Das, Antonio Ortega","doi":"10.1109/MMSP48831.2020.9287103","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287103","url":null,"abstract":"With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input 基于低帧率输入的高帧率虚拟视图合成
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287076
K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek
In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.
本文研究了基于低帧率摄像机获取高分辨率、高帧率虚拟视图的方法,并将其应用于高性能多视图系统。我们演示了如何为多视图采集系统设置同步以记录所需的数据,然后如何处理数据以更高的帧速率创建虚拟视图,同时保持视图的高分辨率。我们分析了将时间帧插值与另一种侧视图合成技术相结合的各种方法,该技术使我们能够创建虚拟视点所需的高帧率视频。结果表明,所提出的方法能够提供预期的高质量、高分辨率和高帧率的虚拟视图。
{"title":"High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input","authors":"K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek","doi":"10.1109/MMSP48831.2020.9287076","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287076","url":null,"abstract":"In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition 基于小波分解的多准则对比度增强评价方法
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287051
Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi
An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
一种有效的对比度增强方法不仅要提高图像的感知质量,而且要避免添加任何伪影或影响图像的自然度。这使得对比度增强评估(CEE)成为一项具有挑战性的任务,因为需要检查图像质量的改善和不必要的副作用。目前,还没有一种单一的CEE指标可以很好地适用于所有类型的增强标准。在本文中,我们提出了一种新的多标准CEE (MCCEE)测量,它有效地结合了不同的指标来给出一个单一的质量分数。为了充分发挥这些度量的潜力,我们进一步提出将它们应用于小波变换分解后的图像。这个新度量已经在两个自然图像对比度增强数据库以及医学计算机断层扫描(CT)图像上进行了测试。与现有的评估指标相比,结果显示了实质性的改进。度量的代码可在:https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
{"title":"A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition","authors":"Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi","doi":"10.1109/MMSP48831.2020.9287051","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287051","url":null,"abstract":"An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data 基于深度学习的到达时差估计——从声学模拟到记录数据
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287131
Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske
The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.
声源的空间信息由声波传递到传声器阵列,通过估计传声器之间的相位和幅度差来观测。到达时间差(TDoA)捕获麦克风之间波前的传播延迟,可用于引导波束形成器或定位源。但是混响和干扰会使TDoA估计变差。与传统的基于相关的方法相比,深度神经网络通过监督学习可以在更恶劣的条件下提取语音相关的tdoa。声学模拟提供了大量带有注释的数据,而真实记录需要手动注释或使用带有适当校准程序的参考传感器。这两个数据源的分布可能不同。当使用模拟数据训练的DNN模型与来自不同分布的真实数据呈现时,如果不适当处理,其性能会下降。为了降低基于深度神经网络的TDoA估计误差,本研究探讨了不同输入归一化技术的作用,混合模拟和真实数据进行训练,并应用对抗域自适应技术。结果量化了使用不同方法对真实数据的TDoA误差的减少。在训练过程中使用归一化方法、领域自适应和真实数据可以明显降低TDoA误差。
{"title":"Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data","authors":"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske","doi":"10.1109/MMSP48831.2020.9287131","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287131","url":null,"abstract":"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131418773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Learning Off-the-shelf Holistic Feature Descriptors for Visual Place Recognition in Challenging Conditions 在具有挑战性的条件下用于视觉位置识别的深度学习现成的整体特征描述符
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287063
Farid Aliajni, Esa Rahtu
In this paper, we present a comprehensive study on the utility of deep learning feature extraction methods for visual place recognition task in three challenging conditions, appearance variation, viewpoint variation and combination of both appearance and viewpoint variation. We extensively compared the performance of convolutional neural network architectures with batch normalization layers in terms of fraction of the correct matches. These architectures are primarily trained for image classification and object detection problems and used as holistic feature descriptors for visual place recognition task. To verify effectiveness of our results, we utilized four real world datasets in place recognition. Our investigation demonstrates that convolutional neural network architectures coupled with batch normalization and trained for other tasks in computer vision outperform architectures which are specifically designed for place recognition tasks.
在本文中,我们全面研究了深度学习特征提取方法在外观变化、视点变化以及外观和视点变化相结合的三种挑战性条件下的视觉位置识别任务中的应用。我们从正确匹配的比例方面广泛地比较了卷积神经网络架构与批处理归一化层的性能。这些结构主要用于图像分类和目标检测问题,并用作视觉位置识别任务的整体特征描述符。为了验证结果的有效性,我们在位置识别中使用了四个真实世界的数据集。我们的研究表明,卷积神经网络体系结构与批处理归一化和计算机视觉中其他任务的训练相结合,优于专门为位置识别任务设计的体系结构。
{"title":"Deep Learning Off-the-shelf Holistic Feature Descriptors for Visual Place Recognition in Challenging Conditions","authors":"Farid Aliajni, Esa Rahtu","doi":"10.1109/MMSP48831.2020.9287063","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287063","url":null,"abstract":"In this paper, we present a comprehensive study on the utility of deep learning feature extraction methods for visual place recognition task in three challenging conditions, appearance variation, viewpoint variation and combination of both appearance and viewpoint variation. We extensively compared the performance of convolutional neural network architectures with batch normalization layers in terms of fraction of the correct matches. These architectures are primarily trained for image classification and object detection problems and used as holistic feature descriptors for visual place recognition task. To verify effectiveness of our results, we utilized four real world datasets in place recognition. Our investigation demonstrates that convolutional neural network architectures coupled with batch normalization and trained for other tasks in computer vision outperform architectures which are specifically designed for place recognition tasks.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132467147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Criminal Sketching with Generative Adversarial Network 基于生成对抗网络的犯罪素描研究
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287084
Hanzhou Wu, Yuwei Yao, Xinpeng Zhang, Jiangfeng Wang
Criminal sketching aims to draw an approximation portrait of the criminal suspect by details of the criminal suspect that the observer can remember. However, even for a professional artist, it would need much time to complete sketching and draw a good portrait. It therefore motivates us to study forensic sketching with a generative adversarial network based architecture, which allows us to synthesize a real-like portrait of the criminal suspect described by an eyewitness. The proposed work contains two steps: sketch generation and portrait generation. For the former, a facial outline is sketched based on the descriptive details. For the latter, the facial details are completed to generate a portrait. To make the portrait more realistic, we use a portrait discriminator, which can not only learn the discriminative features between the faces synthesized by the generator and the real faces, but also recognize the face attributes. Experiments have shown that this work achieves promising performance for criminal sketching.
犯罪素描的目的是通过观察者能够记住的犯罪嫌疑人的细节,画出犯罪嫌疑人的近似肖像。然而,即使对于一个专业的艺术家来说,完成素描和画出一幅好的肖像也需要很多时间。因此,它促使我们使用基于生成对抗网络的架构来研究法医素描,这使我们能够合成目击者描述的犯罪嫌疑人的真实肖像。所提出的工作包括两个步骤:素描生成和肖像生成。对于前者,面部轮廓是基于描述性细节勾画出来的。对于后者,完成面部细节以生成肖像。为了使人像更加逼真,我们使用了人像识别器,它不仅可以学习生成器合成的人脸与真实人脸之间的判别特征,还可以识别人脸属性。实验表明,该方法在罪犯素描方面取得了良好的效果。
{"title":"Towards Criminal Sketching with Generative Adversarial Network","authors":"Hanzhou Wu, Yuwei Yao, Xinpeng Zhang, Jiangfeng Wang","doi":"10.1109/MMSP48831.2020.9287084","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287084","url":null,"abstract":"Criminal sketching aims to draw an approximation portrait of the criminal suspect by details of the criminal suspect that the observer can remember. However, even for a professional artist, it would need much time to complete sketching and draw a good portrait. It therefore motivates us to study forensic sketching with a generative adversarial network based architecture, which allows us to synthesize a real-like portrait of the criminal suspect described by an eyewitness. The proposed work contains two steps: sketch generation and portrait generation. For the former, a facial outline is sketched based on the descriptive details. For the latter, the facial details are completed to generate a portrait. To make the portrait more realistic, we use a portrait discriminator, which can not only learn the discriminative features between the faces synthesized by the generator and the real faces, but also recognize the face attributes. Experiments have shown that this work achieves promising performance for criminal sketching.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134300242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Adaptive Inference Leveraging Bag-of-Features-based Early Exits 利用基于特征袋的早期退出的有效自适应推理
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287150
N. Passalis, Jenni Raitoharju, M. Gabbouj, A. Tefas
Early exits provide an effective way of implementing adaptive computational graphs over deep learning models. In this way it is possible to adapt them on-the-fly to the available computational resources or even to the difficulty of each input sample, reducing the energy and computational power requirements in many embedded and mobile applications. However, performing this kind of adaptive inference also comes with several challenges, since the difficulty of each sample must be estimated and the most appropriate early exit must be selected. It is worth noting that existing approaches often lead to highly unbalanced distributions over the selected early exits, reducing the efficiency of the adaptive inference process. At the same time, only a few resources can be devoted to the aforementioned process, in order to ensure that an adequate speedup will be obtained. The main contribution of this work is to provide an easy to use and tune adaptive inference approach for early exits that can overcome some of these limitations. In this way, the proposed method allows for a) obtaining a more balanced inference distribution among the early exits, b) relying on a single and interpretable hyperparameter for tuning its behavior (ranging from faster inference to higher accuracy), and c) improving the performance of the networks (increasing the accuracy and reducing the time needed for inference). Indeed, the effectiveness of the proposed method over existing approaches is demonstrated using four different image datasets.
早期退出提供了在深度学习模型上实现自适应计算图的有效方法。通过这种方式,可以实时调整它们以适应可用的计算资源,甚至每个输入样本的难度,从而降低许多嵌入式和移动应用程序中的能量和计算能力要求。然而,执行这种自适应推理也带来了一些挑战,因为必须估计每个样本的难度,并且必须选择最合适的早期退出。值得注意的是,现有的方法经常导致在选择的早期出口上的高度不平衡分布,降低了自适应推理过程的效率。同时,只有很少的资源可用于上述进程,以确保获得充分的加速。这项工作的主要贡献是为早期退出提供了一种易于使用和调整的自适应推理方法,可以克服这些限制。通过这种方式,所提出的方法允许a)在早期出口之间获得更平衡的推理分布,b)依靠单个可解释的超参数来调整其行为(从更快的推理到更高的精度),以及c)提高网络的性能(提高准确性并减少推理所需的时间)。实际上,使用四种不同的图像数据集证明了所提出方法优于现有方法的有效性。
{"title":"Efficient Adaptive Inference Leveraging Bag-of-Features-based Early Exits","authors":"N. Passalis, Jenni Raitoharju, M. Gabbouj, A. Tefas","doi":"10.1109/MMSP48831.2020.9287150","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287150","url":null,"abstract":"Early exits provide an effective way of implementing adaptive computational graphs over deep learning models. In this way it is possible to adapt them on-the-fly to the available computational resources or even to the difficulty of each input sample, reducing the energy and computational power requirements in many embedded and mobile applications. However, performing this kind of adaptive inference also comes with several challenges, since the difficulty of each sample must be estimated and the most appropriate early exit must be selected. It is worth noting that existing approaches often lead to highly unbalanced distributions over the selected early exits, reducing the efficiency of the adaptive inference process. At the same time, only a few resources can be devoted to the aforementioned process, in order to ensure that an adequate speedup will be obtained. The main contribution of this work is to provide an easy to use and tune adaptive inference approach for early exits that can overcome some of these limitations. In this way, the proposed method allows for a) obtaining a more balanced inference distribution among the early exits, b) relying on a single and interpretable hyperparameter for tuning its behavior (ranging from faster inference to higher accuracy), and c) improving the performance of the networks (increasing the accuracy and reducing the time needed for inference). Indeed, the effectiveness of the proposed method over existing approaches is demonstrated using four different image datasets.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134302294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction 基于带间递归预测的HEVC多光谱图像压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287132
Anna Meyer, Nils Genser, A. Kaup
Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.
光学传感器的最新发展为多光谱成像提供了广泛的应用,例如,在监视,光学分选和生命科学仪器中。提高空间和光谱分辨率可以创建更高质量的产品,然而,它在处理如此大量的数据时带来了挑战。因此,需要专门的多光谱图像压缩技术。高效视频编码(HEVC)被认为是视频编码和静止图像编码效率最高的技术。本文提出了一种基于HEVC的多光谱数据高效编码的交叉光谱压缩方案。通过一种新的带间预测器扩展图像内预测,可以有效地利用光谱和空间冗余。采用自适应线性回归模型,综合考虑了当前波段与进一步光谱参考之间的依赖关系。所提出的反向预测方案不需要额外的侧信息进行解码。我们表明,我们的新方法能够在率失真性能方面优于最先进的有损压缩技术。在不同的数据集上,与HEVC和文献中的参考方法相比,平均Bjøntegaard δ速率分别节省了82%和55%。
{"title":"Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction","authors":"Anna Meyer, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287132","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287132","url":null,"abstract":"Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content 基于比特流的视频质量模型ITU-T P.1204.3在游戏内容上的大规模评估
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287055
Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake
The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.
近年来,游戏内容流(无论是被动的还是互动的)呈现出多种形式。游戏内容带来了一些传统2D视频所不具备的特点,例如内容的人工合成性质或游戏中物体的重复。此外,由于游戏内容的特殊性,用户对游戏内容的感知与传统2D视频不同,而且用户可能并不经常观看这类内容。因此,评估通常为传统2D视频设计的现有视频质量模型是否适用于游戏内容就变得势在必行。在本文中,我们评估了最近标准化的基于比特流的视频质量模型ITU-T P.1204.3在游戏内容上的适用性。为了分析该模型的性能,我们使用了4个不同的游戏数据集(3个公开可用+ 1个内部数据集),并将其与现有的最先进的模型进行比较。我们发现,ITU P.1204.3开箱模型在这些未见过的数据集上表现良好,在所有4个数据库中,5点绝对类别评级的RMSE范围为0.38 - 0.45,Pearson相关性为0.85 - 0.93。我们进一步提出了P.1204.3模型的全高清版本,因为原始模型经过了训练和验证,目标分辨率为4K/UHD-1。在所有数据库中使用50:50分割来训练和验证该变体,以确保所建议的模型适用于各种条件。
{"title":"A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content","authors":"Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake","doi":"10.1109/MMSP48831.2020.9287055","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287055","url":null,"abstract":"The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133234512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1