首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
An occlusion compensation model for improving the reconstruction quality of light field 一种提高光场重建质量的遮挡补偿模型
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287094
Jinjie Bi, Weiyan Chen, Changjian Zhu, Hong Zhang, Min Tan
Occlusion lack compensation (OLC) is a multi-plexing gain optimization data acquisition and novel views rendering strategy for light field rendering (LFR). While the achieved OLC is much higher than previously thought possible, the improvement comes at the cost of requiring more scene information. This can capture more detailed scene information, including geometric information, texture information and depth information, by learning and training methods. In this paper, we develop an occlusion compensation (OCC) model based on restricted boltzmann machine (RBM) to compensate for lack scene information caused by occlusion. We show that occlusion will cause the lack of captured scene information, which will lead to the decline of view rendering quality. The OCC model can estimate and compensate the lack information of occlusion edge by learning. We present experimental results to demonstrate the performance of OCC model with analog training, verify our theoretical analysis, and extend our conclusions on optimal rendering quality of light field.
遮挡缺失补偿(OLC)是光场渲染(LFR)中一种多路复用增益优化数据采集和新颖的视图渲染策略。虽然实现的OLC比以前想象的要高得多,但这种改进是以需要更多的场景信息为代价的。通过学习和训练方法,可以捕获更详细的场景信息,包括几何信息、纹理信息和深度信息。本文提出了一种基于受限玻尔兹曼机(RBM)的遮挡补偿(OCC)模型,用于补偿遮挡导致的场景信息缺失。我们发现遮挡会导致捕获场景信息的缺失,从而导致视图渲染质量的下降。OCC模型可以通过学习来估计和补偿遮挡边缘信息的缺失。通过模拟训练,实验结果验证了OCC模型的性能,验证了我们的理论分析,并扩展了我们关于光场最佳渲染质量的结论。
{"title":"An occlusion compensation model for improving the reconstruction quality of light field","authors":"Jinjie Bi, Weiyan Chen, Changjian Zhu, Hong Zhang, Min Tan","doi":"10.1109/MMSP48831.2020.9287094","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287094","url":null,"abstract":"Occlusion lack compensation (OLC) is a multi-plexing gain optimization data acquisition and novel views rendering strategy for light field rendering (LFR). While the achieved OLC is much higher than previously thought possible, the improvement comes at the cost of requiring more scene information. This can capture more detailed scene information, including geometric information, texture information and depth information, by learning and training methods. In this paper, we develop an occlusion compensation (OCC) model based on restricted boltzmann machine (RBM) to compensate for lack scene information caused by occlusion. We show that occlusion will cause the lack of captured scene information, which will lead to the decline of view rendering quality. The OCC model can estimate and compensate the lack information of occlusion edge by learning. We present experimental results to demonstrate the performance of OCC model with analog training, verify our theoretical analysis, and extend our conclusions on optimal rendering quality of light field.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115574942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto-Encoder based Structured Dictinoary Learning 基于自动编码器的结构化字典学习
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287153
Deyin Liu, Yuan Wu, Liangchen Liu, Qichang Hu, Lin Qi
Dictionary learning and deep learning are two popular representation learning paradigms, which can be combined to boost the classification task. However, existing combination methods often learn multiple dictionaries embedded in a cascade of layers, and a specialized classifier accordingly. This may inattentively lead to overfitting and high computational cost. In this paper, we present a novel deep auto-encoding architecture to learn only a dictionary for classification. To empower the dictionary with discrimination, we construct the dictionary with class-specific sub-dictionaries, and introduce supervision by imposing category constraints. The proposed framework is inspired by a sparse optimization method, namely Iterative Shrinkage Thresholding Algorithm, which characterizes the learning process by the forward-propagation based optimization w.r.t the dictionary only, reducing the number of parameters to learn and the computational cost dramatically. Extensive experiments demonstrate the effectiveness of our method in image classification.
字典学习和深度学习是两种流行的表示学习范式,它们可以结合起来提升分类任务。然而,现有的组合方法通常学习嵌入在级联层中的多个字典,并相应地学习专门的分类器。这可能会不经意地导致过拟合和高计算成本。在本文中,我们提出了一种新的深度自动编码体系结构,只学习字典进行分类。为了赋予字典辨别能力,我们用类特定的子字典构造字典,并通过施加类别约束引入监督。该框架的灵感来自于一种稀疏优化方法,即迭代收缩阈值算法,该算法通过仅使用字典的前向传播优化来表征学习过程,大大减少了学习参数的数量和计算成本。大量的实验证明了该方法在图像分类中的有效性。
{"title":"Auto-Encoder based Structured Dictinoary Learning","authors":"Deyin Liu, Yuan Wu, Liangchen Liu, Qichang Hu, Lin Qi","doi":"10.1109/MMSP48831.2020.9287153","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287153","url":null,"abstract":"Dictionary learning and deep learning are two popular representation learning paradigms, which can be combined to boost the classification task. However, existing combination methods often learn multiple dictionaries embedded in a cascade of layers, and a specialized classifier accordingly. This may inattentively lead to overfitting and high computational cost. In this paper, we present a novel deep auto-encoding architecture to learn only a dictionary for classification. To empower the dictionary with discrimination, we construct the dictionary with class-specific sub-dictionaries, and introduce supervision by imposing category constraints. The proposed framework is inspired by a sparse optimization method, namely Iterative Shrinkage Thresholding Algorithm, which characterizes the learning process by the forward-propagation based optimization w.r.t the dictionary only, reducing the number of parameters to learn and the computational cost dramatically. Extensive experiments demonstrate the effectiveness of our method in image classification.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"283 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120895950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Bound of Mutual Information for Fairness in Classification 分类公平性的互信息变分界
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287139
Zahir Alsulaimawi
Machine learning applications have emerged in many aspects of our lives, such as for credit lending, insurance rates, and employment applications. Consequently, it is required that such systems be nondiscriminatory and fair in sensitive features user, e.g., race, sexual orientation, and religion. To address this issue, this paper develops a minimax adversarial framework, called features protector (FP) framework, to achieve the information-theoretical trade-off between minimizing distortion of target data and ensuring that sensitive features have similar distributions. We evaluate the performance of the proposed framework on two real-world datasets. Preliminary empirical evaluation shows that our framework provides both accurate and fair decisions.
机器学习应用已经出现在我们生活的许多方面,比如信用贷款、保险费率和就业申请。因此,要求这些系统在敏感的用户特征(如种族、性取向和宗教)上是非歧视性和公平的。为了解决这个问题,本文开发了一个极大极小对抗框架,称为特征保护(FP)框架,以实现最小化目标数据失真和确保敏感特征具有相似分布之间的信息理论权衡。我们在两个真实数据集上评估了所提出的框架的性能。初步的实证评估表明,我们的框架提供了准确和公平的决策。
{"title":"Variational Bound of Mutual Information for Fairness in Classification","authors":"Zahir Alsulaimawi","doi":"10.1109/MMSP48831.2020.9287139","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287139","url":null,"abstract":"Machine learning applications have emerged in many aspects of our lives, such as for credit lending, insurance rates, and employment applications. Consequently, it is required that such systems be nondiscriminatory and fair in sensitive features user, e.g., race, sexual orientation, and religion. To address this issue, this paper develops a minimax adversarial framework, called features protector (FP) framework, to achieve the information-theoretical trade-off between minimizing distortion of target data and ensuring that sensitive features have similar distributions. We evaluate the performance of the proposed framework on two real-world datasets. Preliminary empirical evaluation shows that our framework provides both accurate and fair decisions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121542704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Leveraging Active Perception for Improving Embedding-based Deep Face Recognition 利用主动感知改进基于嵌入的深度人脸识别
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287085
N. Passalis, A. Tefas
Even though recent advances in deep learning (DL) led to tremendous improvements for various computer and robotic vision tasks, existing DL approaches suffer from a significant limitation: they typically ignore that robots and cyber-physical systems are capable of interacting with the environment in order to better sense their surroundings. In this work we argue that perceiving the world through physical interaction, i.e., employing active perception, allows for both increasing the accuracy of DL models, as well as for deploying smaller and faster models. To this end, we propose an active perception-based face recognition approach, which is capable of simultaneously extracting discriminative embeddings, as well as predicting in which direction the robot must move in order to get a more discriminative view. To the best of our knowledge, we provide the first embedding-based active perception method for deep face recognition. As we experimentally demonstrate, the proposed method can indeed lead to significant improvements, increasing the face recognition accuracy up to 9%, as well as allowing for using overall smaller and faster models, reducing the number of parameters by over one order of magnitude.
尽管深度学习(DL)的最新进展为各种计算机和机器人视觉任务带来了巨大的改进,但现有的深度学习方法存在一个显著的局限性:它们通常忽略了机器人和网络物理系统能够与环境相互作用,以便更好地感知周围环境。在这项工作中,我们认为通过物理交互感知世界,即采用主动感知,既可以提高深度学习模型的准确性,也可以部署更小、更快的模型。为此,我们提出了一种基于主动感知的人脸识别方法,该方法能够同时提取判别嵌入,并预测机器人必须朝哪个方向移动,以获得更具判别性的视图。据我们所知,我们为深度人脸识别提供了第一个基于嵌入的主动感知方法。正如我们通过实验证明的那样,所提出的方法确实可以带来显着的改进,将人脸识别准确率提高到9%,并且允许使用整体更小、更快的模型,将参数数量减少了一个数量级以上。
{"title":"Leveraging Active Perception for Improving Embedding-based Deep Face Recognition","authors":"N. Passalis, A. Tefas","doi":"10.1109/MMSP48831.2020.9287085","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287085","url":null,"abstract":"Even though recent advances in deep learning (DL) led to tremendous improvements for various computer and robotic vision tasks, existing DL approaches suffer from a significant limitation: they typically ignore that robots and cyber-physical systems are capable of interacting with the environment in order to better sense their surroundings. In this work we argue that perceiving the world through physical interaction, i.e., employing active perception, allows for both increasing the accuracy of DL models, as well as for deploying smaller and faster models. To this end, we propose an active perception-based face recognition approach, which is capable of simultaneously extracting discriminative embeddings, as well as predicting in which direction the robot must move in order to get a more discriminative view. To the best of our knowledge, we provide the first embedding-based active perception method for deep face recognition. As we experimentally demonstrate, the proposed method can indeed lead to significant improvements, increasing the face recognition accuracy up to 9%, as well as allowing for using overall smaller and faster models, reducing the number of parameters by over one order of magnitude.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"2 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113961970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Triangulation-Based Backward Adaptive Motion Field Subsampling Scheme 一种基于三角形的后向自适应运动场子采样方案
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287064
Fabian Brand, Jürgen Seiler, E. Alshina, A. Kaup
Optical flow procedures are used to generate dense motion fields which approximate true motion. Such fields contain a large amount of data and if we need to transmit such a field, the raw data usually exceeds the raw data of the two images it was computed from. In many scenarios, however, it is of interest to transmit a dense motion field efficiently. Most prominently this is the case in inter prediction for video coding. In this paper we propose a transmission scheme based on subsampling the motion field. Since a field which was subsampled with a regularly spaced pattern usually yields suboptimal results, we propose an adaptive subsampling algorithm that preferably samples vectors at positions where changes in motion occur. The subsampling pattern is fully reconstructable without the need for signaling of position information. We show an average gain of 2.95 dB in average end point error compared to regular subsampling. Furthermore we show that an additional prediction stage can improve the results by an additional 0.43 dB, gaining 3.38 dB in total.
光流程序用来产生密集的运动场,近似真实的运动。这样的字段包含了大量的数据,如果我们需要传输这样的字段,原始数据通常会超过计算它的两幅图像的原始数据。然而,在许多情况下,如何有效地传输密集运动场是很重要的。最突出的是视频编码的内部预测。本文提出了一种基于运动场子采样的传输方案。由于用规则间隔的模式对场进行次采样通常会产生次优结果,因此我们提出了一种自适应次采样算法,该算法优选在运动发生变化的位置对向量进行采样。该子采样模式是完全可重构的,不需要位置信息的信号。与常规子采样相比,我们显示平均端点误差的平均增益为2.95 dB。此外,我们还表明,增加一个预测阶段可以使结果额外提高0.43 dB,总计提高3.38 dB。
{"title":"A Triangulation-Based Backward Adaptive Motion Field Subsampling Scheme","authors":"Fabian Brand, Jürgen Seiler, E. Alshina, A. Kaup","doi":"10.1109/MMSP48831.2020.9287064","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287064","url":null,"abstract":"Optical flow procedures are used to generate dense motion fields which approximate true motion. Such fields contain a large amount of data and if we need to transmit such a field, the raw data usually exceeds the raw data of the two images it was computed from. In many scenarios, however, it is of interest to transmit a dense motion field efficiently. Most prominently this is the case in inter prediction for video coding. In this paper we propose a transmission scheme based on subsampling the motion field. Since a field which was subsampled with a regularly spaced pattern usually yields suboptimal results, we propose an adaptive subsampling algorithm that preferably samples vectors at positions where changes in motion occur. The subsampling pattern is fully reconstructable without the need for signaling of position information. We show an average gain of 2.95 dB in average end point error compared to regular subsampling. Furthermore we show that an additional prediction stage can improve the results by an additional 0.43 dB, gaining 3.38 dB in total.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127743370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MMSP 2020 Author Information Page MMSP 2020作者信息页面
Pub Date : 2020-09-21 DOI: 10.1109/mmsp48831.2020.9287111
{"title":"MMSP 2020 Author Information Page","authors":"","doi":"10.1109/mmsp48831.2020.9287111","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287111","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134515267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Motion Magnification based on Same-Frame Optical Flow Computations 基于同帧光流计算的混合运动放大
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287152
J. A. Lima, C. Miosso, Mylène C. Q. Farias
Motion magnification refers to the ability of amplifying small movements in a video in order to reveal important information about the observed scene. In the past, several motion magnification methods have been proposed, but most of them have the disadvantage of introducing annoying visual artifacts in the video. In this paper, we propose a method that analyses the optical flow between each original frame and the corresponding motion-magnified frame and, then, synthesizes a new motion-magnified video by remapping the original video using the generated optical flow map. The proposed approach is able to eliminate the artifacts that appear in Eulerian methods. Also, it is able to amplify the motion by an extra factor of 2 and to invert the motion direction.
运动放大是指放大视频中的微小运动,以揭示观察到的场景的重要信息的能力。过去,已经提出了几种运动放大方法,但大多数方法的缺点是在视频中引入令人讨厌的视觉伪影。在本文中,我们提出了一种方法,通过分析每个原始帧与相应的运动放大帧之间的光流,然后利用生成的光流图对原始视频进行重新映射,合成一个新的运动放大视频。提出的方法能够消除欧拉方法中出现的伪影。此外,它还能够将运动放大2倍,并反转运动方向。
{"title":"Hybrid Motion Magnification based on Same-Frame Optical Flow Computations","authors":"J. A. Lima, C. Miosso, Mylène C. Q. Farias","doi":"10.1109/MMSP48831.2020.9287152","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287152","url":null,"abstract":"Motion magnification refers to the ability of amplifying small movements in a video in order to reveal important information about the observed scene. In the past, several motion magnification methods have been proposed, but most of them have the disadvantage of introducing annoying visual artifacts in the video. In this paper, we propose a method that analyses the optical flow between each original frame and the corresponding motion-magnified frame and, then, synthesizes a new motion-magnified video by remapping the original video using the generated optical flow map. The proposed approach is able to eliminate the artifacts that appear in Eulerian methods. Also, it is able to amplify the motion by an extra factor of 2 and to invert the motion direction.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparative Analysis of the Time and Energy Demand of Versatile Video Coding and High Efficiency Video Coding Reference Decoders 通用视频编码与高效视频编码参考解码器的时间和能量需求比较分析
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287098
Matthias Kränzler, Christian Herglotz, A. Kaup
This paper investigates the decoding energy and decoding time demand of VTM-7.0 in relation to HM-16.20. We present the first detailed comparison of two video codecs in terms of software decoder energy consumption. The evaluation shows that the energy demand of the VTM decoder is increased significantly compared to HM and that the increase depends on the coding configuration. For the coding configuration randomaccess, we find that the decoding energy is increased by over 80% at a decoding time increase of over 70%. Furthermore, results indicate that the energy demand increases by up to 207% when Single Instruction Multiple Data (SIMD) instructions are disabled, which corresponds to the HM implementation style. By measurements, it is revealed that the coding tools MIP, AMVR, TPM, LFNST, and MTS increase the energy efficiency of the decoder. Furthermore, we propose a new coding configuration based on our analysis, which reduces the energy demand of the VTM decoder by over 17% on average.
本文研究了VTM-7.0的解码能量和解码时间需求与HM-16.20的关系。本文首先详细比较了两种视频编解码器在软件解码器能耗方面的差异。评估结果表明,VTM解码器的能量需求比HM解码器明显增加,并且这种增加与编码配置有关。对于编码配置随机访问,我们发现解码能量增加了80%以上,解码时间增加了70%以上。此外,结果表明,当禁用单指令多数据(SIMD)指令时,能量需求增加高达207%,这与HM实现风格相对应。通过测量,发现编码工具MIP、AMVR、TPM、LFNST和MTS提高了解码器的能量效率。在此基础上,我们提出了一种新的编码结构,使VTM解码器的能量需求平均降低17%以上。
{"title":"A Comparative Analysis of the Time and Energy Demand of Versatile Video Coding and High Efficiency Video Coding Reference Decoders","authors":"Matthias Kränzler, Christian Herglotz, A. Kaup","doi":"10.1109/MMSP48831.2020.9287098","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287098","url":null,"abstract":"This paper investigates the decoding energy and decoding time demand of VTM-7.0 in relation to HM-16.20. We present the first detailed comparison of two video codecs in terms of software decoder energy consumption. The evaluation shows that the energy demand of the VTM decoder is increased significantly compared to HM and that the increase depends on the coding configuration. For the coding configuration randomaccess, we find that the decoding energy is increased by over 80% at a decoding time increase of over 70%. Furthermore, results indicate that the energy demand increases by up to 207% when Single Instruction Multiple Data (SIMD) instructions are disabled, which corresponds to the HM implementation style. By measurements, it is revealed that the coding tools MIP, AMVR, TPM, LFNST, and MTS increase the energy efficiency of the decoder. Furthermore, we propose a new coding configuration based on our analysis, which reduces the energy demand of the VTM decoder by over 17% on average.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133309537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
VMAF Based Rate-Distortion Optimization for Video Coding 基于VMAF的视频编码率失真优化
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287114
Sai Deng, Jingning Han, Yaowu Xu
Video Multi-method Assessment Fusion (VMAF) is a machine-learning based video quality metric. It is experimentally shown to provide higher correlation with human visual system as compared to conventional metrics like peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) in many scenarios and has drawn considerable interest as an alternative metric to evaluate the perceptual quality. This work proposes a systematic approach to improve the video compression performance in VMAF. It is composed of multiple components including a pre-processing stage with a complement automatic filter parameter selection, and a modified rate-distortion optimization framework tailored for VMAF metric. The proposed scheme achieves on average 37% BD-rate reduction in VMAF, as compared to conventional video codec optimized for PSNR.
视频多方法评估融合(VMAF)是一种基于机器学习的视频质量度量。实验表明,在许多情况下,与峰值信噪比(PSNR)和结构相似性指数(SSIM)等传统指标相比,它与人类视觉系统提供了更高的相关性,并且作为评估感知质量的替代指标引起了相当大的兴趣。本文提出了一种改进VMAF视频压缩性能的系统方法。它由多个部分组成,包括具有互补自动滤波参数选择的预处理阶段,以及针对VMAF度量量身定制的改进的率失真优化框架。与针对PSNR优化的传统视频编解码器相比,该方案在VMAF中平均降低了37%的bd速率。
{"title":"VMAF Based Rate-Distortion Optimization for Video Coding","authors":"Sai Deng, Jingning Han, Yaowu Xu","doi":"10.1109/MMSP48831.2020.9287114","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287114","url":null,"abstract":"Video Multi-method Assessment Fusion (VMAF) is a machine-learning based video quality metric. It is experimentally shown to provide higher correlation with human visual system as compared to conventional metrics like peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) in many scenarios and has drawn considerable interest as an alternative metric to evaluate the perceptual quality. This work proposes a systematic approach to improve the video compression performance in VMAF. It is composed of multiple components including a pre-processing stage with a complement automatic filter parameter selection, and a modified rate-distortion optimization framework tailored for VMAF metric. The proposed scheme achieves on average 37% BD-rate reduction in VMAF, as compared to conventional video codec optimized for PSNR.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115062581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Use of a deep convolutional neural network to diagnose disease in the rose by means of a photographic image 利用深度卷积神经网络通过摄影图像来诊断玫瑰的疾病
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287081
O. A. Miloserdov, N. S. Ovcharenko, A. Makarenko
The article presents particulars of developing a plant disease detection system based on analysis of photo-graphic images by deep convolutional neural networks. A original lightweight neural network architecture is used (only 13 480 trained parameters) that is tens and hundreds of times more compact than typical solutions. Real-life field data is used for training and testing, with photographs taken in adverse conditions: variation in hardware quality, angles, lighting conditions, scales (from macro shots of individual fragments of leaf and stem to several rose bushes in one picture), and complex disorienting backgrounds. An adaptive decision-making rule is used, based on the Bayes’ theorem and Wald’s sequential probability ratio test, in order to improve reliability of the results. A following example is provided: detection of disease on leaves and stems of rose from images taken in the visible spectrum. The authors were able attain the quality of 90.6% on real-life data (F1 score, one input image, test dataset).
本文介绍了一种基于深度卷积神经网络的植物病害检测系统。使用原始的轻量级神经网络架构(只有13480个训练参数),比典型解决方案紧凑数十倍甚至数百倍。真实的现场数据用于训练和测试,在不利条件下拍摄的照片:硬件质量、角度、照明条件、尺度(从单片叶子和茎的微距镜头到一张照片中的几株玫瑰丛)的变化,以及复杂的定向障碍背景。为了提高结果的可靠性,采用了基于贝叶斯定理和Wald序列概率比检验的自适应决策规则。提供了以下示例:从可见光谱中拍摄的图像检测玫瑰叶和茎上的疾病。作者能够在真实数据(F1分数,一个输入图像,测试数据集)上达到90.6%的质量。
{"title":"Use of a deep convolutional neural network to diagnose disease in the rose by means of a photographic image","authors":"O. A. Miloserdov, N. S. Ovcharenko, A. Makarenko","doi":"10.1109/MMSP48831.2020.9287081","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287081","url":null,"abstract":"The article presents particulars of developing a plant disease detection system based on analysis of photo-graphic images by deep convolutional neural networks. A original lightweight neural network architecture is used (only 13 480 trained parameters) that is tens and hundreds of times more compact than typical solutions. Real-life field data is used for training and testing, with photographs taken in adverse conditions: variation in hardware quality, angles, lighting conditions, scales (from macro shots of individual fragments of leaf and stem to several rose bushes in one picture), and complex disorienting backgrounds. An adaptive decision-making rule is used, based on the Bayes’ theorem and Wald’s sequential probability ratio test, in order to improve reliability of the results. A following example is provided: detection of disease on leaves and stems of rose from images taken in the visible spectrum. The authors were able attain the quality of 90.6% on real-life data (F1 score, one input image, test dataset).","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129463878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1