Tunable VVC Frame Partitioning based on Lightweight Machine Learning.

IF 13.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Image Processing Pub Date : 2019-09-06 DOI:10.1109/TIP.2019.2938670

Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron

{"title":"Tunable VVC Frame Partitioning based on Lightweight Machine Learning.","authors":"Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron","doi":"10.1109/TIP.2019.2938670","DOIUrl":null,"url":null,"abstract":"<p><p>Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":13.7000,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TIP.2019.2938670","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于轻量级机器学习的可调 VVC 帧分区。

块分割结构是视频编码方案中的一个关键模块，它能使压缩性能达到明显的差距。在对未来视频编码标准--多功能视频编码（VVC）--的探索中，引入了一种新的四叉树二叉树（QTBT）块分割结构。除了在高效视频编码（HEVC）标准中定义的 QT 块分区外，还启用了新的水平和垂直 BT 分区，这与 HEVC 相比大大增加了编码时间。在本文中，我们提出了一种基于机器学习 (ML) 方法的轻量级可调 QTBT 分区方案。建议的解决方案使用随机森林分类器为每个编码块确定最可能的分区模式。为了尽量减少错误分类造成的编码损失，建议的解决方案中引入了分类器决策的风险区间。通过改变风险区间的大小，可在降低编码复杂度和编码损失之间实现可调整的权衡。在 JEM-7.0 软件中实施的拟议解决方案在随机存取（RA）编码配置中仅增加了 0.7% 到 3.0% 的 Bjxntegaard Delta Rate (BDBR)，编码复杂度平均降低了 30%，而随机森林造成的开销非常小。在 VTM-5.0 软件下，基于随机森林分类器的拟议解决方案还能有效降低多类型树（MTT）分区方案的复杂度，在 BD-BR 仅增加 0.4% 至 2.2% 的情况下，复杂度平均降低 25% 至 61%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Image Processing 工程技术-工程：电子与电气

CiteScore

20.90

自引率

6.60%

发文量

774

审稿时长

7.6 months

期刊介绍： The IEEE Transactions on Image Processing delves into groundbreaking theories, algorithms, and structures concerning the generation, acquisition, manipulation, transmission, scrutiny, and presentation of images, video, and multidimensional signals across diverse applications. Topics span mathematical, statistical, and perceptual aspects, encompassing modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Pertinent applications range from image and video communications to electronic imaging, biomedical imaging, image and video systems, and remote sensing.

期刊最新文献

Robust Source-Free Domain Adaptation From Non-Robust Source Models Partially Supervised Compositional Zero-Shot Learning by Class-Balanced Distribution Alignment MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion A Greedy Strategy for Graph Cut