Tunable VVC Frame Partitioning based on Lightweight Machine Learning.

IF 10.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Image Processing Pub Date : 2019-09-06 DOI:10.1109/TIP.2019.2938670
Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron
{"title":"Tunable VVC Frame Partitioning based on Lightweight Machine Learning.","authors":"Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron","doi":"10.1109/TIP.2019.2938670","DOIUrl":null,"url":null,"abstract":"<p><p>Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.8000,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TIP.2019.2938670","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于轻量级机器学习的可调 VVC 帧分区。
块分割结构是视频编码方案中的一个关键模块,它能使压缩性能达到明显的差距。在对未来视频编码标准--多功能视频编码(VVC)--的探索中,引入了一种新的四叉树二叉树(QTBT)块分割结构。除了在高效视频编码(HEVC)标准中定义的 QT 块分区外,还启用了新的水平和垂直 BT 分区,这与 HEVC 相比大大增加了编码时间。在本文中,我们提出了一种基于机器学习 (ML) 方法的轻量级可调 QTBT 分区方案。建议的解决方案使用随机森林分类器为每个编码块确定最可能的分区模式。为了尽量减少错误分类造成的编码损失,建议的解决方案中引入了分类器决策的风险区间。通过改变风险区间的大小,可在降低编码复杂度和编码损失之间实现可调整的权衡。在 JEM-7.0 软件中实施的拟议解决方案在随机存取(RA)编码配置中仅增加了 0.7% 到 3.0% 的 Bjxntegaard Delta Rate (BDBR),编码复杂度平均降低了 30%,而随机森林造成的开销非常小。在 VTM-5.0 软件下,基于随机森林分类器的拟议解决方案还能有效降低多类型树(MTT)分区方案的复杂度,在 BD-BR 仅增加 0.4% 至 2.2% 的情况下,复杂度平均降低 25% 至 61%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Image Processing
IEEE Transactions on Image Processing 工程技术-工程:电子与电气
CiteScore
20.90
自引率
6.60%
发文量
774
审稿时长
7.6 months
期刊介绍: The IEEE Transactions on Image Processing delves into groundbreaking theories, algorithms, and structures concerning the generation, acquisition, manipulation, transmission, scrutiny, and presentation of images, video, and multidimensional signals across diverse applications. Topics span mathematical, statistical, and perceptual aspects, encompassing modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Pertinent applications range from image and video communications to electronic imaging, biomedical imaging, image and video systems, and remote sensing.
期刊最新文献
GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity A Versatile Framework for Unsupervised Domain Adaptation based on Instance Weighting Revisiting Domain-Adaptive Semantic Segmentation via Knowledge Distillation RegSeg: An End-to-End Network for Multimodal RGB-Thermal Registration and Semantic Segmentation Salient Object Detection in RGB-D Videos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1