Parallel multi-view HEVC for heterogeneously embedded cluster system

IF 2.1 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS Parallel Computing Pub Date : 2022-09-01 DOI:10.1016/j.parco.2022.102948

Seo Jin Jang , Wei Liu , Wei Li , Yong Beom Cho

{"title":"Parallel multi-view HEVC for heterogeneously embedded cluster system","authors":"Seo Jin Jang , Wei Liu , Wei Li , Yong Beom Cho","doi":"10.1016/j.parco.2022.102948","DOIUrl":null,"url":null,"abstract":"<div>In this paper, we present a computer cluster with heterogeneous computing components intended to provide concurrency and parallelism with embedded processors to achieve a real-time Multi-View High-Efficiency Video Coding (MV-HEVC) encoder/decoder with a maximum resolution of 1088p. The latest MV-HEVC standard represents a significant improvement over the previous video coding standard (MVC). However, the MV-HEVC standard also has higher computational complexity. To this point, research using the MV-HEVC has had to use the Central Processing Unit (CPU) on a Personal Computer (PC) or workstation for decompression, because MV-HEVC is much more complex than High-Efficiency Video Coding (HEVC), and because decompressors need higher parallelism to decompress in real time. It is particularly difficult to encode/decode in an embedded device. Therefore, we propose a novel framework for an MV-HEVC encoder/decoder that is based on a heterogeneously distributed embedded system. To this end, we use a parallel computing method to divide the video into multiple blocks and then code the blocks independently in each sub-work node with a group of pictures and a coding tree unit level. To appropriately assign the tasks to each work node, we propose a new allocation method that makes the operation of the entire heterogeneously distributed system more efficient. Our experimental results show that, compared to the single device (3D-HTM single threading), the proposed distributed MV-HEVC decoder and encoder performance increased approximately (20.39 and 68.7) times under 20 devices (multithreading) with the CTU level of a 1088p resolution video, respectively. Further, at the proposed GOP level, the decoder and encoder performance with 20 devices (multithreading) respectively increased approximately (20.78 and 77) times for a 1088p resolution video with heterogeneously distributed computing compared to the single device (3D-HTM single threading).</div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"112 ","pages":"Article 102948"},"PeriodicalIF":2.1000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167819122000448","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we present a computer cluster with heterogeneous computing components intended to provide concurrency and parallelism with embedded processors to achieve a real-time Multi-View High-Efficiency Video Coding (MV-HEVC) encoder/decoder with a maximum resolution of 1088p. The latest MV-HEVC standard represents a significant improvement over the previous video coding standard (MVC). However, the MV-HEVC standard also has higher computational complexity. To this point, research using the MV-HEVC has had to use the Central Processing Unit (CPU) on a Personal Computer (PC) or workstation for decompression, because MV-HEVC is much more complex than High-Efficiency Video Coding (HEVC), and because decompressors need higher parallelism to decompress in real time. It is particularly difficult to encode/decode in an embedded device. Therefore, we propose a novel framework for an MV-HEVC encoder/decoder that is based on a heterogeneously distributed embedded system. To this end, we use a parallel computing method to divide the video into multiple blocks and then code the blocks independently in each sub-work node with a group of pictures and a coding tree unit level. To appropriately assign the tasks to each work node, we propose a new allocation method that makes the operation of the entire heterogeneously distributed system more efficient. Our experimental results show that, compared to the single device (3D-HTM single threading), the proposed distributed MV-HEVC decoder and encoder performance increased approximately (20.39 and 68.7) times under 20 devices (multithreading) with the CTU level of a 1088p resolution video, respectively. Further, at the proposed GOP level, the decoder and encoder performance with 20 devices (multithreading) respectively increased approximately (20.78 and 77) times for a 1088p resolution video with heterogeneously distributed computing compared to the single device (3D-HTM single threading).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构嵌入式集群系统的并行多视图HEVC

在本文中，我们提出了一个具有异构计算组件的计算机集群，旨在通过嵌入式处理器提供并发性和并行性，以实现最大分辨率为1088p的实时多视图高效视频编码(MV-HEVC)编码器/解码器。最新的MV-HEVC标准比之前的视频编码标准(MVC)有了显著的改进。然而，MV-HEVC标准也具有较高的计算复杂度。到目前为止，使用MV-HEVC的研究必须使用个人计算机(PC)或工作站上的中央处理器(CPU)进行解压，因为MV-HEVC比高效视频编码(HEVC)复杂得多，而且解压缩器需要更高的并行度才能实时解压缩。在嵌入式设备中进行编码/解码尤其困难。因此，我们提出了一种基于异构分布式嵌入式系统的MV-HEVC编码器/解码器的新框架。为此，我们采用并行计算的方法将视频分成多个块，然后在每个子工作节点上以一组图片和一个编码树单元级对这些块进行独立编码。为了将任务合理地分配给各个工作节点，我们提出了一种新的分配方法，使整个异构分布式系统的运行效率更高。实验结果表明，与单设备(3D-HTM单线程)相比，本文提出的分布式MV-HEVC解码器和编码器性能在20个设备(多线程)下，以1088p分辨率视频的CTU级别分别提高了约20.39倍和68.7倍。此外，在建议的GOP水平上，与单设备(3D-HTM单线程)相比，在异构分布式计算的1088p分辨率视频中，20个设备(多线程)的解码器和编码器性能分别提高了大约(20.78和77)倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Parallel Computing 工程技术-计算机：理论方法

CiteScore

3.50

自引率

7.10%

发文量

审稿时长

4.5 months

期刊介绍： Parallel Computing is an international journal presenting the practical use of parallel computer systems, including high performance architecture, system software, programming systems and tools, and applications. Within this context the journal covers all aspects of high-end parallel computing from single homogeneous or heterogenous computing nodes to large-scale multi-node systems. Parallel Computing features original research work and review articles as well as novel or illustrative accounts of application experience with (and techniques for) the use of parallel computers. We also welcome studies reproducing prior publications that either confirm or disprove prior published results. Particular technical areas of interest include, but are not limited to: -System software for parallel computer systems including programming languages (new languages as well as compilation techniques), operating systems (including middleware), and resource management (scheduling and load-balancing). -Enabling software including debuggers, performance tools, and system and numeric libraries. -General hardware (architecture) concepts, new technologies enabling the realization of such new concepts, and details of commercially available systems -Software engineering and productivity as it relates to parallel computing -Applications (including scientific computing, deep learning, machine learning) or tool case studies demonstrating novel ways to achieve parallelism -Performance measurement results on state-of-the-art systems -Approaches to effectively utilize large-scale parallel computing including new algorithms or algorithm analysis with demonstrated relevance to real applications using existing or next generation parallel computer architectures. -Parallel I/O systems both hardware and software -Networking technology for support of high-speed computing demonstrating the impact of high-speed computation on parallel applications