H.264 Color Components Video Decoding Parallelization on Multi-core Processors

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools Pub Date : 2010-09-01 DOI:10.1109/DSD.2010.76

Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine

{"title":"H.264 Color Components Video Decoding Parallelization on Multi-core Processors","authors":"Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine","doi":"10.1109/DSD.2010.76","DOIUrl":null,"url":null,"abstract":"Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多核处理器上的H.264彩色分量视频解码并行化

多处理器单片系统将成为嵌入式系统的主导架构，因为它提供了并发性的增加，从而改善了系统的性能，而不是增加影响系统功耗的时钟速度。然而，为了提高系统在不同应用环境下的性能，需要利用并发性。新出现的H.264/AVC编码标准旨在涵盖广泛的应用(实时会话服务，如视频会议，视频电话等)。与以前的视频编码标准相比，它有许多需要复杂计算的新功能。这种编码标准对于未来的MPSoC嵌入式系统来说将是一个具有挑战性的工作负载。利用视频编解码器应用程序的不同级别的并行性可以在数据级、功能级或同时完成。本文的目的是探索H.264解码器软件[2]本身存在的自然并行性，而无需对编码器相位进行任何修改，而不是强制并行化技术。我们的新想法是基于H.264解码器分别解码亮度和色度信号的事实，但解码器以串行解码它们的方式实现。我们的方法是并行执行亮度信号的不同解码阶段。使用两个核并行解码亮度和色度信号可以获得15-20%的解码处理时间，并将它们结合在四个核或更多的功能流水线上实现，与目前的顺序执行相比，增益可以达到60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

自引率

0.00%

发文量

期刊最新文献

A Multicore SDR Architecture for Reconfigurable WiMAX Downlink Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit Low Latency Recovery from Transient Faults for Pipelined Processor Architectures System Level Hardening by Computing with Matrices Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration