H.264 Color Components Video Decoding Parallelization on Multi-core Processors

Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine
{"title":"H.264 Color Components Video Decoding Parallelization on Multi-core Processors","authors":"Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine","doi":"10.1109/DSD.2010.76","DOIUrl":null,"url":null,"abstract":"Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多核处理器上的H.264彩色分量视频解码并行化
多处理器单片系统将成为嵌入式系统的主导架构,因为它提供了并发性的增加,从而改善了系统的性能,而不是增加影响系统功耗的时钟速度。然而,为了提高系统在不同应用环境下的性能,需要利用并发性。新出现的H.264/AVC编码标准旨在涵盖广泛的应用(实时会话服务,如视频会议,视频电话等)。与以前的视频编码标准相比,它有许多需要复杂计算的新功能。这种编码标准对于未来的MPSoC嵌入式系统来说将是一个具有挑战性的工作负载。利用视频编解码器应用程序的不同级别的并行性可以在数据级、功能级或同时完成。本文的目的是探索H.264解码器软件[2]本身存在的自然并行性,而无需对编码器相位进行任何修改,而不是强制并行化技术。我们的新想法是基于H.264解码器分别解码亮度和色度信号的事实,但解码器以串行解码它们的方式实现。我们的方法是并行执行亮度信号的不同解码阶段。使用两个核并行解码亮度和色度信号可以获得15-20%的解码处理时间,并将它们结合在四个核或更多的功能流水线上实现,与目前的顺序执行相比,增益可以达到60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multicore SDR Architecture for Reconfigurable WiMAX Downlink Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit Low Latency Recovery from Transient Faults for Pipelined Processor Architectures System Level Hardening by Computing with Matrices Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1