首页 > 最新文献

2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia最新文献

英文 中文
Fast configuration of MEMS-based storage devices for streaming applications 流媒体应用的mems存储设备的快速配置
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336829
Mohammed G. Khatib, H. W. V. Dijk
An exciting class of storage devices is emerging: the class of Micro-Electro-Mechanical storage Systems (MEMS). Properties of MEMS-based storage devices include high density, small form factor, and low power. The use of this type of devices in mobile infotainment systems, such as video cameras is not at all obvious. We must explore their configuration and assess their benefit with respect to existing devices, such as Flash.
一类令人兴奋的存储设备正在出现:微机电存储系统(MEMS)。基于mems的存储器件具有高密度、小尺寸和低功耗等特点。这类设备在移动信息娱乐系统(如摄像机)中的应用并不明显。我们必须探索它们的配置,并评估它们相对于现有设备(如Flash)的优势。
{"title":"Fast configuration of MEMS-based storage devices for streaming applications","authors":"Mohammed G. Khatib, H. W. V. Dijk","doi":"10.1109/ESTMED.2009.5336829","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336829","url":null,"abstract":"An exciting class of storage devices is emerging: the class of Micro-Electro-Mechanical storage Systems (MEMS). Properties of MEMS-based storage devices include high density, small form factor, and low power. The use of this type of devices in mobile infotainment systems, such as video cameras is not at all obvious. We must explore their configuration and assess their benefit with respect to existing devices, such as Flash.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"AES-6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126502280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs 探索同步数据流图的性能和资源需求之间的权衡
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336821
Yang Yang, M. Geilen, T. Basten, S. Stuijk, H. Corporaal
Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10–50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.
同步数据流图(sdfg)被广泛用于模拟流应用,如信号处理和多媒体应用。这些通常在资源受限的嵌入式平台上实现,从pda和手机到汽车设备和打印系统。在这些产品的生命周期中,从设计时裁剪平台到目标应用程序,再到运行时的资源管理,资源使用和性能之间的权衡分析是至关重要的。我们提出了一种基于模型检查技术和利用数据流领域知识的sdfg权衡分析方法。我们开发了一些结果来修剪SDFG的状态空间,以便在不失去最优性的情况下进行多目标模型检查。为了实现对大型状态空间的可伸缩性,我们将这些修剪技术与实用启发式相结合。我们用两组实验来评估我们的技术。其中一组显示,我们现在可以对共享内存架构进行吞吐量-存储权衡分析,与现有的基于分布式内存的分析相比,内存使用减少了10-50%。第二组实验显示了我们的技术如何支持专业打印机系统数字数据路径的设计空间探索。分析时间从少于一秒到最多几分钟不等。
{"title":"Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs","authors":"Yang Yang, M. Geilen, T. Basten, S. Stuijk, H. Corporaal","doi":"10.1109/ESTMED.2009.5336821","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336821","url":null,"abstract":"Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10–50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114298050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications 多媒体应用中芯片多处理器的内核间数据重用和流水线
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336815
L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha
The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.
对低功耗、高性能多媒体嵌入式系统的需求日益增长,在紧张的功耗预算下,对带宽和延迟的要求也越来越高,需要有效的解决方案来满足应用。随着技术的扩展,必须对应用程序进行优化,以充分利用底层资源,同时满足功耗和性能要求。我们提出了一种方法,能够通过代码转换发现并启用并行机会,有效地分配资源之间的计算负载,并最大限度地减少不必要的数据传输。我们的方法将应用程序的任务分解为更小的计算单元(称为内核),这些计算单元分布在不同的处理资源上,并通过流水线进行处理。我们利用内核间数据重用的思想来减少内核和早期执行边之间不必要的数据传输,以提高性能。我们在JPEG2000案例研究上的实验结果表明,与标准应用程序映射方法相比,性能提高了80%,动态功耗降低了60%。
{"title":"Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications","authors":"L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha","doi":"10.1109/ESTMED.2009.5336815","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336815","url":null,"abstract":"The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129658583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Robust image processing for an omnidirectional camera-based smart car door 基于全向摄像头的智能车门鲁棒图像处理
Pub Date : 2009-11-17 DOI: 10.1145/2362336.2362354
C. Scharfenberger, S. Chakraborty, G. Färber
Over the last one decade there has been an increasing emphasis on driver-assistance systems for the automotive domain. In this paper we report our work on designing a camera-based surveillance system embedded in a “smart” car door. Such a camera is used to monitor the ambient environment outside the car — e.g., the presence of obstacles such as approaching cars or cyclists who might collide with the car door if opened — and automatically control the car door operations. This is an enhancement to the currently available side-view mirrors which the driver/passenger checks before opening the car door. The focus of this paper is on fast and robust image processing algorithms specifically targeting such a smart car door system. The requirement is to quickly detect traffic objects of interest from gray-scale images captured by omnidirectional cameras. Whereas known algorithms for object extraction from the image processing literature rely on color information and are sensitive to shadows and illumination changes, our proposed algorithms are highly robust, can operate on gray-scale images (color images are not available in our setup) and output results in real-time. To illustrate these, we present a number of experimental results based on image sequences captured from real-life traffic scenarios.
在过去的十年中,人们越来越重视汽车领域的驾驶员辅助系统。在这篇论文中,我们报告了我们在设计一个嵌入在“智能”车门上的基于摄像头的监控系统的工作。这种摄像头用于监控车外的环境——例如,障碍物的存在,如接近的汽车或骑自行车的人,如果打开车门可能会与车门相撞——并自动控制车门的操作。这是对目前可用的侧视镜的改进,驾驶员/乘客在打开车门之前检查侧视镜。本文的重点是针对这种智能车门系统的快速和鲁棒的图像处理算法。其要求是从全向摄像机捕获的灰度图像中快速检测出感兴趣的交通目标。从图像处理文献中提取物体的已知算法依赖于颜色信息,并且对阴影和照明变化敏感,而我们提出的算法具有高度鲁棒性,可以在灰度图像上操作(在我们的设置中不提供彩色图像)并实时输出结果。为了说明这一点,我们给出了一些基于从现实交通场景中捕获的图像序列的实验结果。
{"title":"Robust image processing for an omnidirectional camera-based smart car door","authors":"C. Scharfenberger, S. Chakraborty, G. Färber","doi":"10.1145/2362336.2362354","DOIUrl":"https://doi.org/10.1145/2362336.2362354","url":null,"abstract":"Over the last one decade there has been an increasing emphasis on driver-assistance systems for the automotive domain. In this paper we report our work on designing a camera-based surveillance system embedded in a “smart” car door. Such a camera is used to monitor the ambient environment outside the car — e.g., the presence of obstacles such as approaching cars or cyclists who might collide with the car door if opened — and automatically control the car door operations. This is an enhancement to the currently available side-view mirrors which the driver/passenger checks before opening the car door. The focus of this paper is on fast and robust image processing algorithms specifically targeting such a smart car door system. The requirement is to quickly detect traffic objects of interest from gray-scale images captured by omnidirectional cameras. Whereas known algorithms for object extraction from the image processing literature rely on color information and are sensitive to shadows and illumination changes, our proposed algorithms are highly robust, can operate on gray-scale images (color images are not available in our setup) and output results in real-time. To illustrate these, we present a number of experimental results based on image sequences captured from real-life traffic scenarios.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126013386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Optimal stack frame placement and transfer for energy reduction targeting embedded processors with scratch-pad memories 针对带有刮刮板存储器的嵌入式处理器的最优堆栈帧放置和传输节能
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336819
L. Gauthier, T. Ishihara
Memory accesses are a major cause of energy consumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part of frames between a scratch-pad memory and the main memory. The technique utilizes an integer linear formulation of the problem in order to find at compile time the optimal management for the frames. The technique is also extended to integrate existing methods which deal with static memory objects and others which deal with recursive functions. Experimental results show that our technique effectively exploits an available scratch-pad memory space which is only one half of what the stack requires to reduce the stack-related energy consumption by more than 90% for several applications and on an average of 84% compared to the case where all the frames of the stack are placed into the main memory.
内存访问是嵌入式系统能耗的主要原因,堆栈是数据访问的常见目标。本文提出了一种全软件技术,旨在通过在刮刮板存储器和主存储器之间分配和传输帧或部分帧来降低与堆栈相关的能量消耗。该技术利用问题的整数线性公式,以便在编译时找到对帧的最佳管理。该技术还扩展到集成处理静态内存对象的现有方法和处理递归函数的其他方法。实验结果表明,我们的技术有效地利用了一个可用的刮刮板存储空间,这只是堆栈所需的一半,对于几个应用,将堆栈相关的能耗降低了90%以上,与堆栈的所有帧放置在主存储器的情况相比,平均降低了84%。
{"title":"Optimal stack frame placement and transfer for energy reduction targeting embedded processors with scratch-pad memories","authors":"L. Gauthier, T. Ishihara","doi":"10.1109/ESTMED.2009.5336819","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336819","url":null,"abstract":"Memory accesses are a major cause of energy consumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part of frames between a scratch-pad memory and the main memory. The technique utilizes an integer linear formulation of the problem in order to find at compile time the optimal management for the frames. The technique is also extended to integrate existing methods which deal with static memory objects and others which deal with recursive functions. Experimental results show that our technique effectively exploits an available scratch-pad memory space which is only one half of what the stack requires to reduce the stack-related energy consumption by more than 90% for several applications and on an average of 84% compared to the case where all the frames of the stack are placed into the main memory.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs 在多处理器系统上使用原线程和窗口fifo高效执行Kahn进程网络
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336828
Wolfgang Haid, Lars Schor, Kai Huang, Iuliana Bacivarov, L. Thiele
As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed as Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. We show that application developers can use this framework to achieve considerable speed-ups on the Cell Broadband Engine without needing to write architecture-specific code.
随着单处理器系统无法有效扩展,多处理器系统变得越来越流行。虽然在硬件中设计多处理器系统存在许多挑战,但编写利用多处理器计算能力的高效并行应用程序可能更具挑战性。在本文中,我们介绍了一个框架,该框架允许在多处理器系统上使用原线程和窗口fifo有效地执行表示为Kahn进程网络的应用程序。我们展示了应用程序开发人员可以使用该框架在Cell宽带引擎上实现相当大的加速,而无需编写特定于体系结构的代码。
{"title":"Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs","authors":"Wolfgang Haid, Lars Schor, Kai Huang, Iuliana Bacivarov, L. Thiele","doi":"10.1109/ESTMED.2009.5336828","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336828","url":null,"abstract":"As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed as Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. We show that application developers can use this framework to achieve considerable speed-ups on the Cell Broadband Engine without needing to write architecture-specific code.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
The wizard of OS: a heartbeat for Legacy multimedia applications OS向导:Legacy多媒体应用程序的心跳
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336825
T. Cucinotta, Luca Abeni, L. Palopoli, Fabio Checconi
Multimedia applications are often characterised by implicit temporal constraints but, in many cases, they are not programmed using any specialised real-time API. These “Legacy applications” have no way to communicate their temporal constraints to the OS kernel, and their quality of service (QoS), being necessarily linked to the temporal behaviour, fails to satisfy acceptable standards. In this paper we propose an innovative way for dealing with these applications, based on the combination of an on-line identification mechanism (which extracts from high-level observations such important parameters as the execution rate) and an adaptive scheduler (specialised for legacy applications) that identifies the correct amount of CPU needed by each application. Preliminary experimental results are reported, proving the effectiveness of the proposed idea in providing a widely used multimedia player on Linux with appropriate QoS guarantees, through an appropriate choice of the scheduling parameters. Finally, a detailed road-map is presented with the possible extensions to the approach.
多媒体应用程序通常以隐式时间约束为特征,但在许多情况下,它们没有使用任何专门的实时API进行编程。这些“遗留应用程序”没有办法将它们的时间约束传达给OS内核,并且它们的服务质量(QoS)必然与时间行为相关联,无法满足可接受的标准。在本文中,我们提出了一种处理这些应用程序的创新方法,该方法基于在线识别机制(从高级观察中提取诸如执行速率等重要参数)和自适应调度器(专门用于遗留应用程序)的组合,该机制可以识别每个应用程序所需的正确CPU数量。初步的实验结果表明,通过合理选择调度参数,可以有效地为Linux上广泛使用的多媒体播放器提供适当的QoS保证。最后,给出了详细的路线图,包括该方法的可能扩展。
{"title":"The wizard of OS: a heartbeat for Legacy multimedia applications","authors":"T. Cucinotta, Luca Abeni, L. Palopoli, Fabio Checconi","doi":"10.1109/ESTMED.2009.5336825","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336825","url":null,"abstract":"Multimedia applications are often characterised by implicit temporal constraints but, in many cases, they are not programmed using any specialised real-time API. These “Legacy applications” have no way to communicate their temporal constraints to the OS kernel, and their quality of service (QoS), being necessarily linked to the temporal behaviour, fails to satisfy acceptable standards. In this paper we propose an innovative way for dealing with these applications, based on the combination of an on-line identification mechanism (which extracts from high-level observations such important parameters as the execution rate) and an adaptive scheduler (specialised for legacy applications) that identifies the correct amount of CPU needed by each application. Preliminary experimental results are reported, proving the effectiveness of the proposed idea in providing a widely used multimedia player on Linux with appropriate QoS guarantees, through an appropriate choice of the scheduling parameters. Finally, a detailed road-map is presented with the possible extensions to the approach.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Software parallel CAVLC encoder based on stream processing 基于流处理的软件并行CAVLC编码器
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336822
Ju Ren, Yi He, Wei Wu, M. Wen, N. Wu, Chunyuan Zhang
Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and precedence constraints, which becomes a bottleneck to implement H.264 encoder efficiently. This paper presents a software parallel CAVLC encoder based on stream processing. Many approaches are explored to solve the restrictions of parallelizing CAVLC caused by data dependency and branch/loop instructions. Experiment results show that our parallel CAVLC encoder on two stream processing platforms of STORM and GPU achieves 3.03x and 2.08x speedup over the original serial CAVLC respectively. Finally, the proposed parallel CAVLC encoder coupled with stream processor enables a real-time encoding of 1080p H.264 video.
高清晰度H.264视频的实时编码是当前嵌入式可编程处理器面临的一个挑战。大多数gpu和可编程处理器支持的新兴流处理方法提供了一种强大的机制来实现媒体/信号处理的惊人高性能,这为应对这一挑战带来了机会。然而,传统的串行CAVLC具有高度的输入依赖性和优先性约束,成为H.264编码器高效实现的瓶颈。提出了一种基于流处理的软件并行CAVLC编码器。为了解决数据依赖性和分支/循环指令对并行化CAVLC的限制,研究了许多方法。实验结果表明,我们的并行CAVLC编码器在STORM和GPU两种流处理平台上的速度分别比原来的串行CAVLC提高了3.03倍和2.08倍。最后,本文提出的并行CAVLC编码器与流处理器相结合,实现了1080p H.264视频的实时编码。
{"title":"Software parallel CAVLC encoder based on stream processing","authors":"Ju Ren, Yi He, Wei Wu, M. Wen, N. Wu, Chunyuan Zhang","doi":"10.1109/ESTMED.2009.5336822","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336822","url":null,"abstract":"Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and precedence constraints, which becomes a bottleneck to implement H.264 encoder efficiently. This paper presents a software parallel CAVLC encoder based on stream processing. Many approaches are explored to solve the restrictions of parallelizing CAVLC caused by data dependency and branch/loop instructions. Experiment results show that our parallel CAVLC encoder on two stream processing platforms of STORM and GPU achieves 3.03x and 2.08x speedup over the original serial CAVLC respectively. Finally, the proposed parallel CAVLC encoder coupled with stream processor enables a real-time encoding of 1080p H.264 video.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132029268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An effective dictionary-based display frame compressor 一个有效的基于字典的显示框压缩器
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336820
Hui-Ting Yang, Jian-Wen Chen, Huang-Chih Kuo, Y. Lin
For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080HD video. It is much more effective than all previous works.
对于所有视频应用程序,在有限的时间内处理大量数据。这些数据通常存储在低成本、慢速的外部DRAM中,这导致对内存带宽的要求很高。内存带宽将决定系统的性能,特别是对于运行在嵌入式系统上的应用程序。本文提出了一种有效的视频解码系统中显示帧的基于字典的压缩和解压缩算法,并给出了其硬件实现。我们将提出的设计集成到H.264/AVC视频解码器中。仿真结果表明,该算法在解码1080HD视频时,压缩率提高54%,内存流量减少34%。它比以前所有的作品都有效得多。
{"title":"An effective dictionary-based display frame compressor","authors":"Hui-Ting Yang, Jian-Wen Chen, Huang-Chih Kuo, Y. Lin","doi":"10.1109/ESTMED.2009.5336820","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336820","url":null,"abstract":"For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080HD video. It is much more effective than all previous works.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A high-throughput pipelined architecture for JPEG XR encoding JPEG XR编码的高吞吐量流水线架构
Pub Date : 2009-11-17 DOI: 10.1109/ESTMED.2009.5336818
Koichi Hattori, Hiroshi Tsutsui, H. Ochi, Yukihiro Nakamura
JPEG XR is an emerging image coding standard, based on HD Photo developed by Microsoft. It supports high compression performance twice as high as the de facto image coding system, namely JPEG, and also has an advantage over JPEG 2000 in terms of computational cost. JPEG XR is expected to be widespread for many devices including embedded systems in the near future. In this paper, we propose a novel architecture for JPEG XR encoding. In previous architectures, entropy coding was the throughput bottleneck because it was implemented as a sequential algorithm to handle data with dependency. We found that there is no dependency in intra-macroblock data, and we could safely pipeline all the encoding processes including the entropy coding. The proposed fully-pipelined architecture achieves 100 M pixel/sec at 125 MHz which could not be achieved by previous works.
JPEG XR是一种新兴的图像编码标准,基于微软开发的HD Photo。它支持比实际的图像编码系统(即JPEG)高两倍的高压缩性能,并且在计算成本方面也比JPEG 2000有优势。JPEG XR有望在不久的将来广泛应用于包括嵌入式系统在内的许多设备。本文提出了一种新的JPEG XR编码体系结构。在以前的体系结构中,熵编码是吞吐量瓶颈,因为它是作为顺序算法实现的,以处理具有依赖性的数据。我们发现宏块内部数据不存在依赖关系,我们可以安全地将包括熵编码在内的所有编码过程流水线化。提出的全流水线架构在125 MHz下实现了100 M像素/秒,这是以前的工作无法实现的。
{"title":"A high-throughput pipelined architecture for JPEG XR encoding","authors":"Koichi Hattori, Hiroshi Tsutsui, H. Ochi, Yukihiro Nakamura","doi":"10.1109/ESTMED.2009.5336818","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336818","url":null,"abstract":"JPEG XR is an emerging image coding standard, based on HD Photo developed by Microsoft. It supports high compression performance twice as high as the de facto image coding system, namely JPEG, and also has an advantage over JPEG 2000 in terms of computational cost. JPEG XR is expected to be widespread for many devices including embedded systems in the near future. In this paper, we propose a novel architecture for JPEG XR encoding. In previous architectures, entropy coding was the throughput bottleneck because it was implemented as a sequential algorithm to handle data with dependency. We found that there is no dependency in intra-macroblock data, and we could safely pipeline all the encoding processes including the entropy coding. The proposed fully-pipelined architecture achieves 100 M pixel/sec at 125 MHz which could not be achieved by previous works.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"53 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113938924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1