首页 > 最新文献

2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.最新文献

英文 中文
Application design trajectory towards reusable coprocessors - MPEG case study 面向可重用协处理器的应用程序设计轨迹- MPEG案例研究
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359699
M. Rutten, O. P. Gangwal, J. V. Eijndhoven, E. Jaspers, E. Pol
This work presents a structured application design trajectory to transform media-processing applications - modeled as Kahn process network - into a set of function-specific hardware units called coprocessors. The proposed design trajectory focuses on identifying hardware-implementable computation kernels that are common for a predetermined set of applications. The design trajectory is exercised in a case study that maps MPEG video decoding and encoding applications onto a set of coprocessors in a heterogeneous multiprocessor architecture. The resulting set of coprocessors can simultaneously perform both encoding and decoding functions for multiple MPEG-2 streams in an estimated 4 mm/sup 2/ (excluding memory) in 0.18 /spl mu/ technology.
这项工作提出了一个结构化的应用程序设计轨迹,将媒体处理应用程序(建模为Kahn过程网络)转换为一组称为协处理器的特定功能硬件单元。所提出的设计轨迹侧重于识别硬件可实现的计算内核,这些内核对于一组预定的应用程序是通用的。设计轨迹在一个案例研究中得到了实践,该案例研究将MPEG视频解码和编码应用程序映射到异构多处理器架构中的一组协处理器上。由此产生的一组协处理器可以同时为多个MPEG-2流执行编码和解码功能,在0.18 /spl mu/技术中估计为4 mm/sup 2/(不包括内存)。
{"title":"Application design trajectory towards reusable coprocessors - MPEG case study","authors":"M. Rutten, O. P. Gangwal, J. V. Eijndhoven, E. Jaspers, E. Pol","doi":"10.1109/ESTMED.2004.1359699","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359699","url":null,"abstract":"This work presents a structured application design trajectory to transform media-processing applications - modeled as Kahn process network - into a set of function-specific hardware units called coprocessors. The proposed design trajectory focuses on identifying hardware-implementable computation kernels that are common for a predetermined set of applications. The design trajectory is exercised in a case study that maps MPEG video decoding and encoding applications onto a set of coprocessors in a heterogeneous multiprocessor architecture. The resulting set of coprocessors can simultaneously perform both encoding and decoding functions for multiple MPEG-2 streams in an estimated 4 mm/sup 2/ (excluding memory) in 0.18 /spl mu/ technology.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122552886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A scalable VLIW for smart imaging 用于智能成像的可扩展VLIW
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359703
A. Lundgren, W. Kruijtzer
This work presents a VLIW co-processor template for smart imaging. The template is highly scalable and can easily be instantiated for a specific data level parallelism. The co-processor is built to operate on frame segments instead of full frames only. As a result, eight different instances of the co-processor have been generated, each with different amount of parallelism exploited. Each instance is generated in about 30 minutes using C-based high-level synthesis tools. The generated co-processors have been evaluated and the result shows that the template can be effectively used to balance the area, performance and power consumption with respect to the application requirements.
本文提出了一种用于智能成像的VLIW协处理器模板。模板是高度可伸缩的,并且可以很容易地为特定的数据级并行性实例化。协处理器被构建为在帧段上操作,而不是只在全帧上操作。结果,生成了8个不同的协处理器实例,每个实例都利用了不同数量的并行性。使用基于c的高级合成工具在大约30分钟内生成每个实例。对生成的协处理器进行了评估,结果表明该模板可以有效地用于平衡与应用需求相关的面积、性能和功耗。
{"title":"A scalable VLIW for smart imaging","authors":"A. Lundgren, W. Kruijtzer","doi":"10.1109/ESTMED.2004.1359703","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359703","url":null,"abstract":"This work presents a VLIW co-processor template for smart imaging. The template is highly scalable and can easily be instantiated for a specific data level parallelism. The co-processor is built to operate on frame segments instead of full frames only. As a result, eight different instances of the co-processor have been generated, each with different amount of parallelism exploited. Each instance is generated in about 30 minutes using C-based high-level synthesis tools. The generated co-processors have been evaluated and the result shows that the template can be effectively used to balance the area, performance and power consumption with respect to the application requirements.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123109615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algebraic techniques in the memory size computation of multimedia processing applications 代数技术在多媒体处理中内存大小计算中的应用
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359708
Hongwei Zhu, Karthik Cbandramouli, Yan Yue, F. Balasa
In real-time multimedia processing systems a very large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by memories. Hence, the optimization of the memory architecture is a crucial step in the design methodology for this type of applications. In deriving an optimized memory architecture, memory size computation is an important step in the data transfer and storage exploration stage. This work investigates non-scalar methods for computing the memory size in real-time multimedia algorithms. The approach is based on more recent algebraic techniques specific to the data-flow analysis used in modem compilers. In contrast with previous works which utilize only approximate methods due to the size of the problems (in terms of number of scalars) and single-assignment specifications, this research aims to obtain exact determinations even for large applications.
在实时多媒体处理系统中,很大一部分功耗来自于数据的存储和传输。此外,面积成本通常主要由存储器控制。因此,在这类应用程序的设计方法中,内存架构的优化是至关重要的一步。在导出优化的存储器结构时,存储器大小的计算是数据传输和存储探索阶段的重要步骤。本文研究了实时多媒体算法中计算内存大小的非标量方法。该方法基于现代编译器中使用的数据流分析的最新代数技术。与以往的工作相比,由于问题的规模(就标量数量而言)和单一分配规范,仅使用近似方法,本研究旨在即使对于大型应用也能获得精确的确定。
{"title":"Algebraic techniques in the memory size computation of multimedia processing applications","authors":"Hongwei Zhu, Karthik Cbandramouli, Yan Yue, F. Balasa","doi":"10.1109/ESTMED.2004.1359708","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359708","url":null,"abstract":"In real-time multimedia processing systems a very large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by memories. Hence, the optimization of the memory architecture is a crucial step in the design methodology for this type of applications. In deriving an optimized memory architecture, memory size computation is an important step in the data transfer and storage exploration stage. This work investigates non-scalar methods for computing the memory size in real-time multimedia algorithms. The approach is based on more recent algebraic techniques specific to the data-flow analysis used in modem compilers. In contrast with previous works which utilize only approximate methods due to the size of the problems (in terms of number of scalars) and single-assignment specifications, this research aims to obtain exact determinations even for large applications.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122468087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tool-aided performance analysis and optimization of multimedia applications 多媒体应用程序的工具辅助性能分析和优化
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359717
H. Hübert, B. Stabernack, H. Richter
Current embedded system platforms fulfill the requirements of computational and data intensive multimedia applications. However, the software and hardware architecture must be optimized in order use the available resources efficiently and achieve the required real-time performance. We present an analysis tool which aids the system designer during the optimization process by, providing detailed performance and data transfer statistics of the multimedia application. Exemplary optimizations of an H.264 decoder application show how the tool can be utilized.
当前的嵌入式系统平台满足了计算密集型和数据密集型多媒体应用的要求。然而,为了有效地利用可用资源并实现所需的实时性能,必须对软件和硬件体系结构进行优化。我们提出了一个分析工具,通过提供多媒体应用程序的详细性能和数据传输统计数据来帮助系统设计者在优化过程中进行设计。H.264解码器应用程序的示例优化显示了如何利用该工具。
{"title":"Tool-aided performance analysis and optimization of multimedia applications","authors":"H. Hübert, B. Stabernack, H. Richter","doi":"10.1109/ESTMED.2004.1359717","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359717","url":null,"abstract":"Current embedded system platforms fulfill the requirements of computational and data intensive multimedia applications. However, the software and hardware architecture must be optimized in order use the available resources efficiently and achieve the required real-time performance. We present an analysis tool which aids the system designer during the optimization process by, providing detailed performance and data transfer statistics of the multimedia application. Exemplary optimizations of an H.264 decoder application show how the tool can be utilized.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131135153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing memory accesses with a system-level design methodology in customized dynamic memory management 在定制动态内存管理中采用系统级设计方法减少内存访问次数
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359716
David Atienza Alonso, S. Mamagkakis, F. Catthoor, J. Mendias, D. Soudris
Currently, portable consumer embedded devices are increasing more and more their capabilities and can now implement new algorithms (e.g. multimedia and wireless protocols) that a few years ago were reserved only for powerful workstations. Unfortunately, the original design characteristics of such applications do not often allow to port them directly in current embedded devices. These applications share complex and intensive memory use. Furthermore, they must heavily rely on dynamic memory due to the unpredictability of the input data (e.g. 3D streams features) and system behaviour (e.g. number of applications running concurrently defined by the user). Thus they require that the dynamic memory subsystem involved is able to provide the necessary level of performance for these new dynamic applications. However, actual embedded systems have very limited resources (e.g. speed and power consumed in the memory subsystem) to provide efficient general-purpose dynamic memory management. We propose a new methodology to design custom dynamic memory managers that provide the performance required in new embedded devices by reducing the amount of memory accesses to handle these new dynamic multimedia and wireless network applications. Our results in real-life dynamic applications show significant improvements in memory accesses of dynamic memory managers, i.e. up to 58%, compared to state-of-the-art dynamic memory management solutions for complex applications.
目前,便携式消费嵌入式设备的功能越来越强大,现在可以实现几年前仅为功能强大的工作站保留的新算法(例如多媒体和无线协议)。不幸的是,这些应用程序的原始设计特性通常不允许将它们直接移植到当前的嵌入式设备中。这些应用程序共享复杂且密集的内存使用。此外,由于输入数据(例如3D流特征)和系统行为(例如由用户定义的并发运行的应用程序数量)的不可预测性,它们必须严重依赖动态内存。因此,它们要求所涉及的动态内存子系统能够为这些新的动态应用程序提供必要的性能级别。然而,实际的嵌入式系统只有非常有限的资源(例如内存子系统的速度和功耗)来提供高效的通用动态内存管理。我们提出了一种新的方法来设计自定义动态内存管理器,通过减少处理这些新的动态多媒体和无线网络应用的内存访问量来提供新的嵌入式设备所需的性能。我们在实际动态应用程序中的结果表明,与复杂应用程序中最先进的动态内存管理解决方案相比,动态内存管理器在内存访问方面有了显著改善,即高达58%。
{"title":"Reducing memory accesses with a system-level design methodology in customized dynamic memory management","authors":"David Atienza Alonso, S. Mamagkakis, F. Catthoor, J. Mendias, D. Soudris","doi":"10.1109/ESTMED.2004.1359716","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359716","url":null,"abstract":"Currently, portable consumer embedded devices are increasing more and more their capabilities and can now implement new algorithms (e.g. multimedia and wireless protocols) that a few years ago were reserved only for powerful workstations. Unfortunately, the original design characteristics of such applications do not often allow to port them directly in current embedded devices. These applications share complex and intensive memory use. Furthermore, they must heavily rely on dynamic memory due to the unpredictability of the input data (e.g. 3D streams features) and system behaviour (e.g. number of applications running concurrently defined by the user). Thus they require that the dynamic memory subsystem involved is able to provide the necessary level of performance for these new dynamic applications. However, actual embedded systems have very limited resources (e.g. speed and power consumed in the memory subsystem) to provide efficient general-purpose dynamic memory management. We propose a new methodology to design custom dynamic memory managers that provide the performance required in new embedded devices by reducing the amount of memory accesses to handle these new dynamic multimedia and wireless network applications. Our results in real-life dynamic applications show significant improvements in memory accesses of dynamic memory managers, i.e. up to 58%, compared to state-of-the-art dynamic memory management solutions for complex applications.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Low energy data and concurrency management of highly dynamic real-time multi-media systems 高动态实时多媒体系统的低能耗数据和并发管理
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359690
F. Catthoor
Summary form only given. The merging of computers, consumer and communication disciplines gives rise to very fast growing markets for personal communication, multi-media and broadband networks, in the information technology (IT) area. Rapid evolution in sub-micron process technology allows ever more complex systems to be mapped on platforms that become integrated on one single platform (system -on-chip). Technology advances are however not followed by an increase in system design productivity. One of the most critical bottlenecks is the very dynamic concurrent behaviour of many of these new applications. They are fully specified in software oriented languages (like Java, UML, SDL, C++) and still need to be executed in real-time cost/energy sensitive way on the heterogeneous SoC platforms. The main issue is that fully design time based solutions as proposed earlier in the compiler and system synthesis cannot solve the problem, and run-time solutions as present in nowadays operating systems are too inefficient in terms of cost optimisation (especially energy consumption) and are also not adapted for the real-time constraints (even RTOS kernels). This dynamic nature is especially emerging because of the quality-of-service (QoS) aspects of these multi-media and networking applications. Prominent examples of this can be found in the recent MPEG4/JPEG2000 standards and especially the new MPEG21 standard. Also the emerging Ambient Intelligence and virtual reality paradigms will stimulate this further. In order to deal with these dynamic issues where tasks and complex data types are created and deleted at run-time based on non-deterministic events, a novel system design paradigm is required. This presentation will focus on the new requirements that result in system-level synthesis. In particular both a "dynamic data management" and a "task concurrency management" problem formulation will be presented, that have to deal with the very dynamic nature of these systems. The concept of Pareto curve based exploration is crucial in these problem formulations and their solutions.
只提供摘要形式。计算机、消费者和通信学科的融合使信息技术(IT)领域的个人通信、多媒体和宽带网络市场迅速增长。亚微米工艺技术的快速发展使得更复杂的系统可以映射到集成在单个平台(片上系统)上的平台上。然而,技术进步并没有带来系统设计生产力的提高。最关键的瓶颈之一是许多新应用程序的动态并发行为。它们在面向软件的语言(如Java, UML, SDL, c++)中完全指定,并且仍然需要在异构SoC平台上以实时成本/能量敏感的方式执行。主要的问题是,在编译器和系统综合中提出的完全基于设计时间的解决方案不能解决这个问题,而当前操作系统中存在的运行时解决方案在成本优化(特别是能耗)方面效率太低,也不适合实时约束(甚至RTOS内核)。由于这些多媒体和网络应用程序的服务质量(QoS)方面,这种动态特性特别突出。这方面的突出例子可以在最近的MPEG4/JPEG2000标准中找到,特别是新的MPEG21标准。此外,新兴的环境智能和虚拟现实范例将进一步刺激这一点。为了处理这些动态问题,在运行时基于不确定性事件创建和删除任务和复杂数据类型,需要一种新的系统设计范式。本演示将集中讨论导致系统级综合的新需求。特别是“动态数据管理”和“任务并发管理”问题的表述将被提出,它们必须处理这些系统的动态特性。基于帕累托曲线的探索概念在这些问题的表述及其解决方案中至关重要。
{"title":"Low energy data and concurrency management of highly dynamic real-time multi-media systems","authors":"F. Catthoor","doi":"10.1109/ESTMED.2004.1359690","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359690","url":null,"abstract":"Summary form only given. The merging of computers, consumer and communication disciplines gives rise to very fast growing markets for personal communication, multi-media and broadband networks, in the information technology (IT) area. Rapid evolution in sub-micron process technology allows ever more complex systems to be mapped on platforms that become integrated on one single platform (system -on-chip). Technology advances are however not followed by an increase in system design productivity. One of the most critical bottlenecks is the very dynamic concurrent behaviour of many of these new applications. They are fully specified in software oriented languages (like Java, UML, SDL, C++) and still need to be executed in real-time cost/energy sensitive way on the heterogeneous SoC platforms. The main issue is that fully design time based solutions as proposed earlier in the compiler and system synthesis cannot solve the problem, and run-time solutions as present in nowadays operating systems are too inefficient in terms of cost optimisation (especially energy consumption) and are also not adapted for the real-time constraints (even RTOS kernels). This dynamic nature is especially emerging because of the quality-of-service (QoS) aspects of these multi-media and networking applications. Prominent examples of this can be found in the recent MPEG4/JPEG2000 standards and especially the new MPEG21 standard. Also the emerging Ambient Intelligence and virtual reality paradigms will stimulate this further. In order to deal with these dynamic issues where tasks and complex data types are created and deleted at run-time based on non-deterministic events, a novel system design paradigm is required. This presentation will focus on the new requirements that result in system-level synthesis. In particular both a \"dynamic data management\" and a \"task concurrency management\" problem formulation will be presented, that have to deal with the very dynamic nature of these systems. The concept of Pareto curve based exploration is crucial in these problem formulations and their solutions.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121537630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A queuing-theoretic performance model for context-flow system-on-chip platforms 上下文流片上系统平台的排队理论性能模型
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359697
Rami Beidas, Jianwen Zhu
Few analytical performance models that relate performance figure of merit to architectural design decisions are reported in recent studies of network-on-chip, which prevents the development of effective system-level synthesis techniques. We propose an analytical performance model based on queuing theory for a network-on-chip platform recently reported, which features an extremely simple programming model, while providing superior performance measures when compared with alternative architectures. We developed a multi-processor simulation framework, which can simulate an application at the instruction set level given an architecture configuration, to validate the analytical performance model. The accuracy and applicability of the proposed model is illustrated by two real-life applications, namely an SSL security acceleration processor and MP3 decoder.
在最近的片上网络研究中,很少有分析性能模型将性能指标与架构设计决策联系起来,这阻碍了有效的系统级综合技术的发展。我们提出了一种基于排队论的分析性能模型,用于最近报道的片上网络平台,它具有极其简单的编程模型,同时与其他架构相比,提供了优越的性能指标。我们开发了一个多处理器仿真框架,该框架可以在给定架构配置的指令集级别模拟应用程序,以验证分析性能模型。通过两个实际应用,即SSL安全加速处理器和MP3解码器,说明了所提出模型的准确性和适用性。
{"title":"A queuing-theoretic performance model for context-flow system-on-chip platforms","authors":"Rami Beidas, Jianwen Zhu","doi":"10.1109/ESTMED.2004.1359697","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359697","url":null,"abstract":"Few analytical performance models that relate performance figure of merit to architectural design decisions are reported in recent studies of network-on-chip, which prevents the development of effective system-level synthesis techniques. We propose an analytical performance model based on queuing theory for a network-on-chip platform recently reported, which features an extremely simple programming model, while providing superior performance measures when compared with alternative architectures. We developed a multi-processor simulation framework, which can simulate an application at the instruction set level given an architecture configuration, to validate the analytical performance model. The accuracy and applicability of the proposed model is illustrated by two real-life applications, namely an SSL security acceleration processor and MP3 decoder.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134113195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A hardware accelerator IP for EBCOT tier-1 coding in JPEG2000 standard JPEG2000标准EBCOT第1层编码的硬件加速器IP
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359713
Tien-Wei Hsieh, Y. Lin
We propose a hardware accelerator IP for the Tier-1 portion of Embedded Block Coding with Optimal Truncation (EBCOT) used in the JPEG2000 next generation image compression standard. EBCOT Tier-1 accounts for more than 70% of encoding time due to extensive bit-level processing. Our architecture consists of a 16-way parallel context formation module and a 3-stage pipelined arithmetic encoder. We reduce power consumption by properly shutting down parts of the circuit. Compared with the known best design, we reduce 17% of the cycle count and reach a level within 5% of the theoretical lower bound. We have implemented the design in synthesizable Verilog RTL with an AMBA-AHB interface for SOC design. FPGA prototyping has been successfully demonstrated and substantial speedup achieved.
我们为JPEG2000下一代图像压缩标准中使用的具有最佳截断的嵌入式块编码(EBCOT)的第1层部分提出了一个硬件加速器IP。由于广泛的位级处理,EBCOT Tier-1占编码时间的70%以上。我们的架构包括一个16路并行上下文生成模块和一个3级流水线算术编码器。我们通过适当地关闭部分电路来降低功耗。与已知的最佳设计相比,我们减少了17%的循环次数,并达到了理论下限5%以内的水平。我们在可合成的Verilog RTL中实现了设计,并提供了用于SOC设计的AMBA-AHB接口。FPGA原型已成功演示并实现了实质性的加速。
{"title":"A hardware accelerator IP for EBCOT tier-1 coding in JPEG2000 standard","authors":"Tien-Wei Hsieh, Y. Lin","doi":"10.1109/ESTMED.2004.1359713","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359713","url":null,"abstract":"We propose a hardware accelerator IP for the Tier-1 portion of Embedded Block Coding with Optimal Truncation (EBCOT) used in the JPEG2000 next generation image compression standard. EBCOT Tier-1 accounts for more than 70% of encoding time due to extensive bit-level processing. Our architecture consists of a 16-way parallel context formation module and a 3-stage pipelined arithmetic encoder. We reduce power consumption by properly shutting down parts of the circuit. Compared with the known best design, we reduce 17% of the cycle count and reach a level within 5% of the theoretical lower bound. We have implemented the design in synthesizable Verilog RTL with an AMBA-AHB interface for SOC design. FPGA prototyping has been successfully demonstrated and substantial speedup achieved.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133270889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High performance visibility testing with screen segmentation 高性能可见性测试与屏幕分割
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359711
Péter Szántó, B. Fehér
There are two factors determining the performance a 3D accelerator can achieve: the available computational power and the available memory bandwidth. In embedded systems, these resources are even more limited then in desktop environments, thus the efficiency of the hardware architecture and the exploitation of the logic resources become even more important. Most resources are wasted at the visibility testing process: traditional implementations require a lot of bandwidth, and process pixels which are not visible on the final image. By segmenting the screen, the presented architecture can use high performance, on-chip buffers to lower memory requirements and to provide high performance. The order of the processing guarantees that only those colors are computed, which are truly visible. The modular architecture allows satisfying different requirements: a trade off can be made between the number of processing units and performance.
有两个因素决定了3D加速器的性能:可用的计算能力和可用的内存带宽。在嵌入式系统中,这些资源比在桌面环境中更加有限,因此硬件架构的效率和逻辑资源的利用变得更加重要。大多数资源浪费在可见性测试过程中:传统的实现需要大量带宽,并且处理在最终图像上不可见的像素。通过分割屏幕,所提出的架构可以使用高性能的片上缓冲区来降低内存需求并提供高性能。处理的顺序保证只计算那些真正可见的颜色。模块化架构允许满足不同的需求:可以在处理单元的数量和性能之间进行权衡。
{"title":"High performance visibility testing with screen segmentation","authors":"Péter Szántó, B. Fehér","doi":"10.1109/ESTMED.2004.1359711","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359711","url":null,"abstract":"There are two factors determining the performance a 3D accelerator can achieve: the available computational power and the available memory bandwidth. In embedded systems, these resources are even more limited then in desktop environments, thus the efficiency of the hardware architecture and the exploitation of the logic resources become even more important. Most resources are wasted at the visibility testing process: traditional implementations require a lot of bandwidth, and process pixels which are not visible on the final image. By segmenting the screen, the presented architecture can use high performance, on-chip buffers to lower memory requirements and to provide high performance. The order of the processing guarantees that only those colors are computed, which are truly visible. The modular architecture allows satisfying different requirements: a trade off can be made between the number of processing units and performance.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data assignment and access scheduling exploration for multi-layer memory architectures 多层存储器结构的数据分配和访问调度研究
Pub Date : 2004-11-22 DOI: 10.1109/ESTMED.2004.1359707
R. Szymanek, F. Catthoor, K. Kuchcinski
This work presents an exploration framework which performs data assignment and access scheduling exploration for applications given a multilayer memory architecture. Our framework uses multiobjective criteria during exploration, such as application execution time, energy, bandwidth, and data size. In order to tackle the complexity of the exploration, it is divided into three phases; Pareto diagram composition, data assignment, and access scheduling. The first phase produces multidimensional Pareto points for our application. After this phase, our framework produces distinct data assignments which are represented as Pareto points in a two dimensional space defined by bandwidth requirements and size requirements. Finally, the scheduling phase finds possibly optimal order of the tasks and performs precise scheduling of the tasks. Three feedbacks paths are present which can be used to iteratively improve exploration results. It is possible to trade off the quality of the results and the algorithm runtime. We have evaluated our framework on a medical image processing application. We have shown that our algorithms can perform exploration of the huge design space in an iterative manner and obtains good Pareto diagram coverage.
本文提出了一个探索框架,该框架可以为给定的多层存储体系结构的应用程序执行数据分配和访问调度探索。我们的框架在探索过程中使用多目标标准,例如应用程序执行时间、能量、带宽和数据大小。为了解决勘探的复杂性,将其分为三个阶段;帕累托图组成,数据分配和访问调度。第一阶段为我们的应用程序生成多维的帕累托点。在此阶段之后,我们的框架产生不同的数据分配,这些数据分配表示为由带宽需求和大小需求定义的二维空间中的帕累托点。最后,调度阶段找到可能的最优任务顺序,并对任务进行精确调度。提出了三条反馈路径,可用于迭代改进勘探结果。在结果质量和算法运行时间之间进行权衡是可能的。我们已经在一个医学图像处理应用中评估了我们的框架。我们已经证明,我们的算法可以以迭代的方式对巨大的设计空间进行探索,并获得良好的帕累托图覆盖率。
{"title":"Data assignment and access scheduling exploration for multi-layer memory architectures","authors":"R. Szymanek, F. Catthoor, K. Kuchcinski","doi":"10.1109/ESTMED.2004.1359707","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359707","url":null,"abstract":"This work presents an exploration framework which performs data assignment and access scheduling exploration for applications given a multilayer memory architecture. Our framework uses multiobjective criteria during exploration, such as application execution time, energy, bandwidth, and data size. In order to tackle the complexity of the exploration, it is divided into three phases; Pareto diagram composition, data assignment, and access scheduling. The first phase produces multidimensional Pareto points for our application. After this phase, our framework produces distinct data assignments which are represented as Pareto points in a two dimensional space defined by bandwidth requirements and size requirements. Finally, the scheduling phase finds possibly optimal order of the tasks and performs precise scheduling of the tasks. Three feedbacks paths are present which can be used to iteratively improve exploration results. It is possible to trade off the quality of the results and the algorithm runtime. We have evaluated our framework on a medical image processing application. We have shown that our algorithms can perform exploration of the huge design space in an iterative manner and obtains good Pareto diagram coverage.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121117701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1