首页 > 最新文献

2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia最新文献

英文 中文
Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos 基于多asip的高清视频运动估计核并行可扩展实现
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088526
H. Doan, Haris Javaid, S. Parameswaran
Parallel implementations of motion estimation for high definition videos typically exploit various forms of parallelism (GOP, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel implementations deliver real-time throughput, they often suffer from limited flexibility and scalability due to the form of parallelism and architecture used. In this work, we use Group Of MacroBlocks (GOMB) and Intra-MB (IMB) parallelism with a multi-ASIP (Application Specific Instruction set Processor) architecture to provide a flexible and scalable platform for motion estimation of high definition videos. Multiple GOMBs are processed by the ASIPs in parallel (GOMB-level) where each ASIP is equipped with custom instructions to process the pixels of an MB in parallel (IMB-level). The system is flexible and scalable as the number of ASIPs (number of GOMBs) and custom instructions are not fixed, and are determined through design space exploration. We evaluated the multi-ASIP architecture in Tensilica's commercial design environment with varying number of ASIPs (up to nine), and compared hand-coded and automatically generated custom instructions. The results illustrate that systems with three and seven ASIPs delivered real-time throughput of 30 and 60 fps respectively for “pedestrian”, “rush hour” and “tractor” HD1080p video sequences. In addition, the results indicate that the multi-ASIP platform can be extended for even higher resolutions such as Ultra High Definition (UHD) due to its flexibility and scalability.
高清视频运动估计的并行实现通常利用各种形式的并行性(GOP、帧级、片级和宏块级)来提供实时吞吐量。尽管并行实现提供了实时吞吐量,但由于所使用的并行形式和体系结构,它们经常受到灵活性和可伸缩性的限制。在这项工作中,我们使用多asip(应用特定指令集处理器)架构的MacroBlocks Group (GOMB)和Intra-MB (IMB)并行性,为高清视频的运动估计提供了一个灵活和可扩展的平台。多个gomb由ASIP并行处理(gomb级),其中每个ASIP都配备了自定义指令来并行处理一个MB的像素(imb级)。该系统具有灵活性和可扩展性,因为asip (gomb数量)和自定义指令的数量不是固定的,而是通过设计空间探索确定的。我们在Tensilica的商业设计环境中使用不同数量的asip(最多9个)评估了多asip架构,并比较了手工编码和自动生成的自定义指令。结果表明,对于“行人”、“高峰时间”和“拖拉机”HD1080p视频序列,具有3个和7个asip的系统分别提供了30和60 fps的实时吞吐量。此外,结果表明,由于其灵活性和可扩展性,多asip平台可以扩展到更高的分辨率,如超高清(UHD)。
{"title":"Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos","authors":"H. Doan, Haris Javaid, S. Parameswaran","doi":"10.1109/ESTIMedia.2011.6088526","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088526","url":null,"abstract":"Parallel implementations of motion estimation for high definition videos typically exploit various forms of parallelism (GOP, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel implementations deliver real-time throughput, they often suffer from limited flexibility and scalability due to the form of parallelism and architecture used. In this work, we use Group Of MacroBlocks (GOMB) and Intra-MB (IMB) parallelism with a multi-ASIP (Application Specific Instruction set Processor) architecture to provide a flexible and scalable platform for motion estimation of high definition videos. Multiple GOMBs are processed by the ASIPs in parallel (GOMB-level) where each ASIP is equipped with custom instructions to process the pixels of an MB in parallel (IMB-level). The system is flexible and scalable as the number of ASIPs (number of GOMBs) and custom instructions are not fixed, and are determined through design space exploration. We evaluated the multi-ASIP architecture in Tensilica's commercial design environment with varying number of ASIPs (up to nine), and compared hand-coded and automatically generated custom instructions. The results illustrate that systems with three and seven ASIPs delivered real-time throughput of 30 and 60 fps respectively for “pedestrian”, “rush hour” and “tractor” HD1080p video sequences. In addition, the results indicate that the multi-ASIP platform can be extended for even higher resolutions such as Ultra High Definition (UHD) due to its flexibility and scalability.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114783590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Towards an ESL design framework for adaptive and fault-tolerant MPSoCs: MADNESS or not? 面向自适应和容错mpsoc的ESL设计框架:疯狂与否?
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088518
E. Cannella, L. D. Gregorio, Leandro Fiorin, M. Lindwer, P. Meloni, Olaf Neugebauer, A. Pimentel
The MADNESS project aims at the definition of innovative system-level design methodologies for embedded MP-SoCs, extending the classic concept of design space exploration in multi-application domains to cope with high heterogeneity, technology scaling and system reliability. The main goal of the project is to provide a framework able to guide designers and researchers to the optimal composition of embedded MPSoC architectures, according to the requirements and the features of a given target application field. The proposed approach will tackle the new challenges, related to both architecture and design methodologies, arising with the technology scaling, the system reliability and the ever-growing computational needs of modern applications. The methodologies proposed with this project act at different levels of the design flow, enhancing the state-of-the art with novel features in system-level synthesis, architectural evaluation and prototyping. Support for fault resilience and efficient adaptive runtime management is introduced at hardware and middleware level, and considered by the system-level synthesis as one of the optimization factors to be taken into account. This paper presents the first stable results obtained in the MADNESS project, already demonstrating the effectiveness of the proposed methods.
MADNESS项目旨在为嵌入式mp - soc定义创新的系统级设计方法,将设计空间探索的经典概念扩展到多应用领域,以应对高异质性、技术可扩展性和系统可靠性。该项目的主要目标是提供一个框架,能够指导设计人员和研究人员根据给定目标应用领域的要求和特征,优化嵌入式MPSoC架构的组成。所提出的方法将解决随着技术扩展、系统可靠性和现代应用日益增长的计算需求而产生的与架构和设计方法相关的新挑战。该项目提出的方法适用于设计流程的不同层次,在系统级综合、架构评估和原型设计方面具有新颖的功能,从而提高了最新的技术水平。在硬件和中间件级别引入了对故障恢复能力和高效自适应运行时管理的支持,并被系统级综合视为需要考虑的优化因素之一。本文介绍了在MADNESS项目中获得的第一个稳定结果,已经证明了所提出方法的有效性。
{"title":"Towards an ESL design framework for adaptive and fault-tolerant MPSoCs: MADNESS or not?","authors":"E. Cannella, L. D. Gregorio, Leandro Fiorin, M. Lindwer, P. Meloni, Olaf Neugebauer, A. Pimentel","doi":"10.1109/ESTIMedia.2011.6088518","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088518","url":null,"abstract":"The MADNESS project aims at the definition of innovative system-level design methodologies for embedded MP-SoCs, extending the classic concept of design space exploration in multi-application domains to cope with high heterogeneity, technology scaling and system reliability. The main goal of the project is to provide a framework able to guide designers and researchers to the optimal composition of embedded MPSoC architectures, according to the requirements and the features of a given target application field. The proposed approach will tackle the new challenges, related to both architecture and design methodologies, arising with the technology scaling, the system reliability and the ever-growing computational needs of modern applications. The methodologies proposed with this project act at different levels of the design flow, enhancing the state-of-the art with novel features in system-level synthesis, architectural evaluation and prototyping. Support for fault resilience and efficient adaptive runtime management is introduced at hardware and middleware level, and considered by the system-level synthesis as one of the optimization factors to be taken into account. This paper presents the first stable results obtained in the MADNESS project, already demonstrating the effectiveness of the proposed methods.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127929720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Support of software framework for embedded multi-core systems with Android environments 支持Android环境下的嵌入式多核系统的软件框架
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088522
Yu-Hao Chang, Chi-Bang Kuan, Cheng-Yen Lin, Te-Feng Su, Chun-Ta Chen, J. Jang, S. Lai, Jenq-Kuen Lee
Applications on mobile devices are getting more complicated with the new wave of applications in the mobile devices. The computing power for embedded devices are increased with such trends, and embedded multi-core platform are in a position to help boost system performance. Software frameworks integrated the multi-core platforms are often needed to help boost the system performance and reduce programming complexity. In this paper, we present a software framework based on Android and multi-core embedded systems. In the framework, we integrate the compiler toolkit chain for multi-core programming environment which includes DSP C/C++ compilers, streaming RPC programming model, debugger, ESL simulator, and power management models. We also develop software framework for face detection, voice recognition, and mobile streaming management. Those frameworks are designed as multi-core programs and are used to illustrate the design flow for applications on embedded multi-core environments equipped with Android systems. We demonstrate our proposed mechanisms by implementing two applications, Face RMS and voice recognition. The proposed framework gives a case study to illustrate software framework and design flow for emerging RMS-based and voice recognition applications on embedded multi-core systems equipped with Android systems.
随着移动设备应用程序的新浪潮,移动设备上的应用程序变得越来越复杂。嵌入式设备的计算能力随着这种趋势的发展而提高,嵌入式多核平台可以帮助提高系统性能。通常需要集成多核平台的软件框架来帮助提高系统性能并降低编程复杂性。本文提出了一个基于Android和多核嵌入式系统的软件框架。在该框架中,我们集成了多核编程环境的编译器工具包链,包括DSP C/ c++编译器、流RPC编程模型、调试器、ESL模拟器和电源管理模型。我们还开发了人脸检测、语音识别和移动流媒体管理的软件框架。这些框架被设计为多核程序,并用于演示Android系统下嵌入式多核环境下应用程序的设计流程。我们通过实现两个应用程序来演示我们提出的机制,人脸RMS和语音识别。提出的框架给出了一个案例研究,说明了新兴的基于rms和语音识别应用在嵌入式多核系统上的软件框架和设计流程。
{"title":"Support of software framework for embedded multi-core systems with Android environments","authors":"Yu-Hao Chang, Chi-Bang Kuan, Cheng-Yen Lin, Te-Feng Su, Chun-Ta Chen, J. Jang, S. Lai, Jenq-Kuen Lee","doi":"10.1109/ESTIMedia.2011.6088522","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088522","url":null,"abstract":"Applications on mobile devices are getting more complicated with the new wave of applications in the mobile devices. The computing power for embedded devices are increased with such trends, and embedded multi-core platform are in a position to help boost system performance. Software frameworks integrated the multi-core platforms are often needed to help boost the system performance and reduce programming complexity. In this paper, we present a software framework based on Android and multi-core embedded systems. In the framework, we integrate the compiler toolkit chain for multi-core programming environment which includes DSP C/C++ compilers, streaming RPC programming model, debugger, ESL simulator, and power management models. We also develop software framework for face detection, voice recognition, and mobile streaming management. Those frameworks are designed as multi-core programs and are used to illustrate the design flow for applications on embedded multi-core environments equipped with Android systems. We demonstrate our proposed mechanisms by implementing two applications, Face RMS and voice recognition. The proposed framework gives a case study to illustrate software framework and design flow for emerging RMS-based and voice recognition applications on embedded multi-core systems equipped with Android systems.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132395255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the management of multichannel architectures of solid-state disks 固态磁盘多通道架构的管理
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088524
Li-Pin Chang, Yi-Hsun Huang, Chen-Yi Wen
Solid-state disks use arrays of flash-memory chips for data storage. They adopt multichannel architectures to exploit parallelism among flash operations. Under real disk workloads, a channel spends nearly the same amount of time on writing host data and collecting garbage. Thus, the key to the success of multichannel architectures is to achieve high parallelism among channel operations under these two kinds of activities. This study presents a channel management scheme that comprises a write-buffer design and two channel management policies. The proposed scheme is designed to be generic, and it is applicable to both hybrid mapping and page-level mapping. Our experimental results show that the proposed management scheme doubles the average number of write requests completed per second (e.g., write IOPS) of a baseline multichannel architecture. We also successfully implemented the proposed scheme in a real solid-state disk and demonstrated the feasibility of our approach.
固态磁盘使用闪存芯片阵列来存储数据。它们采用多通道架构来利用闪存操作之间的并行性。在实际磁盘工作负载下,通道在写主机数据和收集垃圾上花费的时间几乎相同。因此,多通道架构成功的关键是在这两种活动下实现通道操作的高并行性。本研究提出了一种通道管理方案,该方案包括一个写缓冲区设计和两个通道管理策略。该方案具有通用性,既适用于混合映射,也适用于页面级映射。我们的实验结果表明,所提出的管理方案将基准多通道架构每秒完成的平均写请求数(例如,写IOPS)提高了一倍。我们还成功地在一个实际的固态磁盘上实现了所提出的方案,并证明了我们方法的可行性。
{"title":"On the management of multichannel architectures of solid-state disks","authors":"Li-Pin Chang, Yi-Hsun Huang, Chen-Yi Wen","doi":"10.1109/ESTIMedia.2011.6088524","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088524","url":null,"abstract":"Solid-state disks use arrays of flash-memory chips for data storage. They adopt multichannel architectures to exploit parallelism among flash operations. Under real disk workloads, a channel spends nearly the same amount of time on writing host data and collecting garbage. Thus, the key to the success of multichannel architectures is to achieve high parallelism among channel operations under these two kinds of activities. This study presents a channel management scheme that comprises a write-buffer design and two channel management policies. The proposed scheme is designed to be generic, and it is applicable to both hybrid mapping and page-level mapping. Our experimental results show that the proposed management scheme doubles the average number of write requests completed per second (e.g., write IOPS) of a baseline multichannel architecture. We also successfully implemented the proposed scheme in a real solid-state disk and demonstrated the feasibility of our approach.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"542 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116508823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Shadow-based vehicle model refinement and tracking in advanced automotive driver assistance systems 先进汽车驾驶辅助系统中基于影子的车辆模型优化与跟踪
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088525
F. Rattei, Philipp H. Kindt, Alma Pröbstl, S. Chakraborty
Vision-based automotive driver assistance systems have to cope with complex outdoor illumination conditions. Many applications for vehicle detection and tracking consider shadows caused by the sunlight as distracting effects and try to computationally compensate or disregard them. In this paper we suggest a shape-from-shadow approach which uses cast shadows as additional supporting information for vehicle model refinement and tracking. To take the position of the sun into account we only use sensor systems that mid-range vehicles are normally equipped with. Analysing shadows is a suitable method to support rear-view based vehicle tracking. A vehicle's shadow turned out to be a strong feature since it is possible to track a vehicle solely based on its shadow. Another benefit is that we are able to set up and refine three-dimensional shape models of vehicles driving ahead in the same lane. In selected situations it is possible to visually track two vehicles driving on the same lane one after another.
基于视觉的汽车驾驶辅助系统必须应对复杂的室外照明条件。许多车辆检测和跟踪的应用程序考虑由阳光引起的阴影作为分散效果,并试图通过计算来补偿或忽略它们。在本文中,我们提出了一种形状从阴影的方法,使用投影作为额外的支持信息,车辆模型的细化和跟踪。考虑到太阳的位置,我们只使用中档车辆通常配备的传感器系统。阴影分析是支持基于后视镜的车辆跟踪的合适方法。车辆的影子被证明是一个强大的特征,因为它可以仅仅根据它的影子来跟踪车辆。另一个好处是,我们能够建立和完善在同一车道上行驶的车辆的三维形状模型。在选定的情况下,可以视觉跟踪在同一车道上连续行驶的两辆车。
{"title":"Shadow-based vehicle model refinement and tracking in advanced automotive driver assistance systems","authors":"F. Rattei, Philipp H. Kindt, Alma Pröbstl, S. Chakraborty","doi":"10.1109/ESTIMedia.2011.6088525","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088525","url":null,"abstract":"Vision-based automotive driver assistance systems have to cope with complex outdoor illumination conditions. Many applications for vehicle detection and tracking consider shadows caused by the sunlight as distracting effects and try to computationally compensate or disregard them. In this paper we suggest a shape-from-shadow approach which uses cast shadows as additional supporting information for vehicle model refinement and tracking. To take the position of the sun into account we only use sensor systems that mid-range vehicles are normally equipped with. Analysing shadows is a suitable method to support rear-view based vehicle tracking. A vehicle's shadow turned out to be a strong feature since it is possible to track a vehicle solely based on its shadow. Another benefit is that we are able to set up and refine three-dimensional shape models of vehicles driving ahead in the same lane. In selected situations it is possible to visually track two vehicles driving on the same lane one after another.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache 利用基于nvm的高效混合缓存的集级写入非均匀性
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088521
Jianhua Li, Liang Shi, C. Xue, Chengmo Yang, Yinlong Xu
Hybrid cache architectures have been proposed to mitigate the increasing on-chip power dissipation through the exploitation of the emerging non-volatile memories (NVMs). To overcome the high energy and long latency associated with write operations of NVMs, a small SRAM is typically incorporated into the hybrid cache for accommodating write-intensive cache blocks. How to efficiently manage this SRAM and manipulate the write operations are crucial to the performance of the hybrid cache. In this paper, we first present our observation that the intensity of write operations on different cache sets is usually non-uniform for real applications, such as multimedia, multi-programmed, multithreaded applications. The previously proposed hybrid cache schemes can not efficiently and symmetrically utilize the small SRAM to accommodate such widely-existing non-uniform writes on cache sets. Based on this observation, we propose a novel hybrid cache design, Dual Associative Hybrid Cache (denoted as DAHYC), as well as the corresponding cache management policy. By organizing the SRAM blocks in the hybrid cache as a semi-independent set-associative cache, several hybrid cache sets can efficiently share and cooperatively utilize their SRAM blocks, instead of exclusively utilizing the SRAM blocks in each cache set in previous hybrid cache schemes, to boost power-efficiency. Through prudently manipulating the locality information of SRAM blocks in both the NVM sets and the SRAM sets, the proposed cache management policy also delivers high-performance. Experimental results show that, compared with previous works, the DAHYC can reduce the dynamic power of the hybrid cache by 24.8% on average and up to 54% for SPEC2000 INT benchmarks, while at the same time improving the performance of the hybrid cache by 1.16% on average.
混合高速缓存架构已经被提出,通过利用新兴的非易失性存储器(NVMs)来缓解芯片上不断增加的功耗。为了克服与nvm写操作相关的高能量和长延迟,通常在混合缓存中加入一个小的SRAM,以容纳写密集型缓存块。如何有效地管理SRAM和操作写操作对混合高速缓存的性能至关重要。在本文中,我们首先提出了我们的观察,即对于实际应用程序(如多媒体、多编程、多线程应用程序),不同缓存集上写操作的强度通常是不一致的。以前提出的混合缓存方案不能有效和对称地利用小的SRAM来适应这种广泛存在的对缓存集的不均匀写。基于此,我们提出了一种新的混合缓存设计,即双关联混合缓存(Dual Associative hybrid cache,简称DAHYC),以及相应的缓存管理策略。通过将混合缓存中的SRAM块组织为半独立的集关联缓存,多个混合缓存集可以有效地共享和协作利用它们的SRAM块,而不是在以前的混合缓存方案中单独利用每个缓存集中的SRAM块,从而提高了功率效率。通过谨慎地处理NVM集和SRAM集中的SRAM块的位置信息,所提出的缓存管理策略也提供了高性能。实验结果表明,与以往的工作相比,DAHYC可将混合缓存的动态功耗平均降低24.8%,在SPEC2000 INT基准测试中可降低54%,同时将混合缓存的性能平均提高1.16%。
{"title":"Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache","authors":"Jianhua Li, Liang Shi, C. Xue, Chengmo Yang, Yinlong Xu","doi":"10.1109/ESTIMedia.2011.6088521","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088521","url":null,"abstract":"Hybrid cache architectures have been proposed to mitigate the increasing on-chip power dissipation through the exploitation of the emerging non-volatile memories (NVMs). To overcome the high energy and long latency associated with write operations of NVMs, a small SRAM is typically incorporated into the hybrid cache for accommodating write-intensive cache blocks. How to efficiently manage this SRAM and manipulate the write operations are crucial to the performance of the hybrid cache. In this paper, we first present our observation that the intensity of write operations on different cache sets is usually non-uniform for real applications, such as multimedia, multi-programmed, multithreaded applications. The previously proposed hybrid cache schemes can not efficiently and symmetrically utilize the small SRAM to accommodate such widely-existing non-uniform writes on cache sets. Based on this observation, we propose a novel hybrid cache design, Dual Associative Hybrid Cache (denoted as DAHYC), as well as the corresponding cache management policy. By organizing the SRAM blocks in the hybrid cache as a semi-independent set-associative cache, several hybrid cache sets can efficiently share and cooperatively utilize their SRAM blocks, instead of exclusively utilizing the SRAM blocks in each cache set in previous hybrid cache schemes, to boost power-efficiency. Through prudently manipulating the locality information of SRAM blocks in both the NVM sets and the SRAM sets, the proposed cache management policy also delivers high-performance. Experimental results show that, compared with previous works, the DAHYC can reduce the dynamic power of the hybrid cache by 24.8% on average and up to 54% for SPEC2000 INT benchmarks, while at the same time improving the performance of the hybrid cache by 1.16% on average.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124654226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Evaluation of scheduling heuristics for jitter reduction of real-time streaming applications on multi-core general purpose hardware 多核通用硬件上实时流应用减少抖动的调度启发式评估
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088520
M. Westmijze, M. Bekooij, G. Smit, M. Schrijver
The real-time system research community has paid a lot of attention to the design of safety critical hard real-time systems for which the use of non-standard hardware and operating systems can be justified. However, stream processing applications like medical imaging systems are often not considered safety critical enough to justify the use of hard real-time techniques that would increase the cost of these systems significantly. Instead commercial off the shelf (COTS) hardware and OS are used, and techniques at the application level are employed to reduce the variation in the end-to-end latency of these imaging processing systems. In this paper, we study the effectiveness of a number of scheduling heuristics that are intended to reduce the latency and the jitter of stream processing applications that are executed on COTS multiprocessor systems. The proposed scheduling heuristics take the execution times of tasks into account as well as dependencies between the tasks, the data structures accessed by the tasks, and the memory hierarchy. Experiments were carried out on a quad core symmetric multiprocessing (SMP) Intel processor. These experiments show that the proposed heuristics can reduce the end-to-end latency with almost 60%, and reduce the variation in the latency with more than 90% when compared with a naive scheduling heuristic that does not consider execution times, dependencies and the memory hierarchy.
实时系统研究界对安全关键型硬实时系统的设计给予了很大的关注,因为使用非标准硬件和操作系统是合理的。然而,像医疗成像系统这样的流处理应用通常被认为不够安全,不足以证明使用硬实时技术是合理的,这将大大增加这些系统的成本。取而代之的是使用商用现货(COTS)硬件和操作系统,并采用应用程序级别的技术来减少这些成像处理系统端到端延迟的变化。在本文中,我们研究了一些调度启发式的有效性,这些启发式旨在减少在COTS多处理器系统上执行的流处理应用程序的延迟和抖动。提出的调度启发式方法考虑了任务的执行时间以及任务之间的依赖关系、任务访问的数据结构和内存层次结构。在四核对称多处理(SMP) Intel处理器上进行了实验。实验结果表明,与不考虑执行时间、依赖关系和内存层次结构的朴素调度启发式算法相比,所提出的启发式算法可以将端到端延迟减少近60%,将延迟变化减少90%以上。
{"title":"Evaluation of scheduling heuristics for jitter reduction of real-time streaming applications on multi-core general purpose hardware","authors":"M. Westmijze, M. Bekooij, G. Smit, M. Schrijver","doi":"10.1109/ESTIMedia.2011.6088520","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088520","url":null,"abstract":"The real-time system research community has paid a lot of attention to the design of safety critical hard real-time systems for which the use of non-standard hardware and operating systems can be justified. However, stream processing applications like medical imaging systems are often not considered safety critical enough to justify the use of hard real-time techniques that would increase the cost of these systems significantly. Instead commercial off the shelf (COTS) hardware and OS are used, and techniques at the application level are employed to reduce the variation in the end-to-end latency of these imaging processing systems. In this paper, we study the effectiveness of a number of scheduling heuristics that are intended to reduce the latency and the jitter of stream processing applications that are executed on COTS multiprocessor systems. The proposed scheduling heuristics take the execution times of tasks into account as well as dependencies between the tasks, the data structures accessed by the tasks, and the memory hierarchy. Experiments were carried out on a quad core symmetric multiprocessing (SMP) Intel processor. These experiments show that the proposed heuristics can reduce the end-to-end latency with almost 60%, and reduce the variation in the latency with more than 90% when compared with a naive scheduling heuristic that does not consider execution times, dependencies and the memory hierarchy.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"36 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114033484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Model checking a SystemC/TLM design of the AMBA AHB protocol AMBA AHB协议的SystemC/TLM设计模型检验
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088527
Marcel Pockrandt, Paula Herber, S. Glesner
Transaction Level Modeling (TLM) is gaining more and more importance to quickly evaluate design alternatives in multimedia systems and other mixed HW/SW systems. However, the comprehensive and automated verification of TLM models is still a difficult challenge. In previous work, we presented an approach for model checking of SystemC/TLM designs based on a transformation into Uppaal timed automata. In this paper, we present an optimized version of our previously proposed transformation, and show its effectiveness with experimental results from an industrial case study. The key idea is to generate a Uppaal model that is especially tailored for being model checked. This significantly reduces the semantic state space and makes model checking considerably faster and less memory-consuming. We demonstrate this by comparing the verification times of both versions for our previously used case study, and by presenting results from a new and larger case study, namely a TLM implementation of the AMBA Advanced High-performance Bus (AHB). The AMBA bus is one of the most popular on-chip bus architectures in IP-based embedded SoCs, and it is used in many multimedia applications. The case study shows that with the proposed optimizations, our approach is applicable for industrial real world examples. The detection of a serious bug, namely a deadlock situation in a certain scenario, and also the verification of some important safety, liveness, and timing properties provide evidence for the usefulness of our approach.
在多媒体系统和其他软硬件混合系统中,事务级建模(TLM)对于快速评估设计方案越来越重要。然而,对TLM模型进行全面、自动化的验证仍然是一项艰巨的挑战。在之前的工作中,我们提出了一种基于转换到Uppaal时间自动机的SystemC/TLM设计模型检查方法。在本文中,我们提出了先前提出的转换的优化版本,并通过工业案例研究的实验结果证明了其有效性。关键思想是生成一个特别为模型检查而定制的Uppaal模型。这大大减少了语义状态空间,使模型检查速度更快,内存消耗更少。我们通过比较我们之前使用的案例研究的两个版本的验证时间,并通过展示来自一个新的更大的案例研究的结果,即AMBA高级高性能总线(AHB)的TLM实现,来证明这一点。AMBA总线是基于ip的嵌入式soc中最流行的片上总线体系结构之一,它被用于许多多媒体应用中。案例研究表明,通过提出的优化,我们的方法适用于工业现实世界的示例。检测一个严重的错误,即某个场景中的死锁情况,以及验证一些重要的安全性、活动性和定时属性,为我们的方法的有效性提供了证据。
{"title":"Model checking a SystemC/TLM design of the AMBA AHB protocol","authors":"Marcel Pockrandt, Paula Herber, S. Glesner","doi":"10.1109/ESTIMedia.2011.6088527","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088527","url":null,"abstract":"Transaction Level Modeling (TLM) is gaining more and more importance to quickly evaluate design alternatives in multimedia systems and other mixed HW/SW systems. However, the comprehensive and automated verification of TLM models is still a difficult challenge. In previous work, we presented an approach for model checking of SystemC/TLM designs based on a transformation into Uppaal timed automata. In this paper, we present an optimized version of our previously proposed transformation, and show its effectiveness with experimental results from an industrial case study. The key idea is to generate a Uppaal model that is especially tailored for being model checked. This significantly reduces the semantic state space and makes model checking considerably faster and less memory-consuming. We demonstrate this by comparing the verification times of both versions for our previously used case study, and by presenting results from a new and larger case study, namely a TLM implementation of the AMBA Advanced High-performance Bus (AHB). The AMBA bus is one of the most popular on-chip bus architectures in IP-based embedded SoCs, and it is used in many multimedia applications. The case study shows that with the proposed optimizations, our approach is applicable for industrial real world examples. The detection of a serious bug, namely a deadlock situation in a certain scenario, and also the verification of some important safety, liveness, and timing properties provide evidence for the usefulness of our approach.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114775286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1