2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia最新文献

英文中文

A novel low-power embedded object recognition system working at multi-frames per second (Extended abstract) 一种新型的低功耗嵌入式多帧每秒目标识别系统(扩展摘要)

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1145/2435227.2435229

A. Nikitakis, Savvas Papaioannou, I. Papaefstathiou

One very important challenge in the field of multimedia is the implementation of fast and detailed Object Detection and Recognition systems. In particular, in the current state-of-the-art mobile multimedia systems, it is highly desirable to detect and locate certain objects within a video frame in real time. In this paper, we present a novel FPGA-based embedded implementation of a very efficient object recognition algorithm called Receptive Field Cooccurrence Histograms Algorithm(RFCH). Our main focus was to increase its performance so as to be able to handle the object recognition task of today's highly sophisticated embedded multimedia systems while keeping its energy consumption at very low levels. Our low-power embedded reconfigurable system is at least 15 times faster than the software implementation on a low-voltage high-end CPU, while consuming at least 60 times less energy. Our novel system is also 88 times more energy efficient than the recently introduced low-power multi-core Intel devices which are optimized for embedded systems.

在多媒体领域的一个非常重要的挑战是实现快速和详细的目标检测和识别系统。特别是，在当前最先进的移动多媒体系统中，非常需要实时检测和定位视频帧内的某些物体。在本文中，我们提出了一种新颖的基于fpga的嵌入式实现，它实现了一种非常高效的目标识别算法，称为接受场协同直方图算法(RFCH)。我们的主要重点是提高其性能，以便能够处理当今高度复杂的嵌入式多媒体系统的目标识别任务，同时将其能耗保持在非常低的水平。我们的低功耗嵌入式可重构系统比在低压高端CPU上的软件实现至少快15倍，而消耗的能量至少少60倍。我们的新系统比最近推出的针对嵌入式系统优化的低功耗多核英特尔设备节能88倍。

引用次数: 5

Static prediction of recursion frequency using machine learning to enable hot spot optimizations 使用机器学习实现热点优化的递归频率静态预测

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507027

D. Tetzlaff, S. Glesner

Recursion poses a severe problem for static optimizations because its execution frequency usually depends upon runtime values, hence being rarely predictable at compile time. As a consequence, optimization potential of programs is sacrificed since possible hot paths where most of the execution time is spent and where optimization would be beneficial might be undiscovered. In this paper, we propose a sophisticated machine learning based approach to statically predict the recursion frequency of functions for programs in real-world application domains, which can be used to guide various hot spot optimizations. Our experiments with 369 programs of 25 benchmark suites from different domains demonstrate that our approach is applicable to a wide range of programs with different behavior and yields more precise heuristics than those generated by pure static analyses. Moreover, our results provide valuable insights into recursive structures in general, when they appear and how deep they are.

递归给静态优化带来了严重的问题，因为它的执行频率通常取决于运行时值，因此在编译时很难预测。因此，程序的优化潜力被牺牲了，因为可能没有发现花费大部分执行时间和优化有益的可能的热路径。在本文中，我们提出了一种复杂的基于机器学习的方法来静态预测实际应用领域中程序的函数递归频率，该方法可用于指导各种热点优化。我们对来自不同领域的25个基准套件的369个程序进行的实验表明，我们的方法适用于具有不同行为的广泛程序，并且比纯静态分析生成的启发式更精确。此外，我们的结果为递归结构提供了有价值的见解，包括它们何时出现以及它们有多深。

引用次数: 3

TEACA: Thread ProgrEss Aware Coherence Adaption for hybrid coherence protocols 混合相干协议的线程进程感知相干自适应

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507024

Jianhua Li, Liang Shi, Qing'an Li, C. Xue, Yinlong Xu

Hybrid coherence protocols can provide the scalability of directory protocols and low latency sharing miss handling in snooping protocols simultaneously. Unfortunately, how to adapt the hybrid protocols at runtime is not well studied. This paper proposes Thread ProgrEss Aware Coherence Adaption (TEACA) which utilizes the thread progress information as the hints to adapt hybrid coherence protocols. Specifically, TEACA fuses the memory system statistics to estimate the progress of threads. Based on the estimated thread progress information, TEACA dynamically categorizes threads into leader threads and laggard threads. The thread categorization decisions are then leveraged for efficient coherence adaption in hybrid coherence protocols. A case study on a recently proposed hybrid protocol (PATCH [29]) shows that, with the hints from TEACA, the enhanced hybrid protocol outperforms its baseline in both application execution time and energy dissipation.

混合相干协议可以同时提供目录协议的可扩展性和低延迟的共享缺失处理。不幸的是，如何在运行时适应混合协议还没有得到很好的研究。提出了线程进度感知相干自适应(TEACA)方法，该方法利用线程进度信息作为提示来适应混合相干协议。具体来说，TEACA融合了内存系统的统计信息来估计线程的进程。TEACA根据预估的线程进度信息，动态地将线程划分为领先线程和落后线程。然后利用线程分类决策在混合一致性协议中实现高效的一致性适应。最近提出的一种混合协议(PATCH[29])的案例研究表明，在TEACA的提示下，增强的混合协议在应用程序执行时间和能量消耗方面都优于其基线。

引用次数: 2

Enhancing user experiences by exploiting energy and launch delay tradeoff of mobile multimedia applications (Extended abstract) 利用移动多媒体应用的能量和启动延迟权衡来增强用户体验(扩展摘要)

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507034

Yi-Fan Chung, Yin-Tsung Lo, C. King

The growing multimedia applications on smart phones place ever more stringent demands on user experiences. A key factor affecting user experiences is the delay in launching applications. It affects a user's perception of the responsiveness of the phone and the multimedia applications.

智能手机上日益增长的多媒体应用对用户体验提出了越来越高的要求。影响用户体验的一个关键因素是启动应用程序的延迟。它会影响用户对手机和多媒体应用程序的响应性的感知。

引用次数: 0

Loop instruction caching for energy-efficient embedded multitasking processors 循环指令缓存节能嵌入式多任务处理器

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507036

Ji Gu, T. Ishihara, Kyungsoo Lee

With the exponential increase of power consumption in processor generations, energy dissipation has become one of the most critical constraints in system design. Cache memories are usually the most energy consuming components on the processor chip due to their large die size occupation and frequent access operations. Furthermore, in step with the increased complexity of modern embedded applications, microprocessors are increasingly executing multitasking applications. In multitasking processors, the conventional L1 instruction cache (I-cache) is usually shared by multiple tasks and thereby suffering a highly intensive read/write operations, which can be even more energy-consuming than used in a single-task based system. This paper presents an energy-efficient shared multitasking loop instruction cache (SMLIC), which is designed to address the tasks sharing and context switch issues so that it can be efficiently utilized to reduce the I-cache accesses for energy savings in multitasking processors. Experiments on a set of multitasking applications demonstrate that the proposed SMLIC design scheme can reduce I-cache accesses by 12∼86% and energy consumption in instruction supply by 11∼79% for multitasking system, depending on various frequencies of context switch.

随着处理器功耗呈指数级增长，功耗已成为系统设计中最关键的制约因素之一。高速缓存存储器通常是处理器芯片上消耗能量最多的组件，因为它们占用大量的芯片尺寸和频繁的访问操作。此外，随着现代嵌入式应用程序复杂性的增加，微处理器越来越多地执行多任务应用程序。在多任务处理器中，传统的L1指令缓存(I-cache)通常由多个任务共享，因此需要进行高度密集的读/写操作，这可能比基于单任务的系统消耗更多的能量。本文提出了一种节能的共享多任务循环指令缓存(SMLIC)，旨在解决多任务处理器的任务共享和上下文切换问题，从而有效地利用SMLIC来减少I-cache访问以节省能源。在一组多任务应用中进行的实验表明，根据上下文切换的不同频率，所提出的SMLIC设计方案可以将多任务系统的I-cache访问减少12 ~ 86%，指令供应能耗减少11 ~ 79%。

{"title":"Loop instruction caching for energy-efficient embedded multitasking processors","authors":"Ji Gu, T. Ishihara, Kyungsoo Lee","doi":"10.1109/ESTIMedia.2012.6507036","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507036","url":null,"abstract":"With the exponential increase of power consumption in processor generations, energy dissipation has become one of the most critical constraints in system design. Cache memories are usually the most energy consuming components on the processor chip due to their large die size occupation and frequent access operations. Furthermore, in step with the increased complexity of modern embedded applications, microprocessors are increasingly executing multitasking applications. In multitasking processors, the conventional L1 instruction cache (I-cache) is usually shared by multiple tasks and thereby suffering a highly intensive read/write operations, which can be even more energy-consuming than used in a single-task based system. This paper presents an energy-efficient shared multitasking loop instruction cache (SMLIC), which is designed to address the tasks sharing and context switch issues so that it can be efficiently utilized to reduce the I-cache accesses for energy savings in multitasking processors. Experiments on a set of multitasking applications demonstrate that the proposed SMLIC design scheme can reduce I-cache accesses by 12∼86% and energy consumption in instruction supply by 11∼79% for multitasking system, depending on various frequencies of context switch.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129925830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping of streaming applications considering alternative application specifications (Extended abstract) 考虑可选应用程序规范的流应用程序映射(扩展抽象)

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1145/2435227.2435230

J. Zhai, Hristo Nikolov, T. Stefanov

Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance requirements and resource budgets of embedded systems ask for an efficient design space exploration (DSE) approach to select the best design from a design space consisting of a large number of design choices. However, existing DSE approaches explore the design space that includes only architecture and mapping alternatives for an initial application specification given by the application designer. In this paper, we first show that a design often might not be optimal if alternative specifications of a given application are not taken into account. We further argue that the best alternative specification consists of only independent and load-balanced application tasks. Based on the Polyhedral Process Network (PPN) MoC, we present an approach to analyze and transform an initial PPN to an alternative one that contains only independent processes if possible. Finally, by prototyping real-life applications on both FPGA-based MPSoCs and desktop multi-core platforms, we demonstrate that mapping the alternative application specification results in a large performance gain compared to those approaches, in which alternative application specifications are not taken into account.

流应用程序通常需要并行计算模型(MoC)来指定其应用程序行为，并方便映射到多处理器片上系统(MPSoC)平台。嵌入式系统的各种性能需求和资源预算要求一种有效的设计空间探索(DSE)方法，以便从大量设计选择组成的设计空间中选择最佳设计。然而，现有的DSE方法探索的设计空间只包括由应用程序设计人员给出的初始应用程序规范的体系结构和映射替代方案。在本文中，我们首先表明，如果不考虑给定应用程序的可选规范，设计通常可能不是最优的。我们进一步论证，最佳替代规范只包含独立且负载均衡的应用程序任务。基于多面体过程网络(Polyhedral Process Network, PPN) MoC，我们提出了一种方法来分析和转换一个初始的PPN，并在可能的情况下将其转换为一个只包含独立过程的备选PPN。最后，通过在基于fpga的mpsoc和桌面多核平台上对实际应用程序进行原型设计，我们证明了与不考虑替代应用程序规范的方法相比，映射替代应用程序规范可以获得较大的性能增益。

{"title":"Mapping of streaming applications considering alternative application specifications (Extended abstract)","authors":"J. Zhai, Hristo Nikolov, T. Stefanov","doi":"10.1145/2435227.2435230","DOIUrl":"https://doi.org/10.1145/2435227.2435230","url":null,"abstract":"Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance requirements and resource budgets of embedded systems ask for an efficient design space exploration (DSE) approach to select the best design from a design space consisting of a large number of design choices. However, existing DSE approaches explore the design space that includes only architecture and mapping alternatives for an initial application specification given by the application designer. In this paper, we first show that a design often might not be optimal if alternative specifications of a given application are not taken into account. We further argue that the best alternative specification consists of only independent and load-balanced application tasks. Based on the Polyhedral Process Network (PPN) MoC, we present an approach to analyze and transform an initial PPN to an alternative one that contains only independent processes if possible. Finally, by prototyping real-life applications on both FPGA-based MPSoCs and desktop multi-core platforms, we demonstrate that mapping the alternative application specification results in a large performance gain compared to those approaches, in which alternative application specifications are not taken into account.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126992714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Keynote: “Design space exploration and run-time resource management in the embedded multi-core era” 主题演讲:嵌入式多核时代的设计空间探索与运行时资源管理

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507016

S. Bampi

Increasingly demanding complex algorithms for multimedia systems and higher resolutions for multiview videos hit power and memory walls in portable hardware. Silicon IC technology scaling is reaching two-dimensional limitations that accompany escalating technology cost wall. In this scenario the severe costs of power density, circuit performance variability and energy constraints call for new algorithms-to-architecture approaches. This talk will highlight the architectures and circuits techniques that will influence multimedia systems architectures in the future. Design challenges and specific solutions that deal with energy dissipation in the case of multiview video are addressed. In this presentation the technology-design-architecture-algorithms interactions are pointed as drivers for new cross-layer optimizations in energy-constrained multimedia systems.

多媒体系统对复杂算法的要求越来越高，多视点视频的分辨率也越来越高，这对便携式硬件的功耗和内存造成了冲击。随着硅集成电路技术成本的不断上升，硅集成电路技术的规模已经达到了二维极限。在这种情况下，功率密度的严重成本，电路性能的可变性和能量限制要求新的算法到架构的方法。本讲座将重点介绍影响未来多媒体系统架构的架构和电路技术。讨论了在多视点视频情况下处理能量耗散的设计挑战和具体解决方案。在本报告中，技术-设计-架构-算法交互被指出是能源受限多媒体系统中新的跨层优化的驱动因素。

引用次数: 0

O2render: An OpenCL-to-Renderscript translator for porting across various GPUs or CPUs O2render:用于跨各种gpu或cpu移植的OpenCL-to-Renderscript转换器

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

Pub Date : 2012-10-01 DOI: 10.1109/ESTIMedia.2012.6507031

Cheng-yan Yang, Yi-jui Wu, S. Liao

More than half-a-billion Android devices are world's most impactful real-time, interactive multimedia systems that are open-sourced. Google introduced Renderscript language and runtime in Android releases starting in 2011. Renderscript delivers performance and portability without losing usability. However, it is difficult to reuse software written in existing compute languages such as OpenCL. Thus, we develop the O2render system to enable OpenCL programs on Android devices. We analyze fundamental differences between OpenCL and Renderscript, and present our design of a translator between them using low-level virtual machine (LLVM). We extend LLVMs frontend, Clang, and show that we achieve about the same performance in Renderscript with minimal translation overhead.

超过5亿的Android设备是世界上最具影响力的实时、交互式、开源的多媒体系统。Google从2011年开始在Android版本中引入了Renderscript语言和运行时。Renderscript提供了性能和可移植性，同时又不失可用性。然而，用现有的计算语言(如OpenCL)编写的软件很难重用。因此，我们开发了O2render系统，使OpenCL程序能够在Android设备上运行。我们分析了OpenCL和Renderscript之间的基本区别，并介绍了我们使用低级虚拟机(LLVM)设计的它们之间的转换器。我们扩展了llvm前端，Clang，并表明我们在Renderscript中以最小的转换开销实现了相同的性能。

引用次数: 10

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀