首页 > 最新文献

2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia最新文献

英文 中文
Automatic derivation of polyhedral process networks from while-loop affine programs 从while-loop仿射程序中自动推导多面体过程网络
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088516
D. Nadezhkin, T. Stefanov
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications, signal processing, etc., that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with while-loops into input-output equivalent PPNs.
进程网络(PNs)是一种合适的并行计算模型(MoC),用于以并行形式指定嵌入式流应用程序,从而促进到嵌入式并行执行平台的有效映射。不幸的是,使用并行MoC指定应用程序是非常困难且非常容易出错的任务。为了克服相关的困难,存在一个从静态仿射嵌套循环程序(sanlp)派生特定多面体过程网络(PPN)的自动化程序。这个过程在pn编译器中实现。然而,有许多应用,如多媒体应用,信号处理等,具有自适应和动态的行为,不能表示为sanlp。因此,为了处理更多的动态应用程序,在本文中,我们讨论了一个重要的问题,即我们是否可以在保持执行编译时分析和派生ppn的能力的同时放宽sanlp的一些限制。实现这一点将显著扩展可以以自动化方式并行化的应用程序的范围。本文的主要贡献是将带有while循环的仿射嵌套循环程序自动转换为输入输出等效ppn的第一种方法。
{"title":"Automatic derivation of polyhedral process networks from while-loop affine programs","authors":"D. Nadezhkin, T. Stefanov","doi":"10.1109/ESTIMedia.2011.6088516","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088516","url":null,"abstract":"The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications, signal processing, etc., that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with while-loops into input-output equivalent PPNs.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130353911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Tractable real-time schedulability analysis for mode changes under temporal isolation 时间隔离下模式变化的可处理实时可调度性分析
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088519
N. Fisher, Masud Ahmed
Real-time multimedia subsystems often require support for switching between different resource and application execution modes. To ensure that timing constraints are not violated during or after a subsystem changes mode, real-time schedulability analysis is required. However, existing time-efficient multi-mode schedulability analysis techniques for application-only mode changes are not appropriate for subsystems that require changes in the resource execution behavior (e.g., processors with dynamic power modes). Furthermore, all existing multi-mode schedulability analysis that handles both resource and application mode changes is highly exponential and not scalable for subsystems with a moderate or large number of modes. We address the lack of tractable schedulability analysis for such subsystems by proposing a model for characterizing multiple resource and application modes and by deriving a sufficient schedulability test that has pseudo-polynomial time complexity. Simulation results show that our proposed schedulability test, when compared with previously-proposed approaches, requires significantly less time and is just as precise.
实时多媒体子系统通常需要支持在不同的资源和应用程序执行模式之间切换。为了确保在子系统更改模式期间或之后不违反时间约束,需要进行实时可调度性分析。然而,现有的仅用于应用程序模式更改的时间效率高的多模式可调度性分析技术不适用于需要更改资源执行行为的子系统(例如,具有动态电源模式的处理器)。此外,所有现有的处理资源和应用程序模式变化的多模式可调度性分析都是高度指数式的,对于具有中等或大量模式的子系统来说是不可扩展的。我们提出了一个描述多资源和应用模式的模型,并推导了一个具有伪多项式时间复杂度的足够的可调度性测试,从而解决了此类子系统缺乏可处理的可调度性分析的问题。仿真结果表明,与已有的可调度性测试方法相比,所提出的可调度性测试方法所需的时间明显减少,且精度相同。
{"title":"Tractable real-time schedulability analysis for mode changes under temporal isolation","authors":"N. Fisher, Masud Ahmed","doi":"10.1109/ESTIMedia.2011.6088519","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088519","url":null,"abstract":"Real-time multimedia subsystems often require support for switching between different resource and application execution modes. To ensure that timing constraints are not violated during or after a subsystem changes mode, real-time schedulability analysis is required. However, existing time-efficient multi-mode schedulability analysis techniques for application-only mode changes are not appropriate for subsystems that require changes in the resource execution behavior (e.g., processors with dynamic power modes). Furthermore, all existing multi-mode schedulability analysis that handles both resource and application mode changes is highly exponential and not scalable for subsystems with a moderate or large number of modes. We address the lack of tractable schedulability analysis for such subsystems by proposing a model for characterizing multiple resource and application modes and by deriving a sufficient schedulability test that has pseudo-polynomial time complexity. Simulation results show that our proposed schedulability test, when compared with previously-proposed approaches, requires significantly less time and is just as precise.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132677900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Parallelization of a Bokeh application on embedded multicore DSP systems 嵌入式多核DSP系统上散景应用的并行化
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088531
Chi-Bang Kuan, Shao-Chung Wang, Wen-Li Shih, Kun-Hsien Tsai, S. Lai, Jenq-Kuen Lee
Bokeh application presents the blur or the aesthetic quality of blurring in out-of-focus areas of an image. The out-of-focus effect of Bokeh results depends on accuracy of depth information and blurring effects produced by image postprocessing. To obtain accurate depth information, current stereo vision techniques however consume a huge amount of processing time. In this paper, we present a case study on parallelizing a Bokeh application on an embedded multicore platform, which features one MPU and one DSP sub-system consisting of two VLIW DSP processors. The Bokeh application employs a Belief Propagation method to obtain depth information of input images and uses the information to generate output images with out-of-focus effect. This study also illustrates how to deliver performance for applications on embedded multicore systems. To sustain heavy computation requirement of the stereo vision techniques, DSPs with their SIMD instructions are leveraged to exploit data parallelism in critical kernels. In addition, DMAs on the multicore system are also incorporated to facilitate data transmission between processors. The access to SIMD and DMAs is provided by two essential programming models we developed for embedded multicore systems. Our work also gives the firsthand experiences of how C++ classes and abstractions can be used to help parallelization of applications on embedded multicore DSP systems. Finally, in our experiments, we utilize DSPs, SIMD and DMAs to obtain performance for two key components of the Bokeh application with their speedups of 1.67 and 2.75, respectively.
散景应用程序在图像的失焦区域呈现模糊或模糊的美学质量。散景结果的失焦效果取决于深度信息的准确性和图像后处理产生的模糊效果。为了获得准确的深度信息,现有的立体视觉技术需要耗费大量的处理时间。在本文中,我们提出了一个在嵌入式多核平台上并行化散景应用程序的案例研究,该平台具有一个MPU和一个DSP子系统,由两个VLIW DSP处理器组成。Bokeh应用采用Belief Propagation方法获取输入图像的深度信息,并利用这些信息生成失焦效果的输出图像。本研究还说明了如何为嵌入式多核系统上的应用程序提供性能。为了满足立体视觉技术的大量计算需求,利用dsp及其SIMD指令来利用关键内核中的数据并行性。此外,多核系统上的dma也被纳入,以方便处理器之间的数据传输。我们为嵌入式多核系统开发的两个基本编程模型提供了对SIMD和dma的访问。我们的工作还提供了如何使用c++类和抽象来帮助嵌入式多核DSP系统上的应用程序并行化的第一手经验。最后,在我们的实验中,我们利用dsp, SIMD和dma分别以1.67和2.75的速度获得散景应用程序的两个关键组件的性能。
{"title":"Parallelization of a Bokeh application on embedded multicore DSP systems","authors":"Chi-Bang Kuan, Shao-Chung Wang, Wen-Li Shih, Kun-Hsien Tsai, S. Lai, Jenq-Kuen Lee","doi":"10.1109/ESTIMedia.2011.6088531","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088531","url":null,"abstract":"Bokeh application presents the blur or the aesthetic quality of blurring in out-of-focus areas of an image. The out-of-focus effect of Bokeh results depends on accuracy of depth information and blurring effects produced by image postprocessing. To obtain accurate depth information, current stereo vision techniques however consume a huge amount of processing time. In this paper, we present a case study on parallelizing a Bokeh application on an embedded multicore platform, which features one MPU and one DSP sub-system consisting of two VLIW DSP processors. The Bokeh application employs a Belief Propagation method to obtain depth information of input images and uses the information to generate output images with out-of-focus effect. This study also illustrates how to deliver performance for applications on embedded multicore systems. To sustain heavy computation requirement of the stereo vision techniques, DSPs with their SIMD instructions are leveraged to exploit data parallelism in critical kernels. In addition, DMAs on the multicore system are also incorporated to facilitate data transmission between processors. The access to SIMD and DMAs is provided by two essential programming models we developed for embedded multicore systems. Our work also gives the firsthand experiences of how C++ classes and abstractions can be used to help parallelization of applications on embedded multicore DSP systems. Finally, in our experiments, we utilize DSPs, SIMD and DMAs to obtain performance for two key components of the Bokeh application with their speedups of 1.67 and 2.75, respectively.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128300006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scheduling of stream programs onto SPM enhanced processors with code overlay 调度流程序到SPM增强处理器与代码覆盖
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088530
W. Che, Karam S. Chatha
Scratch Pad Memories (SPM) have emerged as an alternative to caches in embedded processor architectures due to their lower power consumption, smaller chip area and superior performance. However, the advantages of SPM come at the expense of increased load on the programmer as she is responsible for memory management. Consequently, there is a need for novel compilation for mapping applications onto SPM enhanced embedded processors. Stream programs (that describe a large class of embedded applications) demonstrate stable memory access patterns, and are particularly suitable for SPM based processors. In this paper we present a heuristic approach for scheduling and compiling streaming applications (modeled by synchronous data flow graphs) for SPM enhanced processors. The technique maximizes the application performance by minimizing code overlay overheads that are introduced when executing a large code base on a smaller sized SPM. We also present an extension of our approach that further reduces the overheads by selective code pre-fetching. The effectiveness of our approaches is evaluated by compiling ten streaming application onto one Synergistic Processing Engine (SPE) of the IBM Cell processor.
由于其更低的功耗、更小的芯片面积和卓越的性能,SPM已成为嵌入式处理器架构中缓存的替代方案。然而,SPM的优点是以增加程序员的负载为代价的,因为程序员要负责内存管理。因此,需要一种新的编译方法来将应用程序映射到SPM增强的嵌入式处理器上。流程序(描述一大类嵌入式应用程序)展示了稳定的内存访问模式,特别适合基于SPM的处理器。在本文中,我们提出了一种启发式方法,用于调度和编译SPM增强处理器的流应用程序(由同步数据流图建模)。该技术通过最小化在较小的SPM上执行大型代码库时引入的代码覆盖开销来最大化应用程序性能。我们还对我们的方法进行了扩展,通过选择性代码预取进一步减少了开销。通过将十个流应用程序编译到IBM Cell处理器的一个协同处理引擎(SPE)上来评估我们方法的有效性。
{"title":"Scheduling of stream programs onto SPM enhanced processors with code overlay","authors":"W. Che, Karam S. Chatha","doi":"10.1109/ESTIMedia.2011.6088530","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088530","url":null,"abstract":"Scratch Pad Memories (SPM) have emerged as an alternative to caches in embedded processor architectures due to their lower power consumption, smaller chip area and superior performance. However, the advantages of SPM come at the expense of increased load on the programmer as she is responsible for memory management. Consequently, there is a need for novel compilation for mapping applications onto SPM enhanced embedded processors. Stream programs (that describe a large class of embedded applications) demonstrate stable memory access patterns, and are particularly suitable for SPM based processors. In this paper we present a heuristic approach for scheduling and compiling streaming applications (modeled by synchronous data flow graphs) for SPM enhanced processors. The technique maximizes the application performance by minimizing code overlay overheads that are introduced when executing a large code base on a smaller sized SPM. We also present an extension of our approach that further reduces the overheads by selective code pre-fetching. The effectiveness of our approaches is evaluated by compiling ten streaming application onto one Synergistic Processing Engine (SPE) of the IBM Cell processor.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121333999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SVC or MDC? That's the question SVC还是MDC?这就是问题所在
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088528
Yi-Hsuan Chiang, Polly Huang, Homer H. Chen
Multi-rate video scalable codecs, SVC and MDC, provide as plausible solutions to deal with heterogeneous environment of Internet. They, however, also give rise to a wide debate over which one is more efficient supporting P2P IPTV systems. Our goal in this work is to resolve the debate by providing a quantitative comparison of P2P IPTV systems given different choices of coding schemes and P2P network formations. The answer is rather subtle. MDC-based systems, though outperform SVC-based ones under certain network formation with bottleneck in terms of network throughput, suffer from a lower level of perceptual quality in terms of PSNR due to the coding inefficiency. The results drawn from this paper can be provided not only a lesson to the design of large-scale heterogeneous P2P IPTV systems but also as a strong evidence that a poor choice of codec at the higher level might over shadow the network-level designs and the codec and network formation components ought to be co-designed for optimal user experience.
多速率视频可扩展编解码器SVC和MDC为处理Internet异构环境提供了可行的解决方案。然而,它们也引起了关于哪一个更有效地支持P2P IPTV系统的广泛争论。我们在这项工作中的目标是通过提供给定不同编码方案和P2P网络结构的P2P IPTV系统的定量比较来解决争论。答案相当微妙。在网络吞吐量存在瓶颈的特定网络形态下,基于mdc的系统虽然优于基于svc的系统,但由于编码效率低下,在PSNR方面的感知质量水平较低。本文的研究结果不仅可以为大规模异构P2P IPTV系统的设计提供借鉴,而且还有力地证明了在较高级别选择错误的编解码器可能会影响网络级别的设计,编解码器和网络组成组件应该共同设计以获得最佳用户体验。
{"title":"SVC or MDC? That's the question","authors":"Yi-Hsuan Chiang, Polly Huang, Homer H. Chen","doi":"10.1109/ESTIMedia.2011.6088528","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088528","url":null,"abstract":"Multi-rate video scalable codecs, SVC and MDC, provide as plausible solutions to deal with heterogeneous environment of Internet. They, however, also give rise to a wide debate over which one is more efficient supporting P2P IPTV systems. Our goal in this work is to resolve the debate by providing a quantitative comparison of P2P IPTV systems given different choices of coding schemes and P2P network formations. The answer is rather subtle. MDC-based systems, though outperform SVC-based ones under certain network formation with bottleneck in terms of network throughput, suffer from a lower level of perceptual quality in terms of PSNR due to the coding inefficiency. The results drawn from this paper can be provided not only a lesson to the design of large-scale heterogeneous P2P IPTV systems but also as a strong evidence that a poor choice of codec at the higher level might over shadow the network-level designs and the codec and network formation components ought to be co-designed for optimal user experience.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134155735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
System perspective on embedded multimedia 嵌入式多媒体的系统视角
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088533
Liang-Gee Chen
Today's user demand for multimedia has moved into an anywhere anytime paradigm. The ubiquitous usage model creates lots of needs for embedded multimedia system design, and traditional module-wise design concept will not suffice. In this talk, the system design view of modern VLSI architectures for multimedia applications, including H.264, scalable video coding (SVC) and stereo/3D video coding, will be reviewed. In addition, several emerging applications, where machine-to-machine and machine-to-human design factors also become important, like distributed video coding (DVC), free-viewpoint TV and intelligent image recognition, will also be introduced. With the growth of these embedded architecture researches, we can expect a fruitful future of multimedia ICs and systems.
今天的用户对多媒体的需求已经进入了随时随地的模式。通用的使用模式对嵌入式多媒体系统的设计提出了许多要求,传统的模块化设计理念已不能满足要求。在本次演讲中,将回顾用于多媒体应用的现代VLSI架构的系统设计观点,包括H.264,可扩展视频编码(SVC)和立体声/3D视频编码。此外,还将介绍一些新兴应用,其中机器对机器和机器对人的设计因素也变得重要,如分布式视频编码(DVC),自由视点电视和智能图像识别。随着这些嵌入式体系结构研究的发展,我们可以期待多媒体集成电路和系统的丰硕未来。
{"title":"System perspective on embedded multimedia","authors":"Liang-Gee Chen","doi":"10.1109/ESTIMedia.2011.6088533","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088533","url":null,"abstract":"Today's user demand for multimedia has moved into an anywhere anytime paradigm. The ubiquitous usage model creates lots of needs for embedded multimedia system design, and traditional module-wise design concept will not suffice. In this talk, the system design view of modern VLSI architectures for multimedia applications, including H.264, scalable video coding (SVC) and stereo/3D video coding, will be reviewed. In addition, several emerging applications, where machine-to-machine and machine-to-human design factors also become important, like distributed video coding (DVC), free-viewpoint TV and intelligent image recognition, will also be introduced. With the growth of these embedded architecture researches, we can expect a fruitful future of multimedia ICs and systems.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A whole layer performance analysis method for Android platforms Android平台的全层性能分析方法
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088515
Namseung Lee, Sung-Soo Lim
As the products based on Android platform have been widely spread in consumer electronics market, the needs for systematic performance analysis have significantly increased. Conventional approaches rely on publicly open performance analysis tools in Android SDK or Linux community such as DDMS (Dalvik Debug Monitor Server), LTTng, Oprofile, and Ftrace. Though the approaches provide analysis or measurement results in certain aspects and specific software layers, any methods do not give a whole software layer view in performance analysis. For example, once a method in an Android application turned out to be a performance bottleneck, it is very hard to locate the code fragments that actually caused the bottleneck in the whole software layers: the application codes do not provide direct reason for the bottleneck, but the underlying native layers including kernel events often cause the bottleneck.
随着基于Android平台的产品在消费电子市场的广泛普及,对系统性能分析的需求显著增加。传统的方法依赖于Android SDK或Linux社区中公开开放的性能分析工具,如DDMS (Dalvik Debug Monitor Server)、ltng、Oprofile和Ftrace。虽然这些方法提供了某些方面和特定软件层的分析或测量结果,但任何方法都不能在性能分析中给出整个软件层的视图。例如,一旦Android应用程序中的某个方法成为性能瓶颈,就很难在整个软件层中找到真正导致瓶颈的代码片段:应用程序代码并不提供导致瓶颈的直接原因,但是底层的原生层(包括内核事件)通常会导致瓶颈。
{"title":"A whole layer performance analysis method for Android platforms","authors":"Namseung Lee, Sung-Soo Lim","doi":"10.1109/ESTIMedia.2011.6088515","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088515","url":null,"abstract":"As the products based on Android platform have been widely spread in consumer electronics market, the needs for systematic performance analysis have significantly increased. Conventional approaches rely on publicly open performance analysis tools in Android SDK or Linux community such as DDMS (Dalvik Debug Monitor Server), LTTng, Oprofile, and Ftrace. Though the approaches provide analysis or measurement results in certain aspects and specific software layers, any methods do not give a whole software layer view in performance analysis. For example, once a method in an Android application turned out to be a performance bottleneck, it is very hard to locate the code fragments that actually caused the bottleneck in the whole software layers: the application codes do not provide direct reason for the bottleneck, but the underlying native layers including kernel events often cause the bottleneck.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A flash-friendly B+-tree with endurance-awareness 一棵具有耐力意识的B+树
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088523
Hua-Wei Fang, Mi-Yen Yeh, Pei-Lun Suei, Tei-Wei Kuo
This work is motivated by the strong demands of flash-friendly index designs to resolve reliability and performance concerns for data manipulations over flash memory. Different from the past work, we propose and explore the impacts of hot-data access and sibling-link updates to a tree index structure over flash memory. In particular, a flash-friendly B+-tree, referred to as a Durable B+-tree, is proposed to not only improve the endurance but also the performance of a tree index structure over flash memory. The capability of the proposed methodology and index design was evaluated by a series of experiments, in which significant improvement on endurance was achieved, compared with the past work.
这项工作的动机是对闪存友好索引设计的强烈需求,以解决闪存上数据操作的可靠性和性能问题。与以往的工作不同,我们提出并探讨了热数据访问和兄弟链接更新对闪存上树索引结构的影响。特别提出了一种闪存友好的B+树,称为耐用B+树,它不仅可以提高持久性,还可以提高树索引结构在闪存上的性能。通过一系列实验对所提出的方法和指标设计的能力进行了评估,与以往的工作相比,耐力得到了显着提高。
{"title":"A flash-friendly B+-tree with endurance-awareness","authors":"Hua-Wei Fang, Mi-Yen Yeh, Pei-Lun Suei, Tei-Wei Kuo","doi":"10.1109/ESTIMedia.2011.6088523","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088523","url":null,"abstract":"This work is motivated by the strong demands of flash-friendly index designs to resolve reliability and performance concerns for data manipulations over flash memory. Different from the past work, we propose and explore the impacts of hot-data access and sibling-link updates to a tree index structure over flash memory. In particular, a flash-friendly B+-tree, referred to as a Durable B+-tree, is proposed to not only improve the endurance but also the performance of a tree index structure over flash memory. The capability of the proposed methodology and index design was evaluated by a series of experiments, in which significant improvement on endurance was achieved, compared with the past work.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"81 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A greedy approach to tolerate defect cores for multimedia applications 多媒体应用中容忍缺陷核的贪婪方法
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088517
Kelvin K. Yue, Soumia Ghalim, Zheng Li, Frank Lockom, Shangping Ren, Lei Zhang, Xiaowei Li
Computation-intensive multimedia applications are emerging on mobile devices. System-on-Chip (SoC) offers high performance at a decreased size for these devices. SoC often integrates tens of cores and uses Network-on-Chip (NoC) as its communication infrastructure. To ensure high yield of manycore processors, core-level redundancy is often used as an effective approach to improve the reliability of manycore chips. However, when defective cores are replaced by redundant ones, the NoC topology changes. As a result, a fine-tuned application based on timing parameters given by one topology may not meet the expected timing behavior under the new one. To address this issue, we first define a metric that can measure the timing resemblance between different NoC topologies. Based on this metric, we develop a greedy algorithm to reconfigure a defect-tolerant manycore platform and form a unified application specific virtual topology on which the timing variations caused by the reconfiguration are minimized. Our simulation results clearly indicate the effectiveness of the developed algorithm.
计算密集型多媒体应用程序正在移动设备上出现。片上系统(SoC)以较小的尺寸为这些设备提供高性能。SoC通常集成数十个核心,并使用片上网络(NoC)作为其通信基础设施。为了保证多核处理器的高产率,核心级冗余常被用作提高多核芯片可靠性的有效手段。然而,当缺陷核被冗余核取代时,NoC拓扑结构发生了变化。因此,基于一种拓扑给出的时序参数进行微调的应用程序可能无法满足新拓扑下的预期时序行为。为了解决这个问题,我们首先定义一个度量,它可以度量不同NoC拓扑之间的时间相似性。在此基础上,我们开发了一种贪婪算法来重新配置一个容错多核平台,并形成了一个统一的特定于应用的虚拟拓扑,在该拓扑上重新配置引起的时间变化最小。仿真结果清楚地表明了该算法的有效性。
{"title":"A greedy approach to tolerate defect cores for multimedia applications","authors":"Kelvin K. Yue, Soumia Ghalim, Zheng Li, Frank Lockom, Shangping Ren, Lei Zhang, Xiaowei Li","doi":"10.1109/ESTIMedia.2011.6088517","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088517","url":null,"abstract":"Computation-intensive multimedia applications are emerging on mobile devices. System-on-Chip (SoC) offers high performance at a decreased size for these devices. SoC often integrates tens of cores and uses Network-on-Chip (NoC) as its communication infrastructure. To ensure high yield of manycore processors, core-level redundancy is often used as an effective approach to improve the reliability of manycore chips. However, when defective cores are replaced by redundant ones, the NoC topology changes. As a result, a fine-tuned application based on timing parameters given by one topology may not meet the expected timing behavior under the new one. To address this issue, we first define a metric that can measure the timing resemblance between different NoC topologies. Based on this metric, we develop a greedy algorithm to reconfigure a defect-tolerant manycore platform and form a unified application specific virtual topology on which the timing variations caused by the reconfiguration are minimized. Our simulation results clearly indicate the effectiveness of the developed algorithm.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131015868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Resource minimized static mapping and dynamic scheduling of SDF graphs 资源最小化的SDF图的静态映射和动态调度
Pub Date : 2011-12-01 DOI: 10.1109/ESTIMedia.2011.6088529
Jinwoo Kim, Tae-ho Shin, S. Ha, Hyunok Oh
In this paper, we focus on the throughput-constrained parallel execution of synchronous data flow graphs. This paper assumes static mapping and dynamic scheduling of nodes in contrast to the related work that assumes static scheduling. Since the scheduling order in dynamic scheduling is dependent on the priority assignment, three priority assignment methods are proposed and compared. If all task execution times do not vary at run-time, priority assignment is another way of storing a static schedule. We propose a static mapping technique to minimize the resource overhead considering both the processor cost and the total buffer size on all arcs under a given throughput constraint. Since the problem is NP-complete, a multi objective evolutionary algorithm is exploited to discover the mapping that minimizes the processor cost and the buffer requirement simultaneously. The experimental results show that the proposed technique requires fewer resources or higher average throughput than the previous approaches.
在本文中,我们关注同步数据流图的吞吐量约束并行执行。本文假设节点的静态映射和动态调度,而不是假设静态调度的相关工作。由于动态调度中的调度顺序依赖于优先级分配,提出并比较了三种优先级分配方法。如果所有任务执行时间在运行时不变,则优先级分配是存储静态调度的另一种方式。我们提出了一种静态映射技术,以最小化资源开销,同时考虑处理器成本和给定吞吐量约束下所有弧线上的总缓冲区大小。由于问题是np完全的,利用多目标进化算法寻找同时使处理器成本和缓冲区需求最小化的映射。实验结果表明,该方法比以往的方法需要更少的资源和更高的平均吞吐量。
{"title":"Resource minimized static mapping and dynamic scheduling of SDF graphs","authors":"Jinwoo Kim, Tae-ho Shin, S. Ha, Hyunok Oh","doi":"10.1109/ESTIMedia.2011.6088529","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2011.6088529","url":null,"abstract":"In this paper, we focus on the throughput-constrained parallel execution of synchronous data flow graphs. This paper assumes static mapping and dynamic scheduling of nodes in contrast to the related work that assumes static scheduling. Since the scheduling order in dynamic scheduling is dependent on the priority assignment, three priority assignment methods are proposed and compared. If all task execution times do not vary at run-time, priority assignment is another way of storing a static schedule. We propose a static mapping technique to minimize the resource overhead considering both the processor cost and the total buffer size on all arcs under a given throughput constraint. Since the problem is NP-complete, a multi objective evolutionary algorithm is exploited to discover the mapping that minimizes the processor cost and the buffer requirement simultaneously. The experimental results show that the proposed technique requires fewer resources or higher average throughput than the previous approaches.","PeriodicalId":180192,"journal":{"name":"2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128723812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1