Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich
{"title":"Code generation for embedded heterogeneous architectures on android","authors":"Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.7873/DATE.2014.099","DOIUrl":null,"url":null,"abstract":"The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7873/DATE.2014.099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33
Abstract
The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.
Android的成功基于其统一的Java编程模型,该模型允许为各种不同的目标平台编写独立于平台的程序。然而,这是以性能为代价的。因此,Google引入了api,允许编写本机应用程序,并利用多核以及用于计算密集型部件的嵌入式gpu。本文提出了针对Renderscript和Filterscript api的代码生成技术。Renderscript利用多核cpu和统一的着色器gpu,而更受限制的Filterscript也支持早期着色器模型的gpu。我们的技术专注于图像处理应用程序,并允许从一个共同的描述中针对这些api和OpenCL。我们进一步通过在HSA平台上的不同处理元素之间共享相同的内存区域来取代内存传输。作为参考,我们使用嵌入式平台承载多核ARM CPU和ARM Mali GPU。我们表明,我们生成的源代码比OpenCV中的本机实现更快,也比Google提供的用于在嵌入式GPU上加速的预实现脚本内在特性更快。