首页 > 最新文献

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

英文 中文
Using VLIW softcore processors for image processing applications 使用VLIW软核处理器进行图像处理应用
J. Hoozemans, Stephan Wong, Z. Al-Ars
The ever-increasing complexity of advanced high-resolution image processing applications requires innovative solutions to ensure addressing this issue efficiently and cost effectively. This paper discusses the utilization of reconfigurable general-purpose softcore processors in image processing applications such that hardware resources are efficiently utilized and at the same time ensure high image processing performance for the targeted application. Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze (up to a factor of 3.2 times faster) on image processing applications.
先进的高分辨率图像处理应用程序的复杂性不断增加,需要创新的解决方案来确保高效和经济地解决这个问题。本文讨论了可重构通用软核处理器在图像处理应用中的应用,使硬件资源得到有效利用,同时保证了目标应用的高图像处理性能。结果表明,在图像处理应用中,rVEX软核处理器比行业标准的Xilinx MicroBlaze(快3.2倍)的性能要好得多。
{"title":"Using VLIW softcore processors for image processing applications","authors":"J. Hoozemans, Stephan Wong, Z. Al-Ars","doi":"10.1109/SAMOS.2015.7363691","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363691","url":null,"abstract":"The ever-increasing complexity of advanced high-resolution image processing applications requires innovative solutions to ensure addressing this issue efficiently and cost effectively. This paper discusses the utilization of reconfigurable general-purpose softcore processors in image processing applications such that hardware resources are efficiently utilized and at the same time ensure high image processing performance for the targeted application. Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze (up to a factor of 3.2 times faster) on image processing applications.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130149672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Towards self-adaptive MPSoC systems with adaptivity throttling 基于自适应节流的自适应MPSoC系统研究
W. Quan, A. Pimentel
Today's multi-processor system-on-chip (MPSoC) systems increasingly have to deal with dynamically changing application workload scenarios. To cope with such dynamic application behavior, these systems could dynamically adapt the mapping of application tasks onto the underlying system resources to improve the system's performance. However, such performance improvement comes at the cost of a system reconfiguration in which application tasks may have to be migrated between processors. This trade-off implies that reconfiguring the system is only beneficial when the performance gains outweight the re-configuration overhead. To address this problem for MPSoCs, this paper presents a scenario-based run-time resource management framework with the ability of adaptivity throttling that uses the history of application scenario execution behavior to predict the actual benefit of a system reconfiguration to allow for explicitly deciding (at runtime) whether or not to reconfigure. Experimental results reveal that our proposed approach substantially improves the system's efficiency as compared to MPSoCs that do not provide such intelligent reconfiguration control.
当今的多处理器片上系统(MPSoC)系统越来越需要处理动态变化的应用工作负载场景。为了处理这种动态应用程序行为,这些系统可以动态地调整应用程序任务到底层系统资源的映射,以提高系统的性能。然而,这种性能改进是以系统重新配置为代价的,其中应用程序任务可能必须在处理器之间迁移。这种权衡意味着,只有当性能增益超过重新配置开销时,重新配置系统才有好处。为了解决mpsoc的这个问题,本文提出了一个基于场景的运行时资源管理框架,该框架具有自适应调节的能力,它使用应用程序场景执行行为的历史来预测系统重新配置的实际好处,从而允许(在运行时)明确地决定是否重新配置。实验结果表明,与不提供这种智能重新配置控制的mpsoc相比,我们提出的方法大大提高了系统的效率。
{"title":"Towards self-adaptive MPSoC systems with adaptivity throttling","authors":"W. Quan, A. Pimentel","doi":"10.1109/SAMOS.2015.7363671","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363671","url":null,"abstract":"Today's multi-processor system-on-chip (MPSoC) systems increasingly have to deal with dynamically changing application workload scenarios. To cope with such dynamic application behavior, these systems could dynamically adapt the mapping of application tasks onto the underlying system resources to improve the system's performance. However, such performance improvement comes at the cost of a system reconfiguration in which application tasks may have to be migrated between processors. This trade-off implies that reconfiguring the system is only beneficial when the performance gains outweight the re-configuration overhead. To address this problem for MPSoCs, this paper presents a scenario-based run-time resource management framework with the ability of adaptivity throttling that uses the history of application scenario execution behavior to predict the actual benefit of a system reconfiguration to allow for explicitly deciding (at runtime) whether or not to reconfigure. Experimental results reveal that our proposed approach substantially improves the system's efficiency as compared to MPSoCs that do not provide such intelligent reconfiguration control.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122441610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The AXIOM project (Agile, eXtensible, fast I/O Module) AXIOM项目(敏捷、可扩展、快速I/O模块)
D. Theodoropoulos, D. Pnevmatikatos, C. Álvarez, E. Ayguadé, Javier Bueno, Antonio Filgueras, Daniel Jiménez-González, X. Martorell, N. Navarro, Carlos Segura, Carles Fernández, David Oro, J. Saeta, Paolo Gai, A. Rizzo, R. Giorgi
The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for the future Cyber-Physical Systems (CPSs). These systems are expected to react in real-time, provide enough computational power for the assigned tasks, consume the least possible energy for such task (energy efficiency), scale up through modularity, allow for an easy programmability across performance scaling, and exploit at best existing standards at minimal costs.
AXIOM项目(敏捷、可扩展、快速I/O模块)旨在为未来的网络物理系统(cps)研究新的软件/硬件架构。这些系统被期望实时响应,为分配的任务提供足够的计算能力,为这些任务消耗尽可能少的能量(能源效率),通过模块化扩展,允许跨性能扩展的简单可编程性,并以最小的成本利用现有的最佳标准。
{"title":"The AXIOM project (Agile, eXtensible, fast I/O Module)","authors":"D. Theodoropoulos, D. Pnevmatikatos, C. Álvarez, E. Ayguadé, Javier Bueno, Antonio Filgueras, Daniel Jiménez-González, X. Martorell, N. Navarro, Carlos Segura, Carles Fernández, David Oro, J. Saeta, Paolo Gai, A. Rizzo, R. Giorgi","doi":"10.1109/SAMOS.2015.7363684","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363684","url":null,"abstract":"The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for the future Cyber-Physical Systems (CPSs). These systems are expected to react in real-time, provide enough computational power for the assigned tasks, consume the least possible energy for such task (energy efficiency), scale up through modularity, allow for an easy programmability across performance scaling, and exploit at best existing standards at minimal costs.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134324801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Improving accuracy of source level timing simulation for GPUs using a probabilistic resource model 利用概率资源模型提高gpu源级时序仿真的精度
Christoph Gerum, W. Rosenstiel, O. Bringmann
After their success in the high performance and desktop market, Graphic Processing Units (GPUs), that can be used for general purpose computing are introduced for embedded systems on a chip (SOCs). Due to some advanced architectural features, like massive simultaneous multithreading, static performance analysis and high-level timing simulation are difficult to apply to code running on these systems. This paper extends a method for performance simulation of GPUs. The method uses automated performance annotations in the application's OpenCL C source code, and an extended performance model for derivation of a kernels runtime from metrics produced by the execution of annotated kernels. The final results are then generated using a probabilistic resource conflict model. The model reaches an accuracy of 90% on most test cases and delivers a higher average accuracy than previous methods.
在高性能和台式机市场取得成功后,用于通用计算的图形处理单元(gpu)被引入到芯片上的嵌入式系统(soc)中。由于一些高级的体系结构特性,如大规模同步多线程、静态性能分析和高级时序模拟,很难应用于在这些系统上运行的代码。本文扩展了一种gpu性能仿真方法。该方法在应用程序的OpenCL C源代码中使用自动性能注释,并使用扩展的性能模型,从执行注释的内核产生的度量中派生内核运行时。然后使用概率资源冲突模型生成最终结果。该模型在大多数测试用例上达到了90%的准确率,并且比以前的方法提供了更高的平均准确率。
{"title":"Improving accuracy of source level timing simulation for GPUs using a probabilistic resource model","authors":"Christoph Gerum, W. Rosenstiel, O. Bringmann","doi":"10.1109/SAMOS.2015.7363655","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363655","url":null,"abstract":"After their success in the high performance and desktop market, Graphic Processing Units (GPUs), that can be used for general purpose computing are introduced for embedded systems on a chip (SOCs). Due to some advanced architectural features, like massive simultaneous multithreading, static performance analysis and high-level timing simulation are difficult to apply to code running on these systems. This paper extends a method for performance simulation of GPUs. The method uses automated performance annotations in the application's OpenCL C source code, and an extended performance model for derivation of a kernels runtime from metrics produced by the execution of annotated kernels. The final results are then generated using a probabilistic resource conflict model. The model reaches an accuracy of 90% on most test cases and delivers a higher average accuracy than previous methods.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133046245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Framework for parameter analysis of FPGA-based image processing architectures 基于fpga的图像处理体系结构参数分析框架
M. Reichenbach, B. Pfundt, D. Fey
Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embedded systems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing time, if the FPGA is directly placed inside the image acquisition unit forming a smart camera, but also reduces or even eliminates the PC based hardware which saves space and power. However, most designers begin from scratch when they have to implement stencil computations into smart cameras. This leads to a not fully utilized FPGA because the most efficient usage of the given resources is only secondary alongside functional correctness. Therefore, we are presenting in this paper a framework for stencil code applications which immediately delivers the best architecture regarding prominent resource criteria. An analytical model is used to find an optimized parameter set (degree of parallelism, usage of buffers, etc.) for a highly flexible FPGA implementation. A graphical tool allows to further evaluate the effects of certain parameters. Our results show, that we are able to create an optimized hardware architecture for this application domain.
仅对局部邻域起作用的图像处理算法几乎应用于所有图像处理应用中。通常在一个固定的邻域上执行多次迭代,从而导致模板代码的描述。在嵌入式系统中,利用FPGA的大规模并行计算能力来实现这种算法是一种很有前途的方法。这不仅加快了处理时间,如果将FPGA直接放置在图像采集单元内部形成智能相机,而且还减少甚至消除了基于PC的硬件,节省了空间和功耗。然而,大多数设计师在智能相机中实现模板计算时都是从零开始的。这将导致FPGA没有得到充分利用,因为给定资源的最有效使用只是次要的,而不是功能正确性。因此,我们在本文中为模板代码应用程序提供了一个框架,它可以根据突出的资源标准立即提供最佳的体系结构。一个分析模型是用来找到一个优化的参数集(并行度,缓冲区的使用等)为一个高度灵活的FPGA实现。图形工具允许进一步评估某些参数的影响。我们的结果表明,我们能够为这个应用领域创建一个优化的硬件体系结构。
{"title":"Framework for parameter analysis of FPGA-based image processing architectures","authors":"M. Reichenbach, B. Pfundt, D. Fey","doi":"10.1109/SAMOS.2015.7363664","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363664","url":null,"abstract":"Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embedded systems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing time, if the FPGA is directly placed inside the image acquisition unit forming a smart camera, but also reduces or even eliminates the PC based hardware which saves space and power. However, most designers begin from scratch when they have to implement stencil computations into smart cameras. This leads to a not fully utilized FPGA because the most efficient usage of the given resources is only secondary alongside functional correctness. Therefore, we are presenting in this paper a framework for stencil code applications which immediately delivers the best architecture regarding prominent resource criteria. An analytical model is used to find an optimized parameter set (degree of parallelism, usage of buffers, etc.) for a highly flexible FPGA implementation. A graphical tool allows to further evaluate the effects of certain parameters. Our results show, that we are able to create an optimized hardware architecture for this application domain.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114998396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient distribution of Triggered Synchronous Block Diagrams on asynchronous platforms 异步平台上触发同步方框图的有效分布
Yang Yang, S. Tripakis, A. Sangiovanni-Vincentelli
As the complexity of embedded systems rapidly increases in terms of both scale and functionality, there has been a strong interest in design languages and methodologies that facilitate the use of formal methods. These languages and methodologies are mostly based on a synchronous paradigm that, while satisfies the need for formalization, often results in an inefficient implementation requiring substantial overhead when compared to approaches that do not enforce synchronicity on the execution platform. Therefore, the interest is high for techniques that on one hand, maintain the formal properties of synchronous models, and on the other hand, enable the use of asynchronous and distributed execution platforms with little overhead. In this paper, we propose an approach for efficient distribution of Triggered Synchronous Block Diagrams (SBDs) on asynchronous platforms while preserving the correct semantics. Compared to previous work that utilizes trigger elimination, our approach aims to reduce the unnecessary communication overhead and thus improve the efficiency of the implementation. We consider both general Triggered SBDs where the values of triggers are dynamically computed, as well as Timed SBDs where triggers are statically known and usually specified by (period, initial phase) pairs.
随着嵌入式系统在规模和功能方面的复杂性迅速增加,人们对便于使用形式化方法的设计语言和方法产生了浓厚的兴趣。这些语言和方法大多基于同步范型,虽然满足了形式化的需要,但与在执行平台上不强制同步的方法相比,通常会导致效率低下的实现,需要大量的开销。因此,对于一方面维护同步模型的形式属性,另一方面支持使用异步和分布式执行平台,并且开销很小的技术,人们的兴趣很高。在本文中,我们提出了一种在异步平台上有效分发触发同步框图(sbd)的方法,同时保持正确的语义。与以前使用触发器消除的工作相比,我们的方法旨在减少不必要的通信开销,从而提高实现效率。我们既考虑一般的触发sdd,其中触发器的值是动态计算的,也考虑定时sdd,其中触发器是静态已知的,通常由(周期,初始相位)对指定。
{"title":"Efficient distribution of Triggered Synchronous Block Diagrams on asynchronous platforms","authors":"Yang Yang, S. Tripakis, A. Sangiovanni-Vincentelli","doi":"10.1109/SAMOS.2015.7363666","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363666","url":null,"abstract":"As the complexity of embedded systems rapidly increases in terms of both scale and functionality, there has been a strong interest in design languages and methodologies that facilitate the use of formal methods. These languages and methodologies are mostly based on a synchronous paradigm that, while satisfies the need for formalization, often results in an inefficient implementation requiring substantial overhead when compared to approaches that do not enforce synchronicity on the execution platform. Therefore, the interest is high for techniques that on one hand, maintain the formal properties of synchronous models, and on the other hand, enable the use of asynchronous and distributed execution platforms with little overhead. In this paper, we propose an approach for efficient distribution of Triggered Synchronous Block Diagrams (SBDs) on asynchronous platforms while preserving the correct semantics. Compared to previous work that utilizes trigger elimination, our approach aims to reduce the unnecessary communication overhead and thus improve the efficiency of the implementation. We consider both general Triggered SBDs where the values of triggers are dynamically computed, as well as Timed SBDs where triggers are statically known and usually specified by (period, initial phase) pairs.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115741848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ESL power estimation using virtual platforms with black box processor models 使用带有黑盒处理器模型的虚拟平台进行ESL功率估计
Stefan Schürmans, Gereon Onnebrink, R. Leupers, G. Ascheid, Xiaotao Chen
Processor models for electronic system level (ESL) simulations are usually provided by their vendors as binary object code. Those binaries appear as black boxes, which do not allow to observe their internals. This prevents the application of most existing ESL power estimation methodologies. To remedy this situation, this work presents an estimation methodology for the case of black box models. The evaluation for the ARM Cortex-A9 processor shows that the proposed approach is able to achieve a high accuracy. In comparison to hardware power measurements obtained from the OMAP4460 chip on the PandaBoard, the ESL estimation error is below 5%.
电子系统级(ESL)仿真的处理器模型通常由其供应商以二进制目标代码的形式提供。这些二进制文件显示为黑盒,不允许观察它们的内部。这阻止了大多数现有ESL功率估计方法的应用。为了纠正这种情况,本工作提出了一种黑盒模型的估计方法。对ARM Cortex-A9处理器的测试表明,该方法能够达到较高的精度。与从PandaBoard上的OMAP4460芯片获得的硬件功耗测量结果相比,ESL估计误差低于5%。
{"title":"ESL power estimation using virtual platforms with black box processor models","authors":"Stefan Schürmans, Gereon Onnebrink, R. Leupers, G. Ascheid, Xiaotao Chen","doi":"10.1109/SAMOS.2015.7363698","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363698","url":null,"abstract":"Processor models for electronic system level (ESL) simulations are usually provided by their vendors as binary object code. Those binaries appear as black boxes, which do not allow to observe their internals. This prevents the application of most existing ESL power estimation methodologies. To remedy this situation, this work presents an estimation methodology for the case of black box models. The evaluation for the ARM Cortex-A9 processor shows that the proposed approach is able to achieve a high accuracy. In comparison to hardware power measurements obtained from the OMAP4460 chip on the PandaBoard, the ESL estimation error is below 5%.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Experiences in speeding up computer vision applications on mobile computing platforms 有在移动计算平台上加速计算机视觉应用的经验
Luna Backes, Alejandro Rico, Björn Franke
Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people's lives through augmented reality. One of the problems yet to solve is how to transfer demanding state-of-the-art CV algorithms -designed to run on powerful desktop computers with several GPUs- onto energy-efficient, but slow, processors and GPUs found in mobile devices. To accommodate to the lack of performance, current CV applications for mobile devices are simpler versions of more complex algorithms, which generally run slowly and unreliably and provide a poor user experience. In this paper, we investigate ways to speed up demanding CV applications to run faster on mobile devices. We selected KinectFusion (KF) as a representative CV application. The KF application constructs a 3D model from the images captured by a Kinect. After porting it to an ARM platform, we applied several optimisation and parallelisation techniques using OpenCL to exploit all the available computing resources. We evaluated the impact on performance and power and demonstrate a 4× speedup with just a 1.38× power increase. We also evaluated the performance portability of our optimisations by running on a different platform, and assessed similar improvements despite the different multi-core configuration and memory system. By measuring processor temperature, we found overheating to be the main limiting factor for running such high-performance codes on a mobile device not designed for full continuous utilisation.
人们普遍认为计算机视觉(CV)将成为移动计算领域的下一个重要技术。移动设备中摄像头和大量传感器的可用性将使CV应用程序能够理解环境,并通过增强现实改善人们的生活。尚待解决的问题之一是如何将要求最高的CV算法(设计用于在具有多个gpu的强大台式计算机上运行)转移到节能但速度较慢的移动设备处理器和gpu上。为了适应性能的不足,目前移动设备上的CV应用程序是更复杂算法的简单版本,通常运行缓慢且不可靠,并且提供较差的用户体验。在本文中,我们研究了如何加快要求高的CV应用程序在移动设备上的运行速度。我们选择了KinectFusion (KF)作为典型的CV应用程序。KF应用程序根据Kinect捕获的图像构建3D模型。在将其移植到ARM平台后,我们使用OpenCL应用了几种优化和并行化技术来利用所有可用的计算资源。我们评估了对性能和功率的影响,并演示了仅增加1.38倍的功率即可实现4倍的加速。我们还通过在不同的平台上运行来评估我们的优化的性能可移植性,并在不同的多核配置和内存系统下评估类似的改进。通过测量处理器温度,我们发现过热是在移动设备上运行这种高性能代码的主要限制因素,而移动设备不是为完全连续使用而设计的。
{"title":"Experiences in speeding up computer vision applications on mobile computing platforms","authors":"Luna Backes, Alejandro Rico, Björn Franke","doi":"10.1109/SAMOS.2015.7363653","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363653","url":null,"abstract":"Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people's lives through augmented reality. One of the problems yet to solve is how to transfer demanding state-of-the-art CV algorithms -designed to run on powerful desktop computers with several GPUs- onto energy-efficient, but slow, processors and GPUs found in mobile devices. To accommodate to the lack of performance, current CV applications for mobile devices are simpler versions of more complex algorithms, which generally run slowly and unreliably and provide a poor user experience. In this paper, we investigate ways to speed up demanding CV applications to run faster on mobile devices. We selected KinectFusion (KF) as a representative CV application. The KF application constructs a 3D model from the images captured by a Kinect. After porting it to an ARM platform, we applied several optimisation and parallelisation techniques using OpenCL to exploit all the available computing resources. We evaluated the impact on performance and power and demonstrate a 4× speedup with just a 1.38× power increase. We also evaluated the performance portability of our optimisations by running on a different platform, and assessed similar improvements despite the different multi-core configuration and memory system. By measuring processor temperature, we found overheating to be the main limiting factor for running such high-performance codes on a mobile device not designed for full continuous utilisation.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114124788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Platform-aware dynamic data type refinement methodology for radix tree Data Structures 基树数据结构的平台感知动态数据类型细化方法
Thomas Papastergiou, Lazaros Papadopoulos, D. Soudris
Modern embedded systems are now capable of executing complex and demanding applications that are often based on large data structures. The design of the critical data structures of the application affects the performance and the memory requirements of the whole system. Dynamic Data Structure Refinement methodology provides optimizations, mainly in list and array data structures, which are based on the application's features and access patterns. In this work, we extend various aspects of the methodology: First, we integrate radix tree optimizations. Then, we provide a set of platform-aware data structure implementations, for performing optimizations based on the hardware features. The extended methodology is evaluated using a wide set of synthetic and real-world benchmarks, in which we achieved performance and memory trade-offs up to 29.6%. Additionally, Pareto optimal data structure implementations that were not available by the previous methodology, are identified with the extended one.
现代嵌入式系统现在能够执行复杂且要求苛刻的应用程序,这些应用程序通常基于大型数据结构。应用程序关键数据结构的设计直接影响到整个系统的性能和内存需求。动态数据结构细化方法提供了基于应用程序的特性和访问模式的优化,主要针对列表和数组数据结构。在这项工作中,我们扩展了该方法的各个方面:首先,我们集成了基树优化。然后,我们提供了一组平台感知的数据结构实现,用于基于硬件特性执行优化。我们使用一系列广泛的综合基准和真实世界的基准来评估扩展的方法,在这些基准中,我们实现了高达29.6%的性能和内存折衷。此外,以前的方法无法实现的帕累托最优数据结构实现与扩展的方法一致。
{"title":"Platform-aware dynamic data type refinement methodology for radix tree Data Structures","authors":"Thomas Papastergiou, Lazaros Papadopoulos, D. Soudris","doi":"10.1109/SAMOS.2015.7363662","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363662","url":null,"abstract":"Modern embedded systems are now capable of executing complex and demanding applications that are often based on large data structures. The design of the critical data structures of the application affects the performance and the memory requirements of the whole system. Dynamic Data Structure Refinement methodology provides optimizations, mainly in list and array data structures, which are based on the application's features and access patterns. In this work, we extend various aspects of the methodology: First, we integrate radix tree optimizations. Then, we provide a set of platform-aware data structure implementations, for performing optimizations based on the hardware features. The extended methodology is evaluated using a wide set of synthetic and real-world benchmarks, in which we achieved performance and memory trade-offs up to 29.6%. Additionally, Pareto optimal data structure implementations that were not available by the previous methodology, are identified with the extended one.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121949410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HARPA: Solutions for dependable performance under physically induced performance variability HARPA:在物理诱发的性能变化下提供可靠性能的解决方案
D. Rodopoulos, S. Corbetta, G. Massari, Simone Libutti, F. Catthoor, Yiannakis Sazeides, C. Nicopoulos, A. Portero, Etienne Cappe, R. Vavrík, V. Vondrák, D. Soudris, Federico Sassi, A. Fritsch, W. Fornaciari
Transistor miniaturization, combined with the dawn of novel switching semiconductor structures, calls for careful examination of the variability and aging of the computer fabric. Time-zero and time-dependent phenomena need to be carefully considered so that the dependability of digital systems can be guaranteed. Already, architectures contain many mechanisms that detect and correct physically induced reliability violations. In many cases, guarantees on functional correctness come at a quantifiable performance cost. The current paper discusses the FP7-612069-HARPA project of the European Commission and its approach towards dependable performance. This project provides solutions for performance variability mitigation, under the run time presence of fabric variability/aging and built-in reliability, availability and serviceability (RAS) techniques. In this paper, we briefly present and discuss modeling and mitigation techniques developed within HARPA, covering many abstractions of digital system design: from the transistor to the application layer.
晶体管的小型化,再加上新型开关半导体结构的出现,要求我们仔细研究计算机结构的可变性和老化问题。为了保证数字系统的可靠性,需要仔细考虑时间零和时间相关现象。目前,体系结构已经包含了许多检测和纠正物理上引起的可靠性违规的机制。在许多情况下,对功能正确性的保证是以可量化的性能成本为代价的。本文讨论了欧盟委员会的FP7-612069-HARPA项目及其实现可靠性能的方法。在运行时存在结构可变性/老化和内置可靠性、可用性和可服务性(RAS)技术的情况下,该项目提供了缓解性能可变性的解决方案。在本文中,我们简要介绍和讨论了在HARPA中开发的建模和缓解技术,涵盖了数字系统设计的许多抽象:从晶体管到应用层。
{"title":"HARPA: Solutions for dependable performance under physically induced performance variability","authors":"D. Rodopoulos, S. Corbetta, G. Massari, Simone Libutti, F. Catthoor, Yiannakis Sazeides, C. Nicopoulos, A. Portero, Etienne Cappe, R. Vavrík, V. Vondrák, D. Soudris, Federico Sassi, A. Fritsch, W. Fornaciari","doi":"10.1109/SAMOS.2015.7363685","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363685","url":null,"abstract":"Transistor miniaturization, combined with the dawn of novel switching semiconductor structures, calls for careful examination of the variability and aging of the computer fabric. Time-zero and time-dependent phenomena need to be carefully considered so that the dependability of digital systems can be guaranteed. Already, architectures contain many mechanisms that detect and correct physically induced reliability violations. In many cases, guarantees on functional correctness come at a quantifiable performance cost. The current paper discusses the FP7-612069-HARPA project of the European Commission and its approach towards dependable performance. This project provides solutions for performance variability mitigation, under the run time presence of fabric variability/aging and built-in reliability, availability and serviceability (RAS) techniques. In this paper, we briefly present and discuss modeling and mitigation techniques developed within HARPA, covering many abstractions of digital system design: from the transistor to the application layer.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127931880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1