2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

英文中文

Using VLIW softcore processors for image processing applications 使用VLIW软核处理器进行图像处理应用

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363691

J. Hoozemans, Stephan Wong, Z. Al-Ars

The ever-increasing complexity of advanced high-resolution image processing applications requires innovative solutions to ensure addressing this issue efficiently and cost effectively. This paper discusses the utilization of reconfigurable general-purpose softcore processors in image processing applications such that hardware resources are efficiently utilized and at the same time ensure high image processing performance for the targeted application. Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze (up to a factor of 3.2 times faster) on image processing applications.

先进的高分辨率图像处理应用程序的复杂性不断增加，需要创新的解决方案来确保高效和经济地解决这个问题。本文讨论了可重构通用软核处理器在图像处理应用中的应用，使硬件资源得到有效利用，同时保证了目标应用的高图像处理性能。结果表明，在图像处理应用中，rVEX软核处理器比行业标准的Xilinx MicroBlaze(快3.2倍)的性能要好得多。

引用次数: 19

Towards self-adaptive MPSoC systems with adaptivity throttling 基于自适应节流的自适应MPSoC系统研究

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363671

W. Quan, A. Pimentel

Today's multi-processor system-on-chip (MPSoC) systems increasingly have to deal with dynamically changing application workload scenarios. To cope with such dynamic application behavior, these systems could dynamically adapt the mapping of application tasks onto the underlying system resources to improve the system's performance. However, such performance improvement comes at the cost of a system reconfiguration in which application tasks may have to be migrated between processors. This trade-off implies that reconfiguring the system is only beneficial when the performance gains outweight the re-configuration overhead. To address this problem for MPSoCs, this paper presents a scenario-based run-time resource management framework with the ability of adaptivity throttling that uses the history of application scenario execution behavior to predict the actual benefit of a system reconfiguration to allow for explicitly deciding (at runtime) whether or not to reconfigure. Experimental results reveal that our proposed approach substantially improves the system's efficiency as compared to MPSoCs that do not provide such intelligent reconfiguration control.

当今的多处理器片上系统(MPSoC)系统越来越需要处理动态变化的应用工作负载场景。为了处理这种动态应用程序行为，这些系统可以动态地调整应用程序任务到底层系统资源的映射，以提高系统的性能。然而，这种性能改进是以系统重新配置为代价的，其中应用程序任务可能必须在处理器之间迁移。这种权衡意味着，只有当性能增益超过重新配置开销时，重新配置系统才有好处。为了解决mpsoc的这个问题，本文提出了一个基于场景的运行时资源管理框架，该框架具有自适应调节的能力，它使用应用程序场景执行行为的历史来预测系统重新配置的实际好处，从而允许(在运行时)明确地决定是否重新配置。实验结果表明，与不提供这种智能重新配置控制的mpsoc相比，我们提出的方法大大提高了系统的效率。

{"title":"Towards self-adaptive MPSoC systems with adaptivity throttling","authors":"W. Quan, A. Pimentel","doi":"10.1109/SAMOS.2015.7363671","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363671","url":null,"abstract":"Today's multi-processor system-on-chip (MPSoC) systems increasingly have to deal with dynamically changing application workload scenarios. To cope with such dynamic application behavior, these systems could dynamically adapt the mapping of application tasks onto the underlying system resources to improve the system's performance. However, such performance improvement comes at the cost of a system reconfiguration in which application tasks may have to be migrated between processors. This trade-off implies that reconfiguring the system is only beneficial when the performance gains outweight the re-configuration overhead. To address this problem for MPSoCs, this paper presents a scenario-based run-time resource management framework with the ability of adaptivity throttling that uses the history of application scenario execution behavior to predict the actual benefit of a system reconfiguration to allow for explicitly deciding (at runtime) whether or not to reconfigure. Experimental results reveal that our proposed approach substantially improves the system's efficiency as compared to MPSoCs that do not provide such intelligent reconfiguration control.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122441610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The AXIOM project (Agile, eXtensible, fast I/O Module) AXIOM项目(敏捷、可扩展、快速I/O模块)

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363684

D. Theodoropoulos, D. Pnevmatikatos, C. Álvarez, E. Ayguadé, Javier Bueno, Antonio Filgueras, Daniel Jiménez-González, X. Martorell, N. Navarro, Carlos Segura, Carles Fernández, David Oro, J. Saeta, Paolo Gai, A. Rizzo, R. Giorgi

The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for the future Cyber-Physical Systems (CPSs). These systems are expected to react in real-time, provide enough computational power for the assigned tasks, consume the least possible energy for such task (energy efficiency), scale up through modularity, allow for an easy programmability across performance scaling, and exploit at best existing standards at minimal costs.

AXIOM项目(敏捷、可扩展、快速I/O模块)旨在为未来的网络物理系统(cps)研究新的软件/硬件架构。这些系统被期望实时响应，为分配的任务提供足够的计算能力，为这些任务消耗尽可能少的能量(能源效率)，通过模块化扩展，允许跨性能扩展的简单可编程性，并以最小的成本利用现有的最佳标准。

引用次数: 19

Improving accuracy of source level timing simulation for GPUs using a probabilistic resource model 利用概率资源模型提高gpu源级时序仿真的精度

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363655

Christoph Gerum, W. Rosenstiel, O. Bringmann

After their success in the high performance and desktop market, Graphic Processing Units (GPUs), that can be used for general purpose computing are introduced for embedded systems on a chip (SOCs). Due to some advanced architectural features, like massive simultaneous multithreading, static performance analysis and high-level timing simulation are difficult to apply to code running on these systems. This paper extends a method for performance simulation of GPUs. The method uses automated performance annotations in the application's OpenCL C source code, and an extended performance model for derivation of a kernels runtime from metrics produced by the execution of annotated kernels. The final results are then generated using a probabilistic resource conflict model. The model reaches an accuracy of 90% on most test cases and delivers a higher average accuracy than previous methods.

在高性能和台式机市场取得成功后，用于通用计算的图形处理单元(gpu)被引入到芯片上的嵌入式系统(soc)中。由于一些高级的体系结构特性，如大规模同步多线程、静态性能分析和高级时序模拟，很难应用于在这些系统上运行的代码。本文扩展了一种gpu性能仿真方法。该方法在应用程序的OpenCL C源代码中使用自动性能注释，并使用扩展的性能模型，从执行注释的内核产生的度量中派生内核运行时。然后使用概率资源冲突模型生成最终结果。该模型在大多数测试用例上达到了90%的准确率，并且比以前的方法提供了更高的平均准确率。

引用次数: 1

Efficient distribution of Triggered Synchronous Block Diagrams on asynchronous platforms 异步平台上触发同步方框图的有效分布

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363666

Yang Yang, S. Tripakis, A. Sangiovanni-Vincentelli

As the complexity of embedded systems rapidly increases in terms of both scale and functionality, there has been a strong interest in design languages and methodologies that facilitate the use of formal methods. These languages and methodologies are mostly based on a synchronous paradigm that, while satisfies the need for formalization, often results in an inefficient implementation requiring substantial overhead when compared to approaches that do not enforce synchronicity on the execution platform. Therefore, the interest is high for techniques that on one hand, maintain the formal properties of synchronous models, and on the other hand, enable the use of asynchronous and distributed execution platforms with little overhead. In this paper, we propose an approach for efficient distribution of Triggered Synchronous Block Diagrams (SBDs) on asynchronous platforms while preserving the correct semantics. Compared to previous work that utilizes trigger elimination, our approach aims to reduce the unnecessary communication overhead and thus improve the efficiency of the implementation. We consider both general Triggered SBDs where the values of triggers are dynamically computed, as well as Timed SBDs where triggers are statically known and usually specified by (period, initial phase) pairs.

随着嵌入式系统在规模和功能方面的复杂性迅速增加，人们对便于使用形式化方法的设计语言和方法产生了浓厚的兴趣。这些语言和方法大多基于同步范型，虽然满足了形式化的需要，但与在执行平台上不强制同步的方法相比，通常会导致效率低下的实现，需要大量的开销。因此，对于一方面维护同步模型的形式属性，另一方面支持使用异步和分布式执行平台，并且开销很小的技术，人们的兴趣很高。在本文中，我们提出了一种在异步平台上有效分发触发同步框图(sbd)的方法，同时保持正确的语义。与以前使用触发器消除的工作相比，我们的方法旨在减少不必要的通信开销，从而提高实现效率。我们既考虑一般的触发sdd，其中触发器的值是动态计算的，也考虑定时sdd，其中触发器是静态已知的，通常由(周期，初始相位)对指定。

{"title":"Efficient distribution of Triggered Synchronous Block Diagrams on asynchronous platforms","authors":"Yang Yang, S. Tripakis, A. Sangiovanni-Vincentelli","doi":"10.1109/SAMOS.2015.7363666","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363666","url":null,"abstract":"As the complexity of embedded systems rapidly increases in terms of both scale and functionality, there has been a strong interest in design languages and methodologies that facilitate the use of formal methods. These languages and methodologies are mostly based on a synchronous paradigm that, while satisfies the need for formalization, often results in an inefficient implementation requiring substantial overhead when compared to approaches that do not enforce synchronicity on the execution platform. Therefore, the interest is high for techniques that on one hand, maintain the formal properties of synchronous models, and on the other hand, enable the use of asynchronous and distributed execution platforms with little overhead. In this paper, we propose an approach for efficient distribution of Triggered Synchronous Block Diagrams (SBDs) on asynchronous platforms while preserving the correct semantics. Compared to previous work that utilizes trigger elimination, our approach aims to reduce the unnecessary communication overhead and thus improve the efficiency of the implementation. We consider both general Triggered SBDs where the values of triggers are dynamically computed, as well as Timed SBDs where triggers are statically known and usually specified by (period, initial phase) pairs.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115741848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Framework for parameter analysis of FPGA-based image processing architectures 基于fpga的图像处理体系结构参数分析框架

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363664

M. Reichenbach, B. Pfundt, D. Fey

Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embedded systems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing time, if the FPGA is directly placed inside the image acquisition unit forming a smart camera, but also reduces or even eliminates the PC based hardware which saves space and power. However, most designers begin from scratch when they have to implement stencil computations into smart cameras. This leads to a not fully utilized FPGA because the most efficient usage of the given resources is only secondary alongside functional correctness. Therefore, we are presenting in this paper a framework for stencil code applications which immediately delivers the best architecture regarding prominent resource criteria. An analytical model is used to find an optimized parameter set (degree of parallelism, usage of buffers, etc.) for a highly flexible FPGA implementation. A graphical tool allows to further evaluate the effects of certain parameters. Our results show, that we are able to create an optimized hardware architecture for this application domain.

仅对局部邻域起作用的图像处理算法几乎应用于所有图像处理应用中。通常在一个固定的邻域上执行多次迭代，从而导致模板代码的描述。在嵌入式系统中，利用FPGA的大规模并行计算能力来实现这种算法是一种很有前途的方法。这不仅加快了处理时间，如果将FPGA直接放置在图像采集单元内部形成智能相机，而且还减少甚至消除了基于PC的硬件，节省了空间和功耗。然而，大多数设计师在智能相机中实现模板计算时都是从零开始的。这将导致FPGA没有得到充分利用，因为给定资源的最有效使用只是次要的，而不是功能正确性。因此，我们在本文中为模板代码应用程序提供了一个框架，它可以根据突出的资源标准立即提供最佳的体系结构。一个分析模型是用来找到一个优化的参数集(并行度，缓冲区的使用等)为一个高度灵活的FPGA实现。图形工具允许进一步评估某些参数的影响。我们的结果表明，我们能够为这个应用领域创建一个优化的硬件体系结构。

{"title":"Framework for parameter analysis of FPGA-based image processing architectures","authors":"M. Reichenbach, B. Pfundt, D. Fey","doi":"10.1109/SAMOS.2015.7363664","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363664","url":null,"abstract":"Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embedded systems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing time, if the FPGA is directly placed inside the image acquisition unit forming a smart camera, but also reduces or even eliminates the PC based hardware which saves space and power. However, most designers begin from scratch when they have to implement stencil computations into smart cameras. This leads to a not fully utilized FPGA because the most efficient usage of the given resources is only secondary alongside functional correctness. Therefore, we are presenting in this paper a framework for stencil code applications which immediately delivers the best architecture regarding prominent resource criteria. An analytical model is used to find an optimized parameter set (degree of parallelism, usage of buffers, etc.) for a highly flexible FPGA implementation. A graphical tool allows to further evaluate the effects of certain parameters. Our results show, that we are able to create an optimized hardware architecture for this application domain.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114998396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ESL power estimation using virtual platforms with black box processor models 使用带有黑盒处理器模型的虚拟平台进行ESL功率估计

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363698

Stefan Schürmans, Gereon Onnebrink, R. Leupers, G. Ascheid, Xiaotao Chen

Processor models for electronic system level (ESL) simulations are usually provided by their vendors as binary object code. Those binaries appear as black boxes, which do not allow to observe their internals. This prevents the application of most existing ESL power estimation methodologies. To remedy this situation, this work presents an estimation methodology for the case of black box models. The evaluation for the ARM Cortex-A9 processor shows that the proposed approach is able to achieve a high accuracy. In comparison to hardware power measurements obtained from the OMAP4460 chip on the PandaBoard, the ESL estimation error is below 5%.

电子系统级(ESL)仿真的处理器模型通常由其供应商以二进制目标代码的形式提供。这些二进制文件显示为黑盒，不允许观察它们的内部。这阻止了大多数现有ESL功率估计方法的应用。为了纠正这种情况，本工作提出了一种黑盒模型的估计方法。对ARM Cortex-A9处理器的测试表明，该方法能够达到较高的精度。与从PandaBoard上的OMAP4460芯片获得的硬件功耗测量结果相比，ESL估计误差低于5%。

引用次数: 12

Parallelism extraction in embedded software for android devices android设备嵌入式软件的并行抽取

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363654

M. Aguilar, Juan Fernando Eusse Giraldo, Projjol Ray, R. Leupers, G. Ascheid, Weihua Sheng, Prashant Sharma

In the last years the presence of embedded devices in everyday life has grown exponentially. The market of these devices imposes conflicting requirements such as cost, performance and energy. The use of Multiprocessor Systems on Chip (MPSoCs) is a widely accepted solution to provide a trade-off between these demands. However, programming MPSoCs is still a cumbersome task. Several research efforts have addressed this challenge in two complementary directions: paradigms for parallel programming and tools for parallelism extraction. However, most of these efforts are focused on the high performance domain and they do not consider the characteristics of the underlying platform. In this paper, we present an approach to extract multiple forms of parallelism from sequential C code, which is applied to widespread Android mobile devices. We show the effectiveness of our work by parallelizing relevant embedded benchmarks on a quad-core Nexus 7 tablet.

在过去的几年里，嵌入式设备在日常生活中的出现呈指数级增长。这些设备的市场在成本、性能和能源等方面提出了相互矛盾的要求。使用多处理器片上系统(mpsoc)是一种广泛接受的解决方案，可以在这些需求之间提供折衷。然而，编程mpsoc仍然是一项繁琐的任务。一些研究工作已经从两个互补的方向解决了这一挑战:并行编程范例和并行抽取工具。然而，这些工作大多集中在高性能领域，而没有考虑底层平台的特征。在本文中，我们提出了一种从顺序C代码中提取多种形式的并行性的方法，该方法应用于广泛的Android移动设备。我们通过在四核Nexus 7平板电脑上并行化相关嵌入式基准测试来展示我们工作的有效性。

引用次数: 9

Experiences in speeding up computer vision applications on mobile computing platforms 有在移动计算平台上加速计算机视觉应用的经验

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363653

Luna Backes, Alejandro Rico, Björn Franke

Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people's lives through augmented reality. One of the problems yet to solve is how to transfer demanding state-of-the-art CV algorithms -designed to run on powerful desktop computers with several GPUs- onto energy-efficient, but slow, processors and GPUs found in mobile devices. To accommodate to the lack of performance, current CV applications for mobile devices are simpler versions of more complex algorithms, which generally run slowly and unreliably and provide a poor user experience. In this paper, we investigate ways to speed up demanding CV applications to run faster on mobile devices. We selected KinectFusion (KF) as a representative CV application. The KF application constructs a 3D model from the images captured by a Kinect. After porting it to an ARM platform, we applied several optimisation and parallelisation techniques using OpenCL to exploit all the available computing resources. We evaluated the impact on performance and power and demonstrate a 4× speedup with just a 1.38× power increase. We also evaluated the performance portability of our optimisations by running on a different platform, and assessed similar improvements despite the different multi-core configuration and memory system. By measuring processor temperature, we found overheating to be the main limiting factor for running such high-performance codes on a mobile device not designed for full continuous utilisation.

人们普遍认为计算机视觉(CV)将成为移动计算领域的下一个重要技术。移动设备中摄像头和大量传感器的可用性将使CV应用程序能够理解环境，并通过增强现实改善人们的生活。尚待解决的问题之一是如何将要求最高的CV算法(设计用于在具有多个gpu的强大台式计算机上运行)转移到节能但速度较慢的移动设备处理器和gpu上。为了适应性能的不足，目前移动设备上的CV应用程序是更复杂算法的简单版本，通常运行缓慢且不可靠，并且提供较差的用户体验。在本文中，我们研究了如何加快要求高的CV应用程序在移动设备上的运行速度。我们选择了KinectFusion (KF)作为典型的CV应用程序。KF应用程序根据Kinect捕获的图像构建3D模型。在将其移植到ARM平台后，我们使用OpenCL应用了几种优化和并行化技术来利用所有可用的计算资源。我们评估了对性能和功率的影响，并演示了仅增加1.38倍的功率即可实现4倍的加速。我们还通过在不同的平台上运行来评估我们的优化的性能可移植性，并在不同的多核配置和内存系统下评估类似的改进。通过测量处理器温度，我们发现过热是在移动设备上运行这种高性能代码的主要限制因素，而移动设备不是为完全连续使用而设计的。

{"title":"Experiences in speeding up computer vision applications on mobile computing platforms","authors":"Luna Backes, Alejandro Rico, Björn Franke","doi":"10.1109/SAMOS.2015.7363653","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363653","url":null,"abstract":"Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people's lives through augmented reality. One of the problems yet to solve is how to transfer demanding state-of-the-art CV algorithms -designed to run on powerful desktop computers with several GPUs- onto energy-efficient, but slow, processors and GPUs found in mobile devices. To accommodate to the lack of performance, current CV applications for mobile devices are simpler versions of more complex algorithms, which generally run slowly and unreliably and provide a poor user experience. In this paper, we investigate ways to speed up demanding CV applications to run faster on mobile devices. We selected KinectFusion (KF) as a representative CV application. The KF application constructs a 3D model from the images captured by a Kinect. After porting it to an ARM platform, we applied several optimisation and parallelisation techniques using OpenCL to exploit all the available computing resources. We evaluated the impact on performance and power and demonstrate a 4× speedup with just a 1.38× power increase. We also evaluated the performance portability of our optimisations by running on a different platform, and assessed similar improvements despite the different multi-core configuration and memory system. By measuring processor temperature, we found overheating to be the main limiting factor for running such high-performance codes on a mobile device not designed for full continuous utilisation.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114124788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Platform-aware dynamic data type refinement methodology for radix tree Data Structures 基树数据结构的平台感知动态数据类型细化方法

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Pub Date : 2015-07-19 DOI: 10.1109/SAMOS.2015.7363662

Thomas Papastergiou, Lazaros Papadopoulos, D. Soudris

Modern embedded systems are now capable of executing complex and demanding applications that are often based on large data structures. The design of the critical data structures of the application affects the performance and the memory requirements of the whole system. Dynamic Data Structure Refinement methodology provides optimizations, mainly in list and array data structures, which are based on the application's features and access patterns. In this work, we extend various aspects of the methodology: First, we integrate radix tree optimizations. Then, we provide a set of platform-aware data structure implementations, for performing optimizations based on the hardware features. The extended methodology is evaluated using a wide set of synthetic and real-world benchmarks, in which we achieved performance and memory trade-offs up to 29.6%. Additionally, Pareto optimal data structure implementations that were not available by the previous methodology, are identified with the extended one.

现代嵌入式系统现在能够执行复杂且要求苛刻的应用程序，这些应用程序通常基于大型数据结构。应用程序关键数据结构的设计直接影响到整个系统的性能和内存需求。动态数据结构细化方法提供了基于应用程序的特性和访问模式的优化，主要针对列表和数组数据结构。在这项工作中，我们扩展了该方法的各个方面:首先，我们集成了基树优化。然后，我们提供了一组平台感知的数据结构实现，用于基于硬件特性执行优化。我们使用一系列广泛的综合基准和真实世界的基准来评估扩展的方法，在这些基准中，我们实现了高达29.6%的性能和内存折衷。此外，以前的方法无法实现的帕累托最优数据结构实现与扩展的方法一致。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀