首页 > 最新文献

2012 International Conference on Embedded Computer Systems (SAMOS)最新文献

英文 中文
Energy efficient stream-based configurable architecture for embedded platforms 嵌入式平台的高能效流可配置架构
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404174
F. Pratas, P. Tomás, P. Trancoso, L. Sousa
Reconfigurable hardware can be used as an energy and performance efficient co-processing solution to accelerate certain types of applications. To facilitate the design of hardware accelerators we have proposed a methodology that adopts the stream-based computing model and the usage of Graphics Processing Units as prototyping platforms. In this paper we go a step further and propose a new modular architecture for low-power reconfigurable systems to easily map the stream-based algorithms. In particular, the architecture consists of a semi-programable accelerator set that can be adapted to the application needs in terms of functional units and number of streaming engines. The proposed embedded architecture mates the flexibility of reconfigurable hardware with the advantages of stream computing for the strict needs of embedded reconfigurable devices. We show a possible organization for this architecture. Moreover, we provide a general case study to analyze the scalability of the proposed architecture in an Altera FPGA. Our experimental results show that a significant speed-up can be achieved compared to general purpose processors using low-power FPGA devices. Our preliminary estimates show that it is also possible to achieve energy savings of up to 118x.
可重构硬件可以作为一种节能高效的协同处理解决方案来加速某些类型的应用程序。为了方便硬件加速器的设计,我们提出了一种采用基于流的计算模型和使用图形处理单元作为原型平台的方法。在本文中,我们进一步提出了一种新的模块化架构,用于低功耗可重构系统,以方便地映射基于流的算法。特别是,该体系结构包含一个半可编程的加速器集,可以根据应用程序在功能单元和流引擎数量方面的需求进行调整。所提出的嵌入式架构将可重构硬件的灵活性与流计算的优势相结合,以满足嵌入式可重构设备的严格要求。我们展示了这种体系结构的一种可能的组织。此外,我们提供了一个一般的案例研究来分析所提出的架构在Altera FPGA中的可扩展性。我们的实验结果表明,与使用低功耗FPGA器件的通用处理器相比,可以实现显着的加速。我们的初步估计表明,它也有可能实现高达118倍的能源节约。
{"title":"Energy efficient stream-based configurable architecture for embedded platforms","authors":"F. Pratas, P. Tomás, P. Trancoso, L. Sousa","doi":"10.1109/SAMOS.2012.6404174","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404174","url":null,"abstract":"Reconfigurable hardware can be used as an energy and performance efficient co-processing solution to accelerate certain types of applications. To facilitate the design of hardware accelerators we have proposed a methodology that adopts the stream-based computing model and the usage of Graphics Processing Units as prototyping platforms. In this paper we go a step further and propose a new modular architecture for low-power reconfigurable systems to easily map the stream-based algorithms. In particular, the architecture consists of a semi-programable accelerator set that can be adapted to the application needs in terms of functional units and number of streaming engines. The proposed embedded architecture mates the flexibility of reconfigurable hardware with the advantages of stream computing for the strict needs of embedded reconfigurable devices. We show a possible organization for this architecture. Moreover, we provide a general case study to analyze the scalability of the proposed architecture in an Altera FPGA. Our experimental results show that a significant speed-up can be achieved compared to general purpose processors using low-power FPGA devices. Our preliminary estimates show that it is also possible to achieve energy savings of up to 118x.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"384 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116522241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
System modeling and multicore simulation using transactions 使用事务的系统建模和多核仿真
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404156
Amine Anane, E. Aboulhamid, Y. Savaria
With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become a necessity. This paper shows how, by using the powerful mechanism of transactions as a concurrency model, and by taking advantage of .NET introspection and attribute programming capabilities, we were able to develop a system-level modeling and parallel simulation environment. We kept the same concepts to describe the architecture of high-level models, such as modules and communication channels. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. We implemented scheduling algorithms in order to enable simulating a transactional models in parallel by taking advantage of a multicore machine. These algorithms take into account the dependency between transactions and the number of cores of the simulation machine. We studied two synchronisation strategies: one using locking and the other using partitioning. An experiment made on a WiFi 802.11a transmitter achieved a speedup of about 1.9 using two threads. With 8 threads, although the workload of individual transactions was not significant, we could reach a 5.1 speedup. When the workload is significant the speedup can reach 6.3.
随着数字系统越来越复杂,越来越并行,一个更好的抽象描述系统已经成为一种必要。本文展示了如何使用强大的事务机制作为并发模型,并利用。net自省和属性编程功能,开发系统级建模和并行仿真环境。我们保留了相同的概念来描述高级模型的体系结构,例如模块和通信通道。然而,与SystemC不同的是,行为不再被描述为过程和事件,而是作为事务。我们实现了调度算法,以便利用多核机器并行模拟事务模型。这些算法考虑了事务之间的依赖关系和模拟机的核数。我们研究了两种同步策略:一种使用锁定,另一种使用分区。在WiFi 802.11a发射器上进行的一项实验使用两个线程实现了大约1.9的加速。对于8个线程,尽管单个事务的工作负载并不大,但我们可以达到5.1的加速。当工作负载很大时,加速可以达到6.3。
{"title":"System modeling and multicore simulation using transactions","authors":"Amine Anane, E. Aboulhamid, Y. Savaria","doi":"10.1109/SAMOS.2012.6404156","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404156","url":null,"abstract":"With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become a necessity. This paper shows how, by using the powerful mechanism of transactions as a concurrency model, and by taking advantage of .NET introspection and attribute programming capabilities, we were able to develop a system-level modeling and parallel simulation environment. We kept the same concepts to describe the architecture of high-level models, such as modules and communication channels. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. We implemented scheduling algorithms in order to enable simulating a transactional models in parallel by taking advantage of a multicore machine. These algorithms take into account the dependency between transactions and the number of cores of the simulation machine. We studied two synchronisation strategies: one using locking and the other using partitioning. An experiment made on a WiFi 802.11a transmitter achieved a speedup of about 1.9 using two threads. With 8 threads, although the workload of individual transactions was not significant, we could reach a 5.1 speedup. When the workload is significant the speedup can reach 6.3.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122643576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Interleaving methods for hybrid system-level MPSoC design space exploration 混合系统级MPSoC设计空间探索的交错方法
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404152
R. Piscitelli, A. Pimentel
System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of memories, or the mapping of application tasks to architectural resources, are considered. Simulation-based DSE, in which different design instances are evaluated using system-level simulations, typically are computationally costly. Even using high-level simulations and efficient exploration algorithms, the simulation time to evaluate design points forms a real bottleneck in such DSE. Therefore, the vast design space that needs to be searched requires effective design space pruning techniques. This paper presents and studies different strategies for interleaving fast but less accurate analytical performance estimations with slower but more accurate simulations during DSE. By interleaving these analytical estimations with simulations, our hybrid approach significantly reduces the number of simulations that are needed during the process of DSE. Experimental results have demonstrated that such hybrid DSE is a promising technique that can yield solutions of similar quality as compared to simulation-based DSE but only at a fraction of the execution time.
系统级设计空间探索(system -level design space exploration, DSE)在设计过程的早期进行,对于复杂的多处理器嵌入式系统架构的设计非常重要。在系统级DSE期间,会考虑系统参数,例如处理器的数量和类型、内存的类型和大小,或者应用程序任务到体系结构资源的映射。基于仿真的DSE,使用系统级仿真来评估不同的设计实例,通常计算成本很高。即使使用高水平的仿真和高效的探索算法,评估设计点的仿真时间也成为这种DSE的真正瓶颈。因此,需要搜索的巨大设计空间需要有效的设计空间修剪技术。本文提出并研究了在DSE过程中,将快速但不太准确的分析性能估计与缓慢但更准确的仿真相结合的不同策略。通过将这些分析估计与模拟交叉,我们的混合方法显着减少了DSE过程中所需的模拟次数。实验结果表明,这种混合DSE是一种很有前途的技术,与基于仿真的DSE相比,它可以产生类似质量的解决方案,但只需要一小部分执行时间。
{"title":"Interleaving methods for hybrid system-level MPSoC design space exploration","authors":"R. Piscitelli, A. Pimentel","doi":"10.1109/SAMOS.2012.6404152","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404152","url":null,"abstract":"System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of memories, or the mapping of application tasks to architectural resources, are considered. Simulation-based DSE, in which different design instances are evaluated using system-level simulations, typically are computationally costly. Even using high-level simulations and efficient exploration algorithms, the simulation time to evaluate design points forms a real bottleneck in such DSE. Therefore, the vast design space that needs to be searched requires effective design space pruning techniques. This paper presents and studies different strategies for interleaving fast but less accurate analytical performance estimations with slower but more accurate simulations during DSE. By interleaving these analytical estimations with simulations, our hybrid approach significantly reduces the number of simulations that are needed during the process of DSE. Experimental results have demonstrated that such hybrid DSE is a promising technique that can yield solutions of similar quality as compared to simulation-based DSE but only at a fraction of the execution time.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive reinforcement learning method for networks-on-chip 片上网络的自适应强化学习方法
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404180
F. Farahnakian, M. Ebrahimi, M. Daneshtalab, J. Plosila, P. Liljeberg
In this paper, we propose a congestion-aware routing algorithm based on Dual Reinforcement Q-routing. In this method, local and global congestion information of the network is provided for each router, utilizing learning packets. This information should be dynamically updated according to the changing traffic conditions in the network. For this purpose, a congestion detection method is presented to measure the average of free buffer slots in a specific time interval. This value is compared with maximum and minimum threshold values and based on the comparison result, the learning rate is updated. If the learning rate is a large value, it means the network gets congested and global information is more emphasized than local information. In contrast, local information is more important than global when a router receives few packets in a time interval. Experimental results for different traffic patterns and network loads show that the proposed method improves the network performance compared with the standard Q-routing, DRQ-routing, and Dynamic XY-routing algorithms.
本文提出了一种基于双增强q路由的拥塞感知路由算法。该方法利用学习包为每台路由器提供网络的本地和全局拥塞信息。该信息应根据网络中不断变化的流量情况动态更新。为此,提出了一种拥塞检测方法来测量在特定时间间隔内空闲缓冲槽的平均值。将该值与最大和最小阈值进行比较,根据比较结果更新学习率。学习率较大,说明网络拥塞,全局信息比局部信息更受重视。相反,当路由器在一段时间间隔内接收到很少的数据包时,本地信息比全局信息更重要。在不同流量模式和网络负载下的实验结果表明,与标准q -路由、drq -路由和动态xy -路由算法相比,该方法提高了网络性能。
{"title":"Adaptive reinforcement learning method for networks-on-chip","authors":"F. Farahnakian, M. Ebrahimi, M. Daneshtalab, J. Plosila, P. Liljeberg","doi":"10.1109/SAMOS.2012.6404180","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404180","url":null,"abstract":"In this paper, we propose a congestion-aware routing algorithm based on Dual Reinforcement Q-routing. In this method, local and global congestion information of the network is provided for each router, utilizing learning packets. This information should be dynamically updated according to the changing traffic conditions in the network. For this purpose, a congestion detection method is presented to measure the average of free buffer slots in a specific time interval. This value is compared with maximum and minimum threshold values and based on the comparison result, the learning rate is updated. If the learning rate is a large value, it means the network gets congested and global information is more emphasized than local information. In contrast, local information is more important than global when a router receives few packets in a time interval. Experimental results for different traffic patterns and network loads show that the proposed method improves the network performance compared with the standard Q-routing, DRQ-routing, and Dynamic XY-routing algorithms.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116923735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
BEE technology overview BEE技术概述
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404186
Joseph Rothman, Chen Chang
This presentation will focus on a technology overview of the BEE4 and miniBEE FPGA based reconfigurable platforms. BEEcube supplies advanced system level FPGA prototyping platforms, targeting a wide range of uses including: multi-core computer architecture, wireless communications, 100Gbps+ networking solutions, HD video processing, signal intelligence, radar/sonar array, and High Performance Computing (HPC) needs. This overview will review features, capabilities, unique technology and uses of BEE platforms on both, its state of the art Virtex 6 based multi-array FPGA BEE4™ system, and introduce the first Research in a Box solution, the miniBEE™. miniBEE offers a combination of the latest FPGA, multicore CPU, high-speed networking technology all tightly coupled in one integrated cost effective solution targeting the research and lab community. This flexible system replaces the need for disjointed FPGA boards, PCs, networking devices, and test equipment. The presentation will describe how both algorithm oriented researchers as well as seasoned FPGA experts can utilize BEE technology to achieve their proof of concept or application level prototyping goals based on real time and real world data or conditions. Unique BEE technologies covered include its' symmetrical Honeycomb Architecture, Full Speed Sting I/O interface, Application Control and Debugging Nectar OS, and the BEEcube Platform Studio software environment. The presentation plans to include BEE technology in action, for real-time imaging manipulation or as a flexible testing platform, an Arbitrary Waveform Generation example.
本报告将重点介绍基于BEE4和miniBEE FPGA的可重构平台的技术概述。BEEcube提供先进的系统级FPGA原型平台,针对广泛的用途,包括:多核计算机架构,无线通信,100Gbps+网络解决方案,高清视频处理,信号情报,雷达/声纳阵列和高性能计算(HPC)需求。本综述将回顾两种BEE平台的特性、功能、独特技术和用途,以及基于Virtex 6的多阵列FPGA BEE4™系统,并介绍第一个研究盒解决方案miniBEE™。miniBEE提供了最新的FPGA,多核CPU,高速网络技术的组合,所有这些技术都紧密耦合在一个针对研究和实验室社区的集成成本效益解决方案中。这种灵活的系统取代了对分离的FPGA板、pc机、网络设备和测试设备的需求。该演讲将描述算法导向的研究人员以及经验丰富的FPGA专家如何利用BEE技术来实现基于实时和真实世界数据或条件的概念验证或应用级原型目标。独特的BEE技术包括其对称蜂巢结构、全速Sting I/O接口、应用控制和调试Nectar操作系统以及BEEcube平台工作室软件环境。演示计划包括BEE技术的实际应用,用于实时成像操作或作为灵活的测试平台,任意波形生成示例。
{"title":"BEE technology overview","authors":"Joseph Rothman, Chen Chang","doi":"10.1109/SAMOS.2012.6404186","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404186","url":null,"abstract":"This presentation will focus on a technology overview of the BEE4 and miniBEE FPGA based reconfigurable platforms. BEEcube supplies advanced system level FPGA prototyping platforms, targeting a wide range of uses including: multi-core computer architecture, wireless communications, 100Gbps+ networking solutions, HD video processing, signal intelligence, radar/sonar array, and High Performance Computing (HPC) needs. This overview will review features, capabilities, unique technology and uses of BEE platforms on both, its state of the art Virtex 6 based multi-array FPGA BEE4™ system, and introduce the first Research in a Box solution, the miniBEE™. miniBEE offers a combination of the latest FPGA, multicore CPU, high-speed networking technology all tightly coupled in one integrated cost effective solution targeting the research and lab community. This flexible system replaces the need for disjointed FPGA boards, PCs, networking devices, and test equipment. The presentation will describe how both algorithm oriented researchers as well as seasoned FPGA experts can utilize BEE technology to achieve their proof of concept or application level prototyping goals based on real time and real world data or conditions. Unique BEE technologies covered include its' symmetrical Honeycomb Architecture, Full Speed Sting I/O interface, Application Control and Debugging Nectar OS, and the BEEcube Platform Studio software environment. The presentation plans to include BEE technology in action, for real-time imaging manipulation or as a flexible testing platform, an Arbitrary Waveform Generation example.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Virtual prototyping for efficient multi-core ECU development of driver assistance systems 驾驶员辅助系统中高效多核ECU开发的虚拟样机
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404155
Rainer Kiesel, M. Streubühr, C. Haubelt, A. Terzis, J. Teich
In recent years, road vehicles have experienced an enormous increase in driver assistance systems such as traffic sign recognition, lane departure warning, and pedestrian detection. Cost-efficient development of electronic control units (ECUs) for these systems is a complex challenge. The demand for shortened time to market makes the development even more challenging and thus demands efficient design flows. This paper proposes a model-based design flow that permits simulation-based performance evaluation of multi-core ECUs for driver assistance systems in a pre-development stage. The approach is based on a system-level virtual prototype of a multi-core ECU and allows the evaluation of timing effects by mapping application tasks to different platforms. The results show that performance estimation of different parallel implementation candidates is possible with high accuracy even in a pre-development stage. By adapting the best-fitting parallelization strategy to the final ECU, a reduction in the time to market period is possible. Currently, the design flow is being evaluated by Daimler AG and is being applied to a pedestrian detection system. Results from this application illustrate the benefits of the proposed approach.
近年来,道路车辆的驾驶辅助系统如交通标志识别、车道偏离警告、行人检测等有了巨大的增长。为这些系统开发具有成本效益的电子控制单元(ecu)是一个复杂的挑战。缩短上市时间的需求使得开发更具挑战性,因此需要高效的设计流程。本文提出了一种基于模型的设计流程,允许在预开发阶段对驾驶员辅助系统的多核ecu进行基于仿真的性能评估。该方法基于多核ECU的系统级虚拟原型,并允许通过将应用程序任务映射到不同平台来评估时序效果。结果表明,即使在预开发阶段,也可以对不同的并行候选实现进行高精度的性能估计。通过对最终ECU采用最合适的并行化策略,可以缩短产品上市时间。目前,戴姆勒公司正在对设计流程进行评估,并将其应用于行人检测系统。这个应用程序的结果说明了所提出的方法的好处。
{"title":"Virtual prototyping for efficient multi-core ECU development of driver assistance systems","authors":"Rainer Kiesel, M. Streubühr, C. Haubelt, A. Terzis, J. Teich","doi":"10.1109/SAMOS.2012.6404155","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404155","url":null,"abstract":"In recent years, road vehicles have experienced an enormous increase in driver assistance systems such as traffic sign recognition, lane departure warning, and pedestrian detection. Cost-efficient development of electronic control units (ECUs) for these systems is a complex challenge. The demand for shortened time to market makes the development even more challenging and thus demands efficient design flows. This paper proposes a model-based design flow that permits simulation-based performance evaluation of multi-core ECUs for driver assistance systems in a pre-development stage. The approach is based on a system-level virtual prototype of a multi-core ECU and allows the evaluation of timing effects by mapping application tasks to different platforms. The results show that performance estimation of different parallel implementation candidates is possible with high accuracy even in a pre-development stage. By adapting the best-fitting parallelization strategy to the final ECU, a reduction in the time to market period is possible. Currently, the design flow is being evaluated by Daimler AG and is being applied to a pedestrian detection system. Results from this application illustrate the benefits of the proposed approach.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115948045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-driven robot-software design using integrated models and co-simulation 基于集成模型和协同仿真的模型驱动机器人软件设计
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404197
J. Broenink, Yunyun Ni
The work presented here is on a methodology for design of hard real-time embedded control software for robots, i.e. mechatronic products. The behavior of the total robot system (machine, control, software and I/O) is relevant, because the dynamics of the machine influences the robot software. Therefore, we use two appropriate Models of Computation, which represent continuous-time equations for the machine / robot part, and discrete event / discrete time equations for the control software part.
这里介绍的工作是关于机器人的硬实时嵌入式控制软件的设计方法,即机电产品。整个机器人系统(机器、控制、软件和I/O)的行为是相关的,因为机器的动力学影响机器人软件。因此,我们使用了两种合适的计算模型,分别表示机器/机器人部分的连续时间方程和控制软件部分的离散事件/离散时间方程。
{"title":"Model-driven robot-software design using integrated models and co-simulation","authors":"J. Broenink, Yunyun Ni","doi":"10.1109/SAMOS.2012.6404197","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404197","url":null,"abstract":"The work presented here is on a methodology for design of hard real-time embedded control software for robots, i.e. mechatronic products. The behavior of the total robot system (machine, control, software and I/O) is relevant, because the dynamics of the machine influences the robot software. Therefore, we use two appropriate Models of Computation, which represent continuous-time equations for the machine / robot part, and discrete event / discrete time equations for the control software part.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121269411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Out-Of-order execution of synchronous data-flow networks 同步数据流网络的乱序执行
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404171
D. Baudisch, J. Brandt, K. Schneider
Data flow process networks (DPNs) have been introduced as a convenient model of computation for distributed and asynchronous systems since each process node can work independently of the other nodes, i. e. without the need of a global coordination. Synchronous and cyclo-static data flow process networks even allow to derive at compile-time efficient static schedules that allow one to run these systems with an efficient use of available resources, e. g. in embedded systems. Single process nodes of DPNs are stream-based computing devices that transform input streams to uniquely defined corresponding output streams such that single values of the output streams are computed as soon as sufficient input values are available. In this sense, they are related to the execution of an instruction stream by a conventional microprocessor. In this paper, we show how out-of-order execution that has been introduced for the efficient use of multiple functional units in microprocessors can also be used for the implementation of DPNs on multiprocessors. This way, the implementation of DPNs on multiprocessors allows one to optimize the throughput of single process nodes, and as shown by our experiments, also of the entire DPN.
数据流过程网络(dpn)作为分布式和异步系统的一种方便的计算模型被引入,因为每个过程节点可以独立于其他节点工作,即不需要全局协调。同步和循环静态数据流处理网络甚至允许在编译时派生出有效的静态调度,从而允许在有效利用可用资源的情况下运行这些系统,例如在嵌入式系统中。dpn的单进程节点是基于流的计算设备,它将输入流转换为唯一定义的相应输出流,以便在有足够的输入值可用时立即计算输出流的单个值。从这个意义上说,它们与传统微处理器执行指令流有关。在本文中,我们展示了为有效使用微处理器中的多个功能单元而引入的乱序执行如何也可用于在多处理器上实现dpn。通过这种方式,在多处理器上实现DPN可以优化单个进程节点的吞吐量,正如我们的实验所示,也可以优化整个DPN的吞吐量。
{"title":"Out-Of-order execution of synchronous data-flow networks","authors":"D. Baudisch, J. Brandt, K. Schneider","doi":"10.1109/SAMOS.2012.6404171","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404171","url":null,"abstract":"Data flow process networks (DPNs) have been introduced as a convenient model of computation for distributed and asynchronous systems since each process node can work independently of the other nodes, i. e. without the need of a global coordination. Synchronous and cyclo-static data flow process networks even allow to derive at compile-time efficient static schedules that allow one to run these systems with an efficient use of available resources, e. g. in embedded systems. Single process nodes of DPNs are stream-based computing devices that transform input streams to uniquely defined corresponding output streams such that single values of the output streams are computed as soon as sufficient input values are available. In this sense, they are related to the execution of an instruction stream by a conventional microprocessor. In this paper, we show how out-of-order execution that has been introduced for the efficient use of multiple functional units in microprocessors can also be used for the implementation of DPNs on multiprocessors. This way, the implementation of DPNs on multiprocessors allows one to optimize the throughput of single process nodes, and as shown by our experiments, also of the entire DPN.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117134555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A template-based methodology for efficient microprocessor and FPGA accelerator co-design 基于模板的高效微处理器和FPGA加速器协同设计方法
Pub Date : 2012-07-16 DOI: 10.1109/samos.2012.6404153
A. Kritikakou, F. Catthoor, G. Athanasiou, Vasilios I. Kelefouras, C. Goutis
Embedded applications usually require Software/Hardware (SW/HW) designs to meet the hard timing constraints and the required design flexibility. Exhaustive exploration for SW/HW designs is a very time consuming task, while the adhoc approaches and the use of partially automatic tools usually lead to less efficient designs. To support a more efficient codesign process for FPGA platforms we propose a systematic methodology to map an application to SW/HW platform with a custom HW accelerator and a microprocessor core. The methodology mapping steps are expressed through parametric templates for the SW/HW Communication Organization, the Foreground (FG) Memory Management and the Data Path (DP) Mapping. Several performance-area tradeoff design Pareto points are produced by instantiating the templates. A real-time bioimaging application is mapped on a FPGA to evaluate the gains of our approach, i.e. 44,8% on performance compared with pure SW designs and 58% on area compared with pure HW designs.
嵌入式应用通常需要软件/硬件(SW/HW)设计来满足硬时序约束和所需的设计灵活性。对软件/硬件设计进行详尽的探索是一项非常耗时的任务,而特别的方法和部分自动化工具的使用通常会导致设计效率较低。为了支持更有效的FPGA平台协同设计过程,我们提出了一种系统的方法,将应用程序映射到具有定制硬件加速器和微处理器核心的软件/硬件平台。方法映射步骤通过软件/硬件通信组织、前景(FG)内存管理和数据路径(DP)映射的参数模板表示。通过实例化模板产生了几个性能领域的权衡设计帕累托点。将实时生物成像应用程序映射到FPGA上,以评估我们的方法的增益,即与纯SW设计相比,性能提高44.8%,与纯硬件设计相比,面积提高58%。
{"title":"A template-based methodology for efficient microprocessor and FPGA accelerator co-design","authors":"A. Kritikakou, F. Catthoor, G. Athanasiou, Vasilios I. Kelefouras, C. Goutis","doi":"10.1109/samos.2012.6404153","DOIUrl":"https://doi.org/10.1109/samos.2012.6404153","url":null,"abstract":"Embedded applications usually require Software/Hardware (SW/HW) designs to meet the hard timing constraints and the required design flexibility. Exhaustive exploration for SW/HW designs is a very time consuming task, while the adhoc approaches and the use of partially automatic tools usually lead to less efficient designs. To support a more efficient codesign process for FPGA platforms we propose a systematic methodology to map an application to SW/HW platform with a custom HW accelerator and a microprocessor core. The methodology mapping steps are expressed through parametric templates for the SW/HW Communication Organization, the Foreground (FG) Memory Management and the Data Path (DP) Mapping. Several performance-area tradeoff design Pareto points are produced by instantiating the templates. A real-time bioimaging application is mapped on a FPGA to evaluate the gains of our approach, i.e. 44,8% on performance compared with pure SW designs and 58% on area compared with pure HW designs.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient hardware implementation of data-flow parallel embedded systems 数据流并行嵌入式系统的高效硬件实现
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404202
P. Quinton, Anne-Marie Chana, Steven Derrien
Many modern computing systems deal with streams of data, which have to be processed in parallel in order to be handled in real-time. This is in particular the case for some kind of cyber physical systems, which process data provided by physical devices. We consider here an approach to generate efficient hardware for-a particular class of-such systems, which relies upon the polyhedral model. Flexible parallel components, described by the Alpha functional language, are modelled and assembled using a scheduling method which combines the synchronous data-flow principle of balance equations, and the polyhedral scheduling technique. The modelling of flexible components relies on a simple, affine-periodic, delayable and stretchable time model, which allows a full system to be assembled and synthesized by combining the component hardware descriptions with automatically generated wrappers. We illustrate this method on a simplified WCDMA system and we discuss the relationship of this approach with stream languages, latency-insensitive design, and multidimensional data-flow systems.
许多现代计算系统处理数据流,这些数据流必须并行处理才能实时处理。这对于处理物理设备提供的数据的某种网络物理系统来说尤其如此。我们在这里考虑一种方法,以产生有效的硬件为一类这样的系统,它依赖于多面体模型。采用平衡方程的同步数据流原理和多面体调度技术相结合的调度方法,用Alpha函数语言对柔性并行部件进行建模和装配。柔性部件的建模依赖于一个简单的、仿射周期的、可延迟的和可拉伸的时间模型,通过将部件硬件描述与自动生成的包装器相结合,可以组装和合成一个完整的系统。我们在一个简化的WCDMA系统上演示了这种方法,并讨论了这种方法与流语言、延迟不敏感设计和多维数据流系统的关系。
{"title":"Efficient hardware implementation of data-flow parallel embedded systems","authors":"P. Quinton, Anne-Marie Chana, Steven Derrien","doi":"10.1109/SAMOS.2012.6404202","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404202","url":null,"abstract":"Many modern computing systems deal with streams of data, which have to be processed in parallel in order to be handled in real-time. This is in particular the case for some kind of cyber physical systems, which process data provided by physical devices. We consider here an approach to generate efficient hardware for-a particular class of-such systems, which relies upon the polyhedral model. Flexible parallel components, described by the Alpha functional language, are modelled and assembled using a scheduling method which combines the synchronous data-flow principle of balance equations, and the polyhedral scheduling technique. The modelling of flexible components relies on a simple, affine-periodic, delayable and stretchable time model, which allows a full system to be assembled and synthesized by combining the component hardware descriptions with automatically generated wrappers. We illustrate this method on a simplified WCDMA system and we discuss the relationship of this approach with stream languages, latency-insensitive design, and multidimensional data-flow systems.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128780834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2012 International Conference on Embedded Computer Systems (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1