首页 > 最新文献

2012 International Conference on Embedded Computer Systems (SAMOS)最新文献

英文 中文
K-Periodic schedules for evaluating the maximum throughput of a Synchronous Dataflow graph 用于评估同步数据流图的最大吞吐量的周期调度
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404169
Bruno Bodin, Alix Munier Kordon, B. Dinechin
Synchronous Dataflow graphs, introduced by Lee and Messerschmitt in 1987, are a well-known formalism commonly used to model data-exchanges between parallel processes. This model was extensively studied in the last two decades because of the importance of its applications. However, the determination of a maximal throughput is a difficult question, for which no polynomial time algorithm exists to date. In this context, several authors proved that a K-Periodic schedule, where K is a vector of no polynomially bounded values, reaches the maximum throughput. On the other hand, a 1-Periodic schedule may be built polynomially, but without any guarantee on the throughput achieved. Therefore, the investigated problem is the trade-off between the schedule size induced by the vector K (called the periodicity vector) and its corresponding throughput. Necessary and sufficient conditions for the existence of K-Periodic schedules are first shown for any fixed value in the vector K; the computation of the maximum throughput of a K-Periodic schedule is deduced. A set of dominant values of K is exhibited, and a relationship between the optimal throughput of these values is proved. Some real-life experiments measure the variation of the throughput according to K.
同步数据流图是由Lee和Messerschmitt在1987年引入的,是一种众所周知的形式化方法,通常用于为并行进程之间的数据交换建模。由于其应用的重要性,该模型在过去二十年中得到了广泛的研究。然而,最大吞吐量的确定是一个难题,迄今为止还没有多项式时间算法。在这种情况下,一些作者证明了K-周期调度达到最大吞吐量,其中K是一个没有多项式有界值的向量。另一方面,1-周期调度可以多项式地构建,但不能保证实现的吞吐量。因此,所研究的问题是由向量K(称为周期性向量)引起的调度大小与其相应的吞吐量之间的权衡。首先给出了K-周期调度存在的充分必要条件;推导了k周期调度的最大吞吐量的计算方法。给出了K的一组优势值,并证明了这些值的最优吞吐量之间的关系。一些现实生活中的实验根据K来测量吞吐量的变化。
{"title":"K-Periodic schedules for evaluating the maximum throughput of a Synchronous Dataflow graph","authors":"Bruno Bodin, Alix Munier Kordon, B. Dinechin","doi":"10.1109/SAMOS.2012.6404169","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404169","url":null,"abstract":"Synchronous Dataflow graphs, introduced by Lee and Messerschmitt in 1987, are a well-known formalism commonly used to model data-exchanges between parallel processes. This model was extensively studied in the last two decades because of the importance of its applications. However, the determination of a maximal throughput is a difficult question, for which no polynomial time algorithm exists to date. In this context, several authors proved that a K-Periodic schedule, where K is a vector of no polynomially bounded values, reaches the maximum throughput. On the other hand, a 1-Periodic schedule may be built polynomially, but without any guarantee on the throughput achieved. Therefore, the investigated problem is the trade-off between the schedule size induced by the vector K (called the periodicity vector) and its corresponding throughput. Necessary and sufficient conditions for the existence of K-Periodic schedules are first shown for any fixed value in the vector K; the computation of the maximum throughput of a K-Periodic schedule is deduced. A set of dominant values of K is exhibited, and a relationship between the optimal throughput of these values is proved. Some real-life experiments measure the variation of the throughput according to K.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128669732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Using OpenMP superscalar for parallelization of embedded and consumer applications 使用OpenMP超标量实现嵌入式和消费者应用程序的并行化
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404154
M. Andersch, C. C. Chi, B. Juurlink
In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. Embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H.264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.
在过去的几年里,研究和工业界已经引入了几种并行编程模型来简化并行应用程序的开发。这些模型中流行的一类是基于任务的编程模型,它们宣称易于使用、可移植性和高性能。该类中的一个新模型OpenMP超标量(Superscalar)结合了自动运行时依赖项解析等高级特性,同时为C/ c++保持了简单的基于pragma的编程。OpenMP超标量已被证明可以有效地利用HPC工作负载中的并行性。然而,嵌入式和消费者应用程序目前仍然主要使用传统的基于线程的编程模型并行化。在这项工作中,我们研究了OpenMP超标量在可用性和性能方面对嵌入式和消费者应用程序的有效性。为了确定omps的可用性,我们详细展示了如何实现复杂的并行化策略,例如并行H.264解码中使用的策略。为了评估性能,我们创建了一个包含10个嵌入式和消费者基准的集合,这些基准在omps和pthread中并行运行。
{"title":"Using OpenMP superscalar for parallelization of embedded and consumer applications","authors":"M. Andersch, C. C. Chi, B. Juurlink","doi":"10.1109/SAMOS.2012.6404154","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404154","url":null,"abstract":"In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. Embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H.264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125226560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
System-on-Chip deployment with MCAPI abstraction and IP-XACT metadata 使用MCAPI抽象和IP-XACT元数据的片上系统部署
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404176
Lauri Matilainen, Lasse Lehtonen, J. Määttä, E. Salminen, T. Hämäläinen
IP-XACT, the recent IEEE1685 standard, defines metadata format for IP packing and integration in System-on-Chip designs. It was originally proposed for hardware descriptions, but we have extended it for software, HW/SW mappings and application communication abstraction. The latter is realized with Multicore Association MCAPI that is a lightweight message passing interface. In this paper we present as a work-in-progress how we utilize all these to deploy and move application tasks between different platforms for FPGA prototyping, execution acceleration or verification. The focus is on the metadata format since it is a foundation for automation and tool development. The design flow is illustrated with two case studies: A motion JPEG encoder and a 12-node workload model of video object plane decoder (VOPD). These are deployed to PC and Altera and Xilinx FPGA boards in five variations. The results are reported as the deployment time for both non-recurring and deployment specific tasks. Setting up a new deployment is a matter of hours when there is an IP-XACT library of HW and SW components.
IP- xact,最近的IEEE1685标准,定义了在片上系统设计中IP封装和集成的元数据格式。它最初是为硬件描述而提出的,但我们已经将其扩展到软件、硬件/软件映射和应用程序通信抽象。后者是通过多核关联MCAPI实现的,这是一个轻量级的消息传递接口。在本文中,我们提出了一个正在进行的工作,我们如何利用所有这些来部署和在不同平台之间移动应用程序任务,以实现FPGA原型,执行加速或验证。重点是元数据格式,因为它是自动化和工具开发的基础。通过两个案例研究说明了设计流程:运动JPEG编码器和视频对象平面解码器(VOPD)的12节点工作负载模型。它们以五种变体部署在PC和Altera和Xilinx FPGA板上。结果将作为非经常性任务和部署特定任务的部署时间报告。当有一个IP-XACT硬件和软件组件库时,设置一个新的部署只需要几个小时。
{"title":"System-on-Chip deployment with MCAPI abstraction and IP-XACT metadata","authors":"Lauri Matilainen, Lasse Lehtonen, J. Määttä, E. Salminen, T. Hämäläinen","doi":"10.1109/SAMOS.2012.6404176","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404176","url":null,"abstract":"IP-XACT, the recent IEEE1685 standard, defines metadata format for IP packing and integration in System-on-Chip designs. It was originally proposed for hardware descriptions, but we have extended it for software, HW/SW mappings and application communication abstraction. The latter is realized with Multicore Association MCAPI that is a lightweight message passing interface. In this paper we present as a work-in-progress how we utilize all these to deploy and move application tasks between different platforms for FPGA prototyping, execution acceleration or verification. The focus is on the metadata format since it is a foundation for automation and tool development. The design flow is illustrated with two case studies: A motion JPEG encoder and a 12-node workload model of video object plane decoder (VOPD). These are deployed to PC and Altera and Xilinx FPGA boards in five variations. The results are reported as the deployment time for both non-recurring and deployment specific tasks. Setting up a new deployment is a matter of hours when there is an IP-XACT library of HW and SW components.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130007658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
HNOCS: Modular open-source simulator for Heterogeneous NoCs HNOCS:异构NoCs的模块化开源模拟器
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404157
Y. Ben-Itzhak, E. Zahavi, I. Cidon, A. Kolodny
We present HNOCS (Heterogeneous Network-on-Chip Simulator), an open-source NoC simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to support modeling of heterogeneous NoCs with variable link capacities and number of VCs per unidirectional port. The HNOCS simulation platform provides an open-source, modular, scalable, extendible and fully parameterizable framework for modeling NoCs. It includes three types of NoC routers: synchronous, synchronous virtual output queue (VoQ) and asynchronous. HNOCS provides a rich set of statistical measurements at the flit and packet levels: end-to-end latencies, throughput, VC acquisition latencies, transfer latencies, etc. We describe the architecture, structure, available models and the features that make HNOCS suitable for advanced NoC exploration. We also evaluate several case studies which cannot be evaluated with any other exiting NoC simulator.
本文提出了基于omnet++的开源异构片上网络模拟器HNOCS(异构片上网络模拟器)。据我们所知,HNOCS是第一个支持异构noc建模的模拟器,具有可变的链路容量和每个单向端口的vc数量。HNOCS仿真平台为NoCs建模提供了一个开源、模块化、可扩展、可完全参数化的框架。它包括三种类型的NoC路由器:同步、同步虚拟输出队列(VoQ)和异步。HNOCS提供了一套丰富的统计测量在飞行和分组级别:端到端延迟,吞吐量,VC获取延迟,传输延迟等。我们描述了使HNOCS适合先进NoC勘探的体系结构、结构、可用模型和特征。我们还评估了几个案例研究,这些案例研究无法用任何其他现有的NoC模拟器进行评估。
{"title":"HNOCS: Modular open-source simulator for Heterogeneous NoCs","authors":"Y. Ben-Itzhak, E. Zahavi, I. Cidon, A. Kolodny","doi":"10.1109/SAMOS.2012.6404157","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404157","url":null,"abstract":"We present HNOCS (Heterogeneous Network-on-Chip Simulator), an open-source NoC simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to support modeling of heterogeneous NoCs with variable link capacities and number of VCs per unidirectional port. The HNOCS simulation platform provides an open-source, modular, scalable, extendible and fully parameterizable framework for modeling NoCs. It includes three types of NoC routers: synchronous, synchronous virtual output queue (VoQ) and asynchronous. HNOCS provides a rich set of statistical measurements at the flit and packet levels: end-to-end latencies, throughput, VC acquisition latencies, transfer latencies, etc. We describe the architecture, structure, available models and the features that make HNOCS suitable for advanced NoC exploration. We also evaluate several case studies which cannot be evaluated with any other exiting NoC simulator.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124441001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 121
Architecture-level fault-tolerance for biomedical implants 生物医学植入物的体系结构级容错
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404163
R. M. Seepers, C. Strydis, G. Gaydadjiev
In this paper, we describe the design and implementation of a new fault-tolerant RISC-processor architecture suitable for a design framework targeting biomedical implants. The design targets both soft and hard faults and is original in efficiently combining as well as enhancing classic fault-tolerance techniques. The proposed architecture allows run-time tradeoffs between performance and fault tolerance by means of instruction-level configurability. The system design is synthesized for UMC 90nm CMOS standard-process and is evaluated in terms of fault coverage, area, average power consumption, total energy consumption and performance for various duplication policies and test-sequence schedules. It is shown that area and power overheads of approximately 25% and 32%, respectively, are required to implement our techniques on the baseline processor. The major overheads of the proposed architecture are performance (up to 107%) and energy consumption (up to 157%). It is observed that the average power consumption is often reduced when a higher degree of fault tolerance is targeted. It is shown that test sequences can effectively be scheduled during the available program stalls and that nearly all soft faults are tolerated by using instruction duplication. The main advantages of the proposed architecture are the high portability of the introduced architecture-level fault-tolerance techniques, the flexibility in trading processor overheads for required fault-tolerance degree as well as affordable area and power consumption overheads.
在本文中,我们描述了一种新的容错risc处理器架构的设计和实现,适用于针对生物医学植入物的设计框架。该设计同时针对软故障和硬故障,在有效结合和增强经典容错技术方面具有独创性。所提出的体系结构允许在运行时通过指令级可配置性在性能和容错性之间进行权衡。系统设计是根据UMC 90nm CMOS标准工艺合成的,并根据各种重复策略和测试顺序时间表的故障覆盖率、面积、平均功耗、总能耗和性能进行了评估。结果表明,在基准处理器上实现我们的技术所需的面积和功耗开销分别约为25%和32%。所建议架构的主要开销是性能(高达107%)和能耗(高达157%)。可以观察到,当目标是更高程度的容错时,平均功耗通常会降低。实验结果表明,该方法可以有效地安排测试序列,并且几乎可以容忍所有的软故障。所提出的体系结构的主要优点是所引入的体系结构级容错技术的高可移植性,在交换所需容错程度的处理器开销以及负担得起的面积和功耗开销方面的灵活性。
{"title":"Architecture-level fault-tolerance for biomedical implants","authors":"R. M. Seepers, C. Strydis, G. Gaydadjiev","doi":"10.1109/SAMOS.2012.6404163","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404163","url":null,"abstract":"In this paper, we describe the design and implementation of a new fault-tolerant RISC-processor architecture suitable for a design framework targeting biomedical implants. The design targets both soft and hard faults and is original in efficiently combining as well as enhancing classic fault-tolerance techniques. The proposed architecture allows run-time tradeoffs between performance and fault tolerance by means of instruction-level configurability. The system design is synthesized for UMC 90nm CMOS standard-process and is evaluated in terms of fault coverage, area, average power consumption, total energy consumption and performance for various duplication policies and test-sequence schedules. It is shown that area and power overheads of approximately 25% and 32%, respectively, are required to implement our techniques on the baseline processor. The major overheads of the proposed architecture are performance (up to 107%) and energy consumption (up to 157%). It is observed that the average power consumption is often reduced when a higher degree of fault tolerance is targeted. It is shown that test sequences can effectively be scheduled during the available program stalls and that nearly all soft faults are tolerated by using instruction duplication. The main advantages of the proposed architecture are the high portability of the introduced architecture-level fault-tolerance techniques, the flexibility in trading processor overheads for required fault-tolerance degree as well as affordable area and power consumption overheads.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124348396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Adaptive processor architecture - invited paper 自适应处理器架构-特邀论文
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404181
M. Hübner, D. Göhringer, Carsten Tradowsky, J. Henkel, J. Becker
This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. The latter parameter is normally not known at design time. This leads to the development of more general purpose processors which are capable to handle the data to be processed in any case. With the novel approach which keeps the microarchitecture of a processor flexible, the processor can start as a general purpose device and end up with a specific parameterization, comparable with application specific processor architectures. Furthermore, the increased degree of freedom which is enabled through the approach for a novel quality of processors is described.
本文介绍了一种在运行时适应处理器微体系结构的新方法。目标是根据应用程序和要处理的数据的需求定制内部体系结构。后一个参数在设计时通常是未知的。这导致了更多通用处理器的开发,这些处理器能够在任何情况下处理要处理的数据。这种新颖的方法保持了处理器微架构的灵活性,使处理器可以从通用设备开始,并以特定的参数化结束,可与特定应用的处理器架构相媲美。此外,描述了通过新质量处理器的方法实现的自由度的增加。
{"title":"Adaptive processor architecture - invited paper","authors":"M. Hübner, D. Göhringer, Carsten Tradowsky, J. Henkel, J. Becker","doi":"10.1109/SAMOS.2012.6404181","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404181","url":null,"abstract":"This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. The latter parameter is normally not known at design time. This leads to the development of more general purpose processors which are capable to handle the data to be processed in any case. With the novel approach which keeps the microarchitecture of a processor flexible, the processor can start as a general purpose device and end up with a specific parameterization, comparable with application specific processor architectures. Furthermore, the increased degree of freedom which is enabled through the approach for a novel quality of processors is described.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116819021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reconfigurable miniature sensor nodes for condition monitoring 用于状态监测的可重构微型传感器节点
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404164
Teemu Nylanden, J. Boutellier, Karri Nikunen, J. Hannuksela, O. Silvén
The wireless sensor networks are being deployed at escalating rate for various application fields. The ever growing number of application areas requires a diverse set of algorithms with disparate processing needs. The wireless sensor networks also need to adapt to the prevailing energy conditions and processing requirements. The preceding reasons rule out the use of a single fixed design. Instead a general purpose design that can rapidly adapt to different conditions and requirements is desired. In lieu of the traditional inflexible wireless sensor node consisting of a micro-controller, radio transceiver, sensor array and energy storage, we propose a rapidly reconfigurable miniature sensor node, implemented with a transport triggered architecture processor on a low-power Flash FPGA. Also power consumption and silicon area usage comparison between 16-bit fixed and floating point and 32-bit floating point implementations is presented in this paper. The implemented processors and algorithms are intended for rolling bearing condition monitoring, but can be fully extended for other applications as well.
无线传感器网络正以越来越快的速度被部署到各个应用领域。越来越多的应用领域需要一组具有不同处理需求的不同算法。无线传感器网络还需要适应当前的能源条件和处理要求。上述原因排除了单一固定设计的使用。相反,需要一种能够快速适应不同条件和要求的通用设计。代替传统的由微控制器、无线电收发器、传感器阵列和能量存储组成的不灵活的无线传感器节点,我们提出了一种快速可重构的微型传感器节点,在低功耗闪存FPGA上实现传输触发架构处理器。本文还比较了16位固定、浮点和32位浮点实现的功耗和芯片面积使用情况。所实现的处理器和算法旨在用于滚动轴承状态监测,但也可以完全扩展到其他应用。
{"title":"Reconfigurable miniature sensor nodes for condition monitoring","authors":"Teemu Nylanden, J. Boutellier, Karri Nikunen, J. Hannuksela, O. Silvén","doi":"10.1109/SAMOS.2012.6404164","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404164","url":null,"abstract":"The wireless sensor networks are being deployed at escalating rate for various application fields. The ever growing number of application areas requires a diverse set of algorithms with disparate processing needs. The wireless sensor networks also need to adapt to the prevailing energy conditions and processing requirements. The preceding reasons rule out the use of a single fixed design. Instead a general purpose design that can rapidly adapt to different conditions and requirements is desired. In lieu of the traditional inflexible wireless sensor node consisting of a micro-controller, radio transceiver, sensor array and energy storage, we propose a rapidly reconfigurable miniature sensor node, implemented with a transport triggered architecture processor on a low-power Flash FPGA. Also power consumption and silicon area usage comparison between 16-bit fixed and floating point and 32-bit floating point implementations is presented in this paper. The implemented processors and algorithms are intended for rolling bearing condition monitoring, but can be fully extended for other applications as well.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129060467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Simultaneous reconfiguration of issue-width and instruction cache for a VLIW processor VLIW处理器的问题宽度和指令缓存的同时重新配置
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404173
Fakhar Anjam, Stephan Wong, L. Carro, G. Nazar, M. B. Rutzig
This paper presents an analysis on the impact of simultaneous instruction cache (I-cache) and issue-width reconfiguration for a very long instruction word (VLIW) processor. The issue-width of the processor can be adjusted at run-time to be 2-issue, 4-issue, or 8-issue, and the I-cache can be reconfigured in terms of associativity, cache size, and line size.We observe that, compared to reconfiguring only the I-cache for a fixed issue-width core, reconfiguring the issue-width and I-cache together can further reduce the execution time, energy consumption, and/or the energy-delay product (EDP). The results for the MiBench and the PowerStone benchmark suites show that compared to “2-issue + the best I-cache”, “4-issue + the best I-cache” can reduce execution time, energy consumption, and EDP by up to 37%, 11%, and 36%, respectively, for different applications. Similarly, compared to “2-issue + the best I-cache”, “8-issue + the best I-cache” can reduce execution time and EDP by up to 46% and 30%, respectively, for different applications.
本文分析了同步指令缓存(I-cache)和问题宽度重构对超长指令字处理器的影响。处理器的问题宽度可以在运行时调整为2个问题、4个问题或8个问题,并且可以根据关联性、缓存大小和行大小重新配置I-cache。我们观察到,与仅为固定问题宽度的核心重新配置I-cache相比,同时重新配置问题宽度和I-cache可以进一步减少执行时间、能耗和/或能量延迟积(EDP)。MiBench和PowerStone基准测试套件的结果表明,对于不同的应用程序,与“2-issue +最佳I-cache”相比,“4-issue +最佳I-cache”可以分别减少37%、11%和36%的执行时间、能耗和EDP。同样,对于不同的应用程序,与“2-issue +最佳I-cache”相比,“8-issue +最佳I-cache”可以分别减少46%和30%的执行时间和EDP。
{"title":"Simultaneous reconfiguration of issue-width and instruction cache for a VLIW processor","authors":"Fakhar Anjam, Stephan Wong, L. Carro, G. Nazar, M. B. Rutzig","doi":"10.1109/SAMOS.2012.6404173","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404173","url":null,"abstract":"This paper presents an analysis on the impact of simultaneous instruction cache (I-cache) and issue-width reconfiguration for a very long instruction word (VLIW) processor. The issue-width of the processor can be adjusted at run-time to be 2-issue, 4-issue, or 8-issue, and the I-cache can be reconfigured in terms of associativity, cache size, and line size.We observe that, compared to reconfiguring only the I-cache for a fixed issue-width core, reconfiguring the issue-width and I-cache together can further reduce the execution time, energy consumption, and/or the energy-delay product (EDP). The results for the MiBench and the PowerStone benchmark suites show that compared to “2-issue + the best I-cache”, “4-issue + the best I-cache” can reduce execution time, energy consumption, and EDP by up to 37%, 11%, and 36%, respectively, for different applications. Similarly, compared to “2-issue + the best I-cache”, “8-issue + the best I-cache” can reduce execution time and EDP by up to 46% and 30%, respectively, for different applications.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128654435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Throughput driven transformations of Synchronous Data Flows for mapping to heterogeneous MPSoCs 映射到异构mpsoc的同步数据流的吞吐量驱动转换
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404168
Anastasia Stulova, R. Leupers, G. Ascheid
Due to energy efficiency requirements of modern embedded systems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.
由于现代嵌入式系统对能源效率的要求,芯片供应商倾向于采用不同类型的处理引擎和不统一的互连结构的多核架构。同时,多个应用程序打算在具有这种异构体系结构的设备上并发运行。硬件及其用例复杂性的快速增长给软件开发工具带来了新的挑战。为了克服这种复杂性,基于计算模型的方法正变得越来越有前途。同步数据流(SDF)是具有固有并发性的流应用程序的流行规范形式。然而,原始表示中表达的并行性通常不足以最大限度地利用多核平台的潜力。在本文中,我们提出了一个整体的方法来提高流应用程序的吞吐量,同时将它们映射到异构架构上。该方法使用根据可用平台资源调整SDF中的并行性的转换。我们使用遗传算法来探索SDF实例,目标是在目标平台上最大化吞吐量。我们的模型支持架构异构性和多应用场景。实验表明,在大多数测试用例中,我们的方法优于其他在单个应用程序上利用并行性的技术,并支持并发应用程序优化。
{"title":"Throughput driven transformations of Synchronous Data Flows for mapping to heterogeneous MPSoCs","authors":"Anastasia Stulova, R. Leupers, G. Ascheid","doi":"10.1109/SAMOS.2012.6404168","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404168","url":null,"abstract":"Due to energy efficiency requirements of modern embedded systems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121255951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
An application-specific Network-on-Chip for control architectures in RF transceivers 用于射频收发器控制体系结构的特定应用的片上网络
Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404159
S. Brandstätter, M. Huemer
This paper focuses on the design of an on-chip communication system for control architectures used in RF (Radio Frequency) transceivers. Continuous developments and enhancements of RF transceivers, especially of smart transceivers supporting multi-mode standards, led to new and complex SoC (System-on-Chip) designs. These designs are defined by a distributed controlling concept using several processing modules which are connected over an advanced communication system. Based on the requirements and restrictions of this communication system an application-specific NoC (Network-on-Chip) is presented and analyzed in this work.
本文主要研究射频收发器控制体系结构的片上通信系统的设计。射频收发器的不断发展和增强,特别是支持多模式标准的智能收发器,导致了新的和复杂的SoC(片上系统)设计。这些设计由分布式控制概念定义,使用几个处理模块,这些模块通过先进的通信系统连接。基于该通信系统的要求和限制,本文提出并分析了一种专用的片上网络(Network-on-Chip)。
{"title":"An application-specific Network-on-Chip for control architectures in RF transceivers","authors":"S. Brandstätter, M. Huemer","doi":"10.1109/SAMOS.2012.6404159","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404159","url":null,"abstract":"This paper focuses on the design of an on-chip communication system for control architectures used in RF (Radio Frequency) transceivers. Continuous developments and enhancements of RF transceivers, especially of smart transceivers supporting multi-mode standards, led to new and complex SoC (System-on-Chip) designs. These designs are defined by a distributed controlling concept using several processing modules which are connected over an advanced communication system. Based on the requirements and restrictions of this communication system an application-specific NoC (Network-on-Chip) is presented and analyzed in this work.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 International Conference on Embedded Computer Systems (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1