首页 > 最新文献

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

英文 中文
Deterministic event-based control of Virtual Platforms for MPSoC software debugging 基于确定性事件控制的虚拟平台MPSoC软件调试
L. Murillo, Robert Buecs, R. Leupers, G. Ascheid
Virtual Platforms (VPs) are advantageous to develop and debug complex software for multi- and many-processor systems-on-chip (MPSoCs). VPs provide unrivalled controllability and visibility of the target, which can be exploited to examine bugs that cannot be reproduced easily in real hardware. However, VPs as used for debugging provide only traditional interfaces, such as step-based debuggers and traces, that do little to help with the enormous complexity of MPSoCs and their parallel software. Finding a bug is still largely left to the developer's experience and intuition, using manual means rather than automated solutions. To bridge this gap, this paper presents a novel VP debug visualization and control framework for concurrent software that allows examining and steering the target by means of an abstract representation of its inter-task interactions. Our framework reduces the effort required to understand complex concurrency patterns and helps to expose bugs.
虚拟平台在多处理器和多处理器片上系统(mpsoc)中具有开发和调试复杂软件的优势。副总裁提供了无与伦比的可控性和目标的可见性,这可以用来检查在真实硬件中不容易复制的错误。然而,用于调试的vp只提供传统的接口,例如基于步骤的调试器和跟踪,这些接口对mpsoc及其并行软件的巨大复杂性几乎没有帮助。寻找漏洞仍然很大程度上取决于开发人员的经验和直觉,使用手动方法而不是自动解决方案。为了弥补这一差距,本文提出了一种新的并行软件的VP调试可视化和控制框架,该框架允许通过任务间交互的抽象表示来检查和指导目标。我们的框架减少了理解复杂并发模式所需的工作量,并有助于暴露bug。
{"title":"Deterministic event-based control of Virtual Platforms for MPSoC software debugging","authors":"L. Murillo, Robert Buecs, R. Leupers, G. Ascheid","doi":"10.1109/SAMOS.2015.7363697","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363697","url":null,"abstract":"Virtual Platforms (VPs) are advantageous to develop and debug complex software for multi- and many-processor systems-on-chip (MPSoCs). VPs provide unrivalled controllability and visibility of the target, which can be exploited to examine bugs that cannot be reproduced easily in real hardware. However, VPs as used for debugging provide only traditional interfaces, such as step-based debuggers and traces, that do little to help with the enormous complexity of MPSoCs and their parallel software. Finding a bug is still largely left to the developer's experience and intuition, using manual means rather than automated solutions. To bridge this gap, this paper presents a novel VP debug visualization and control framework for concurrent software that allows examining and steering the target by means of an abstract representation of its inter-task interactions. Our framework reduces the effort required to understand complex concurrency patterns and helps to expose bugs.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125359237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AEGLE: A big bio-data analytics framework for integrated health-care services 用于综合医疗保健服务的大型生物数据分析框架
D. Soudris, S. Xydis, Christos Baloukas, A. Hadzidimitriou, I. Chouvarda, K. Stamatopoulos, N. Maglaveras, John Chang, Andreas Raptopoulos, D. Manset, B. Pierscionek, R. Kayyali, N. Philip, Tobias Becker, K. Vaporidi, Eumorphia Kondili, D. Georgopoulos, L. Sutton, R. Rosenquist, L. Scarfò, P. Ghia
AEGLE project1 targets to build an innovative ICT solution addressing the whole data value chain for health based on: cloud computing enabling dynamic resource allocation, HPC infrastructures for computational acceleration and advanced visualization techniques. In this paper, we provide an analysis of the addressed Big Data health scenarios and we describe the key enabling technologies, as well as data privacy and regulatory issues to be integrated into AEGLE's ecosystem, enabling advanced health-care analytic services, while also promoting related research activities.
AEGLE项目1的目标是建立一个创新的信息通信技术解决方案,解决基于云计算实现动态资源分配、用于计算加速的高性能计算基础设施和先进的可视化技术的整个健康数据价值链。在本文中,我们提供了一个解决大数据健康场景的分析,我们描述了关键的使能技术,以及数据隐私和监管问题,将集成到AEGLE的生态系统中,实现先进的医疗保健分析服务,同时也促进了相关的研究活动。
{"title":"AEGLE: A big bio-data analytics framework for integrated health-care services","authors":"D. Soudris, S. Xydis, Christos Baloukas, A. Hadzidimitriou, I. Chouvarda, K. Stamatopoulos, N. Maglaveras, John Chang, Andreas Raptopoulos, D. Manset, B. Pierscionek, R. Kayyali, N. Philip, Tobias Becker, K. Vaporidi, Eumorphia Kondili, D. Georgopoulos, L. Sutton, R. Rosenquist, L. Scarfò, P. Ghia","doi":"10.1109/SAMOS.2015.7363682","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363682","url":null,"abstract":"AEGLE project1 targets to build an innovative ICT solution addressing the whole data value chain for health based on: cloud computing enabling dynamic resource allocation, HPC infrastructures for computational acceleration and advanced visualization techniques. In this paper, we provide an analysis of the addressed Big Data health scenarios and we describe the key enabling technologies, as well as data privacy and regulatory issues to be integrated into AEGLE's ecosystem, enabling advanced health-care analytic services, while also promoting related research activities.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125856451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Designing applications for heterogeneous many-core architectures with the FlexTiles Platform 使用FlexTiles平台为异构多核架构设计应用程序
Benedikt Janßen, Fynn Schwiegelshohn, Martijn Koedam, François Duhem, Leonard Masing, Stephan Werner, Christophe Huriaux, A. Courtay, Emilie Wheatley, K. Goossens, F. Lemonnier, P. Millet, J. Becker, O. Sentieys, M. Hübner
The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed.
FlexTiles平台是在第七个框架计划项目中开发的,该项目由欧洲联盟与五个国家的十个参与者共同资助。它旨在创建一个能够动态管理负载平衡、功耗和故障模块的自适应异构多核架构。它的重点是使体系结构高效,并保持较低的编程工作量。因此,该概念包含一个专用的自动化工具流,用于创建硬件和软件,一个仿真平台,可以执行与FPGA原型相同的二进制文件,以及一个虚拟化层,用于管理最终的异构多核架构,以实现运行时适应性。使用这种方法可以提高软件开发的生产率,从而减少上市时间和开发成本。在本文中,我们介绍了FlexTiles开发平台和一个多核心架构演示。讨论了实现、验证和集成两个用例的步骤。
{"title":"Designing applications for heterogeneous many-core architectures with the FlexTiles Platform","authors":"Benedikt Janßen, Fynn Schwiegelshohn, Martijn Koedam, François Duhem, Leonard Masing, Stephan Werner, Christophe Huriaux, A. Courtay, Emilie Wheatley, K. Goossens, F. Lemonnier, P. Millet, J. Becker, O. Sentieys, M. Hübner","doi":"10.1109/SAMOS.2015.7363683","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363683","url":null,"abstract":"The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Bridging the semantic gap between heterogeneous modeling formalisms and FMI 弥合异构建模形式化和FMI之间的语义差距
S. Tripakis
FMI (Functional Mockup Interface) is a standard for exchanging and co-simulating model components (called FMUs) coming from potentially different modeling formalisms, languages, and tools. Previous work has proposed a formal model for the co-simulation part of the FMI standard, and also presented two co-simulation algorithms which can be proven to have desirable properties, such as determinacy, provided the FMUs satisfy a formal contract. In this paper we discuss the principles for encoding different modeling formalisms, including state machines (both untimed and timed), discrete-event systems, and synchronous dataflow, as FMUs. The challenge is to bridge the various semantic gaps (untimed vs. timed, signals vs. events, etc.) that arise because of the heterogeneity between these modeling formalisms and the FMI API.
FMI(功能性模型接口)是用于交换和共同模拟模型组件(称为fmu)的标准,这些组件可能来自不同的建模形式化、语言和工具。先前的工作已经为FMI标准的联合仿真部分提出了一个形式化模型,并且还提出了两种联合仿真算法,这些算法可以被证明具有理想的性质,如确定性,前提是fmu满足正式契约。在本文中,我们讨论了编码不同建模形式的原则,包括状态机(非定时和定时),离散事件系统和同步数据流,作为fmu。挑战在于如何弥合由于这些建模形式化和FMI API之间的异质性而产生的各种语义差距(非定时与定时、信号与事件等)。
{"title":"Bridging the semantic gap between heterogeneous modeling formalisms and FMI","authors":"S. Tripakis","doi":"10.1109/SAMOS.2015.7363660","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363660","url":null,"abstract":"FMI (Functional Mockup Interface) is a standard for exchanging and co-simulating model components (called FMUs) coming from potentially different modeling formalisms, languages, and tools. Previous work has proposed a formal model for the co-simulation part of the FMI standard, and also presented two co-simulation algorithms which can be proven to have desirable properties, such as determinacy, provided the FMUs satisfy a formal contract. In this paper we discuss the principles for encoding different modeling formalisms, including state machines (both untimed and timed), discrete-event systems, and synchronous dataflow, as FMUs. The challenge is to bridge the various semantic gaps (untimed vs. timed, signals vs. events, etc.) that arise because of the heterogeneity between these modeling formalisms and the FMI API.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126835695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Video chain demonstrator on Xilinx Kintex7 FPGA with EdkDSP floating point accelerators 基于Xilinx Kintex7 FPGA和EdkDSP浮点加速器的视频链演示
J. Kadlec
This paper briefly describes basic Kintex7 FPGA video pipe infrastructure for UTIA demonstrator in the ARTEMIS JU project ALMARVI. The video pipeline is combined with the run-time reprogrammable vector floating point EdkDSP accelerators on the same FPGA chip.
本文简要介绍了ARTEMIS JU项目ALMARVI中用于uta演示器的基本Kintex7 FPGA视频管道基础结构。视频管道与运行时可编程的矢量浮点EdkDSP加速器结合在同一个FPGA芯片上。
{"title":"Video chain demonstrator on Xilinx Kintex7 FPGA with EdkDSP floating point accelerators","authors":"J. Kadlec","doi":"10.1109/SAMOS.2015.7363690","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363690","url":null,"abstract":"This paper briefly describes basic Kintex7 FPGA video pipe infrastructure for UTIA demonstrator in the ARTEMIS JU project ALMARVI. The video pipeline is combined with the run-time reprogrammable vector floating point EdkDSP accelerators on the same FPGA chip.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116593238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Constraint multi-processor Resource Allocation 多约束多处理器资源分配
A. Behrouzian, Dip Goswami, T. Basten, M. Geilen, Hadi Alizadeh Ara
This work proposes a Multi-Constraint Resource Allocation (MuCoRA) method for applications from multiple domains onto multi-processors. In particular, we address a mapping problem for multiple throughput-constrained streaming applications and multiple latency-constrained feedback control applications onto a multi-processor platform running under a Time-Division Multiple-Access (TDMA) policy. The main objective of the proposed method is to reduce resource usage while meeting constraints from both these two domains (i.e., throughput and latency constraints). We show by experiments that the overall resource usage for this mapping problem can be reduced by distributing the allocated resource (i.e., TDMA slots) to the control applications over the TDMA wheel instead of allocating consecutive slots.
本文提出了一种多约束资源分配(MuCoRA)方法,用于从多域到多处理器的应用。特别是,我们解决了多个吞吐量受限的流应用程序和多个延迟受限的反馈控制应用程序到在时分多址(TDMA)策略下运行的多处理器平台上的映射问题。所提出的方法的主要目标是在满足这两个域的约束(即吞吐量和延迟约束)的同时减少资源使用。我们通过实验表明,通过将分配的资源(即TDMA插槽)分配给TDMA轮上的控制应用程序,而不是分配连续的插槽,可以减少该映射问题的总体资源使用。
{"title":"Multi-Constraint multi-processor Resource Allocation","authors":"A. Behrouzian, Dip Goswami, T. Basten, M. Geilen, Hadi Alizadeh Ara","doi":"10.1109/SAMOS.2015.7363695","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363695","url":null,"abstract":"This work proposes a Multi-Constraint Resource Allocation (MuCoRA) method for applications from multiple domains onto multi-processors. In particular, we address a mapping problem for multiple throughput-constrained streaming applications and multiple latency-constrained feedback control applications onto a multi-processor platform running under a Time-Division Multiple-Access (TDMA) policy. The main objective of the proposed method is to reduce resource usage while meeting constraints from both these two domains (i.e., throughput and latency constraints). We show by experiments that the overall resource usage for this mapping problem can be reduced by distributing the allocated resource (i.e., TDMA slots) to the control applications over the TDMA wheel instead of allocating consecutive slots.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning-based analytical cross-platform performance prediction 基于学习的跨平台性能预测分析
Xinnian Zheng, Pradeep Ravikumar, L. John, A. Gerstlauer
As modern processors are becoming increasingly complex, fast and accurate performance prediction is crucial during the early phases of hardware and software co-development. To accurately and efficiently predict the performance of a given software workload is, however, a challenging problem. Traditional cycle-accurate simulation is often too slow, while analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we propose a novel learning-based approach for synthesizing analytical models that can accurately predict the performance of a workload on a target platform from various performance statistics obtained directly on a host platform using built-in hardware counters. Our learning approach relies on a one-time training phase using a cycle-accurate reference of the chosen target processor. We train our models on over 15,000 program instances from the ACM-ICPC programming contest database, and demonstrate the prediction accuracy on standard benchmark suites. Result show that our approach achieves on average more than 90% accuracy at 160× the speed compared to a cycle-accurate reference simulation.
随着现代处理器变得越来越复杂,在硬件和软件协同开发的早期阶段,快速准确的性能预测至关重要。然而,准确有效地预测给定软件工作负载的性能是一个具有挑战性的问题。传统的周期精确模拟通常太慢,而分析模型不够准确,或者仍然需要特定于目标的执行统计数据,这些统计数据可能很慢或难以获得。在本文中,我们提出了一种新的基于学习的方法,用于综合分析模型,该模型可以通过使用内置硬件计数器直接在主机平台上获得的各种性能统计数据准确预测目标平台上工作负载的性能。我们的学习方法依赖于使用周期精确参考所选目标处理器的一次性训练阶段。我们在ACM-ICPC编程竞赛数据库中超过15,000个程序实例上训练我们的模型,并在标准基准套件上证明了预测的准确性。结果表明,与周期精度参考仿真相比,我们的方法在160倍的速度下平均达到90%以上的精度。
{"title":"Learning-based analytical cross-platform performance prediction","authors":"Xinnian Zheng, Pradeep Ravikumar, L. John, A. Gerstlauer","doi":"10.1109/SAMOS.2015.7363659","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363659","url":null,"abstract":"As modern processors are becoming increasingly complex, fast and accurate performance prediction is crucial during the early phases of hardware and software co-development. To accurately and efficiently predict the performance of a given software workload is, however, a challenging problem. Traditional cycle-accurate simulation is often too slow, while analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we propose a novel learning-based approach for synthesizing analytical models that can accurately predict the performance of a workload on a target platform from various performance statistics obtained directly on a host platform using built-in hardware counters. Our learning approach relies on a one-time training phase using a cycle-accurate reference of the chosen target processor. We train our models on over 15,000 program instances from the ACM-ICPC programming contest database, and demonstrate the prediction accuracy on standard benchmark suites. Result show that our approach achieves on average more than 90% accuracy at 160× the speed compared to a cycle-accurate reference simulation.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121580095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
An interval algebra for multiprocessor resource allocation 多处理机资源分配的区间代数
L. Indrusiak, P. Dziurzański
This paper presents an interval algebra created specifically to evaluate timing properties of multiprocessor systems. It models the application load as intervals, and considers allocation and scheduling as algebraic operations over those intervals, aiming to analyse the impact of resource allocation decisions on application response times or schedulability. The theoretical background is introduced informally, followed by the description of a reference implementation of the interval algebra in C++, aiming to appeal to the design practitioner rather than the formalist. Examples of the usage of the proposed algebra are also provided, showing its applicability to the performance evaluation of industrial systems implemented over bus-based and Network-on-Chip multiprocessor platforms. A particular design flow is highlighted, where the interval algebra is used as a fitness function in a genetic algorithm tailored to optimise resource allocation in hard real-time multiprocessors.
本文提出了一个专门用于评价多处理机系统时序特性的区间代数。它将应用程序负载建模为间隔,并将分配和调度视为这些间隔上的代数操作,旨在分析资源分配决策对应用程序响应时间或可调度性的影响。本文非正式地介绍了理论背景,然后描述了c++中区间代数的参考实现,旨在吸引设计实践者而不是形式主义者。还提供了所提出代数的使用示例,表明其适用于基于总线和片上网络的多处理器平台上实现的工业系统的性能评估。强调了一个特定的设计流程,其中区间代数被用作遗传算法中的适应度函数,用于优化硬实时多处理器中的资源分配。
{"title":"An interval algebra for multiprocessor resource allocation","authors":"L. Indrusiak, P. Dziurzański","doi":"10.1109/SAMOS.2015.7363672","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363672","url":null,"abstract":"This paper presents an interval algebra created specifically to evaluate timing properties of multiprocessor systems. It models the application load as intervals, and considers allocation and scheduling as algebraic operations over those intervals, aiming to analyse the impact of resource allocation decisions on application response times or schedulability. The theoretical background is introduced informally, followed by the description of a reference implementation of the interval algebra in C++, aiming to appeal to the design practitioner rather than the formalist. Examples of the usage of the proposed algebra are also provided, showing its applicability to the performance evaluation of industrial systems implemented over bus-based and Network-on-Chip multiprocessor platforms. A particular design flow is highlighted, where the interval algebra is used as a fitness function in a genetic algorithm tailored to optimise resource allocation in hard real-time multiprocessors.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127576335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Power optimizations for transport triggered SIMD processors 传输触发SIMD处理器的电源优化
Joonas Multanen, T. Viitanen, Henry Linjamaki, Heikki O. Kultala, P. Jääskeläinen, J. Takala, L. Koskinen, Jesse Simonsson, H. Berg, K. Raiskila, Tommi Zetterman
Power consumption in modern processor design is a key aspect. Optimizing the processor for power leads to direct savings in battery energy consumption in case of mobile devices. At the same time, many mobile applications demand high computational performance. In case of large scale computing, low power compute devices help in thermal design and in reducing the electricity bill. This paper presents a case study of a customized low power vector processor design that was synthesized on a 28 nm process technology. The processor has a programmer exposed datapath based on the transport triggered architecture programming model. The paper's focus is on the RTL and microarchitecture level power optimizations applied to the design. Using semiautomated interconnection network and register file optimization algorithm, up to 27% of power savings were achieved. Using this as a baseline and applying register file datapath gating, register file banking and enabling clock gating of individual pipeline stages in pipelined function units, up to 26% of power and energy savings could be achieved with only a 3% area overhead. On top of this, for the measured radio applications, the exposed datapath architecture helped to achieve approximately 18% power improvement in comparison to a VLIW-like architecture by utilizing optimizations unique to transport triggered architectures.
功耗是现代处理器设计中的一个关键方面。在移动设备的情况下,优化处理器的功率可以直接节省电池的能耗。同时,许多移动应用对计算性能的要求也很高。在大规模计算的情况下,低功耗计算设备有助于热设计和减少电费。本文介绍了一个基于28纳米工艺技术合成的定制低功耗矢量处理器设计的案例研究。处理器有一个程序员公开的基于传输触发架构编程模型的数据路径。本文的重点是RTL和微架构级的功率优化应用于设计。采用半自动互连网络和注册文件优化算法,可节省高达27%的电力。以此为基准,在流水线功能单元中应用寄存器文件数据路径门控、寄存器文件银行和启用单个流水线阶段的时钟门控,可以实现高达26%的电力和能源节约,而面积开销仅为3%。最重要的是,对于测量的无线电应用程序,通过利用传输触发体系结构特有的优化,与类似vliw的体系结构相比,公开的数据路径体系结构帮助实现了大约18%的功率改进。
{"title":"Power optimizations for transport triggered SIMD processors","authors":"Joonas Multanen, T. Viitanen, Henry Linjamaki, Heikki O. Kultala, P. Jääskeläinen, J. Takala, L. Koskinen, Jesse Simonsson, H. Berg, K. Raiskila, Tommi Zetterman","doi":"10.1109/SAMOS.2015.7363689","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363689","url":null,"abstract":"Power consumption in modern processor design is a key aspect. Optimizing the processor for power leads to direct savings in battery energy consumption in case of mobile devices. At the same time, many mobile applications demand high computational performance. In case of large scale computing, low power compute devices help in thermal design and in reducing the electricity bill. This paper presents a case study of a customized low power vector processor design that was synthesized on a 28 nm process technology. The processor has a programmer exposed datapath based on the transport triggered architecture programming model. The paper's focus is on the RTL and microarchitecture level power optimizations applied to the design. Using semiautomated interconnection network and register file optimization algorithm, up to 27% of power savings were achieved. Using this as a baseline and applying register file datapath gating, register file banking and enabling clock gating of individual pipeline stages in pipelined function units, up to 26% of power and energy savings could be achieved with only a 3% area overhead. On top of this, for the measured radio applications, the exposed datapath architecture helped to achieve approximately 18% power improvement in comparison to a VLIW-like architecture by utilizing optimizations unique to transport triggered architectures.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133664542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FNOCEE: A framework for NoC evaluation by FPGA-based emulation FNOCEE:基于fpga仿真的NoC评估框架
D. Pfefferkorn, Achim Schmider, G. P. Vayá, M. Neuenhahn, H. Blume
This paper introduces FNOCEE, a framework for the evaluation of NoC-based many-cores systems by FPGA-based emulation. It uses a task graph-oriented approach to model applications, while a hardware-accelerated genetic algorithm is employed to find close-to-optimal solutions to the task mapping problem. The proposed genetic algorithm is analyzed in detail, e.g., in terms of mutation rate and number of elite individuals. In order to illustrate the framework's capabilities, several case studies have been performed, wherein scalability of relevant parallel applications is investigated with regard to the number and type of available processing cores and the generated traffic load as a result of inter-task communication.
本文介绍了一种基于fpga的多核系统仿真评估框架FNOCEE。它使用面向任务图的方法对应用程序建模,同时使用硬件加速的遗传算法来找到任务映射问题的接近最优解。对所提出的遗传算法进行了详细的分析,例如,从突变率和精英个体数量方面进行了分析。为了说明框架的功能,执行了几个案例研究,其中根据可用处理核心的数量和类型以及由于任务间通信而产生的流量负载,研究了相关并行应用程序的可伸缩性。
{"title":"FNOCEE: A framework for NoC evaluation by FPGA-based emulation","authors":"D. Pfefferkorn, Achim Schmider, G. P. Vayá, M. Neuenhahn, H. Blume","doi":"10.1109/SAMOS.2015.7363663","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363663","url":null,"abstract":"This paper introduces FNOCEE, a framework for the evaluation of NoC-based many-cores systems by FPGA-based emulation. It uses a task graph-oriented approach to model applications, while a hardware-accelerated genetic algorithm is employed to find close-to-optimal solutions to the task mapping problem. The proposed genetic algorithm is analyzed in detail, e.g., in terms of mutation rate and number of elite individuals. In order to illustrate the framework's capabilities, several case studies have been performed, wherein scalability of relevant parallel applications is investigated with regard to the number and type of available processing cores and the generated traffic load as a result of inter-task communication.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123415686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1