首页 > 最新文献

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

英文 中文
A framework for reducing the modeling and simulation complexity of Cyberphysical Systems 降低网络物理系统建模和仿真复杂性的框架
N. Zompakis, K. Siozios
As systems continue to evolve they rely less on human decision-making and more on computational intelligence. This trend in conjunction to the available technologies for providing advanced sensing, measurement, process control, and communication leads us towards the new field of Cyber-Physical System (CPS). Although these systems exhibit remarkable characteristics, the increased complexity imposed by numerous components and services makes their design extremely difficult. This paper proposes a software-supported framework for reducing the design complexity regarding the modeling, as well as the simulation of CPS. For this purpose, a novel technique based on system scenarios is applied. Evaluation results prove the effectiveness of introduced framework, as we achieve to reduce mentionable the modeling and simulation complexity with a controllable overhead in accuracy.
随着系统的不断进化,它们越来越少地依赖于人类的决策,而更多地依赖于计算智能。这一趋势与提供先进传感、测量、过程控制和通信的现有技术相结合,将我们引向信息物理系统(CPS)的新领域。尽管这些系统表现出显著的特征,但由于众多组件和服务的增加,复杂性使得它们的设计变得极其困难。本文提出了一个软件支持的框架,以降低CPS建模和仿真的设计复杂性。为此,采用了一种基于系统场景的新技术。评估结果证明了所引入框架的有效性,在降低建模和仿真复杂度的同时,在精度上的开销可控。
{"title":"A framework for reducing the modeling and simulation complexity of Cyberphysical Systems","authors":"N. Zompakis, K. Siozios","doi":"10.1109/SAMOS.2015.7363699","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363699","url":null,"abstract":"As systems continue to evolve they rely less on human decision-making and more on computational intelligence. This trend in conjunction to the available technologies for providing advanced sensing, measurement, process control, and communication leads us towards the new field of Cyber-Physical System (CPS). Although these systems exhibit remarkable characteristics, the increased complexity imposed by numerous components and services makes their design extremely difficult. This paper proposes a software-supported framework for reducing the design complexity regarding the modeling, as well as the simulation of CPS. For this purpose, a novel technique based on system scenarios is applied. Evaluation results prove the effectiveness of introduced framework, as we achieve to reduce mentionable the modeling and simulation complexity with a controllable overhead in accuracy.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Visual processing sparks a new class of processors 视觉处理激发了一类新的处理器
Marco C. Jacobs
Summary form only given. Augmented reality, gesture interfaces, and automotive driver assistance systems enable novel user experiences, safer rides, and new usage models. Bringing these systems to market requires a power-efficient architecture and many billions of operations per second of processing. Widely adopted processing architectures like CPUs and GPUs can't fulfill the requirements, sparking a new class of video and vision processors. In this talk we'll give a quick overview of applications, typical algorithms, and their implications on computer architecture. We will focus on the automotive market, where computer vision is the key technology to enabling autonoumous vehicles hitting the road.
只提供摘要形式。增强现实、手势界面和汽车驾驶员辅助系统可以实现新颖的用户体验、更安全的乘坐和新的使用模式。将这些系统推向市场需要一个节能的架构和每秒数十亿次的处理。广泛采用的cpu和gpu等处理架构无法满足需求,从而引发了一类新的视频和视觉处理器。在这次演讲中,我们将快速概述应用程序,典型算法,以及它们对计算机体系结构的影响。我们将专注于汽车市场,计算机视觉是使自动驾驶汽车上路的关键技术。
{"title":"Visual processing sparks a new class of processors","authors":"Marco C. Jacobs","doi":"10.1109/SAMOS.2015.7363651","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363651","url":null,"abstract":"Summary form only given. Augmented reality, gesture interfaces, and automotive driver assistance systems enable novel user experiences, safer rides, and new usage models. Bringing these systems to market requires a power-efficient architecture and many billions of operations per second of processing. Widely adopted processing architectures like CPUs and GPUs can't fulfill the requirements, sparking a new class of video and vision processors. In this talk we'll give a quick overview of applications, typical algorithms, and their implications on computer architecture. We will focus on the automotive market, where computer vision is the key technology to enabling autonoumous vehicles hitting the road.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127054070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic re-vectorization of binary code 二进制码的动态再矢量化
Nabil Hallou, Erven Rohou, P. Clauss, A. Ketterlin
In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, including legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions and propose to convert automatically, and at runtime, loops vectorized for an older version of the SIMD extension to a newer one. We propose a lightweight mechanism, that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependences and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The re-vectorizer is implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require rewriting binaries, and operates during program execution.
在许多情况下,应用程序没有针对其运行的硬件进行优化。造成这种不满意的情况有几个原因,包括遗留代码、以二进制形式分发的商业代码,或者在计算场上的部署。实际上,ISA的向后兼容性只保证了功能,而不是对硬件的最佳利用。在这项工作中,我们的重点是最大化SIMD扩展的CPU效率,并建议在运行时自动将旧版本SIMD扩展的循环矢量化转换为新版本。我们提出了一种轻量级机制,它不包括向量化器,而是利用了以前静态向量化器所做的工作。我们展示了许多为x86 SSE编译的循环可以动态地转换为最新的、更强大的AVX;此外,如何维护与数据依赖性和约简等挑战相关的正确性。我们获得了与针对AVX的本机编译器一致的加速。再矢量化器在动态优化平台内实现;它对用户完全透明,不需要重写二进制文件,并且在程序执行期间进行操作。
{"title":"Dynamic re-vectorization of binary code","authors":"Nabil Hallou, Erven Rohou, P. Clauss, A. Ketterlin","doi":"10.1109/SAMOS.2015.7363680","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363680","url":null,"abstract":"In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, including legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions and propose to convert automatically, and at runtime, loops vectorized for an older version of the SIMD extension to a newer one. We propose a lightweight mechanism, that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependences and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The re-vectorizer is implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require rewriting binaries, and operates during program execution.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"94 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113962524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Physical design aware system level synthesis of hardware 物理设计感知系统级综合硬件
Nasim Farahini, A. Hemani, Hasan Sohofi, Shuo Li
In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.
尽管经过了几十年的研究,但由于标准单元的抽象级别与算法级别之间的巨大差距,只有一小部分硬件是使用高级合成设计的。我们提出了一个基于网格的规则物理设计平台,该平台由称为SiLago块的大颗粒硬化构建块组成。该平台被划分为不同的功能区域,如计算、存储、系统控制等。SiLago平台特有的微体系结构操作充当了与中间的高级和系统级综合框架相遇的接口。该框架用于生成来自SiLago平台的三个硬件宏实例,用于信号处理领域的三个应用。结果表明,与商业SOC流程相比,系统级设计空间探索和综合时间的效率提高了两个数量级,能量和面积的平均设计质量损失分别为18%和54%。
{"title":"Physical design aware system level synthesis of hardware","authors":"Nasim Farahini, A. Hemani, Hasan Sohofi, Shuo Li","doi":"10.1109/SAMOS.2015.7363669","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363669","url":null,"abstract":"In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"62 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134427185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel SystemC simulation for ESL design using flexible time decoupling 采用柔性时间解耦的并行系统仿真ESL设计
Jan Weinstock, R. Leupers, G. Ascheid
Engineers of next generation embedded systems heavily rely on virtual platforms as central tools in their design process. Yet, the ever increasing HW/SW complexity degrades the simulation performance of those platforms and threatens their viability as design tools. With multi-core workstations today being widely available, the transition towards parallel simulation technologies seems obvious. Recently published parallel SystemC simulators use time-decoupling to achieve high simulation performance on modern SMP machines. However, those simulators have to identify all cross-thread communication ahead of time. This work presents an approach how to overcome this limitation and to enable time-decoupled simulation for mainstream SystemC simulators, achieving a speedup of up to 3.4× on a quad-core host.
下一代嵌入式系统的工程师在设计过程中严重依赖虚拟平台作为核心工具。然而,不断增加的硬件/软件复杂性降低了这些平台的模拟性能,并威胁到它们作为设计工具的可行性。随着多核工作站的广泛使用,向并行仿真技术的过渡似乎是显而易见的。最近发布的并行SystemC模拟器使用时间解耦来实现现代SMP机器的高仿真性能。然而,这些模拟器必须提前识别所有跨线程通信。这项工作提出了一种克服这一限制的方法,并为主流SystemC模拟器实现了时间解耦仿真,在四核主机上实现了高达3.4倍的加速。
{"title":"Parallel SystemC simulation for ESL design using flexible time decoupling","authors":"Jan Weinstock, R. Leupers, G. Ascheid","doi":"10.1109/SAMOS.2015.7363702","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363702","url":null,"abstract":"Engineers of next generation embedded systems heavily rely on virtual platforms as central tools in their design process. Yet, the ever increasing HW/SW complexity degrades the simulation performance of those platforms and threatens their viability as design tools. With multi-core workstations today being widely available, the transition towards parallel simulation technologies seems obvious. Recently published parallel SystemC simulators use time-decoupling to achieve high simulation performance on modern SMP machines. However, those simulators have to identify all cross-thread communication ahead of time. This work presents an approach how to overcome this limitation and to enable time-decoupled simulation for mainstream SystemC simulators, achieving a speedup of up to 3.4× on a quad-core host.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Decentralized diagnosis of permanent faults in automotive E/E architectures 汽车E/E体系结构永久故障的分散诊断
Peter Waszecki, M. Lukasiewycz, S. Chakraborty
This paper presents a novel decentralized approach for the diagnosis of permanent faults in automotive Electrical and Electronic (E/E) architectures. Both, the safety-critical real-time requirements and the distributed nature of these systems make fault tolerance in general and fault diagnosis in particular a crucial and challenging issue. At the same time, high unit numbers in manufacturing add cost efficiency as an important criterion during system design, which is conflicting with the use of often expensive explicit fault diagnosis hardware. To address these challenges, we propose a diagnosis framework that consists of two stages. In the first diagnosis determination stage, potential fault scenarios, such as defective Electronic Control Units (ECUs), are investigated to obtain a set of diagnosis functions. Specific diagnosis functions are used for each component in the network at runtime to determine whether a certain fault scenario is present. In the second diagnosis optimization stage, an optimization of diagnosis functions is proposed to determine trade-offs between diagnosis times and the number of monitored message streams. Experimental results based on 100 synthetic test cases give evidence of the feasibility and efficiency of the presented framework. Finally, an automotive case study demonstrates the practicability and details of our fault diagnosis approach.
本文提出了一种新的分散的汽车电气和电子(E/E)结构永久故障诊断方法。安全关键的实时需求和这些系统的分布式特性使得容错,特别是故障诊断成为一个至关重要和具有挑战性的问题。同时,制造中的高单元数量增加了成本效率作为系统设计的重要标准,这与通常昂贵的显式故障诊断硬件的使用相冲突。为了应对这些挑战,我们提出了一个由两个阶段组成的诊断框架。在第一诊断确定阶段,研究潜在的故障场景,如有缺陷的电子控制单元(ecu),以获得一套诊断功能。在运行时对网络中的每个组件使用特定的诊断功能,以确定是否存在特定的故障场景。在第二个诊断优化阶段,提出了诊断功能的优化,以确定诊断时间和监控消息流数量之间的权衡。基于100个综合测试用例的实验结果证明了该框架的可行性和有效性。最后,以汽车为例验证了故障诊断方法的实用性和细节性。
{"title":"Decentralized diagnosis of permanent faults in automotive E/E architectures","authors":"Peter Waszecki, M. Lukasiewycz, S. Chakraborty","doi":"10.1109/SAMOS.2015.7363675","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363675","url":null,"abstract":"This paper presents a novel decentralized approach for the diagnosis of permanent faults in automotive Electrical and Electronic (E/E) architectures. Both, the safety-critical real-time requirements and the distributed nature of these systems make fault tolerance in general and fault diagnosis in particular a crucial and challenging issue. At the same time, high unit numbers in manufacturing add cost efficiency as an important criterion during system design, which is conflicting with the use of often expensive explicit fault diagnosis hardware. To address these challenges, we propose a diagnosis framework that consists of two stages. In the first diagnosis determination stage, potential fault scenarios, such as defective Electronic Control Units (ECUs), are investigated to obtain a set of diagnosis functions. Specific diagnosis functions are used for each component in the network at runtime to determine whether a certain fault scenario is present. In the second diagnosis optimization stage, an optimization of diagnosis functions is proposed to determine trade-offs between diagnosis times and the number of monitored message streams. Experimental results based on 100 synthetic test cases give evidence of the feasibility and efficiency of the presented framework. Finally, an automotive case study demonstrates the practicability and details of our fault diagnosis approach.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132019936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High-level synthesizable dataflow MapReduce accelerator for FPGA-coupled data centers 用于fpga耦合数据中心的高级可合成数据流MapReduce加速器
D. Diamantopoulos, C. Kachris
Manipulating big-data entries of emerging server workloads requires a design paradigm shift towards more aggressive system-level architecture solutions. From software perspective, the MapReduce framework is a prominent parallel data processing tool as the volume of data to analyze grows rapidly. FPGAs can be used to accelerate the processing of data and reduce significantly the power consumption. However, FPGAs have not been deployed in data centers due to the high programming complexity of hardware. In this paper we present HLSMapReduceFlow, i.e. a novel reconfigurable MapReduce accelerator that can be scaled-up to data centers and it can speedup the processing of Map computation kernels, while promising minimum energy footprint and high programming efficiency due to the use of HLS. We propose the complete decoupling of MapReduce's tasks data-paths to distinct buses, accessed from individual processing engines. Such a dataflow approach implies a holistic C/C++ to RTL domain-level MapReduce transition. In this work, we further extent HLS tools, with systematic source-to-source code annotation of HLS optimization directives, by adding as a state-of-art system-level implementation toolflow. The proposed architecture is implemented, mapped and evaluated to a Virtex-7 FPGA and shows that the proposed scheme can achieve up to 4.3× overall throughput improvement in MapReduce applications, while offering two orders of magnitude power/energy improvements compared to a high-end multi-core processor.
处理新兴服务器工作负载的大数据条目需要设计范式转向更积极的系统级架构解决方案。从软件的角度来看,随着需要分析的数据量的快速增长,MapReduce框架是一个突出的并行数据处理工具。使用fpga可以加速数据处理并显著降低功耗。然而,由于硬件编程的高复杂性,fpga尚未部署在数据中心中。在本文中,我们提出了一种新的可重构MapReduce加速器HLSMapReduceFlow,它可以扩展到数据中心,可以加速地图计算内核的处理,同时由于使用了HLS,它保证了最小的能量足迹和高的编程效率。我们建议将MapReduce的任务数据路径完全解耦到不同的总线,从各个处理引擎访问。这样的数据流方法意味着从C/ c++到RTL域级MapReduce的整体转换。在这项工作中,我们进一步扩展了HLS工具,通过添加最先进的系统级实现工具流,对HLS优化指令进行系统的源代码到源代码注释。所提出的架构被实现、映射和评估到一个Virtex-7 FPGA上,并表明所提出的方案可以在MapReduce应用程序中实现高达4.3倍的总体吞吐量改进,同时与高端多核处理器相比,提供两个数量级的功率/能量改进。
{"title":"High-level synthesizable dataflow MapReduce accelerator for FPGA-coupled data centers","authors":"D. Diamantopoulos, C. Kachris","doi":"10.1109/SAMOS.2015.7363656","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363656","url":null,"abstract":"Manipulating big-data entries of emerging server workloads requires a design paradigm shift towards more aggressive system-level architecture solutions. From software perspective, the MapReduce framework is a prominent parallel data processing tool as the volume of data to analyze grows rapidly. FPGAs can be used to accelerate the processing of data and reduce significantly the power consumption. However, FPGAs have not been deployed in data centers due to the high programming complexity of hardware. In this paper we present HLSMapReduceFlow, i.e. a novel reconfigurable MapReduce accelerator that can be scaled-up to data centers and it can speedup the processing of Map computation kernels, while promising minimum energy footprint and high programming efficiency due to the use of HLS. We propose the complete decoupling of MapReduce's tasks data-paths to distinct buses, accessed from individual processing engines. Such a dataflow approach implies a holistic C/C++ to RTL domain-level MapReduce transition. In this work, we further extent HLS tools, with systematic source-to-source code annotation of HLS optimization directives, by adding as a state-of-art system-level implementation toolflow. The proposed architecture is implemented, mapped and evaluated to a Virtex-7 FPGA and shows that the proposed scheme can achieve up to 4.3× overall throughput improvement in MapReduce applications, while offering two orders of magnitude power/energy improvements compared to a high-end multi-core processor.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128638142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Parallel program = operator + schedule + parallel data structure 并行程序=运算符+调度+并行数据结构
K. Pingali
Summary form only given. Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. In this talk, I will argue that these problems arise largely from the computation-centric abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of unitary actions on data structures. This data-centric view of parallel algorithms shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation and partitioning algorithms, graph analytics, and machine learning applications. Binding time considerations provide a unification of parallelization techniques ranging from static parallelization to speculative parallelization. We have built a system called Galois, based on these ideas, for exploiting amorphous data-parallelism on multicores and GPUs. I will present experimental results from our group as well as from other groups that are using the Galois system.
只提供摘要形式。多核和多核处理器现在无处不在,但并行编程仍然像30-40年前一样困难。在这次演讲中,我将指出,这些问题主要来自于我们目前用来思考并行性的以计算为中心的抽象。在它们的位置上,我将提出一种新的以数据为中心的并行编程基础,称为算子公式,其中算法是根据数据结构上的统一动作来描述的。这种以数据为中心的并行算法视图表明,即使在复杂的不规则图形应用程序(如网格生成和划分算法、图形分析和机器学习应用程序)中,称为无定形数据并行的广义数据并行形式也无处不在。绑定时间方面的考虑提供了从静态并行到推测并行的并行化技术的统一。基于这些想法,我们建立了一个名为Galois的系统,用于在多核和gpu上开发无定形数据并行性。我将展示我们小组以及其他使用伽罗瓦系统的小组的实验结果。
{"title":"Parallel program = operator + schedule + parallel data structure","authors":"K. Pingali","doi":"10.1109/SAMOS.2015.7363652","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363652","url":null,"abstract":"Summary form only given. Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. In this talk, I will argue that these problems arise largely from the computation-centric abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of unitary actions on data structures. This data-centric view of parallel algorithms shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation and partitioning algorithms, graph analytics, and machine learning applications. Binding time considerations provide a unification of parallelization techniques ranging from static parallelization to speculative parallelization. We have built a system called Galois, based on these ideas, for exploiting amorphous data-parallelism on multicores and GPUs. I will present experimental results from our group as well as from other groups that are using the Galois system.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115428147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current analysis approaches and performance needs for whole slide image processing in breast cancer diagnostics 乳腺癌诊断中全幻灯片图像处理的现有分析方法及性能要求
I. Pöllänen, Billy Braithwaite, Keijo Haataja, Tiia Ikonen, Pekka J. Toivanen
In this paper, the current approaches and performance needs for whole slide image (WSI) analysis processing in breast cancer diagnostics are discussed. WSIs provide high resolution digital image data from the patient's diseased tissue. Digital whole slide images are typically very large and contain a high amount of information. Digitizing tissue specimen into the form of digital images allows the development and application of computational analysis algorithms. Biological tissues are complex with variance in tissue structures between healthy individuals as well as between patients with the same disease. Furthermore, the tissue preparation and digitization usually generates a lot of artifacts and more complexity, which causes classification challenges. This variance and also the large size of the images make creating an accurate and reliable automated breast cancer image analysis a challenge. In the ALMARVI project we aim at generating and implementing efficient histopathological image analysis algorithms in our breast cancer analysis scheme. This paper focuses on discussing relevant information concerning histopathological breast cancer diagnosis, and could also be considered as an introduction to the concept of WSI analysis to non-experts. Since the WSI sizes are very large (up to 40 GB with no compression) there are challenges on the computational analysis which requires computationally efficient tools and suitable approaches to relieve the problems caused by the large size of the images.
本文对目前乳腺癌诊断中全幻灯片图像分析处理的方法及性能要求进行了综述。wsi提供来自患者病变组织的高分辨率数字图像数据。数字整张幻灯片图像通常非常大,包含大量的信息。数字化组织标本成数字图像的形式允许计算分析算法的发展和应用。生物组织是复杂的,健康个体之间以及同一疾病患者之间的组织结构存在差异。此外,组织制备和数字化通常会产生大量的伪影,并且更加复杂,这给分类带来了挑战。这种差异以及图像的大尺寸使得创建准确可靠的自动化乳腺癌图像分析成为一项挑战。在ALMARVI项目中,我们的目标是在我们的乳腺癌分析方案中生成和实施有效的组织病理学图像分析算法。本文重点讨论组织病理学乳腺癌诊断的相关信息,也可视为向非专家介绍WSI分析的概念。由于WSI大小非常大(未压缩时可达40gb),因此对计算分析提出了挑战,这需要计算效率高的工具和合适的方法来缓解图像大尺寸带来的问题。
{"title":"Current analysis approaches and performance needs for whole slide image processing in breast cancer diagnostics","authors":"I. Pöllänen, Billy Braithwaite, Keijo Haataja, Tiia Ikonen, Pekka J. Toivanen","doi":"10.1109/SAMOS.2015.7363692","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363692","url":null,"abstract":"In this paper, the current approaches and performance needs for whole slide image (WSI) analysis processing in breast cancer diagnostics are discussed. WSIs provide high resolution digital image data from the patient's diseased tissue. Digital whole slide images are typically very large and contain a high amount of information. Digitizing tissue specimen into the form of digital images allows the development and application of computational analysis algorithms. Biological tissues are complex with variance in tissue structures between healthy individuals as well as between patients with the same disease. Furthermore, the tissue preparation and digitization usually generates a lot of artifacts and more complexity, which causes classification challenges. This variance and also the large size of the images make creating an accurate and reliable automated breast cancer image analysis a challenge. In the ALMARVI project we aim at generating and implementing efficient histopathological image analysis algorithms in our breast cancer analysis scheme. This paper focuses on discussing relevant information concerning histopathological breast cancer diagnosis, and could also be considered as an introduction to the concept of WSI analysis to non-experts. Since the WSI sizes are very large (up to 40 GB with no compression) there are challenges on the computational analysis which requires computationally efficient tools and suitable approaches to relieve the problems caused by the large size of the images.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A scriptable, standards-compliant reporting and logging extension for SystemC 一个可编写脚本的,符合标准的SystemC报告和日志扩展
J. Wagner, Rolf Meyer, R. Buchty, Mladen Berekovic
The shift towards more and more complex System-on-Chips fosters high-level modeling (HLM) of new systems in order to provide required time-to-first-virtual-prototype and adequate simulation speed. Using HLM furthermore allows running exhaustive simulations are, enabling the developer to gain a plethora of information from the system during simulation. Reporting, logging, analyzing, and interpreting this vast amount of data requires a potent report and logging system. This paper proposes such a solution: the presented system is build on the foundations of SystemC's sc_report class and maintains full compatibility with it. To provide extensive search and analysis features, the proposed solution features Python-based scripting capabilities and supports attached key-value pairs to each report message. Using highly efficient black- and whitelisting filters empowers the user to reported events during runtime and suppresses all irrelevant reports in order to achieve fast simulation. Filter rules are fully scriptable and interpreted during simulation runtime, allowing dynamic adaption of the rules based on events occurred. All proposed mechanisms were evaluated under real-world conditions in an existing virtual prototype platform, including a report database backend, enabling easy analysis of the generated data.
向越来越复杂的片上系统的转变促进了新系统的高级建模(HLM),以提供所需的时间到第一个虚拟原型和足够的仿真速度。使用HLM进一步允许运行详尽的模拟,使开发人员能够在模拟期间从系统获得大量信息。报告、记录、分析和解释这些大量的数据需要一个强大的报告和记录系统。本文提出了这样一种解决方案:本系统建立在SystemC的sc_report类的基础上,并保持与之完全兼容。为了提供广泛的搜索和分析功能,建议的解决方案提供了基于python的脚本功能,并支持将键值对附加到每个报告消息。使用高效的黑名单和白名单过滤器使用户能够在运行时报告事件,并抑制所有不相关的报告,以实现快速模拟。过滤器规则完全可编写脚本,并在模拟运行时期间进行解释,从而允许根据发生的事件动态调整规则。所有提出的机制都在现有虚拟样机平台的实际条件下进行了评估,包括报告数据库后端,从而可以轻松分析生成的数据。
{"title":"A scriptable, standards-compliant reporting and logging extension for SystemC","authors":"J. Wagner, Rolf Meyer, R. Buchty, Mladen Berekovic","doi":"10.1109/SAMOS.2015.7363700","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363700","url":null,"abstract":"The shift towards more and more complex System-on-Chips fosters high-level modeling (HLM) of new systems in order to provide required time-to-first-virtual-prototype and adequate simulation speed. Using HLM furthermore allows running exhaustive simulations are, enabling the developer to gain a plethora of information from the system during simulation. Reporting, logging, analyzing, and interpreting this vast amount of data requires a potent report and logging system. This paper proposes such a solution: the presented system is build on the foundations of SystemC's sc_report class and maintains full compatibility with it. To provide extensive search and analysis features, the proposed solution features Python-based scripting capabilities and supports attached key-value pairs to each report message. Using highly efficient black- and whitelisting filters empowers the user to reported events during runtime and suppresses all irrelevant reports in order to achieve fast simulation. Filter rules are fully scriptable and interpreted during simulation runtime, allowing dynamic adaption of the rules based on events occurred. All proposed mechanisms were evaluated under real-world conditions in an existing virtual prototype platform, including a report database backend, enabling easy analysis of the generated data.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120944966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1