首页 > 最新文献

Proceedings of the Computing Frontiers Conference最新文献

英文 中文
Trading Fault Tolerance for Performance in AN Encoding AN编码中交易容错性能研究
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075565
Norman A. Rink, J. Castrillón
Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.
不断增加的暂态硬件故障率给计算应用带来了问题。当前和未来的趋势可能会加剧这一问题。当程序执行过程中发生短暂故障时,输出中的数据可能会损坏。输出损坏的严重程度取决于应用程序域。因此,不同的应用程序需要不同级别的容错。我们提出了一个基于llvm的an编码器,它可以在可配置的严格级别上为程序配备错误检测机制。基于我们的编码器,分析了容错性和运行时开销之间的权衡。通过适当配置我们的AN编码器,可以将运行时开销从9.9倍降低到2.1倍。与此同时,CPU硬件故障导致数据静默损坏的概率从0.007上升到0.022以上。内存故障的相同概率从0.009增加到0.032以上。通过将AN编码器的不同配置应用于算术表达式解释器的组件,进一步证明了对容错级别进行细粒度控制是有益的。
{"title":"Trading Fault Tolerance for Performance in AN Encoding","authors":"Norman A. Rink, J. Castrillón","doi":"10.1145/3075564.3075565","DOIUrl":"https://doi.org/10.1145/3075564.3075565","url":null,"abstract":"Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Private inter-network routing for Wireless Sensor Networks and the Internet of Things 无线传感器网络和物联网的专用网络间路由
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3079068
P. Palmieri, L. Calderoni, D. Maio
As computing becomes increasingly pervasive, different heterogeneous networks are connected and integrated. This is especially true in the Internet of Things (IoT) and Wireless Sensor Networks (WSN) settings. However, as different networks managed by different parties and with different security requirements are integrated, security becomes a primary concern. WSN nodes, in particular, are often deployed "in the open", where a potential attacker can gain physical access to the device. As nodes can be deployed in hostile or difficult scenarios, such as military battlefields or disaster recovery settings, it is crucial to avoid escalation from successful attacks on a single node to the whole network, and from there to other connected networks. It is therefore crucial to secure the communication within the WSN, and in particular, maintain context information, such as the network topology and the location and identity of base stations (which collect data gathered by the sensors) private. In this paper, we propose a protocol achieving anonymous routing between different interconnected IoT or WSN networks, based on the Spatial Bloom Filter (SBF) data structure. The protocol enables communications between the nodes through the use of anonymous identifiers, thus hiding the location and identity of the nodes within the network. The proposed routing strategy preserves context privacy, and prevents adversaries from learning the network structure and topology, as routing information is encrypted using a homomorphic encryption scheme, and computed only in the encrypted domain. Preserving context privacy is crucial in preventing adversaries from gaining valuable network information from a successful attacks on a single node of the network, and reduces the potential for attack escalation.
随着计算的日益普及,不同的异构网络被连接和集成。在物联网(IoT)和无线传感器网络(WSN)设置中尤其如此。然而,由于不同的网络由不同的管理方管理,具有不同的安全需求,因此安全问题成为首要问题。特别是WSN节点,通常部署在“开放”的地方,潜在的攻击者可以获得对设备的物理访问。由于节点可以部署在敌对或困难的场景中,例如军事战场或灾难恢复设置,因此必须避免从单个节点上的成功攻击升级到整个网络,并从那里升级到其他连接的网络。因此,确保WSN内的通信安全至关重要,特别是维护上下文信息,例如网络拓扑和基站的位置和身份(收集传感器收集的数据)的私密性。在本文中,我们提出了一种基于空间布隆滤波器(SBF)数据结构的协议,实现不同互联物联网或WSN网络之间的匿名路由。该协议通过使用匿名标识符实现节点之间的通信,从而隐藏了网络中节点的位置和身份。由于路由信息使用同态加密方案进行加密,并且仅在加密域中计算,因此所提出的路由策略保留了上下文隐私,并防止攻击者了解网络结构和拓扑。保护上下文隐私对于防止攻击者从对网络单个节点的成功攻击中获取有价值的网络信息至关重要,并且可以减少攻击升级的可能性。
{"title":"Private inter-network routing for Wireless Sensor Networks and the Internet of Things","authors":"P. Palmieri, L. Calderoni, D. Maio","doi":"10.1145/3075564.3079068","DOIUrl":"https://doi.org/10.1145/3075564.3079068","url":null,"abstract":"As computing becomes increasingly pervasive, different heterogeneous networks are connected and integrated. This is especially true in the Internet of Things (IoT) and Wireless Sensor Networks (WSN) settings. However, as different networks managed by different parties and with different security requirements are integrated, security becomes a primary concern. WSN nodes, in particular, are often deployed \"in the open\", where a potential attacker can gain physical access to the device. As nodes can be deployed in hostile or difficult scenarios, such as military battlefields or disaster recovery settings, it is crucial to avoid escalation from successful attacks on a single node to the whole network, and from there to other connected networks. It is therefore crucial to secure the communication within the WSN, and in particular, maintain context information, such as the network topology and the location and identity of base stations (which collect data gathered by the sensors) private. In this paper, we propose a protocol achieving anonymous routing between different interconnected IoT or WSN networks, based on the Spatial Bloom Filter (SBF) data structure. The protocol enables communications between the nodes through the use of anonymous identifiers, thus hiding the location and identity of the nodes within the network. The proposed routing strategy preserves context privacy, and prevents adversaries from learning the network structure and topology, as routing information is encrypted using a homomorphic encryption scheme, and computed only in the encrypted domain. Preserving context privacy is crucial in preventing adversaries from gaining valuable network information from a successful attacks on a single node of the network, and reduces the potential for attack escalation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114064139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Brain-Inspired Memory Architecture for Sparse Nonlocal and Unstructured Workloads 稀疏非局部和非结构化工作负载的脑启发内存架构
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075597
Y. Katayama
This paper presents a brain-inspired von Neumann memory architecture for sparse, nonlocal, and unstructured workloads. Memory at each node contains selectable windows for optimistic shared access. A low-latency multiple access control for various policies is provided inside the local memory controller, using conditional deferred queuing with shared address list entries and associated lock bits. When combined with a memory-side cache, the proposed architecture is expected to transparently accelerate and flexibly scale the performance of sparse, nonlocal, and unstructured workloads by better regulating the data-access pipelining across local and remote memory requests.
本文提出了一种针对稀疏、非局部和非结构化工作负载的受大脑启发的冯·诺伊曼内存架构。每个节点上的内存包含可选择的窗口,用于乐观共享访问。本地内存控制器内部为各种策略提供了低延迟的多重访问控制,使用具有共享地址列表项和相关锁位的条件延迟队列。当与内存端缓存结合使用时,通过更好地调节跨本地和远程内存请求的数据访问管道,所提出的体系结构有望透明地加速和灵活地扩展稀疏、非本地和非结构化工作负载的性能。
{"title":"Brain-Inspired Memory Architecture for Sparse Nonlocal and Unstructured Workloads","authors":"Y. Katayama","doi":"10.1145/3075564.3075597","DOIUrl":"https://doi.org/10.1145/3075564.3075597","url":null,"abstract":"This paper presents a brain-inspired von Neumann memory architecture for sparse, nonlocal, and unstructured workloads. Memory at each node contains selectable windows for optimistic shared access. A low-latency multiple access control for various policies is provided inside the local memory controller, using conditional deferred queuing with shared address list entries and associated lock bits. When combined with a memory-side cache, the proposed architecture is expected to transparently accelerate and flexibly scale the performance of sparse, nonlocal, and unstructured workloads by better regulating the data-access pipelining across local and remote memory requests.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of S-boxes Defined with Cellular Automata Rules 用元胞自动机规则定义的s盒设计
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3079069
S. Picek, L. Mariot, Bohan Yang, D. Jakobović, N. Mentens
The aim of this paper is to find cellular automata (CA) rules that are used to describe S-boxes with good cryptographic properties and low implementation cost. Up to now, CA rules have been used in several ciphers to define an S-box, but in all those ciphers, the same CA rule is used. This CA rule is best known as the one defining the Keccak χ transformation. Since there exists no straightforward method for constructing CA rules that define S-boxes with good cryptographic/implementation properties, we use a special kind of heuristics for that -- Genetic Programming (GP). Although it is not possible to theoretically prove the efficiency of such a method, our experimental results show that GP is able to find a large number of CA rules that define good S-boxes in a relatively easy way. We focus on the 4 x 4 and 5 x 5 sizes and we implement the S-boxes in hardware to examine implementation properties like latency, area, and power. Particularly interesting is the internal encoding of the solutions in the considered heuristics using combinatorial circuits; this makes it easy to approximate S-box implementation properties like latency and area a priori.
本文的目的是寻找用于描述具有良好加密特性和低实现成本的s -box的元胞自动机(CA)规则。到目前为止,已经在几个密码中使用了CA规则来定义S-box,但在所有这些密码中都使用了相同的CA规则。这个CA规则以定义Keccak χ变换而闻名。由于没有直接的方法来构建定义具有良好加密/实现属性的s盒的CA规则,因此我们使用了一种特殊的启发式方法——遗传规划(GP)。虽然无法从理论上证明这种方法的效率,但我们的实验结果表明,GP能够以相对简单的方式找到大量定义好的s -box的CA规则。我们专注于4 × 4和5 × 5尺寸,并在硬件中实现s盒,以检查延迟、面积和功耗等实现属性。特别有趣的是在考虑的启发式中使用组合电路的解决方案的内部编码;这使得近似s盒实现属性(如延迟和先验面积)变得容易。
{"title":"Design of S-boxes Defined with Cellular Automata Rules","authors":"S. Picek, L. Mariot, Bohan Yang, D. Jakobović, N. Mentens","doi":"10.1145/3075564.3079069","DOIUrl":"https://doi.org/10.1145/3075564.3079069","url":null,"abstract":"The aim of this paper is to find cellular automata (CA) rules that are used to describe S-boxes with good cryptographic properties and low implementation cost. Up to now, CA rules have been used in several ciphers to define an S-box, but in all those ciphers, the same CA rule is used. This CA rule is best known as the one defining the Keccak χ transformation. Since there exists no straightforward method for constructing CA rules that define S-boxes with good cryptographic/implementation properties, we use a special kind of heuristics for that -- Genetic Programming (GP). Although it is not possible to theoretically prove the efficiency of such a method, our experimental results show that GP is able to find a large number of CA rules that define good S-boxes in a relatively easy way. We focus on the 4 x 4 and 5 x 5 sizes and we implement the S-boxes in hardware to examine implementation properties like latency, area, and power. Particularly interesting is the internal encoding of the solutions in the considered heuristics using combinatorial circuits; this makes it easy to approximate S-box implementation properties like latency and area a priori.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115130621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Exploring the Performance Limits of Out-of-order Commit 探索乱序提交的性能限制
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075581
M. Alipour, Trevor E. Carlson, S. Kaxiras
Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.
乱序执行对于高性能、通用计算是必不可少的,因为它可以找到并执行有用的工作,而不是拖延。然而,它受到可见的顺序、原子指令执行需求的限制——换句话说,按顺序提交指令。虽然按顺序提交有其优点,例如提供精确的中断和避免内存一致性模型的复杂性,但它要求内核保持资源(重新排序缓冲区项、加载/存储队列项、寄存器),直到它们按程序顺序释放。相反,乱序提交更早地释放资源,用更少的传统硬件资源获得更高的性能。然而,乱序提交在正确性方面受到Bell和Lipasti工作中描述的条件的限制。在本文中,我们从不同的角度重新审视乱序提交,不是通过提出另一种硬件技术,而是通过逐一检查这些条件,并结合它们在非推测性和推测性乱序提交方面的潜在性能优势。虽然正确处理所有乱序提交条件的恢复目前需要复杂的跟踪和昂贵的检查点,但这项工作旨在演示使用oracle实现选择性、推测性乱序提交的潜力,而不需要推测性回滚成本。我们了解到:a)无序提交的侵略性变体有很大的未开发潜力;B)对于平衡设计来说,优化提交深度或无序提交的搜索距离是很重要的:较小的岩心可以从较短的深度中受益,而较大的岩心继续从积极的参数中受益;C)关注乱序提交条件的子集可以导致高效的实现;D)乱序提交的好处随着内存延迟的增加而增加,并且可以很好地与预取结合使用以继续提高性能。
{"title":"Exploring the Performance Limits of Out-of-order Commit","authors":"M. Alipour, Trevor E. Carlson, S. Kaxiras","doi":"10.1145/3075564.3075581","DOIUrl":"https://doi.org/10.1145/3075564.3075581","url":null,"abstract":"Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124889400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Designing and Programming the Configurable Cloud 可配置云的设计与编程
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3095083
Andrew Putnam
Process technology improvements have historically allowed an effortless expansion of the capacity and capabilities of computers and the cloud with few changes to the underlying software or programming model. However, the end of Dennard Scaling means that performance and efficiency gains will rely on the customization of the hardware for each application. Yet customizing hardware for each application runs contrary to the trend to moving more and more applications to a common hardware infrastructure the Cloud. Microsofts Catapult project has brought the power and performance of FPGA-based reconfigurable computing to hyperscale datacenters, accelerating major production cloud applications such as Bing web search and Microsoft Azure, and enabling a new generation of machine learning and artificial intelligence applications. These diverse workloads are accelerated using the same underlying hardware by using highly programmable silicon. The presence of ubiquitous and programmable silicon in the datacenter enables a new era of hardware/software co-design across a wide variety of workloads, opening up affordable and efficient performance across an enormous set of workloads. Catapult is now deployed in nearly every new server across the more than a million machines that make up the Microsoft hyperscale cloud. In this talk, I will describe the next generation of the Catapult configurable cloud architecture, and the tools and techniques that have made Catapult successful to date. I will discuss areas where traditional hardware and software development flows fall short, the domains where this programmable hardware holds the most potential, and how this technology can enable new computing frontiers.
从历史上看,过程技术的改进使计算机和云的容量和功能得以轻松扩展,而对底层软件或编程模型的更改很少。然而,Dennard Scaling的终结意味着性能和效率的提升将依赖于每个应用程序的硬件定制。然而,为每个应用程序定制硬件与将越来越多的应用程序迁移到公共硬件基础设施(云)的趋势背道而驰。微软Catapult项目为超大规模数据中心带来了基于fpga的可重构计算的功能和性能,加速了必应网络搜索和微软Azure等主要生产云应用程序,并实现了新一代机器学习和人工智能应用程序。通过使用高度可编程的芯片,这些不同的工作负载使用相同的底层硬件进行加速。数据中心中无处不在的可编程芯片的存在,开启了一个跨各种工作负载的硬件/软件协同设计的新时代,在大量工作负载中提供经济实惠且高效的性能。如今,在组成微软超大规模云的100多万台机器上,几乎每台新服务器上都部署了Catapult。在这次演讲中,我将介绍下一代Catapult可配置云架构,以及迄今为止使Catapult取得成功的工具和技术。我将讨论传统硬件和软件开发流程不足的领域,可编程硬件最有潜力的领域,以及这项技术如何能够实现新的计算前沿。
{"title":"Designing and Programming the Configurable Cloud","authors":"Andrew Putnam","doi":"10.1145/3075564.3095083","DOIUrl":"https://doi.org/10.1145/3075564.3095083","url":null,"abstract":"Process technology improvements have historically allowed an effortless expansion of the capacity and capabilities of computers and the cloud with few changes to the underlying software or programming model. However, the end of Dennard Scaling means that performance and efficiency gains will rely on the customization of the hardware for each application. Yet customizing hardware for each application runs contrary to the trend to moving more and more applications to a common hardware infrastructure the Cloud. Microsofts Catapult project has brought the power and performance of FPGA-based reconfigurable computing to hyperscale datacenters, accelerating major production cloud applications such as Bing web search and Microsoft Azure, and enabling a new generation of machine learning and artificial intelligence applications. These diverse workloads are accelerated using the same underlying hardware by using highly programmable silicon. The presence of ubiquitous and programmable silicon in the datacenter enables a new era of hardware/software co-design across a wide variety of workloads, opening up affordable and efficient performance across an enormous set of workloads. Catapult is now deployed in nearly every new server across the more than a million machines that make up the Microsoft hyperscale cloud. In this talk, I will describe the next generation of the Catapult configurable cloud architecture, and the tools and techniques that have made Catapult successful to date. I will discuss areas where traditional hardware and software development flows fall short, the domains where this programmable hardware holds the most potential, and how this technology can enable new computing frontiers.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129778573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recommending Resources to Cloud Applications based on Custom Templates Composition 基于自定义模板组合向云应用程序推荐资源
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075582
Ronny Bazan Antequera, P. Calyam, A. Chandrashekara, Shivoam Malhotra
Emerging interdisciplinary data-intensive applications in science and engineering fields (e.g. bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. The applications requirements warrant intelligent resource 'abstractions' coupled with 'reusable' approaches to save time and effort in deploying cyberinfrastructure (CI). In this paper, we present a novel 'custom templates' management middleware to overcome this scarcity of resources by use of advanced CI management technologies/protocols to on-demand deploy data-intensive applications across distributed/federated cloud resources. Our middleware comprises of a novel resource recommendation scheme that abstracts user requirements of data-intensive applications and matches them with federated cloud resources using custom templates in a catalog. We evaluate the accuracy of our recommendation scheme in two experiment scenarios. The experiments involve simulating a series of user interactions with diverse applications requirements, also feature a real-world data-intensive application case study. Our experiment results show that our scheme improves the resource recommendation accuracy by up to 21%, compared to the existing schemes.
新兴的跨学科数据密集型应用在科学和工程领域(如生物信息学,网络制造)需要使用高性能的计算资源。然而,由于大量的前期成本,数据密集型应用程序的本地资源通常呈现有限的容量和可用性。应用程序的需求保证了智能资源“抽象”与“可重用”方法的结合,以节省部署网络基础设施(CI)的时间和精力。在本文中,我们提出了一种新的“自定义模板”管理中间件,通过使用先进的CI管理技术/协议,在分布式/联合云资源上按需部署数据密集型应用程序,从而克服了资源的稀缺性。我们的中间件包含一个新颖的资源推荐方案,该方案抽象数据密集型应用程序的用户需求,并使用目录中的自定义模板将其与联邦云资源相匹配。我们在两个实验场景中评估了我们的推荐方案的准确性。实验包括模拟一系列具有不同应用程序需求的用户交互,还包括现实世界数据密集型应用程序案例研究。实验结果表明,与现有方案相比,我们的方案将资源推荐准确率提高了21%。
{"title":"Recommending Resources to Cloud Applications based on Custom Templates Composition","authors":"Ronny Bazan Antequera, P. Calyam, A. Chandrashekara, Shivoam Malhotra","doi":"10.1145/3075564.3075582","DOIUrl":"https://doi.org/10.1145/3075564.3075582","url":null,"abstract":"Emerging interdisciplinary data-intensive applications in science and engineering fields (e.g. bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. The applications requirements warrant intelligent resource 'abstractions' coupled with 'reusable' approaches to save time and effort in deploying cyberinfrastructure (CI). In this paper, we present a novel 'custom templates' management middleware to overcome this scarcity of resources by use of advanced CI management technologies/protocols to on-demand deploy data-intensive applications across distributed/federated cloud resources. Our middleware comprises of a novel resource recommendation scheme that abstracts user requirements of data-intensive applications and matches them with federated cloud resources using custom templates in a catalog. We evaluate the accuracy of our recommendation scheme in two experiment scenarios. The experiments involve simulating a series of user interactions with diverse applications requirements, also feature a real-world data-intensive application case study. Our experiment results show that our scheme improves the resource recommendation accuracy by up to 21%, compared to the existing schemes.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126239564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators CAROL-FI:现代高性能计算并行加速器漏洞评估的高效故障注入工具
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075598
Daniel Oliveira, Vinicius Fratin, P. Navaux, I. Koren, P. Rech
Transient faults are a major problem for large scale HPC systems, and the mitigation of adverse fault effects need to be highly efficient as we approach exascale. We developed a fault injection tool (CAROL-FI) to identify the potential sources of adverse fault effects. With a deeper understanding of such effects, we provide useful insights to design efficient mitigation techniques, like selective hardening of critical portions of the code. We performed a fault injection campaign injecting more than 67,000 faults into an Intel Xeon Phi executing six representative HPC programs. We show that selective hardening can be successfully applied to DGEMM and Hotspot while LavaMD and NW may require a complete code hardening.
瞬态故障是大规模高性能计算系统的主要问题,当我们接近百亿亿级时,需要高效地缓解不利的故障影响。我们开发了一种故障注入工具(CAROL-FI)来识别不利故障影响的潜在来源。通过对这些影响的更深入了解,我们提供了有用的见解来设计有效的缓解技术,例如对代码的关键部分进行选择性强化。我们进行了一次故障注入活动,向执行六个代表性HPC程序的Intel Xeon Phi处理器注入了67,000多个故障。我们表明,选择性加固可以成功地应用于DGEMM和Hotspot,而LavaMD和NW可能需要完整的代码加固。
{"title":"CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators","authors":"Daniel Oliveira, Vinicius Fratin, P. Navaux, I. Koren, P. Rech","doi":"10.1145/3075564.3075598","DOIUrl":"https://doi.org/10.1145/3075564.3075598","url":null,"abstract":"Transient faults are a major problem for large scale HPC systems, and the mitigation of adverse fault effects need to be highly efficient as we approach exascale. We developed a fault injection tool (CAROL-FI) to identify the potential sources of adverse fault effects. With a deeper understanding of such effects, we provide useful insights to design efficient mitigation techniques, like selective hardening of critical portions of the code. We performed a fault injection campaign injecting more than 67,000 faults into an Intel Xeon Phi executing six representative HPC programs. We show that selective hardening can be successfully applied to DGEMM and Hotspot while LavaMD and NW may require a complete code hardening.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125210105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Towards Big Data Visualization for Monitoring and Diagnostics of High Volume Semiconductor Manufacturing 面向大批量半导体制造监控与诊断的大数据可视化
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078883
D. Gkorou, A. Ypma, G. Tsirogiannis, Manuel Giollo, Dag Sonntag, Geert Vinken, Richard van Haren, Robert Jan van Wijk, Jelle Nije, Tomoko Hoogenboom
In semiconductor manufacturing, continuous on-line monitoring prevents production stop and yield loss. The challenges towards this accomplishment are: 1) the complexity of lithography machines which are composed of hundreds of mechanical and optical components, 2) the high rate and volume data acquisition from different lithography and metrology machines, and 3) the scarcity of performance measurements due to their cost. This paper addresses these challenges by 1) visualizing and ranking the most relevant factors to a performance metric, 2) organizing efficiently Big Data from different sources and 3) predicting the performance with machine learning when measurements are lacking. Even though this project targets semiconductor manufacturing, its methodology is applicable to any case of monitoring complex systems, with many potentially interesting features, and imbalanced datasets.
在半导体制造中,连续在线监测可以防止生产停止和产量损失。实现这一目标的挑战是:1)光刻机的复杂性,它由数百个机械和光学部件组成;2)从不同的光刻和计量机器中获取高速率和大量数据;3)由于成本原因,性能测量的稀缺性。本文通过以下方法解决了这些挑战:1)可视化并将最相关的因素排序为性能指标;2)有效地组织来自不同来源的大数据;3)在缺乏测量时使用机器学习预测性能。尽管这个项目的目标是半导体制造业,但它的方法适用于监测复杂系统的任何情况,这些系统具有许多潜在的有趣特征和不平衡的数据集。
{"title":"Towards Big Data Visualization for Monitoring and Diagnostics of High Volume Semiconductor Manufacturing","authors":"D. Gkorou, A. Ypma, G. Tsirogiannis, Manuel Giollo, Dag Sonntag, Geert Vinken, Richard van Haren, Robert Jan van Wijk, Jelle Nije, Tomoko Hoogenboom","doi":"10.1145/3075564.3078883","DOIUrl":"https://doi.org/10.1145/3075564.3078883","url":null,"abstract":"In semiconductor manufacturing, continuous on-line monitoring prevents production stop and yield loss. The challenges towards this accomplishment are: 1) the complexity of lithography machines which are composed of hundreds of mechanical and optical components, 2) the high rate and volume data acquisition from different lithography and metrology machines, and 3) the scarcity of performance measurements due to their cost. This paper addresses these challenges by 1) visualizing and ranking the most relevant factors to a performance metric, 2) organizing efficiently Big Data from different sources and 3) predicting the performance with machine learning when measurements are lacking. Even though this project targets semiconductor manufacturing, its methodology is applicable to any case of monitoring complex systems, with many potentially interesting features, and imbalanced datasets.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On Learning the Energy Model of an MPSoC for Convex Optimization 基于凸优化的MPSoC能量模型学习
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078893
Erwan Nogues, D. Ménard, Alexandre Mercat, M. Pelcat
The energy efficiency of a Multiprocessor SoC (MPSoC) is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper proposes a methodology to learn an energy model from real power measurements. From this energy model, a convex optimization framework can determine the optimal energy efficient operating point in terms of frequency and number of active cores in an MPSoC. Experimental data are reported using a Samsung Exynos 5410 MPSoC. They show that a precise yet relatively simple model can be derived.
多处理器SoC (MPSoC)的能效通过动态电压和频率缩放(DVFS)和动态电源管理(DPM)等复杂硬件功能得到提高。本文提出了一种从实际功率测量中学习能量模型的方法。根据该能量模型,凸优化框架可以根据MPSoC中活动核的频率和数量确定最佳能效工作点。实验数据报告使用三星Exynos 5410 MPSoC。它们表明可以推导出一个精确但相对简单的模型。
{"title":"On Learning the Energy Model of an MPSoC for Convex Optimization","authors":"Erwan Nogues, D. Ménard, Alexandre Mercat, M. Pelcat","doi":"10.1145/3075564.3078893","DOIUrl":"https://doi.org/10.1145/3075564.3078893","url":null,"abstract":"The energy efficiency of a Multiprocessor SoC (MPSoC) is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper proposes a methodology to learn an energy model from real power measurements. From this energy model, a convex optimization framework can determine the optimal energy efficient operating point in terms of frequency and number of active cores in an MPSoC. Experimental data are reported using a Samsung Exynos 5410 MPSoC. They show that a precise yet relatively simple model can be derived.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Computing Frontiers Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1