首页 > 最新文献

Proceedings of the ACM International Conference on Computing Frontiers最新文献

英文 中文
CAOS: combined analysis with online sifting for dynamic compilation systems CAOS:动态编译系统的综合分析与在线筛选
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903151
Jie Fu, Guojie Jin, Longbing Zhang, Jian Wang
Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.
动态编译对虚拟机的性能影响很大。本文研究了动态编译的特点,揭示了动态编译系统优化的目标。根据这些目标,我们提出了一种新的动态编译调度算法,称为联机筛选结合分析(CAOS)。它由组合优先级分析模型和在线筛选机制组成。在调度过程中,采用组合优先级分析模型确定方法的优先级,以协调响应性与编译队列的平均延迟。通过执行在线筛选,可以进一步减少运行时开销,因为对性能没有什么好处的方法被筛选掉了。CAOS可以显著提高应用程序的启动性能。实验结果表明,CAOS的启动性能平均提升了14.0%,最高提升了55.1%。CAOS具有通用性强、易于实现的优点,可以应用于大多数动态编译系统。
{"title":"CAOS: combined analysis with online sifting for dynamic compilation systems","authors":"Jie Fu, Guojie Jin, Longbing Zhang, Jian Wang","doi":"10.1145/2903150.2903151","DOIUrl":"https://doi.org/10.1145/2903150.2903151","url":null,"abstract":"Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116598033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems ANTAREX的方法,自动调整和适应节能高性能计算系统
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903470
C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou
The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
ANTAREX项目旨在通过领域特定语言(DSL)表达应用程序的自适应性,并为绿色和异构高性能计算(HPC)系统进行运行时管理和自动调整应用程序,最高可达Exascale。DSL方法允许定义能效、性能和适应性策略,以及通过应用程序自动调优、资源和电源管理在运行时实施这些策略。通过从一个项目应用程序用例中提取的一个小应用程序,我们展示了通过DSL支持的方式对应用程序精度调优的一些初步探索。
{"title":"The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems","authors":"C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou","doi":"10.1145/2903150.2903470","DOIUrl":"https://doi.org/10.1145/2903150.2903470","url":null,"abstract":"The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129123004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Secure key-exchange protocol for implants using heartbeats 使用心跳的植入物安全密钥交换协议
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903165
R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis
The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.
最近提出了心脏搏动间隔(IPI),以促进使用患者自己的心跳作为信任来源的植入式医疗设备(imd)的密钥交换。虽然这种形式的密钥交换为IMD安全性带来了希望,但由于在相关工作中发现的简化方法,其可行性尚不完全清楚。例如,以前提出的协议在设计时没有考虑到每个IPI可用的有限随机性,或者忽略了与现实系统相关的方面,例如不完美的心跳检测或强加于IMD的能量开销。本文提出了一种新的基于ip的密钥交换协议,并对其在医疗紧急情况中的应用进行了评估。我们的协议采用模糊承诺来容忍外部读取器和IMD获得的IPI之间的预期差异,以及通过IPI分类解决心跳误检的新方法。使用我们的协议,以高概率(1-10−6)安全地交换80位密钥的预期时间大约为1分钟,而从IMD中仅消耗88 μJ。
{"title":"Secure key-exchange protocol for implants using heartbeats","authors":"R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis","doi":"10.1145/2903150.2903165","DOIUrl":"https://doi.org/10.1145/2903150.2903165","url":null,"abstract":"The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132344739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Breadth first search vectorization on the Intel Xeon Phi 基于Intel Xeon Phi处理器的宽度优先搜索矢量化
Pub Date : 2016-04-11 DOI: 10.1145/2903150.2903180
Mireya Paredes, G. Riley, M. Luján
Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. However, due to the irregular memory access patterns and the unstructured nature of the large graphs, its efficient parallelization is a challenge. The Xeon Phi is a massively parallel architecture available as an off-the-shelf accelerator, which includes a powerful 512 bit vector unit with optimized scatter and gather functions. Given its potential benefits, work related to graph traversing on this architecture is an active area of research. We present a set of experiments in which we explore architectural features of the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the techniques can be applied to the current state-of-the-art hybrid, top-down plus bottom-up, algorithms. We focus on the exploitation of the vector unit by developing an improved highly vectorized OpenMP parallel algorithm, using vector intrinsics, and understanding the use of data alignment and prefetching. In addition, we investigate the impact of hyperthreading and thread affinity on performance, a topic that appears under researched in the literature. As a result, we achieve what we believe is the fastest published top-down BFS algorithm on the version of Xeon Phi used in our experiments. The vectorized BFS top-down source code presented in this paper can be available on request as free-to-use software.
广度优先搜索(BFS)是图算法的一个构建块,最近被用于各种应用程序的大规模信息分析,包括社交网络、图数据库和网络搜索。由于其重要性,许多不同的并行编程模型和架构被用来优化BFS。然而,由于不规则的内存访问模式和大型图的非结构化性质,它的高效并行化是一个挑战。Xeon Phi是一款大规模并行架构的现成加速器,它包括一个强大的512位矢量单元,具有优化的散射和收集功能。考虑到其潜在的好处,与此架构上的图遍历相关的工作是一个活跃的研究领域。我们提出了一组实验,在这些实验中,我们探索了Xeon Phi的架构特征,以及如何在自上而下的BFS算法中最好地利用它们,但这些技术可以应用于当前最先进的自上而下加自下而上的混合算法。我们通过开发一种改进的高度向量化的OpenMP并行算法,使用向量本质,以及理解数据对齐和预取的使用,专注于向量单元的利用。此外,我们还研究了超线程和线程亲和性对性能的影响,这是一个在文献中尚未研究的主题。因此,我们在实验中使用的Xeon Phi版本上实现了我们认为最快的自顶向下BFS算法。本文中提出的矢量化BFS自顶向下源代码可以作为免费软件提供。
{"title":"Breadth first search vectorization on the Intel Xeon Phi","authors":"Mireya Paredes, G. Riley, M. Luján","doi":"10.1145/2903150.2903180","DOIUrl":"https://doi.org/10.1145/2903150.2903180","url":null,"abstract":"Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. However, due to the irregular memory access patterns and the unstructured nature of the large graphs, its efficient parallelization is a challenge. The Xeon Phi is a massively parallel architecture available as an off-the-shelf accelerator, which includes a powerful 512 bit vector unit with optimized scatter and gather functions. Given its potential benefits, work related to graph traversing on this architecture is an active area of research. We present a set of experiments in which we explore architectural features of the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the techniques can be applied to the current state-of-the-art hybrid, top-down plus bottom-up, algorithms. We focus on the exploitation of the vector unit by developing an improved highly vectorized OpenMP parallel algorithm, using vector intrinsics, and understanding the use of data alignment and prefetching. In addition, we investigate the impact of hyperthreading and thread affinity on performance, a topic that appears under researched in the literature. As a result, we achieve what we believe is the fastest published top-down BFS algorithm on the version of Xeon Phi used in our experiments. The vectorized BFS top-down source code presented in this paper can be available on request as free-to-use software.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Towards co-designed optimizations in parallel frameworks: a MapReduce case study 迈向并行框架中的协同设计优化:一个MapReduce案例研究
Pub Date : 2016-03-31 DOI: 10.1145/2903150.2903162
Colin Barrett, Christos Kotselidis, M. Luján
The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.
随着大数据的爆炸式增长,大量复杂的并行软件栈应运而生,其目的是应对数据泛滥带来的挑战。这种多层分层部署的一个缺点是无法在堆栈中的各层之间维护和委派重要的语义信息。软件抽象增加了应用程序与其生成的代码之间的语义距离。然而,并行软件框架包含通用编译器无法利用的固有语义信息。本文介绍了一个案例研究,展示了MapReduce范式的特定语义信息如何在多核架构上被利用。MR4J已经在Java中实现,并根据手工优化的C和c++等效版本进行了评估。最初观察到的结果导致了语义感知优化器的设计,该优化器无需修改应用程序代码即可自动运行。优化器能够将MR4J的执行时间加快至多2.0倍。引入的优化不仅提高了在映射阶段生成的代码的性能,而且还减少了垃圾收集器的压力。这演示了在使用并行软件框架时,如何在不牺牲可靠的软件工程实践的情况下利用语义信息。
{"title":"Towards co-designed optimizations in parallel frameworks: a MapReduce case study","authors":"Colin Barrett, Christos Kotselidis, M. Luján","doi":"10.1145/2903150.2903162","DOIUrl":"https://doi.org/10.1145/2903150.2903162","url":null,"abstract":"The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124116797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proceedings of the ACM International Conference on Computing Frontiers ACM计算前沿国际会议论文集
{"title":"Proceedings of the ACM International Conference on Computing Frontiers","authors":"","doi":"10.1145/2903150","DOIUrl":"https://doi.org/10.1145/2903150","url":null,"abstract":"","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121403285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1