首页 > 最新文献

Proceedings of the ACM International Conference on Computing Frontiers最新文献

英文 中文
Shared resource aware scheduling on power-constrained tiled many-core processors 在功率受限的平铺多核处理器上实现共享资源感知调度
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903490
S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout
Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases. In this paper, we propose a two-tier hierarchical power management methodology to exploit per-tile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior.
在当今功率受限的多核环境中,通过动态核心、缓存和频率自适应进行电源管理已成为一种必要。不幸的是,随着核数的增加,适配硬件和电源管理算法的复杂性也在增加。在本文中,我们提出了一种双层分层电源管理方法来利用每层电压调节器和集群的最后一级缓存。此外,我们还包含了一个新的线程迁移层,该层(i)分析在平铺多核处理器上运行的线程,以便与内核、缓存和频率适配一起共享资源敏感性,以及(ii)以兼容的行为共同调度每个平铺上的线程。
{"title":"Shared resource aware scheduling on power-constrained tiled many-core processors","authors":"S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout","doi":"10.1145/2903150.2903490","DOIUrl":"https://doi.org/10.1145/2903150.2903490","url":null,"abstract":"Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases. In this paper, we propose a two-tier hierarchical power management methodology to exploit per-tile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125755841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
CryoCMOS hardware technology a classical infrastructure for a scalable quantum computer CryoCMOS硬件技术是可扩展量子计算机的经典基础设施
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2906828
H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon
We propose a classical infrastructure for a quantum computer implemented in CMOS. The peculiarity of the approach is to operate the classical CMOS circuits and systems at deep-cryogenic temperatures (cryoCMOS), so as to ensure physical proximity to the quantum bits, thus reducing thermal gradients and increasing compactness. CryoCMOS technology leverages the CMOS fabrication infrastructure and exploits the continuous effort of miniaturization that has sustained Moore's Law for over 50 years. Such approach is believed to enable the growth of the number of qubits operating in a fault-tolerant fashion, paving the way to scalable quantum computing machines.
我们提出了一种经典的CMOS量子计算机基础结构。该方法的特点是在深低温(cryoCMOS)下运行经典CMOS电路和系统,以确保与量子比特的物理接近,从而减小热梯度并增加紧凑性。CryoCMOS技术利用了CMOS制造基础设施,并利用了摩尔定律持续50多年的小型化努力。这种方法被认为能够增加以容错方式运行的量子位的数量,为可扩展的量子计算机器铺平道路。
{"title":"CryoCMOS hardware technology a classical infrastructure for a scalable quantum computer","authors":"H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon","doi":"10.1145/2903150.2906828","DOIUrl":"https://doi.org/10.1145/2903150.2906828","url":null,"abstract":"We propose a classical infrastructure for a quantum computer implemented in CMOS. The peculiarity of the approach is to operate the classical CMOS circuits and systems at deep-cryogenic temperatures (cryoCMOS), so as to ensure physical proximity to the quantum bits, thus reducing thermal gradients and increasing compactness. CryoCMOS technology leverages the CMOS fabrication infrastructure and exploits the continuous effort of miniaturization that has sustained Moore's Law for over 50 years. Such approach is believed to enable the growth of the number of qubits operating in a fault-tolerant fashion, paving the way to scalable quantum computing machines.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130012690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
InfiniCortex: present and future invited paper InfiniCortex:现在和未来的特邀论文
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2912887
M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou
Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.
从2014年6月开始,新加坡A*STAR计算资源中心(A*CRC)团队与全球数十个合作伙伴一起,一直在构建InfiniCortex。InfiniCortex集成了四个概念:i)亚洲、北美、澳大利亚和欧洲四大洲之间的高带宽(~ 10至100Gbps)洲际连接;ii)使用Obsidian的长弓范围扩展器支持跨大陆距离的InfiniBand扩展技术;iii)连接具有不同网络拓扑结构的独立InfiniBand子网,以创建单个计算资源:Galaxy of Supercomputers [10] iv)在这样一个分布式计算基础设施上运行工作流和应用程序。我们已经在SC14和SC15会议上成功展示了InfiniCortex原型。基础设施由驻留在新加坡、日本、澳大利亚、美国、加拿大、法国和波兰的多个位置的计算资源组成。演示了各种并发应用程序,包括工作流、启用ADIOS系统的大量I/O应用程序、Extempore实时交互应用程序和现场实时可视化。在本文中,我们简要介绍了Infini-Cortex结构背后的基本思想,我们最近的成功以及对该项目进一步发展和扩展的一些想法。
{"title":"InfiniCortex: present and future invited paper","authors":"M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou","doi":"10.1145/2903150.2912887","DOIUrl":"https://doi.org/10.1145/2903150.2912887","url":null,"abstract":"Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Lock-based synchronization for GPU architectures GPU架构的基于锁的同步
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903155
Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian
Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.
现代gpu在加速计算密集型和有限数据共享的数字工作负载方面显示出了有希望的结果。然而,新兴的GPU应用程序在并发执行的线程之间显示了大量的数据共享。在多线程环境中,数据共享往往需要互斥机制来保证数据的完整性。尽管现代GPU提供了可以用来构造细粒度锁的原子原语,但是现有的GPU锁实现要么导致频繁的并发错误,要么由于GPU的单指令多线程(Single Instruction Multiple Threads, SIMT)执行范例导致硬件利用率极低。为了使更多的数据共享应用受益于GPU加速,我们提出了一种新的GPU架构锁定方案。所提出的锁定方案允许在单个扭曲中窃取锁,以避免由于gpu的SMIT执行而导致的并发错误。此外,它还采用了锁虚拟化来降低细粒度GPU锁的内存开销。为了说明GPU锁的使用和好处,我们将提出的GPU锁方案应用于Delaunay网格细化(DMR),一个涉及线程间大量数据共享的应用程序。与基于算法优化的实现(使用为DMR量身定制的同步机制)相比,我们基于锁的实现可以实现1.22倍的加速,内存成本降低94%。
{"title":"Lock-based synchronization for GPU architectures","authors":"Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian","doi":"10.1145/2903150.2903155","DOIUrl":"https://doi.org/10.1145/2903150.2903155","url":null,"abstract":"Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133073720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Malevolent app pairs: an Android permission overpassing scheme 恶意应用对:Android权限跨越方案
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911706
Antonios Dimitriadis, P. Efraimidis, Vasilios Katos
Portable smart devices potentially store a wealth of information of personal data, making them attractive targets for data exfiltration attacks. Permission based schemes are core security controls for reducing privacy and security risks. In this paper we demonstrate that current permission schemes cannot effectively mitigate risks posed by covert channels. We show that a pair of apps with different permission settings may collude in order to effectively create a state where a union of their permissions is obtained, giving opportunities for leaking sensitive data, whilst keeping the leak potentially unnoticed. We then propose a solution for such attacks.
便携式智能设备可能存储大量个人数据信息,使其成为数据泄露攻击的诱人目标。基于权限的方案是降低隐私和安全风险的核心安全控制。在本文中,我们证明了当前的许可方案不能有效地减轻隐蔽通道带来的风险。我们展示了一对具有不同权限设置的应用程序可能会串通,以便有效地创建一种状态,在这种状态下获得他们的权限联合,从而为泄露敏感数据提供机会,同时保持泄漏可能不被注意。然后,我们提出了针对此类攻击的解决方案。
{"title":"Malevolent app pairs: an Android permission overpassing scheme","authors":"Antonios Dimitriadis, P. Efraimidis, Vasilios Katos","doi":"10.1145/2903150.2911706","DOIUrl":"https://doi.org/10.1145/2903150.2911706","url":null,"abstract":"Portable smart devices potentially store a wealth of information of personal data, making them attractive targets for data exfiltration attacks. Permission based schemes are core security controls for reducing privacy and security risks. In this paper we demonstrate that current permission schemes cannot effectively mitigate risks posed by covert channels. We show that a pair of apps with different permission settings may collude in order to effectively create a state where a union of their permissions is obtained, giving opportunities for leaking sensitive data, whilst keeping the leak potentially unnoticed. We then propose a solution for such attacks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133507839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Big data analytics and the LHC 大数据分析和LHC
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2917755
M. Girone
The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.
大型强子对撞机是迄今为止建造的最大、最复杂的科学设备之一。大型强子对撞机环上的探测器每秒能观测到多达8亿次质子-质子碰撞。10 ^ 11次方的事件是新的物理学,从巨大的背景中提取微小信号需要一系列分层步骤。高能物理(HEP)长期以来一直是管理和处理庞大科学数据集和最大规模高通量计算中心的驱动因素。HEP开发了首批科学计算网格之一,现在定期运行50万个处理器内核和0.5 eb的磁盘存储,分布在五大洲,包括数百个连接的设施。在这次演讲中,我将讨论用于从庞大而复杂的数据集中提取科学发现的技术。虽然HEP已经开发了许多处理大数据集的工具和技术,但该领域越来越希望更有效地利用其他行业发展。我将讨论在大数据分析中采用工业技术的一些正在进行的工作,以提高大型强子对撞机的发现潜力和从事该工作的科学家的效率。
{"title":"Big data analytics and the LHC","authors":"M. Girone","doi":"10.1145/2903150.2917755","DOIUrl":"https://doi.org/10.1145/2903150.2917755","url":null,"abstract":"The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CAOS: combined analysis with online sifting for dynamic compilation systems CAOS:动态编译系统的综合分析与在线筛选
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903151
Jie Fu, Guojie Jin, Longbing Zhang, Jian Wang
Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.
动态编译对虚拟机的性能影响很大。本文研究了动态编译的特点,揭示了动态编译系统优化的目标。根据这些目标,我们提出了一种新的动态编译调度算法,称为联机筛选结合分析(CAOS)。它由组合优先级分析模型和在线筛选机制组成。在调度过程中,采用组合优先级分析模型确定方法的优先级,以协调响应性与编译队列的平均延迟。通过执行在线筛选,可以进一步减少运行时开销,因为对性能没有什么好处的方法被筛选掉了。CAOS可以显著提高应用程序的启动性能。实验结果表明,CAOS的启动性能平均提升了14.0%,最高提升了55.1%。CAOS具有通用性强、易于实现的优点,可以应用于大多数动态编译系统。
{"title":"CAOS: combined analysis with online sifting for dynamic compilation systems","authors":"Jie Fu, Guojie Jin, Longbing Zhang, Jian Wang","doi":"10.1145/2903150.2903151","DOIUrl":"https://doi.org/10.1145/2903150.2903151","url":null,"abstract":"Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116598033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems ANTAREX的方法,自动调整和适应节能高性能计算系统
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903470
C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou
The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
ANTAREX项目旨在通过领域特定语言(DSL)表达应用程序的自适应性,并为绿色和异构高性能计算(HPC)系统进行运行时管理和自动调整应用程序,最高可达Exascale。DSL方法允许定义能效、性能和适应性策略,以及通过应用程序自动调优、资源和电源管理在运行时实施这些策略。通过从一个项目应用程序用例中提取的一个小应用程序,我们展示了通过DSL支持的方式对应用程序精度调优的一些初步探索。
{"title":"The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems","authors":"C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou","doi":"10.1145/2903150.2903470","DOIUrl":"https://doi.org/10.1145/2903150.2903470","url":null,"abstract":"The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129123004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Secure key-exchange protocol for implants using heartbeats 使用心跳的植入物安全密钥交换协议
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903165
R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis
The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.
最近提出了心脏搏动间隔(IPI),以促进使用患者自己的心跳作为信任来源的植入式医疗设备(imd)的密钥交换。虽然这种形式的密钥交换为IMD安全性带来了希望,但由于在相关工作中发现的简化方法,其可行性尚不完全清楚。例如,以前提出的协议在设计时没有考虑到每个IPI可用的有限随机性,或者忽略了与现实系统相关的方面,例如不完美的心跳检测或强加于IMD的能量开销。本文提出了一种新的基于ip的密钥交换协议,并对其在医疗紧急情况中的应用进行了评估。我们的协议采用模糊承诺来容忍外部读取器和IMD获得的IPI之间的预期差异,以及通过IPI分类解决心跳误检的新方法。使用我们的协议,以高概率(1-10−6)安全地交换80位密钥的预期时间大约为1分钟,而从IMD中仅消耗88 μJ。
{"title":"Secure key-exchange protocol for implants using heartbeats","authors":"R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis","doi":"10.1145/2903150.2903165","DOIUrl":"https://doi.org/10.1145/2903150.2903165","url":null,"abstract":"The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132344739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Energy reduction in video systems: the GreenVideo project 视频系统节能:绿色视频项目
Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911716
M. Pelcat, Erwan Nogues, X. Ducloux
With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.
随着微电子技术的发展和网络带宽的不断增加,视频应用越来越普遍,尤其是在移动环境下。2019年,全球80%的互联网流量将是视频。然而,由于处理的数据量很大,优化视频处理的能耗仍然是一个挑战。本讲座将集中讨论视频编解码器的能量优化。在第一部分中,将介绍绿色元数据计划。2014年11月,MPEG发布了一项名为“绿色元数据”的新标准,旨在促进消费设备上的节能媒体。本标准规定了在编码器和解码器之间传输的元数据,以减少编码、解码和显示过程中的功耗。将介绍标准中考虑的不同元数据。更具体地说,绿色自适应流的主张将被详细说明。在第二部分中,将介绍在现代MP-SoC上实现HEVC解码器的能量优化。本文将详细介绍在通用处理器(GPP)上有效实现HEVC解码器的不同技术。利用不同程度的并行性来增加和利用空闲时间。一个复杂的DVFS机制已经被开发来处理每帧解码过程的可变性。为了获得进一步的能量增益,利用近似计算的概念提出了一种改进的HEVC解码器,该解码器能够在管理解码质量与能量权衡的同时调整其能量增益。第二部分详细介绍的工作是法国GreenVideo FUI项目的成果。
{"title":"Energy reduction in video systems: the GreenVideo project","authors":"M. Pelcat, Erwan Nogues, X. Ducloux","doi":"10.1145/2903150.2911716","DOIUrl":"https://doi.org/10.1145/2903150.2911716","url":null,"abstract":"With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1