S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout
Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases. In this paper, we propose a two-tier hierarchical power management methodology to exploit per-tile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior.
{"title":"Shared resource aware scheduling on power-constrained tiled many-core processors","authors":"S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout","doi":"10.1145/2903150.2903490","DOIUrl":"https://doi.org/10.1145/2903150.2903490","url":null,"abstract":"Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases. In this paper, we propose a two-tier hierarchical power management methodology to exploit per-tile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125755841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon
We propose a classical infrastructure for a quantum computer implemented in CMOS. The peculiarity of the approach is to operate the classical CMOS circuits and systems at deep-cryogenic temperatures (cryoCMOS), so as to ensure physical proximity to the quantum bits, thus reducing thermal gradients and increasing compactness. CryoCMOS technology leverages the CMOS fabrication infrastructure and exploits the continuous effort of miniaturization that has sustained Moore's Law for over 50 years. Such approach is believed to enable the growth of the number of qubits operating in a fault-tolerant fashion, paving the way to scalable quantum computing machines.
{"title":"CryoCMOS hardware technology a classical infrastructure for a scalable quantum computer","authors":"H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon","doi":"10.1145/2903150.2906828","DOIUrl":"https://doi.org/10.1145/2903150.2906828","url":null,"abstract":"We propose a classical infrastructure for a quantum computer implemented in CMOS. The peculiarity of the approach is to operate the classical CMOS circuits and systems at deep-cryogenic temperatures (cryoCMOS), so as to ensure physical proximity to the quantum bits, thus reducing thermal gradients and increasing compactness. CryoCMOS technology leverages the CMOS fabrication infrastructure and exploits the continuous effort of miniaturization that has sustained Moore's Law for over 50 years. Such approach is believed to enable the growth of the number of qubits operating in a fault-tolerant fashion, paving the way to scalable quantum computing machines.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130012690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou
Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.
从2014年6月开始,新加坡A*STAR计算资源中心(A*CRC)团队与全球数十个合作伙伴一起,一直在构建InfiniCortex。InfiniCortex集成了四个概念:i)亚洲、北美、澳大利亚和欧洲四大洲之间的高带宽(~ 10至100Gbps)洲际连接;ii)使用Obsidian的长弓范围扩展器支持跨大陆距离的InfiniBand扩展技术;iii)连接具有不同网络拓扑结构的独立InfiniBand子网,以创建单个计算资源:Galaxy of Supercomputers [10] iv)在这样一个分布式计算基础设施上运行工作流和应用程序。我们已经在SC14和SC15会议上成功展示了InfiniCortex原型。基础设施由驻留在新加坡、日本、澳大利亚、美国、加拿大、法国和波兰的多个位置的计算资源组成。演示了各种并发应用程序,包括工作流、启用ADIOS系统的大量I/O应用程序、Extempore实时交互应用程序和现场实时可视化。在本文中,我们简要介绍了Infini-Cortex结构背后的基本思想,我们最近的成功以及对该项目进一步发展和扩展的一些想法。
{"title":"InfiniCortex: present and future invited paper","authors":"M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou","doi":"10.1145/2903150.2912887","DOIUrl":"https://doi.org/10.1145/2903150.2912887","url":null,"abstract":"Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian
Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.
{"title":"Lock-based synchronization for GPU architectures","authors":"Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian","doi":"10.1145/2903150.2903155","DOIUrl":"https://doi.org/10.1145/2903150.2903155","url":null,"abstract":"Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133073720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonios Dimitriadis, P. Efraimidis, Vasilios Katos
Portable smart devices potentially store a wealth of information of personal data, making them attractive targets for data exfiltration attacks. Permission based schemes are core security controls for reducing privacy and security risks. In this paper we demonstrate that current permission schemes cannot effectively mitigate risks posed by covert channels. We show that a pair of apps with different permission settings may collude in order to effectively create a state where a union of their permissions is obtained, giving opportunities for leaking sensitive data, whilst keeping the leak potentially unnoticed. We then propose a solution for such attacks.
{"title":"Malevolent app pairs: an Android permission overpassing scheme","authors":"Antonios Dimitriadis, P. Efraimidis, Vasilios Katos","doi":"10.1145/2903150.2911706","DOIUrl":"https://doi.org/10.1145/2903150.2911706","url":null,"abstract":"Portable smart devices potentially store a wealth of information of personal data, making them attractive targets for data exfiltration attacks. Permission based schemes are core security controls for reducing privacy and security risks. In this paper we demonstrate that current permission schemes cannot effectively mitigate risks posed by covert channels. We show that a pair of apps with different permission settings may collude in order to effectively create a state where a union of their permissions is obtained, giving opportunities for leaking sensitive data, whilst keeping the leak potentially unnoticed. We then propose a solution for such attacks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133507839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.
{"title":"Big data analytics and the LHC","authors":"M. Girone","doi":"10.1145/2903150.2917755","DOIUrl":"https://doi.org/10.1145/2903150.2917755","url":null,"abstract":"The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.
{"title":"CAOS: combined analysis with online sifting for dynamic compilation systems","authors":"Jie Fu, Guojie Jin, Longbing Zhang, Jian Wang","doi":"10.1145/2903150.2903151","DOIUrl":"https://doi.org/10.1145/2903150.2903151","url":null,"abstract":"Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116598033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou
The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
{"title":"The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems","authors":"C. Silvano, G. Agosta, Stefano Cherubin, D. Gadioli, G. Palermo, Andrea Bartolini, L. Benini, J. Martinovič, M. Palkovic, K. Slaninová, João Bispo, João MP Cardoso, Rui Abreu, Pedro Pinto, C. Cavazzoni, N. Sanna, A. Beccari, R. Cmar, Erven Rohou","doi":"10.1145/2903150.2903470","DOIUrl":"https://doi.org/10.1145/2903150.2903470","url":null,"abstract":"The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129123004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis
The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.
{"title":"Secure key-exchange protocol for implants using heartbeats","authors":"R. M. Seepers, J. Weber, Z. Erkin, I. Sourdis, C. Strydis","doi":"10.1145/2903150.2903165","DOIUrl":"https://doi.org/10.1145/2903150.2903165","url":null,"abstract":"The cardiac interpulse interval (IPI) has recently been proposed to facilitate key exchange for implantable medical devices (IMDs) using a patient's own heartbeats as a source of trust. While this form of key exchange holds promise for IMD security, its feasibility is not fully understood due to the simplified approaches found in related works. For example, previously proposed protocols have been designed without considering the limited randomness available per IPI, or have overlooked aspects pertinent to a realistic system, such as imperfect heartbeat detection or the energy overheads imposed on an IMD. In this paper, we propose a new IPI-based key-exchange protocol and evaluate its use during medical emergencies. Our protocol employs fuzzy commitment to tolerate the expected disparity between IPIs obtained by an external reader and an IMD, as well as a novel way of tackling heartbeat misdetection through IPI classification. Using our protocol, the expected time for securely exchanging an 80-bit key with high probability (1-10−6) is roughly one minute, while consuming only 88 μJ from an IMD.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132344739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.
{"title":"Energy reduction in video systems: the GreenVideo project","authors":"M. Pelcat, Erwan Nogues, X. Ducloux","doi":"10.1145/2903150.2911716","DOIUrl":"https://doi.org/10.1145/2903150.2911716","url":null,"abstract":"With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}