首页 > 最新文献

Proceedings of the Computing Frontiers Conference最新文献

英文 中文
Sorting big data on heterogeneous near-data processing systems 异构近数据处理系统大数据分类
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078885
E. Vermij, Leandro Fiorin, C. Hagleitner, K. Bertels
Big data workloads assumed recently a relevant importance in many business and scientific applications. Sorting elements efficiently in big data workloads is a key operation. In this work, we analyze the implementation of the mergesort algorithm on heterogeneous systems composed of CPUs and near-data processors located on the system memory channels. For configurations with equal number of active CPU cores and near-data processors, our experiments show a performance speedup of up to 2.5, as well as up to 2.5x energy-per-solution reduction.
最近,大数据工作负载在许多商业和科学应用中发挥了重要作用。在大数据工作负载中,有效地对元素进行排序是一项关键操作。在这项工作中,我们分析了合并排序算法在由cpu和位于系统内存通道上的近数据处理器组成的异构系统上的实现。对于具有相同数量的活动CPU内核和近数据处理器的配置,我们的实验显示性能加速高达2.5倍,每个解决方案的能量减少高达2.5倍。
{"title":"Sorting big data on heterogeneous near-data processing systems","authors":"E. Vermij, Leandro Fiorin, C. Hagleitner, K. Bertels","doi":"10.1145/3075564.3078885","DOIUrl":"https://doi.org/10.1145/3075564.3078885","url":null,"abstract":"Big data workloads assumed recently a relevant importance in many business and scientific applications. Sorting elements efficiently in big data workloads is a key operation. In this work, we analyze the implementation of the mergesort algorithm on heterogeneous systems composed of CPUs and near-data processors located on the system memory channels. For configurations with equal number of active CPU cores and near-data processors, our experiments show a performance speedup of up to 2.5, as well as up to 2.5x energy-per-solution reduction.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132188271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Peak load optimization through 2-dimensional packing and multi-processor real-time scheduling 基于二维包装和多处理器实时调度的峰值负荷优化
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075587
D. Martini, G. Benetti, Filippo Cipolla, Davide Caprino, M. L. D. Vedova, T. Facchinetti
The use of real-time scheduling methods to coordinate a set of power loads is being explored in the field of Cyber-Physical Energy Systems, with the goal of optimizing the aggregated peak load of power used by many electric loads. Real-time scheduling has attractive features in this domain. Thanks to its inherent resource optimization, which limits the number of concurrent tasks that are running at the same time, real-time scheduling provides direct benefits to peak load optimization. This paper shows the combined use of a two-dimensional bin-packing method and an optimal multi-processor real-time scheduling algorithm to coordinate the activation of electric loads. The result is an effective global scheduling approach where the activation of loads is organized into a pattern that takes into account the timing constraints of the loads and the actual combination of active loads. The validation is done by scheduling a set of thermal loads (heaters) in a building, with accurately modeled temperature dynamics. The proposed method is shown to achieve a significant peak load reduction, up to around 70%, w.r.t. the traditional thermostat controller.
在信息物理能源系统领域,利用实时调度方法来协调一组电力负荷,以优化多个电力负荷所使用的总峰值负荷。实时调度在这一领域具有很好的特点。由于其固有的资源优化限制了同时运行的并发任务的数量,实时调度为峰值负载优化提供了直接的好处。本文将二维装箱法与多处理器优化实时调度算法相结合,以协调电力负荷的激活。其结果是一种有效的全局调度方法,其中将负载的激活组织成一个模式,该模式考虑了负载的时间约束和活动负载的实际组合。验证是通过在建筑物中调度一组热负荷(加热器)来完成的,并精确地模拟温度动态。所提出的方法被证明可以实现显著的峰值负荷降低,高达70%左右,比传统的恒温控制器。
{"title":"Peak load optimization through 2-dimensional packing and multi-processor real-time scheduling","authors":"D. Martini, G. Benetti, Filippo Cipolla, Davide Caprino, M. L. D. Vedova, T. Facchinetti","doi":"10.1145/3075564.3075587","DOIUrl":"https://doi.org/10.1145/3075564.3075587","url":null,"abstract":"The use of real-time scheduling methods to coordinate a set of power loads is being explored in the field of Cyber-Physical Energy Systems, with the goal of optimizing the aggregated peak load of power used by many electric loads. Real-time scheduling has attractive features in this domain. Thanks to its inherent resource optimization, which limits the number of concurrent tasks that are running at the same time, real-time scheduling provides direct benefits to peak load optimization. This paper shows the combined use of a two-dimensional bin-packing method and an optimal multi-processor real-time scheduling algorithm to coordinate the activation of electric loads. The result is an effective global scheduling approach where the activation of loads is organized into a pattern that takes into account the timing constraints of the loads and the actual combination of active loads. The validation is done by scheduling a set of thermal loads (heaters) in a building, with accurately modeled temperature dynamics. The proposed method is shown to achieve a significant peak load reduction, up to around 70%, w.r.t. the traditional thermostat controller.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133640174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Instruction level energy model for the Adapteva Epiphany multi-core processor Adapteva Epiphany多核处理器的指令级能量模型
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078892
Gabriel Ortiz, L. Svensson, Erik Alveflo, P. Larsson-Edefors
Processor energy models can be used by developers to estimate, without the need of hardware implementation or additional measurement setups, the power consumption of software applications. Furthermore, these energy models can be used for energy-aware compiler optimization. This paper presents a measurement-based instruction-level energy characterization for the Adapteva Epiphany processor, which is a 16-core shared-memory architecture connected by a 2D network-on-chip. Based on a number of microbenchmarks, the instruction-level characterization was used to build an energy model that includes essential Epiphany instructions such as remote memory loads and stores. To validate the model, an FFT application was developed. This validation showed that the energy estimated by the model is within 0.4% of the measured energy.
开发人员可以使用处理器能量模型来估计软件应用程序的功耗,而不需要硬件实现或额外的测量设置。此外,这些能量模型可用于能量感知的编译器优化。本文介绍了Adapteva Epiphany处理器的基于测量的指令级能量表征,该处理器是一个16核共享内存架构,通过2D片上网络连接。基于许多微基准测试,使用指令级表征来构建能量模型,该模型包括基本的顿悟指令,如远程内存负载和存储。为了验证该模型,开发了一个FFT应用程序。验证表明,模型估算的能量与实测能量的误差在0.4%以内。
{"title":"Instruction level energy model for the Adapteva Epiphany multi-core processor","authors":"Gabriel Ortiz, L. Svensson, Erik Alveflo, P. Larsson-Edefors","doi":"10.1145/3075564.3078892","DOIUrl":"https://doi.org/10.1145/3075564.3078892","url":null,"abstract":"Processor energy models can be used by developers to estimate, without the need of hardware implementation or additional measurement setups, the power consumption of software applications. Furthermore, these energy models can be used for energy-aware compiler optimization. This paper presents a measurement-based instruction-level energy characterization for the Adapteva Epiphany processor, which is a 16-core shared-memory architecture connected by a 2D network-on-chip. Based on a number of microbenchmarks, the instruction-level characterization was used to build an energy model that includes essential Epiphany instructions such as remote memory loads and stores. To validate the model, an FFT application was developed. This validation showed that the energy estimated by the model is within 0.4% of the measured energy.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116383567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal On-Line Computation of Stack Distances for MIN and OPT 最小最小和最优选择的堆栈距离在线优化计算
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075571
G. Bilardi, K. Ekanadham, P. Pattnaik
The replacement policies known as MIN and OPT are optimal for a two-level memory hierarchy. The computation of the cache content for these policies requires the off-line knowledge of the entire address trace. However, the stack distance of a given access, that is, the smallest capacity of a cache for which that access results in a hit, is independent of future accesses and can be computed on-line. Off-line and on-line algorithms to compute the stack distance in time O(V) per access have been known for several decades, where V denotes the number of distinct addresses within the trace. The off-line time bound was recently improved to O(√V log V). This paper introduces the Critical Stack Algorithm for the online computation of the stack distance of MIN and OPT, in time O(log V) per access. The result exploits a novel analysis of properties of OPT and data structures based on balanced binary trees. A corresponding Ω(log V) lower bound is derived by a reduction from element distinctness; this bound holds in a variety of models of computation and applies even to the off-line simulation of just one cache capacity.
对于两级内存层次结构,称为MIN和OPT的替换策略是最优的。这些策略的缓存内容的计算需要了解整个地址跟踪的离线知识。但是,给定访问的堆栈距离,即该访问导致命中的缓存的最小容量,与未来的访问无关,并且可以在线计算。计算每次访问时间O(V)的堆栈距离的离线和在线算法已经存在了几十年,其中V表示跟踪中不同地址的数量。最近将离线时间限制改进为O(√V log V).本文介绍了在每次访问O(log V)时间内在线计算MIN和OPT的堆栈距离的临界堆栈算法。结果利用了一种新的基于平衡二叉树的OPT和数据结构的特性分析。通过对元素区别度的约简,推导出相应的Ω(log V)下界;这个界限适用于各种计算模型,甚至适用于仅一个缓存容量的离线模拟。
{"title":"Optimal On-Line Computation of Stack Distances for MIN and OPT","authors":"G. Bilardi, K. Ekanadham, P. Pattnaik","doi":"10.1145/3075564.3075571","DOIUrl":"https://doi.org/10.1145/3075564.3075571","url":null,"abstract":"The replacement policies known as MIN and OPT are optimal for a two-level memory hierarchy. The computation of the cache content for these policies requires the off-line knowledge of the entire address trace. However, the stack distance of a given access, that is, the smallest capacity of a cache for which that access results in a hit, is independent of future accesses and can be computed on-line. Off-line and on-line algorithms to compute the stack distance in time O(V) per access have been known for several decades, where V denotes the number of distinct addresses within the trace. The off-line time bound was recently improved to O(√V log V). This paper introduces the Critical Stack Algorithm for the online computation of the stack distance of MIN and OPT, in time O(log V) per access. The result exploits a novel analysis of properties of OPT and data structures based on balanced binary trees. A corresponding Ω(log V) lower bound is derived by a reduction from element distinctness; this bound holds in a variety of models of computation and applies even to the off-line simulation of just one cache capacity.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"52 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126005457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RAGuard: A Hardware Based Mechanism for Backward-Edge Control-Flow Integrity rguard:一种基于硬件的后边缘控制流完整性机制
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075570
Jun Zhang, Rui Hou, Junfeng Fan, KeKe Liu, Lixin Zhang, S. Mckee
Control-flow integrity (CFI) is considered as a general and promising method to prevent code-reuse attacks, which utilize benign code sequences to realize arbitrary computation. Current approaches can efficiently protect control-flow transfers caused by indirect jumps and function calls (forward-edge CFI). However, they cannot effectively protect control-flow caused by the function return (backward-edge CFI). The reason is that the set of return addresses of the functions that are frequently called can be very large, which might bend the backward-edge CFI. We address this backward-edge CFI problem by proposing a novel hardware-assisted mechanism (RAGuard) that binds a message authentication code to each return address and enhances security via a physical unclonable function and a hardware hash function. The message authentication codes can be stored on the program stack with return address. RAGuard hardware automatically verifies the integrity of return addresses. Our experiments show that for a subset of the SPEC CPU2006 benchmarks, RAGuard incurs 1.86% runtime overheads on average with no need for OS support.
控制流完整性(CFI)被认为是一种通用的、有前途的防止代码重用攻击的方法,它利用良性代码序列来实现任意计算。目前的方法可以有效地保护由间接跳转和函数调用引起的控制流转移(前沿CFI)。但是,它们不能有效地保护由函数返回(后缘CFI)引起的控制流。原因是经常调用的函数的返回地址集可能非常大,这可能会弯曲后边缘CFI。我们通过提出一种新的硬件辅助机制(RAGuard)来解决这个后端CFI问题,该机制将消息认证码绑定到每个返回地址,并通过物理不可克隆函数和硬件哈希函数增强安全性。消息身份验证码可以存储在带有返回地址的程序堆栈中。rguard硬件自动验证返回地址的完整性。我们的实验表明,对于SPEC CPU2006基准测试的一个子集,在不需要操作系统支持的情况下,RAGuard平均会产生1.86%的运行时开销。
{"title":"RAGuard: A Hardware Based Mechanism for Backward-Edge Control-Flow Integrity","authors":"Jun Zhang, Rui Hou, Junfeng Fan, KeKe Liu, Lixin Zhang, S. Mckee","doi":"10.1145/3075564.3075570","DOIUrl":"https://doi.org/10.1145/3075564.3075570","url":null,"abstract":"Control-flow integrity (CFI) is considered as a general and promising method to prevent code-reuse attacks, which utilize benign code sequences to realize arbitrary computation. Current approaches can efficiently protect control-flow transfers caused by indirect jumps and function calls (forward-edge CFI). However, they cannot effectively protect control-flow caused by the function return (backward-edge CFI). The reason is that the set of return addresses of the functions that are frequently called can be very large, which might bend the backward-edge CFI. We address this backward-edge CFI problem by proposing a novel hardware-assisted mechanism (RAGuard) that binds a message authentication code to each return address and enhances security via a physical unclonable function and a hardware hash function. The message authentication codes can be stored on the program stack with return address. RAGuard hardware automatically verifies the integrity of return addresses. Our experiments show that for a subset of the SPEC CPU2006 benchmarks, RAGuard incurs 1.86% runtime overheads on average with no need for OS support.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127387059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
The Future of Deep Learning: Challenges & Solutions 深度学习的未来:挑战与解决方案
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3097267
M. Robins
Mark will begin with a brief overview of deep learning and what has led to its recent popularity. He will provide a few demonstrations and examples of deep learning applications based on recent work at Intel Nervana. He will explain some of the challenges to continued progress in deep learning - such as high compute requirements and lengthy training time - and will discuss some of the solutions (e.g. custom deep learning hardware) that Intel Nervana is developing to usher in a new era of even more powerful AI.
Mark将从深度学习的简要概述开始,以及它最近流行的原因。他将提供一些基于英特尔Nervana最近工作的深度学习应用的演示和示例。他将解释深度学习持续发展的一些挑战,例如高计算要求和冗长的训练时间,并将讨论英特尔Nervana正在开发的一些解决方案(例如定制深度学习硬件),以迎接更强大的人工智能的新时代。
{"title":"The Future of Deep Learning: Challenges & Solutions","authors":"M. Robins","doi":"10.1145/3075564.3097267","DOIUrl":"https://doi.org/10.1145/3075564.3097267","url":null,"abstract":"Mark will begin with a brief overview of deep learning and what has led to its recent popularity. He will provide a few demonstrations and examples of deep learning applications based on recent work at Intel Nervana. He will explain some of the challenges to continued progress in deep learning - such as high compute requirements and lengthy training time - and will discuss some of the solutions (e.g. custom deep learning hardware) that Intel Nervana is developing to usher in a new era of even more powerful AI.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129944172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding the I/O Behavior of Desktop Applications in Virtualization 理解虚拟化中桌面应用程序的I/O行为
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3076263
Yan Sui, Chun Yang, Xu Cheng
Input/Output (I/O) performance is very important when running desktop applications in virtualized environments. Previous research has focused on cold execution or installation of desktop applications, where the I/O requests are obvious; in many other scenarios such as warm launch or web page browsing however, I/O behaviors are less clear, and in this paper, we analyze the I/O behavior of these desktop scenarios. Our analysis reveals several interesting I/O behaviors of desktop applications; for example, we show that many warm applications will send random read requests during their launch, which leads to storage-sensitivity of these applications. We also find that the write requests from web page browsing generates considerable I/O pressure, even when the users only open a simple news page and take no further action. Our results have strong ramifications for the management of storage systems and the deployment of virtual machines in virtualized environments.
在虚拟化环境中运行桌面应用程序时,输入/输出(I/O)性能非常重要。以前的研究主要集中在桌面应用程序的冷执行或安装上,其中的I/O请求是显而易见的;然而,在许多其他场景中,如热启动或网页浏览,I/O行为不太清楚,在本文中,我们分析了这些桌面场景的I/O行为。我们的分析揭示了桌面应用程序的几个有趣的I/O行为;例如,我们展示了许多热应用程序在启动期间会发送随机读取请求,这导致这些应用程序的存储敏感性。我们还发现,来自网页浏览的写请求会产生相当大的I/O压力,即使用户只打开一个简单的新闻页面而不做进一步的操作。我们的研究结果对存储系统的管理和虚拟化环境中虚拟机的部署有很大的影响。
{"title":"Understanding the I/O Behavior of Desktop Applications in Virtualization","authors":"Yan Sui, Chun Yang, Xu Cheng","doi":"10.1145/3075564.3076263","DOIUrl":"https://doi.org/10.1145/3075564.3076263","url":null,"abstract":"Input/Output (I/O) performance is very important when running desktop applications in virtualized environments. Previous research has focused on cold execution or installation of desktop applications, where the I/O requests are obvious; in many other scenarios such as warm launch or web page browsing however, I/O behaviors are less clear, and in this paper, we analyze the I/O behavior of these desktop scenarios. Our analysis reveals several interesting I/O behaviors of desktop applications; for example, we show that many warm applications will send random read requests during their launch, which leads to storage-sensitivity of these applications. We also find that the write requests from web page browsing generates considerable I/O pressure, even when the users only open a simple news page and take no further action. Our results have strong ramifications for the management of storage systems and the deployment of virtual machines in virtualized environments.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121192289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Plant Classification with Deep Neural Networks 基于深度神经网络的大规模植物分类
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075590
Ignacio Heredia
This paper discusses the potential of applying deep learning techniques for plant classification and its usage for citizen science in large-scale biodiversity monitoring. We show that plant classification using near state-of-the-art convolutional network architectures like ResNet50 achieves significant improvements in accuracy compared to the most widespread plant classification application in test sets composed of thousands of different species labels. We find that the predictions can be confidently used as a baseline classification in citizen science communities like iNaturalist (or its Spanish fork, Natusfera) which in turn can share their data with biodiversity portals like GBIF.
本文讨论了深度学习技术在植物分类中的应用潜力及其在大规模生物多样性监测中的公民科学应用。我们表明,与由数千个不同物种标签组成的测试集中最广泛的植物分类应用相比,使用接近最先进的卷积网络架构(如ResNet50)的植物分类在准确性方面取得了显着提高。我们发现,这些预测可以被iNaturalist(或其西班牙分支Natusfera)等公民科学社区自信地用作基线分类,而这些社区又可以与GBIF等生物多样性门户网站分享他们的数据。
{"title":"Large-Scale Plant Classification with Deep Neural Networks","authors":"Ignacio Heredia","doi":"10.1145/3075564.3075590","DOIUrl":"https://doi.org/10.1145/3075564.3075590","url":null,"abstract":"This paper discusses the potential of applying deep learning techniques for plant classification and its usage for citizen science in large-scale biodiversity monitoring. We show that plant classification using near state-of-the-art convolutional network architectures like ResNet50 achieves significant improvements in accuracy compared to the most widespread plant classification application in test sets composed of thousands of different species labels. We find that the predictions can be confidently used as a baseline classification in citizen science communities like iNaturalist (or its Spanish fork, Natusfera) which in turn can share their data with biodiversity portals like GBIF.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Improving Error Resilience Analysis Methodology of Iterative Workloads for Approximate Computing 改进近似计算迭代工作负载的误差恢复分析方法
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078891
G. Gillani, A. Kokkeler
Assessing error resilience inherent to the digital processing workloads provides application-specific insights towards approximate computing strategies for improving power efficiency and/or performance. With the case study of radio astronomy calibration, our contributions for improving the error resilience analysis are focused primarily on iterative methods that use a convergence criterion as a quality metric to terminate the iterative computations. We propose an adaptive statistical approximation model for high-level resilience analysis that provides an opportunity to divide a workload into exact and approximate iterations. This improves the existing error resilience analysis methodology by quantifying the number of approximate iterations (23% of the total iterations in our case study) in addition to other parameters used in the state-of-the-art techniques. This way heterogeneous architectures comprised of exact and inexact computing cores and adaptive accuracy architectures can be exploited efficiently. Moreover, we demonstrate the importance of quality function reconsideration for convergence based iterative processes as the original quality function (the convergence criterion) is not necessarily sufficient in the resilience analysis phase. If such is the case, an additional quality function has to be defined to assess the viability of the approximate techniques.
评估数字处理工作负载固有的错误恢复能力,为提高电源效率和/或性能的近似计算策略提供了特定于应用程序的见解。以射电天文校准为例,我们对改进误差恢复分析的贡献主要集中在迭代方法上,该方法使用收敛准则作为质量度量来终止迭代计算。我们提出了一个自适应的统计近似模型,用于高级弹性分析,该模型提供了将工作负载划分为精确迭代和近似迭代的机会。除了在最先进的技术中使用的其他参数之外,通过量化近似迭代的数量(在我们的案例研究中占总迭代的23%),这改进了现有的错误弹性分析方法。这种方法可以有效地利用由精确和不精确计算核心组成的异构体系结构以及自适应精度体系结构。此外,我们证明了质量函数重新考虑对于基于收敛的迭代过程的重要性,因为原始质量函数(收敛准则)在弹性分析阶段并不一定足够。如果是这种情况,则必须定义一个附加的质量函数来评估近似技术的可行性。
{"title":"Improving Error Resilience Analysis Methodology of Iterative Workloads for Approximate Computing","authors":"G. Gillani, A. Kokkeler","doi":"10.1145/3075564.3078891","DOIUrl":"https://doi.org/10.1145/3075564.3078891","url":null,"abstract":"Assessing error resilience inherent to the digital processing workloads provides application-specific insights towards approximate computing strategies for improving power efficiency and/or performance. With the case study of radio astronomy calibration, our contributions for improving the error resilience analysis are focused primarily on iterative methods that use a convergence criterion as a quality metric to terminate the iterative computations. We propose an adaptive statistical approximation model for high-level resilience analysis that provides an opportunity to divide a workload into exact and approximate iterations. This improves the existing error resilience analysis methodology by quantifying the number of approximate iterations (23% of the total iterations in our case study) in addition to other parameters used in the state-of-the-art techniques. This way heterogeneous architectures comprised of exact and inexact computing cores and adaptive accuracy architectures can be exploited efficiently. Moreover, we demonstrate the importance of quality function reconsideration for convergence based iterative processes as the original quality function (the convergence criterion) is not necessarily sufficient in the resilience analysis phase. If such is the case, an additional quality function has to be defined to assess the viability of the approximate techniques.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127534527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Trading Fault Tolerance for Performance in AN Encoding AN编码中交易容错性能研究
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075565
Norman A. Rink, J. Castrillón
Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.
不断增加的暂态硬件故障率给计算应用带来了问题。当前和未来的趋势可能会加剧这一问题。当程序执行过程中发生短暂故障时,输出中的数据可能会损坏。输出损坏的严重程度取决于应用程序域。因此,不同的应用程序需要不同级别的容错。我们提出了一个基于llvm的an编码器,它可以在可配置的严格级别上为程序配备错误检测机制。基于我们的编码器,分析了容错性和运行时开销之间的权衡。通过适当配置我们的AN编码器,可以将运行时开销从9.9倍降低到2.1倍。与此同时,CPU硬件故障导致数据静默损坏的概率从0.007上升到0.022以上。内存故障的相同概率从0.009增加到0.032以上。通过将AN编码器的不同配置应用于算术表达式解释器的组件,进一步证明了对容错级别进行细粒度控制是有益的。
{"title":"Trading Fault Tolerance for Performance in AN Encoding","authors":"Norman A. Rink, J. Castrillón","doi":"10.1145/3075564.3075565","DOIUrl":"https://doi.org/10.1145/3075564.3075565","url":null,"abstract":"Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the Computing Frontiers Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1