首页 > 最新文献

Proceedings of the 16th ACM International Conference on Computing Frontiers最新文献

英文 中文
Exploration of task-based scheduling for convolutional neural networks accelerators under memory constraints 内存约束下卷积神经网络加速器的任务调度研究
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323162
Crefeda Faviola Rodrigues, G. Riley, M. Luján
Development of application specific accelerators for deep convolutional neural networks (ConvNets) have mainly focussed on accelerating the computationally intensive layers, that is the convolutional layers, to improve performance and energy efficiency. Traditional approaches in this space have relied on handcrafted dataflow implementations to leverage the fine-grained parallelism and data-locality properties within these layers. However, ConvNets layers also have an untapped potential from cross-layer data locality. In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memory-aware heuristic based onHeterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems. Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.
深度卷积神经网络(ConvNets)专用加速器的开发主要集中在加速计算密集型层,即卷积层,以提高性能和能源效率。这个领域的传统方法依赖于手工制作的数据流实现来利用这些层中的细粒度并行性和数据局部性属性。然而,卷积神经网络层在跨层数据局部性方面也有未开发的潜力。在我们的工作中,我们探索了一种在深度神经网络加速器背景下的新方法,通过将计算建模为任务依赖的有向无环图,并提出了一种基于异构最早完成时间(HEFT)的内存感知启发式方法,用于共享内存系统上的任务图调度。我们的结果显示,在使用LeNet-5的前三层模拟环境中,与传统的逐层处理相比,任务图的好处在于更好的内存使用(减少23.4%)。某些任务图权衡内存使用的最大扩展时间(增加10%)(减少20%)。最后,我们在使用内存感知HEFT和原始HEFT时对池化层具有不同切片配置的图进行了探索,结果表明,跨层的规则形状瓦片比沿一个轴的大尺寸瓦片提供了更好的makespan和内存使用。
{"title":"Exploration of task-based scheduling for convolutional neural networks accelerators under memory constraints","authors":"Crefeda Faviola Rodrigues, G. Riley, M. Luján","doi":"10.1145/3310273.3323162","DOIUrl":"https://doi.org/10.1145/3310273.3323162","url":null,"abstract":"Development of application specific accelerators for deep convolutional neural networks (ConvNets) have mainly focussed on accelerating the computationally intensive layers, that is the convolutional layers, to improve performance and energy efficiency. Traditional approaches in this space have relied on handcrafted dataflow implementations to leverage the fine-grained parallelism and data-locality properties within these layers. However, ConvNets layers also have an untapped potential from cross-layer data locality. In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memory-aware heuristic based onHeterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems. Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards realistic battery-DoS protection of implantable medical devices 实现植入式医疗设备的电池dos保护
Pub Date : 2019-04-15 DOI: 10.1145/3310273.3321555
M. Siddiqi, C. Strydis
Modern Implantable Medical Devices (IMDs) feature wireless connectivity, which makes them vulnerable to security attacks. Particular to IMDs is the battery Denial-of-Service attack whereby attackers aim to fully deplete the battery by occupying the IMD with continuous authentication requests. Zero-Power Defense (ZPD) based on energy harvesting is known to be an excellent protection against these attacks. This paper establishes essential design specifications for employing ZPD techniques in IMDs, offers a critical review of ZPD techniques found in literature and, subsequently, gives crucial recommendations for developing comprehensive ZPD solutions.
现代植入式医疗设备(imd)具有无线连接功能,这使得它们容易受到安全攻击。IMD特有的是电池拒绝服务攻击,攻击者的目标是通过持续的身份验证请求占用IMD来完全耗尽电池。基于能量收集的零功率防御(ZPD)是针对这些攻击的极好保护。本文建立了在imd中使用ZPD技术的基本设计规范,对文献中发现的ZPD技术进行了批判性回顾,随后为开发全面的ZPD解决方案提供了重要建议。
{"title":"Towards realistic battery-DoS protection of implantable medical devices","authors":"M. Siddiqi, C. Strydis","doi":"10.1145/3310273.3321555","DOIUrl":"https://doi.org/10.1145/3310273.3321555","url":null,"abstract":"Modern Implantable Medical Devices (IMDs) feature wireless connectivity, which makes them vulnerable to security attacks. Particular to IMDs is the battery Denial-of-Service attack whereby attackers aim to fully deplete the battery by occupying the IMD with continuous authentication requests. Zero-Power Defense (ZPD) based on energy harvesting is known to be an excellent protection against these attacks. This paper establishes essential design specifications for employing ZPD techniques in IMDs, offers a critical review of ZPD techniques found in literature and, subsequently, gives crucial recommendations for developing comprehensive ZPD solutions.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
IMD security vs. energy: are we tilting at windmills?: POSTER IMD安全vs.能源:我们是在向风车倾斜吗?:海报
Pub Date : 2019-04-15 DOI: 10.1145/3310273.3323421
M. Siddiqi, C. Strydis
Implantable Medical Devices (IMDs) such as pacemakers and neurostimulators are highly constrained in terms of energy. In addition, the wireless-communication facilities of these devices also impose security requirements considering their life-critical nature. However, security solutions that provide considerable coverage are generally considered to be too taxing on an IMD battery. Consequently, there has been a tendency to adopt ultra-lightweight security primitives for IMDs in literature. In this work, we demonstrate that the recent advances in embedded computing in fact enable the IMDs to use more mainstream security primitives, which do not need to compromise significantly on security for fear of impacting IMD autonomy.
植入式医疗设备(imd),如起搏器和神经刺激器,在能量方面受到高度限制。此外,考虑到这些设备的生命攸关性,其无线通信设施也对安全提出了要求。然而,提供相当大的覆盖范围的安全解决方案通常被认为对IMD电池过于繁重。因此,文献中出现了为imd采用超轻量级安全原语的趋势。在这项工作中,我们证明了嵌入式计算的最新进展实际上使IMD能够使用更主流的安全原语,这些原语不需要因为担心影响IMD的自主性而在安全性上做出重大妥协。
{"title":"IMD security vs. energy: are we tilting at windmills?: POSTER","authors":"M. Siddiqi, C. Strydis","doi":"10.1145/3310273.3323421","DOIUrl":"https://doi.org/10.1145/3310273.3323421","url":null,"abstract":"Implantable Medical Devices (IMDs) such as pacemakers and neurostimulators are highly constrained in terms of energy. In addition, the wireless-communication facilities of these devices also impose security requirements considering their life-critical nature. However, security solutions that provide considerable coverage are generally considered to be too taxing on an IMD battery. Consequently, there has been a tendency to adopt ultra-lightweight security primitives for IMDs in literature. In this work, we demonstrate that the recent advances in embedded computing in fact enable the IMDs to use more mainstream security primitives, which do not need to compromise significantly on security for fear of impacting IMD autonomy.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125785750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Solving large minimum vertex cover problems on a quantum annealer 求解量子退火炉上的大最小顶点覆盖问题
Pub Date : 2019-03-29 DOI: 10.1145/3310273.3321562
Elijah Pelofske, Georg Hahn, H. Djidjev
We consider the minimum vertex cover problem having applications in e.g. biochemistry and network security. Quantum annealers can find the optimum solution of such NP-hard problems, given they can be embedded on the hardware. This is often infeasible due to limitations of the hardware connectivity structure. This paper presents a decomposition algorithm for the minimum vertex cover problem: The algorithm recursively divides an arbitrary problem until the generated subproblems can be embedded and solved on the annealer. To speed up the decomposition, we propose several pruning and reduction techniques. The performance of our algorithm is assessed in a simulation study.
我们考虑了最小顶点覆盖问题在生物化学和网络安全等领域的应用。量子退火器可以找到这种np困难问题的最佳解决方案,因为它们可以嵌入在硬件上。由于硬件连接结构的限制,这通常是不可行的。提出了一种最小顶点覆盖问题的分解算法:该算法递归地对任意问题进行分解,直到生成的子问题可以嵌入并求解到退火机上。为了加速分解,我们提出了几种修剪和还原技术。在仿真研究中对算法的性能进行了评估。
{"title":"Solving large minimum vertex cover problems on a quantum annealer","authors":"Elijah Pelofske, Georg Hahn, H. Djidjev","doi":"10.1145/3310273.3321562","DOIUrl":"https://doi.org/10.1145/3310273.3321562","url":null,"abstract":"We consider the minimum vertex cover problem having applications in e.g. biochemistry and network security. Quantum annealers can find the optimum solution of such NP-hard problems, given they can be embedded on the hardware. This is often infeasible due to limitations of the hardware connectivity structure. This paper presents a decomposition algorithm for the minimum vertex cover problem: The algorithm recursively divides an arbitrary problem until the generated subproblems can be embedded and solved on the annealer. To speed up the decomposition, we propose several pruning and reduction techniques. The performance of our algorithm is assessed in a simulation study.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
MP net as abstract model of communication for message-passing applications MP网络作为消息传递应用的抽象通信模型
Pub Date : 2019-03-19 DOI: 10.1145/3310273.3322824
Martin Surkovský
MP net is a formal model specifically designed for the field of parallel applications that use message passing interface. The main idea is to use MP net as a comprehensible way of presenting the actual structure of communication within MPI applications. The goal is to provide users with the kind of feedback that can help them to check quickly whether or not the actual communication within their application corresponds to the intended one. This paper introduces MP net that focuses on the communication part of parallel applications and emphasizes its spatial character, which is rather hidden in sequential (textual) form.
MP网络是专门为使用消息传递接口的并行应用程序领域设计的形式化模型。其主要思想是使用MPI网络作为一种可理解的方式来呈现MPI应用程序中的实际通信结构。目标是为用户提供一种反馈,帮助他们快速检查应用程序中的实际通信是否与预期通信相对应。本文介绍了MP网络,着重于并行应用的通信部分,并强调其空间特征,而这种空间特征在顺序(文本)形式下是相当隐蔽的。
{"title":"MP net as abstract model of communication for message-passing applications","authors":"Martin Surkovský","doi":"10.1145/3310273.3322824","DOIUrl":"https://doi.org/10.1145/3310273.3322824","url":null,"abstract":"MP net is a formal model specifically designed for the field of parallel applications that use message passing interface. The main idea is to use MP net as a comprehensible way of presenting the actual structure of communication within MPI applications. The goal is to provide users with the kind of feedback that can help them to check quickly whether or not the actual communication within their application corresponds to the intended one. This paper introduces MP net that focuses on the communication part of parallel applications and emphasizes its spatial character, which is rather hidden in sequential (textual) form.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130660413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A case for superconducting accelerators 超导加速器的案例
Pub Date : 2019-02-12 DOI: 10.1145/3310273.3321561
Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert F. Krick, Douglas M. Carmean, Moinuddin K. Qureshi
As scaling of CMOS slows down, there is growing interest in alternative technologies that can improve performance and energy-efficiency. Superconducting circuits based on Josephson Junctions (JJ) is an emerging technology that provides devices which can be switched with pico-second latencies and consumes two orders of magnitude lower switching energy compared to CMOS. While JJ-based circuits can operate at high frequencies and are energy-efficient, the technology faces three critical challenges: limited device density and lack of area-efficient technology for memory structures, low gate fanout, and new failure modes of Flux-Traps that occurs due to the operating environment. Limited memory density restricts the use of superconducting technology in the near term to application domains that have high compute intensity but require negligible amount of memory. In this paper, we study the use of superconducting technology to build an accelerator for SHA-256 engines commonly used in Bitcoin mining. We show that merely porting existing CMOS-based accelerator to superconducting technology provides 10.6X improvement in energy efficiency. Redesigning the accelerator to suit the unique constraints of superconducting technology (such as low fanout) improves the energy efficiency to 12.2X. We also investigate solutions to make the accelerator tolerant of new fault modes and show how this fault-tolerant design can be leveraged to reduce the operating current, thereby improving the overall energy-efficiency to 46X.
随着CMOS规模的放缓,人们对能够提高性能和能效的替代技术的兴趣越来越大。基于约瑟夫森结(JJ)的超导电路是一项新兴技术,它提供的器件可以以皮秒的延迟进行切换,并且与CMOS相比,其开关能量消耗降低了两个数量级。虽然基于jj的电路可以在高频率下工作并且节能,但该技术面临三个关键挑战:有限的器件密度和缺乏用于存储结构的面积高效技术,低门扇出以及由于操作环境而发生的磁通陷阱的新失效模式。有限的内存密度限制了超导技术在短期内的应用范围,这些领域的计算强度高,但需要的内存量可以忽略不计。在本文中,我们研究了使用超导技术为比特币挖矿中常用的SHA-256引擎构建加速器。我们表明,仅仅将现有的基于cmos的加速器移植到超导技术上,就可以将能源效率提高10.6倍。重新设计加速器以适应超导技术的独特限制(如低扇出),将能量效率提高到12.2倍。我们还研究了使加速器能够容忍新故障模式的解决方案,并展示了如何利用这种容错设计来降低工作电流,从而将整体能效提高到46X。
{"title":"A case for superconducting accelerators","authors":"Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert F. Krick, Douglas M. Carmean, Moinuddin K. Qureshi","doi":"10.1145/3310273.3321561","DOIUrl":"https://doi.org/10.1145/3310273.3321561","url":null,"abstract":"As scaling of CMOS slows down, there is growing interest in alternative technologies that can improve performance and energy-efficiency. Superconducting circuits based on Josephson Junctions (JJ) is an emerging technology that provides devices which can be switched with pico-second latencies and consumes two orders of magnitude lower switching energy compared to CMOS. While JJ-based circuits can operate at high frequencies and are energy-efficient, the technology faces three critical challenges: limited device density and lack of area-efficient technology for memory structures, low gate fanout, and new failure modes of Flux-Traps that occurs due to the operating environment. Limited memory density restricts the use of superconducting technology in the near term to application domains that have high compute intensity but require negligible amount of memory. In this paper, we study the use of superconducting technology to build an accelerator for SHA-256 engines commonly used in Bitcoin mining. We show that merely porting existing CMOS-based accelerator to superconducting technology provides 10.6X improvement in energy efficiency. Redesigning the accelerator to suit the unique constraints of superconducting technology (such as low fanout) improves the energy efficiency to 12.2X. We also investigate solutions to make the accelerator tolerant of new fault modes and show how this fault-tolerant design can be leveraged to reduce the operating current, thereby improving the overall energy-efficiency to 46X.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125236996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
nGraph-HE: a graph compiler for deep learning on homomorphically encrypted data nGraph-HE:用于同态加密数据深度学习的图形编译器
Pub Date : 2018-10-23 DOI: 10.1145/3310273.3323047
Fabian Boemer, Yixing Lao, Casimir Wierzynski
Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have greatly accelerated the training and deployment of DL models to various computing platforms. We introduce nGraph-HE, an extension of nGraph, Intel's DL graph compiler, which enables deployment of trained models with popular frameworks such as TensorFlow while simply treating HE as another hardware target. Our graph-compiler approach enables HE-aware optimizations- implemented at compile-time, such as constant folding and HE-SIMD packing, and at run-time, such as special value plaintext bypass. Furthermore, nGraph-HE integrates with DL frameworks such as TensorFlow, enabling data scientists to benchmark DL models with minimal overhead.
同态加密(HE)——在加密数据上执行计算的能力——是解决深度学习(DL)中对数据隐私日益增加的担忧的一种有吸引力的补救措施。然而,构建基于密文的深度学习模型目前是一项劳动密集型的工作,需要同时具备深度学习、密码学和软件工程方面的专业知识。深度学习框架和图形编译器的最新进展大大加快了深度学习模型在各种计算平台上的训练和部署。我们介绍nGraph-HE, nGraph的扩展,英特尔的深度学习图编译器,它支持使用流行的框架(如TensorFlow)部署训练好的模型,同时简单地将HE视为另一个硬件目标。我们的图形编译器方法支持he感知优化——在编译时实现,比如常量折叠和HE-SIMD打包,在运行时实现,比如特殊值明文绕过。此外,nGraph-HE与深度学习框架(如TensorFlow)集成,使数据科学家能够以最小的开销对深度学习模型进行基准测试。
{"title":"nGraph-HE: a graph compiler for deep learning on homomorphically encrypted data","authors":"Fabian Boemer, Yixing Lao, Casimir Wierzynski","doi":"10.1145/3310273.3323047","DOIUrl":"https://doi.org/10.1145/3310273.3323047","url":null,"abstract":"Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have greatly accelerated the training and deployment of DL models to various computing platforms. We introduce nGraph-HE, an extension of nGraph, Intel's DL graph compiler, which enables deployment of trained models with popular frameworks such as TensorFlow while simply treating HE as another hardware target. Our graph-compiler approach enables HE-aware optimizations- implemented at compile-time, such as constant folding and HE-SIMD packing, and at run-time, such as special value plaintext bypass. Furthermore, nGraph-HE integrates with DL frameworks such as TensorFlow, enabling data scientists to benchmark DL models with minimal overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123795871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
ISA mapper: a compute and hardware agnostic deep learning compiler ISA映射器:一个计算和硬件无关的深度学习编译器
Pub Date : 2018-10-12 DOI: 10.1145/3310273.3321559
Matthew Sotoudeh, Anand Venkat, Michael J. Anderson, E. Georganas, A. Heinecke, Jason Knight
Domain specific accelerators present new challenges for code generation onto novel instruction sets, communication fabrics, and memory architectures. We introduce a shared intermediate representation to describe both deep learning programs and hardware capabilities, then formulate and apply instruction mapping to determine how a computation can be performed on a hardware system. Our scheduler chooses a specific mapping and determines data movement and computation order. With this system, we demonstrate automated extraction of matrix multiplication kernels from recent deep learning operations. We demonstrate 2--5X better performance on GEMM and GRU execution versus state-of-the-art on new hardware and up to 85% of state-of-the-art performance on existing hardware.
领域特定的加速器为在新的指令集、通信结构和内存体系结构上生成代码提出了新的挑战。我们引入了一个共享的中间表示来描述深度学习程序和硬件功能,然后制定和应用指令映射来确定如何在硬件系统上执行计算。我们的调度器选择一个特定的映射,并确定数据移动和计算顺序。利用这个系统,我们演示了从最近的深度学习操作中自动提取矩阵乘法核。我们展示了在GEMM和GRU执行上的性能比在新硬件上的性能提高2- 5倍,在现有硬件上的性能提高85%。
{"title":"ISA mapper: a compute and hardware agnostic deep learning compiler","authors":"Matthew Sotoudeh, Anand Venkat, Michael J. Anderson, E. Georganas, A. Heinecke, Jason Knight","doi":"10.1145/3310273.3321559","DOIUrl":"https://doi.org/10.1145/3310273.3321559","url":null,"abstract":"Domain specific accelerators present new challenges for code generation onto novel instruction sets, communication fabrics, and memory architectures. We introduce a shared intermediate representation to describe both deep learning programs and hardware capabilities, then formulate and apply instruction mapping to determine how a computation can be performed on a hardware system. Our scheduler chooses a specific mapping and determines data movement and computation order. With this system, we demonstrate automated extraction of matrix multiplication kernels from recent deep learning operations. We demonstrate 2--5X better performance on GEMM and GRU execution versus state-of-the-art on new hardware and up to 85% of state-of-the-art performance on existing hardware.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127031896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Quantum encoded quantum evolutionary algorithm for the design of quantum circuits 用于量子电路设计的量子编码量子进化算法
Pub Date : 2018-09-28 DOI: 10.1145/3310273.3322826
Georgiy Krylov, M. Lukac
In this paper we present Quanrum Encoded Quantum Evolutionary Algorithm (QEQEA) and compare its performance against a a classical GPU accelerated Genetic Algorithm (GPUGA). The proposed QEQEA differs from existing quantum evolutionary algorithms in several points: representation of candidates circuits is using qubits and qutrits and the proposed evolutionary operators can theoretically be implemented on quantum computer provided a classical control exists. The synthesized circuits are obtained by a set of measurements performed on the encoding units of quantum representation. Both algorithms are accelerated using (general purpose graphic processing unit) GPGPU. The main target of this paper is not to propose a completely novel quantum genetic algorithm but to rather experimentally estimate the advantages of certain components of genetic algorithm being encoded and implemented in a quantum compatible manner. The algorithms are compared and evaluated on several reversible and quantum circuits. The results demonstrate that on one hand the quantum encoding and quantum implementation compatible implementation provides certain disadvantages with respect to the classical evolutionary computation. On the other hand, encoding certain components in a quantum compatible manner could in theory allow to accelerate the search by providing small overhead when built in quantum computer. Therefore acceleration would in turn counter weight the implementation limitations.
本文提出了量子编码量子进化算法(QEQEA),并将其性能与经典GPU加速遗传算法(GPUGA)进行了比较。本文提出的QEQEA算法与现有的量子进化算法有以下几点不同:候选电路的表示使用量子比特和量子元,并且只要存在经典控制,所提出的进化算子理论上可以在量子计算机上实现。合成电路是通过对量子表示的编码单元进行一组测量得到的。这两种算法都使用GPGPU(通用图形处理单元)加速。本文的主要目标不是提出一种全新的量子遗传算法,而是通过实验估计遗传算法的某些组件以量子兼容的方式编码和实现的优势。在几种可逆电路和量子电路上对算法进行了比较和评价。结果表明,一方面量子编码与量子实现兼容的实现相对于经典的进化计算有一定的缺点;另一方面,以量子兼容的方式对某些组件进行编码,理论上可以通过在量子计算机中提供较小的开销来加速搜索。因此,加速将反过来抵消实现限制。
{"title":"Quantum encoded quantum evolutionary algorithm for the design of quantum circuits","authors":"Georgiy Krylov, M. Lukac","doi":"10.1145/3310273.3322826","DOIUrl":"https://doi.org/10.1145/3310273.3322826","url":null,"abstract":"In this paper we present Quanrum Encoded Quantum Evolutionary Algorithm (QEQEA) and compare its performance against a a classical GPU accelerated Genetic Algorithm (GPUGA). The proposed QEQEA differs from existing quantum evolutionary algorithms in several points: representation of candidates circuits is using qubits and qutrits and the proposed evolutionary operators can theoretically be implemented on quantum computer provided a classical control exists. The synthesized circuits are obtained by a set of measurements performed on the encoding units of quantum representation. Both algorithms are accelerated using (general purpose graphic processing unit) GPGPU. The main target of this paper is not to propose a completely novel quantum genetic algorithm but to rather experimentally estimate the advantages of certain components of genetic algorithm being encoded and implemented in a quantum compatible manner. The algorithms are compared and evaluated on several reversible and quantum circuits. The results demonstrate that on one hand the quantum encoding and quantum implementation compatible implementation provides certain disadvantages with respect to the classical evolutionary computation. On the other hand, encoding certain components in a quantum compatible manner could in theory allow to accelerate the search by providing small overhead when built in quantum computer. Therefore acceleration would in turn counter weight the implementation limitations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129784080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Personal volunteer computing 个人志愿计算
Pub Date : 2018-04-04 DOI: 10.1145/3310273.3322819
Erick Lavoie, L. Hendren
We propose personal volunteer computing, a novel paradigm to encourage technical solutions that leverage personal devices, such as smartphones and laptops, for personal applications that require significant computations, such as animation rendering and image processing. The paradigm requires no investment in additional hardware, relying instead on devices that are already owned by users and their community, and favours simple tools that can be implemented part-time by a single developer. We show that samples of personal devices of today are competitive with a top-of-the-line laptop from two years ago. We also propose new directions to extend the paradigm.
我们提出个人志愿计算,这是一种新的范例,鼓励利用个人设备(如智能手机和笔记本电脑)为需要大量计算的个人应用程序(如动画渲染和图像处理)提供技术解决方案。这种模式不需要在额外的硬件上投资,而是依赖于用户及其社区已经拥有的设备,并且偏爱可以由单个开发人员兼职实现的简单工具。我们显示,今天的个人设备样本与两年前的顶级笔记本电脑相比具有竞争力。我们还提出了扩展范式的新方向。
{"title":"Personal volunteer computing","authors":"Erick Lavoie, L. Hendren","doi":"10.1145/3310273.3322819","DOIUrl":"https://doi.org/10.1145/3310273.3322819","url":null,"abstract":"We propose personal volunteer computing, a novel paradigm to encourage technical solutions that leverage personal devices, such as smartphones and laptops, for personal applications that require significant computations, such as animation rendering and image processing. The paradigm requires no investment in additional hardware, relying instead on devices that are already owned by users and their community, and favours simple tools that can be implemented part-time by a single developer. We show that samples of personal devices of today are competitive with a top-of-the-line laptop from two years ago. We also propose new directions to extend the paradigm.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121599705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
Proceedings of the 16th ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1