首页 > 最新文献

2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

英文 中文
Scaling and Quality of Modularity Optimization Methods for Graph Clustering 图聚类的模块化优化方法的尺度和质量
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916299
Sayan Ghosh, M. Halappanavar, Antonino Tumeo, A. Kalyanaraman
Real-world graphs exhibit structures known as “communities” or “clusters” consisting of a group of vertices with relatively high connectivity between them, as compared to the rest of the vertices in the network. Graph clustering or community detection is a fundamental graph operation used to analyze real-world graphs occurring in the areas of computational biology, cybersecurity, electrical grids, etc. Similar to other graph algorithms, owing to irregular memory accesses and inherently sequential nature, current algorithms for community detection are challenging to parallelize. However, in order to analyze large networks, it is important to develop scalable parallel implementations of graph clustering that are capable of exploiting the architectural features of modern supercomputers.In response to the 2019 Streaming Graph Challenge, we present quality and performance analysis of our distributed-memory community detection using Vite, which is our distributed memory implementation of the popular Louvain method, on the ALCF Theta supercomputer.Clustering methods such as Louvain that rely on modularity maximization are known to suffer from the resolution limit problem, preventing identification of clusters of certain sizes. Hence, we also include quality analysis of our shared-memory implementation of the Fast-tracking Resistance method, in comparison with Louvain on the challenge datasets.Furthermore, we introduce an edge-balanced graph distribution for our distributed memory implementation, that significantly reduces communication, offering up to 80% improvement in the overall execution time. In addition to performance/quality analysis, we also include details on the power/energy consumption, and memory traffic of the distributed-memory clustering implementation using real-world graphs with over a billion edges.
与网络中的其他顶点相比,现实世界的图展示了被称为“社区”或“集群”的结构,这些结构由一组顶点组成,它们之间的连通性相对较高。图聚类或社区检测是一种基本的图操作,用于分析计算生物学、网络安全、电网等领域中出现的现实世界图。与其他图算法类似,由于内存访问不规律和固有的顺序性,当前的社区检测算法很难实现并行化。然而,为了分析大型网络,开发能够利用现代超级计算机的架构特征的可伸缩的图聚类并行实现是很重要的。为了响应2019年的流图挑战,我们在ALCF Theta超级计算机上使用Vite对我们的分布式内存社区检测进行了质量和性能分析,Vite是我们对流行的Louvain方法的分布式内存实现。众所周知,Louvain等依赖于模块化最大化的聚类方法存在分辨率限制问题,无法识别特定大小的聚类。因此,我们还包括快速跟踪阻力方法的共享内存实现的质量分析,与Louvain在挑战数据集上的比较。此外,我们为我们的分布式内存实现引入了一个边缘平衡的图分布,这大大减少了通信,使总体执行时间提高了80%。除了性能/质量分析之外,我们还使用具有超过10亿个边的真实图形,详细介绍了分布式内存集群实现的功耗/能耗和内存流量。
{"title":"Scaling and Quality of Modularity Optimization Methods for Graph Clustering","authors":"Sayan Ghosh, M. Halappanavar, Antonino Tumeo, A. Kalyanaraman","doi":"10.1109/HPEC.2019.8916299","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916299","url":null,"abstract":"Real-world graphs exhibit structures known as “communities” or “clusters” consisting of a group of vertices with relatively high connectivity between them, as compared to the rest of the vertices in the network. Graph clustering or community detection is a fundamental graph operation used to analyze real-world graphs occurring in the areas of computational biology, cybersecurity, electrical grids, etc. Similar to other graph algorithms, owing to irregular memory accesses and inherently sequential nature, current algorithms for community detection are challenging to parallelize. However, in order to analyze large networks, it is important to develop scalable parallel implementations of graph clustering that are capable of exploiting the architectural features of modern supercomputers.In response to the 2019 Streaming Graph Challenge, we present quality and performance analysis of our distributed-memory community detection using Vite, which is our distributed memory implementation of the popular Louvain method, on the ALCF Theta supercomputer.Clustering methods such as Louvain that rely on modularity maximization are known to suffer from the resolution limit problem, preventing identification of clusters of certain sizes. Hence, we also include quality analysis of our shared-memory implementation of the Fast-tracking Resistance method, in comparison with Louvain on the challenge datasets.Furthermore, we introduce an edge-balanced graph distribution for our distributed memory implementation, that significantly reduces communication, offering up to 80% improvement in the overall execution time. In addition to performance/quality analysis, we also include details on the power/energy consumption, and memory traffic of the distributed-memory clustering implementation using real-world graphs with over a billion edges.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124288977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
C to D-Wave: A High-level C Compilation Framework for Quantum Annealers C - to - D-Wave:量子退火器的高级C编译框架
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916231
Mohamed W. Hassan, S. Pakin, Wu-chun Feng
A quantum annealer solves optimization problems by exploiting quantum effects. Problems are represented as Hamiltonian functions that define an energy landscape. The quantum-annealing hardware relaxes to a solution corresponding to the ground state of the energy landscape. Expressing arbitrary programming problems in terms of real-valued Hamiltonian-function coefficients is unintuitive and challenging. This paper addresses the difficulty of programming quantum annealers by presenting a compilation framework that compiles a subset of C code to a quantum machine instruction (QMI) to be executed on a quantum annealer. Our work is based on a modular software stack that facilitates programming D-Wave quantum annealers by successively lowering code from C to Verilog to a symbolic “quantum macro assembly language” and finally to a device-specific Hamiltonian function. We demonstrate the capabilities of our software stack on a set of problems written in C and executed on a D-Wave 2000Q quantum annealer.
量子退火炉利用量子效应来解决最优化问题。问题用哈密顿函数表示,它定义了一个能源格局。量子退火硬件松弛到与能量景观的基态相对应的解。用实值哈密顿函数系数来表示任意规划问题是不直观和具有挑战性的。本文通过提出一个编译框架来解决量子退火炉编程的困难,该框架将C代码子集编译为量子机器指令(QMI),以便在量子退火炉上执行。我们的工作基于模块化软件堆栈,该软件堆栈通过将C语言到Verilog的代码依次降低到象征性的“量子宏汇编语言”,最后降低到特定于设备的哈密顿函数,从而促进了D-Wave量子退火程序的编程。我们在一组用C语言编写并在D-Wave 2000Q量子退火机上执行的问题上展示了我们的软件堆栈的功能。
{"title":"C to D-Wave: A High-level C Compilation Framework for Quantum Annealers","authors":"Mohamed W. Hassan, S. Pakin, Wu-chun Feng","doi":"10.1109/HPEC.2019.8916231","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916231","url":null,"abstract":"A quantum annealer solves optimization problems by exploiting quantum effects. Problems are represented as Hamiltonian functions that define an energy landscape. The quantum-annealing hardware relaxes to a solution corresponding to the ground state of the energy landscape. Expressing arbitrary programming problems in terms of real-valued Hamiltonian-function coefficients is unintuitive and challenging. This paper addresses the difficulty of programming quantum annealers by presenting a compilation framework that compiles a subset of C code to a quantum machine instruction (QMI) to be executed on a quantum annealer. Our work is based on a modular software stack that facilitates programming D-Wave quantum annealers by successively lowering code from C to Verilog to a symbolic “quantum macro assembly language” and finally to a device-specific Hamiltonian function. We demonstrate the capabilities of our software stack on a set of problems written in C and executed on a D-Wave 2000Q quantum annealer.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123432144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Synthesis of Hardware Sandboxes for Trojan Mitigation in Systems on Chip 片上系统木马防护硬件沙箱的综合
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916526
C. Bobda, Taylor J. L. Whitaker, Joel Mandebi Mbongue, S. Saha
In this work, we propose a high-level synthesis approach for hardware sandboxes in system-on-chip. Using interface formalism to capture interactions between non-trusted IPs and trusted parts of a system on chip, along with the properties specification language to specify non-authorized actions of non-trusted IPs, sandboxes are generated and made ready for inclusion as IP in a system-on-chip design. The concepts of composition, compatibility, and refinement are used to capture illegal actions and optimize resources across the boundary of single IPs. We have designed a tool that automatically generates the sandbox and facilitates their integration into system-on-chip. Our approach was validated with benchmarks from trust-hub.com and FPGA implementations. All our results showed 100% Trojan detection and mitigation, with only a minimal increase in resource overhead and no performance decrease.
在这项工作中,我们提出了一种芯片系统中硬件沙箱的高级综合方法。使用接口形式化来捕获芯片上系统的非受信任IP和受信任部分之间的交互,以及使用属性规范语言来指定非受信任IP的非授权操作,生成沙箱并准备将其作为IP包含在片上系统设计中。使用组合、兼容和细化的概念来捕获非法行为,并跨单个ip边界优化资源。我们设计了一个工具,可以自动生成沙盒,并促进其集成到片上系统。我们的方法是验证基准从trust-hub.com和FPGA实现。所有我们的结果显示,100%木马检测和缓解,只有增加最少的资源开销和性能下降。
{"title":"Synthesis of Hardware Sandboxes for Trojan Mitigation in Systems on Chip","authors":"C. Bobda, Taylor J. L. Whitaker, Joel Mandebi Mbongue, S. Saha","doi":"10.1109/HPEC.2019.8916526","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916526","url":null,"abstract":"In this work, we propose a high-level synthesis approach for hardware sandboxes in system-on-chip. Using interface formalism to capture interactions between non-trusted IPs and trusted parts of a system on chip, along with the properties specification language to specify non-authorized actions of non-trusted IPs, sandboxes are generated and made ready for inclusion as IP in a system-on-chip design. The concepts of composition, compatibility, and refinement are used to capture illegal actions and optimize resources across the boundary of single IPs. We have designed a tool that automatically generates the sandbox and facilitates their integration into system-on-chip. Our approach was validated with benchmarks from trust-hub.com and FPGA implementations. All our results showed 100% Trojan detection and mitigation, with only a minimal increase in resource overhead and no performance decrease.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance of Training Sparse Deep Neural Networks on GPUs 稀疏深度神经网络在gpu上的训练性能
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916506
Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li
Deep neural networks have revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to fast store and train them. Over the past few decades, researches have explored the prospect of sparse DNNs before, during, and after training by pruning edges from the underlying topology. After the above operation, the generated neural network is known as a sparse neural network. More recent works have demonstrated the remarkable results that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. Although existing methods ease the situation that high demand for computation resources severely hinders the deployment of large-scale DNNs in resource-constrained devices, DNNs can be trained at a faster speed and lower cost. In this work, we propose a Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of DNNs. FSSL can: (1) learn a compact structure from large sparse DNN to reduce computation cost; (2) obtain a hardware-friendly to accelerate the DNNs evaluation efficiently. Experimental results of the training time and the compression rate show that superior performance and efficiency than the Matlab example code. These speedups are about twice speedups of non-structured sparsity.
深度神经网络通过显着提高各个领域的最新技术,彻底改变了机器学习领域。深度神经网络(dnn)的规模正在迅速超过硬件快速存储和训练它们的能力。在过去的几十年里,研究人员通过从底层拓扑中修剪边缘,探索了稀疏dnn在训练前、训练中和训练后的前景。经过以上操作,生成的神经网络称为稀疏神经网络。最近的工作已经证明了一些显著的结果,即某些稀疏dnn可以在更低的运行时间和存储成本下训练到与密集dnn相同的精度。虽然现有的方法缓解了对计算资源的高需求严重阻碍大规模深度神经网络在资源受限设备上部署的情况,但可以以更快的速度和更低的成本训练深度神经网络。在这项工作中,我们提出了一种微调结构化稀疏学习(FSSL)方法来正则化dnn的结构并加速dnn的训练。FSSL可以:(1)从大型稀疏DNN中学习紧凑结构,降低计算成本;(2)获得一种硬件友好的方法,有效地加速深度神经网络的评估。训练时间和压缩率的实验结果表明,该算法的性能和效率都优于Matlab示例代码。这些加速大约是非结构化稀疏性的两倍。
{"title":"Performance of Training Sparse Deep Neural Networks on GPUs","authors":"Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li","doi":"10.1109/HPEC.2019.8916506","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916506","url":null,"abstract":"Deep neural networks have revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to fast store and train them. Over the past few decades, researches have explored the prospect of sparse DNNs before, during, and after training by pruning edges from the underlying topology. After the above operation, the generated neural network is known as a sparse neural network. More recent works have demonstrated the remarkable results that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. Although existing methods ease the situation that high demand for computation resources severely hinders the deployment of large-scale DNNs in resource-constrained devices, DNNs can be trained at a faster speed and lower cost. In this work, we propose a Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of DNNs. FSSL can: (1) learn a compact structure from large sparse DNN to reduce computation cost; (2) obtain a hardware-friendly to accelerate the DNNs evaluation efficiently. Experimental results of the training time and the compression rate show that superior performance and efficiency than the Matlab example code. These speedups are about twice speedups of non-structured sparsity.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121641754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Update on k-truss Decomposition on GPU 更新了GPU上的k-truss分解
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916285
M. Almasri, Omer Anjum, Carl Pearson, Zaid Qureshi, Vikram Sharma Mailthody, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu
In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single k k-truss implementation, we propose multiple algorithmic optimizations that significantly improve performance by up to 35.2x (6.9x on average) compared to our previous GPU implementation. In addition, we present a scalable multi-GPU implementation in which each GPU handles a different ‘k’ value. Compared to our prior multi-GPU implementation, the proposed approach is faster by up to 151.3x (78.8x on average). In case when the edges with only maximal k-truss are sought, incrementing the ‘k’ value in each iteration is inefficient particularly for graphs with large maximum k-truss. Thus, we propose binary search for the ‘k’ value to find the maximal k-truss. The binary search approach on a single GPU is up to 101.5 (24.3x on average) faster than our 2018 k-truss submission. Lastly, we show that the proposed binary search finds the maximum k-truss for “Twitter“ graph dataset having 2.8 billion bidirectional edges in just 16 minutes on a single V100 GPU.
在本文中,我们对之前提交的2018年图挑战k-桁架分解进行了更新。对于单个k- k-truss实现,我们提出了多个算法优化,与之前的GPU实现相比,显著提高了高达35.2倍(平均6.9倍)的性能。此外,我们提出了一个可扩展的多GPU实现,其中每个GPU处理不同的“k”值。与我们之前的多gpu实现相比,所提出的方法的速度高达151.3倍(平均78.8倍)。在只寻找最大k-truss的边的情况下,在每次迭代中增加k值是低效的,特别是对于具有最大k-truss的图。因此,我们提出二分搜索' k '值,以找到最大的k桁架。在单个GPU上的二进制搜索方法比我们2018年提交的k-truss快101.5(平均24.3倍)。最后,我们证明了所提出的二叉搜索在单个V100 GPU上只需16分钟即可找到具有28亿个双向边的“Twitter”图数据集的最大k-truss。
{"title":"Update on k-truss Decomposition on GPU","authors":"M. Almasri, Omer Anjum, Carl Pearson, Zaid Qureshi, Vikram Sharma Mailthody, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu","doi":"10.1109/HPEC.2019.8916285","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916285","url":null,"abstract":"In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single k k-truss implementation, we propose multiple algorithmic optimizations that significantly improve performance by up to 35.2x (6.9x on average) compared to our previous GPU implementation. In addition, we present a scalable multi-GPU implementation in which each GPU handles a different ‘k’ value. Compared to our prior multi-GPU implementation, the proposed approach is faster by up to 151.3x (78.8x on average). In case when the edges with only maximal k-truss are sought, incrementing the ‘k’ value in each iteration is inefficient particularly for graphs with large maximum k-truss. Thus, we propose binary search for the ‘k’ value to find the maximal k-truss. The binary search approach on a single GPU is up to 101.5 (24.3x on average) faster than our 2018 k-truss submission. Lastly, we show that the proposed binary search finds the maximum k-truss for “Twitter“ graph dataset having 2.8 billion bidirectional edges in just 16 minutes on a single V100 GPU.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Many-target, Many-sensor Ship Tracking and Classification 多目标、多传感器舰船跟踪与分类
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916332
Leonard Kosta, John Irvine, Laura Seaman, H. Xi
Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets.
美国国防部高级研究计划局(DARPA)等政府机构希望了解在具有战略意义的海洋区域航行的船只的数量、位置、轨迹和类型。我们实现了一种多重假设检验算法,利用来自多个传感器的经纬度数据同时跟踪数十艘船舶,然后结合使用行为指纹和深度学习技术,按类型对每艘船舶进行分类。目标的数量是先验未知的。我们在多个数据集上实现了高轨道纯度和高分类精度。
{"title":"Many-target, Many-sensor Ship Tracking and Classification","authors":"Leonard Kosta, John Irvine, Laura Seaman, H. Xi","doi":"10.1109/HPEC.2019.8916332","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916332","url":null,"abstract":"Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122510712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Hardware Security Techniques Targeting Low-Power SoC Designs 针对低功耗SoC设计的硬件安全技术综述
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916486
Alan Ehret, K. Gettings, B. R. Jordan, M. Kinsy
In this work, we survey hardware-based security techniques applicable to low-power system-on-chip designs. Techniques related to a system’s processing elements, volatile main memory and caches, non-volatile memory and on-chip interconnects are examined. Threat models for each subsystem and technique are considered. Performance overheads and other trade-offs for each technique are discussed. Defenses with similar threat models are compared.
在这项工作中,我们调查了适用于低功耗片上系统设计的基于硬件的安全技术。与系统处理元素、易失性主存储器和缓存、非易失性存储器和片上互连相关的技术进行了检查。考虑了各个子系统和技术的威胁模型。讨论了每种技术的性能开销和其他权衡。比较了具有相似威胁模型的防御措施。
{"title":"A Survey on Hardware Security Techniques Targeting Low-Power SoC Designs","authors":"Alan Ehret, K. Gettings, B. R. Jordan, M. Kinsy","doi":"10.1109/HPEC.2019.8916486","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916486","url":null,"abstract":"In this work, we survey hardware-based security techniques applicable to low-power system-on-chip designs. Techniques related to a system’s processing elements, volatile main memory and caches, non-volatile memory and on-chip interconnects are examined. Threat models for each subsystem and technique are considered. Performance overheads and other trade-offs for each technique are discussed. Defenses with similar threat models are compared.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122896317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels 基于Kokkos核的稀疏深度神经网络的可扩展推理
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916378
J. Ellis, S. Rajamanickam
Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.
在过去的十年里,硬件的进步使得训练和推理非常大的深度神经网络成为可能。稀疏化深度神经网络(dnn)在控制精度损失的前提下,可以大大降低标准深度神经网络的存储成本和提高吞吐量。IEEE HPEC稀疏深度神经网络图挑战赛作为算法和实现进步的测试平台,以最大限度地提高稀疏深度神经网络的计算性能。我们将dnn的稀疏网络KK-SpDNN建立在Kokkos内核库中的稀疏线性代数内核上。在Kokkos kernel中使用稀疏矩阵-矩阵乘法允许我们重用高度优化的内核。我们专注于减少12个稀疏网络的单节点和多节点运行时间。我们在英特尔Skylake和Knights Landing架构上测试了KK-SpDNN,发现单节点性能比串行参考实现提高了120-500倍。为了进一步加快网络推理速度,我们采用MPI数据并行模式运行,最终在20个Skylake节点上获得了1.16e+12的边缘处理速率。与我们在单个Skylake节点上高度优化的多线程实现相比,这意味着在20个节点上的速度提高了13倍。
{"title":"Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels","authors":"J. Ellis, S. Rajamanickam","doi":"10.1109/HPEC.2019.8916378","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916378","url":null,"abstract":"Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115584051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
HPEC 2019 Title Page HPEC 2019标题页
Pub Date : 2019-09-01 DOI: 10.1109/hpec.2019.8916315
{"title":"HPEC 2019 Title Page","authors":"","doi":"10.1109/hpec.2019.8916315","DOIUrl":"https://doi.org/10.1109/hpec.2019.8916315","url":null,"abstract":"","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115142422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data-driven framework for uncertainty quantification of a fluidized bed 流化床不确定度量化的数据驱动框架
Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916467
V. Kotteda, Anitha Kommu, Vinod Kumar
We carried out a nondeterministic analysis of flow in a fluidized bed. The flow in the fluidized bed is simulated with National Energy Technology Laboratory’s open-source multiphase fluid dynamics suite MFiX. It does not possess tools for uncertainty quantification. Therefore, we developed a C++ wrapper to integrate an uncertainty quantification toolkit developed at Sandia National Laboratory with MFiX. The wrapper exchanges uncertain input parameters and key output parameters among Dakota and MFiX. However, a data-driven framework is also developed to obtain reliable statistics as it is not feasible to get them with MFiX integrated into Dakota, Dakota-MFiX. The data generated from Dakota-MFiX simulations, with the Latin Hypercube method of sampling size 500, is used to train a machine-learning algorithm. The trained and tested deep neural network algorithm is integrated with Dakota via the wrapper to obtain low order statistics of the bed height and pressure drop across the bed.
我们对流化床中的流动进行了不确定性分析。利用国家能源技术实验室多相流体动力学软件MFiX对流化床内的流动进行了模拟。它不具备不确定度量化的工具。因此,我们开发了一个c++包装器,将桑迪亚国家实验室开发的不确定性量化工具包与MFiX集成在一起。包装器在Dakota和MFiX之间交换不确定的输入参数和关键的输出参数。然而,由于无法将MFiX集成到Dakota, Dakota-MFiX中,因此还开发了一个数据驱动的框架来获得可靠的统计数据。从Dakota-MFiX模拟中生成的数据,采用拉丁超立方体方法,采样大小为500,用于训练机器学习算法。经过训练和测试的深度神经网络算法通过包装器与Dakota集成,以获得床层高度和床层压降的低阶统计数据。
{"title":"A data-driven framework for uncertainty quantification of a fluidized bed","authors":"V. Kotteda, Anitha Kommu, Vinod Kumar","doi":"10.1109/HPEC.2019.8916467","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916467","url":null,"abstract":"We carried out a nondeterministic analysis of flow in a fluidized bed. The flow in the fluidized bed is simulated with National Energy Technology Laboratory’s open-source multiphase fluid dynamics suite MFiX. It does not possess tools for uncertainty quantification. Therefore, we developed a C++ wrapper to integrate an uncertainty quantification toolkit developed at Sandia National Laboratory with MFiX. The wrapper exchanges uncertain input parameters and key output parameters among Dakota and MFiX. However, a data-driven framework is also developed to obtain reliable statistics as it is not feasible to get them with MFiX integrated into Dakota, Dakota-MFiX. The data generated from Dakota-MFiX simulations, with the Latin Hypercube method of sampling size 500, is used to train a machine-learning algorithm. The trained and tested deep neural network algorithm is integrated with Dakota via the wrapper to obtain low order statistics of the bed height and pressure drop across the bed.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126909919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 IEEE High Performance Extreme Computing Conference (HPEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1