首页 > 最新文献

2012 IEEE Conference on High Performance Extreme Computing最新文献

英文 中文
Optimized parallel distribution load flow solver on commodity multi-core CPU 基于商用多核CPU的优化并行分配负载流求解器
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408675
Tao Cui, F. Franchetti
Solving a large number of load flow problems quickly is required for Monte Carlo analysis and various power system problems, including long term steady state simulation, system benchmarking, among others. Due to the computational burden, such applications are considered to be time-consuming, and infeasible for online or realtime application. In this work we developed a high performance framework for high throughput distribution load flow computation, taking advantage of performance-enhancing features of multi-core CPUs and various code optimization techniques. We optimized data structures to better fit the memory hierarchy. We use the SPIRAL code generator to exploit inherent patterns of the load flow model through code specizlization. We use SIMD instructions and multithreading to parallelize our solver. Finally, we designed a Monte Carlo thread scheduling infrastructure to enable real time operation. The optimized solver is able to achieve more than 50% of peak performance on a Intel Core i7 CPU, which translates to solving millions of load flow problems within a second for IEEE 37 test feeder.
蒙特卡罗分析和各种电力系统问题,包括长期稳态仿真、系统基准测试等,都需要快速解决大量的潮流问题。由于计算负担,这种应用程序被认为是耗时的,并且不适合在线或实时应用。在这项工作中,我们开发了一个高性能框架,用于高吞吐量分布负载流计算,利用多核cpu的性能增强特性和各种代码优化技术。我们优化了数据结构以更好地适应内存层次结构。我们使用螺旋代码生成器通过代码专门化来开发负载流模型的固有模式。我们使用SIMD指令和多线程来并行化我们的求解器。最后,我们设计了一个蒙特卡罗线程调度基础架构来实现实时操作。优化后的求解器能够在英特尔酷睿i7 CPU上实现超过50%的峰值性能,这意味着在一秒钟内解决IEEE 37测试馈线的数百万个负载流问题。
{"title":"Optimized parallel distribution load flow solver on commodity multi-core CPU","authors":"Tao Cui, F. Franchetti","doi":"10.1109/HPEC.2012.6408675","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408675","url":null,"abstract":"Solving a large number of load flow problems quickly is required for Monte Carlo analysis and various power system problems, including long term steady state simulation, system benchmarking, among others. Due to the computational burden, such applications are considered to be time-consuming, and infeasible for online or realtime application. In this work we developed a high performance framework for high throughput distribution load flow computation, taking advantage of performance-enhancing features of multi-core CPUs and various code optimization techniques. We optimized data structures to better fit the memory hierarchy. We use the SPIRAL code generator to exploit inherent patterns of the load flow model through code specizlization. We use SIMD instructions and multithreading to parallelize our solver. Finally, we designed a Monte Carlo thread scheduling infrastructure to enable real time operation. The optimized solver is able to achieve more than 50% of peak performance on a Intel Core i7 CPU, which translates to solving millions of load flow problems within a second for IEEE 37 test feeder.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128322026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A MATLAB-to-target development workflow using Sourcery VSIPL++ 一个matlab到目标的开发工作流程,使用Sourcery vspl++
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408682
S. Seefeld, Faheem Sheikh, B. Moses
A hybrid MATLAB/C++ programming model for high performance embedded computing is presented. It is shown how the use of a common data model and API can help not only to speed up the development process, but also to keep the original MATLAB model in sync with the evolving C++ code, and thus allowing it to remain a gold standard for the project as it evolves.
提出了一种用于高性能嵌入式计算的MATLAB/ c++混合编程模型。它显示了如何使用通用数据模型和API不仅可以帮助加快开发过程,还可以保持原始MATLAB模型与不断发展的c++代码同步,从而使其在项目发展过程中保持黄金标准。
{"title":"A MATLAB-to-target development workflow using Sourcery VSIPL++","authors":"S. Seefeld, Faheem Sheikh, B. Moses","doi":"10.1109/HPEC.2012.6408682","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408682","url":null,"abstract":"A hybrid MATLAB/C++ programming model for high performance embedded computing is presented. It is shown how the use of a common data model and API can help not only to speed up the development process, but also to keep the original MATLAB model in sync with the evolving C++ code, and thus allowing it to remain a gold standard for the project as it evolves.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130883370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation 基于逐单元实现的gpu有限元模型求解的高局部性和节点内并行性
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408659
I. Kiss, Z. Badics, S. Gyimóthy, J. Pávó
The utilization of Graphical Processing Units (GPUs) for the element-by-element (EbE) finite element method (FEM) is demonstrated. EbE FEM is a long known technique, by which a conjugate gradient (CG) type iterative solution scheme can be entirely decomposed into computations on the element level, i.e., without assembling the global system matrix. In our implementation, NVIDIA's parallel computing solution, the Compute Unified Device Architecture (CUDA), is used to perform the required element-wise computations in parallel. Since element matrices need not be stored, the memory requirement can be kept extremely low. It is shown that this low-storage but computation-intensive technique is better suited for GPUs than those requiring the massive manipulation of large data sets. This study of the proposed parallel model illustrates a highly improved locality and minimization of data movement, which could also significantly reduce energy consumption in other heterogeneous HPC architectures.
演示了图形处理单元(gpu)在逐单元有限元法(FEM)中的应用。EbE有限元法是一种众所周知的技术,它可以将共轭梯度(CG)型迭代求解方案完全分解为单元级的计算,即不需要组装全局系统矩阵。在我们的实现中,NVIDIA的并行计算解决方案,即计算统一设备架构(CUDA),用于并行执行所需的元素计算。由于不需要存储元素矩阵,因此内存需求可以保持极低。结果表明,这种低存储但计算密集型的技术比那些需要大量操作大型数据集的技术更适合gpu。本研究提出的并行模型显示了高度改进的局部性和最小化数据移动,这也可以显着降低其他异构HPC架构的能耗。
{"title":"High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation","authors":"I. Kiss, Z. Badics, S. Gyimóthy, J. Pávó","doi":"10.1109/HPEC.2012.6408659","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408659","url":null,"abstract":"The utilization of Graphical Processing Units (GPUs) for the element-by-element (EbE) finite element method (FEM) is demonstrated. EbE FEM is a long known technique, by which a conjugate gradient (CG) type iterative solution scheme can be entirely decomposed into computations on the element level, i.e., without assembling the global system matrix. In our implementation, NVIDIA's parallel computing solution, the Compute Unified Device Architecture (CUDA), is used to perform the required element-wise computations in parallel. Since element matrices need not be stored, the memory requirement can be kept extremely low. It is shown that this low-storage but computation-intensive technique is better suited for GPUs than those requiring the massive manipulation of large data sets. This study of the proposed parallel model illustrates a highly improved locality and minimization of data movement, which could also significantly reduce energy consumption in other heterogeneous HPC architectures.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Benchmarking parallel eigen decomposition for residuals analysis of very large graphs 超大型图残差分析的基准并行特征分解
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408677
E. Rutledge, B. A. Miller, M. Beard
Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes.
图分析在许多领域都有应用,从社会科学到物理和工程。一类重要的图分析算法的计算驱动是图的矩阵表示的前导特征向量的计算。本文探讨了使用商品集群硬件和免费的特征求解器软件对有向图的对称模块化矩阵进行特征分解的计算含义,用于具有100万到10亿个顶点和800万到80亿个边的图。在处理这些大小的图时,并行特征求解器是特别有趣的。我们的结果表明,基于图残差特征空间分析的图分析方法即使对于这些大小的图也是可行的。
{"title":"Benchmarking parallel eigen decomposition for residuals analysis of very large graphs","authors":"E. Rutledge, B. A. Miller, M. Beard","doi":"10.1109/HPEC.2012.6408677","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408677","url":null,"abstract":"Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Anatomy of a globally recursive embedded LINPACK benchmark 解析全局递归嵌入式LINPACK基准
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408679
J. Dongarra, P. Luszczek
We present a complete bottom-up implementation of an embedded LINPACK benchmark on iPad 2. We use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. We be believe our new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. We show a assembly API that allows us a much higher level of abstraction and provides rapid code development within the confines of mobile device SDK. We use performance modeling to help with the limitation of the device and the limited access to device from the development environment not geared for HPC application tuning.
我们在ipad2上给出了一个完整的自底向上的嵌入式LINPACK基准测试实现。我们使用了一种新的递归LU分解公式,它在全局范围内是递归和并行的。我们相信我们的新算法提供了一种替代现有的线性代数并行化技术,如master-worker和基于dag的方法。我们展示了一个汇编API,它允许我们进行更高层次的抽象,并在移动设备SDK的范围内提供快速的代码开发。我们使用性能建模来帮助解决设备的局限性,以及不适合HPC应用程序调优的开发环境对设备的有限访问。
{"title":"Anatomy of a globally recursive embedded LINPACK benchmark","authors":"J. Dongarra, P. Luszczek","doi":"10.1109/HPEC.2012.6408679","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408679","url":null,"abstract":"We present a complete bottom-up implementation of an embedded LINPACK benchmark on iPad 2. We use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. We be believe our new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. We show a assembly API that allows us a much higher level of abstraction and provides rapid code development within the confines of mobile device SDK. We use performance modeling to help with the limitation of the device and the limited access to device from the development environment not geared for HPC application tuning.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131966459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Large scale network situational awareness via 3D gaming technology 基于3D游戏技术的大规模网络态势感知
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408670
M. Hubbell, J. Kepner
Obtaining situational awareness of network activity across an enterprise presents unique visualization challenges. IT analysts are required to quickly gather and correlate large volumes of disparate data to identify the existence of anomalous behavior. This paper will show how the MIT Lincoln Laboratory LLGrid Team has approached obtaining network situational awareness utilizing the Unity 3D video game engine. We have developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter (FPS) to demonstrate a virtual depiction of the current state of the network and the machines operating on the network. Within the game or virtual world an analyst or player can gather critical information on all network assets as well as perform physical system actions on machines in question. 3D gaming technology provides tools to create an environment that is both visually familiar to the player as well display immense amounts of system data in a meaningful and easy to absorb format. Our prototype system was able to monitor and display 5000 assets in ~10% of the time of our network time window.
获得跨企业网络活动的态势感知提出了独特的可视化挑战。IT分析师需要快速收集和关联大量不同的数据,以识别异常行为的存在。本文将展示麻省理工学院林肯实验室LLGrid团队如何利用Unity 3D视频游戏引擎获得网络态势感知。我们以网络多人第一人称射击游戏(FPS)的形式开发了一个实体工厂的3D环境,以展示网络当前状态和网络上运行的机器的虚拟描述。在游戏或虚拟世界中,分析师或玩家可以收集有关所有网络资产的关键信息,并在相关机器上执行物理系统操作。3D游戏技术提供了一种工具,能够创造出玩家在视觉上熟悉的环境,并以一种有意义且易于理解的格式显示大量系统数据。我们的原型系统能够在网络时间窗口的10%左右的时间内监控和显示5000个资产。
{"title":"Large scale network situational awareness via 3D gaming technology","authors":"M. Hubbell, J. Kepner","doi":"10.1109/HPEC.2012.6408670","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408670","url":null,"abstract":"Obtaining situational awareness of network activity across an enterprise presents unique visualization challenges. IT analysts are required to quickly gather and correlate large volumes of disparate data to identify the existence of anomalous behavior. This paper will show how the MIT Lincoln Laboratory LLGrid Team has approached obtaining network situational awareness utilizing the Unity 3D video game engine. We have developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter (FPS) to demonstrate a virtual depiction of the current state of the network and the machines operating on the network. Within the game or virtual world an analyst or player can gather critical information on all network assets as well as perform physical system actions on machines in question. 3D gaming technology provides tools to create an environment that is both visually familiar to the player as well display immense amounts of system data in a meaningful and easy to absorb format. Our prototype system was able to monitor and display 5000 assets in ~10% of the time of our network time window.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2012 IEEE Conference on High Performance Extreme Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1