首页 > 最新文献

2012 IEEE Conference on High Performance Extreme Computing最新文献

英文 中文
A MATLAB-to-target development workflow using Sourcery VSIPL++ 一个matlab到目标的开发工作流程,使用Sourcery vspl++
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408682
S. Seefeld, Faheem Sheikh, B. Moses
A hybrid MATLAB/C++ programming model for high performance embedded computing is presented. It is shown how the use of a common data model and API can help not only to speed up the development process, but also to keep the original MATLAB model in sync with the evolving C++ code, and thus allowing it to remain a gold standard for the project as it evolves.
提出了一种用于高性能嵌入式计算的MATLAB/ c++混合编程模型。它显示了如何使用通用数据模型和API不仅可以帮助加快开发过程,还可以保持原始MATLAB模型与不断发展的c++代码同步,从而使其在项目发展过程中保持黄金标准。
{"title":"A MATLAB-to-target development workflow using Sourcery VSIPL++","authors":"S. Seefeld, Faheem Sheikh, B. Moses","doi":"10.1109/HPEC.2012.6408682","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408682","url":null,"abstract":"A hybrid MATLAB/C++ programming model for high performance embedded computing is presented. It is shown how the use of a common data model and API can help not only to speed up the development process, but also to keep the original MATLAB model in sync with the evolving C++ code, and thus allowing it to remain a gold standard for the project as it evolves.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130883370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cluster-based 3D reconstruction of aerial video 基于聚类的航拍视频三维重建
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408681
S. Sawyer, K. Ni, N. Bliss
Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.
尽管在该领域有很多活跃的研究,但使用运动结构(SfM)进行大规模3D场景重建仍然是非常具有计算挑战性的。我们提出了一种高效的、可扩展的处理链,设计用于集群计算,并适用于航空视频。稀疏束调整步骤是通过对输入图像集进行分割,并行生成独立的点云,然后对云进行融合,合并重复点来完成,这是一个迭代且难以并行化的步骤。我们将此处理链与领先的并行SfM实现进行比较,后者在各种矩阵操作中利用细粒度并行性,并且不设计用于扩展到带有GPU的多核工作站之外。我们展示了我们基于集群的方法在可扩展性和运行时方面提供了显著的改进,同时产生了可比的点云密度和更准确的点位置估计。
{"title":"Cluster-based 3D reconstruction of aerial video","authors":"S. Sawyer, K. Ni, N. Bliss","doi":"10.1109/HPEC.2012.6408681","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408681","url":null,"abstract":"Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133678898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation 基于逐单元实现的gpu有限元模型求解的高局部性和节点内并行性
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408659
I. Kiss, Z. Badics, S. Gyimóthy, J. Pávó
The utilization of Graphical Processing Units (GPUs) for the element-by-element (EbE) finite element method (FEM) is demonstrated. EbE FEM is a long known technique, by which a conjugate gradient (CG) type iterative solution scheme can be entirely decomposed into computations on the element level, i.e., without assembling the global system matrix. In our implementation, NVIDIA's parallel computing solution, the Compute Unified Device Architecture (CUDA), is used to perform the required element-wise computations in parallel. Since element matrices need not be stored, the memory requirement can be kept extremely low. It is shown that this low-storage but computation-intensive technique is better suited for GPUs than those requiring the massive manipulation of large data sets. This study of the proposed parallel model illustrates a highly improved locality and minimization of data movement, which could also significantly reduce energy consumption in other heterogeneous HPC architectures.
演示了图形处理单元(gpu)在逐单元有限元法(FEM)中的应用。EbE有限元法是一种众所周知的技术,它可以将共轭梯度(CG)型迭代求解方案完全分解为单元级的计算,即不需要组装全局系统矩阵。在我们的实现中,NVIDIA的并行计算解决方案,即计算统一设备架构(CUDA),用于并行执行所需的元素计算。由于不需要存储元素矩阵,因此内存需求可以保持极低。结果表明,这种低存储但计算密集型的技术比那些需要大量操作大型数据集的技术更适合gpu。本研究提出的并行模型显示了高度改进的局部性和最小化数据移动,这也可以显着降低其他异构HPC架构的能耗。
{"title":"High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation","authors":"I. Kiss, Z. Badics, S. Gyimóthy, J. Pávó","doi":"10.1109/HPEC.2012.6408659","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408659","url":null,"abstract":"The utilization of Graphical Processing Units (GPUs) for the element-by-element (EbE) finite element method (FEM) is demonstrated. EbE FEM is a long known technique, by which a conjugate gradient (CG) type iterative solution scheme can be entirely decomposed into computations on the element level, i.e., without assembling the global system matrix. In our implementation, NVIDIA's parallel computing solution, the Compute Unified Device Architecture (CUDA), is used to perform the required element-wise computations in parallel. Since element matrices need not be stored, the memory requirement can be kept extremely low. It is shown that this low-storage but computation-intensive technique is better suited for GPUs than those requiring the massive manipulation of large data sets. This study of the proposed parallel model illustrates a highly improved locality and minimization of data movement, which could also significantly reduce energy consumption in other heterogeneous HPC architectures.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Benchmarking parallel eigen decomposition for residuals analysis of very large graphs 超大型图残差分析的基准并行特征分解
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408677
E. Rutledge, B. A. Miller, M. Beard
Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes.
图分析在许多领域都有应用,从社会科学到物理和工程。一类重要的图分析算法的计算驱动是图的矩阵表示的前导特征向量的计算。本文探讨了使用商品集群硬件和免费的特征求解器软件对有向图的对称模块化矩阵进行特征分解的计算含义,用于具有100万到10亿个顶点和800万到80亿个边的图。在处理这些大小的图时,并行特征求解器是特别有趣的。我们的结果表明,基于图残差特征空间分析的图分析方法即使对于这些大小的图也是可行的。
{"title":"Benchmarking parallel eigen decomposition for residuals analysis of very large graphs","authors":"E. Rutledge, B. A. Miller, M. Beard","doi":"10.1109/HPEC.2012.6408677","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408677","url":null,"abstract":"Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Anatomy of a globally recursive embedded LINPACK benchmark 解析全局递归嵌入式LINPACK基准
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408679
J. Dongarra, P. Luszczek
We present a complete bottom-up implementation of an embedded LINPACK benchmark on iPad 2. We use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. We be believe our new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. We show a assembly API that allows us a much higher level of abstraction and provides rapid code development within the confines of mobile device SDK. We use performance modeling to help with the limitation of the device and the limited access to device from the development environment not geared for HPC application tuning.
我们在ipad2上给出了一个完整的自底向上的嵌入式LINPACK基准测试实现。我们使用了一种新的递归LU分解公式,它在全局范围内是递归和并行的。我们相信我们的新算法提供了一种替代现有的线性代数并行化技术,如master-worker和基于dag的方法。我们展示了一个汇编API,它允许我们进行更高层次的抽象,并在移动设备SDK的范围内提供快速的代码开发。我们使用性能建模来帮助解决设备的局限性,以及不适合HPC应用程序调优的开发环境对设备的有限访问。
{"title":"Anatomy of a globally recursive embedded LINPACK benchmark","authors":"J. Dongarra, P. Luszczek","doi":"10.1109/HPEC.2012.6408679","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408679","url":null,"abstract":"We present a complete bottom-up implementation of an embedded LINPACK benchmark on iPad 2. We use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. We be believe our new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. We show a assembly API that allows us a much higher level of abstraction and provides rapid code development within the confines of mobile device SDK. We use performance modeling to help with the limitation of the device and the limited access to device from the development environment not geared for HPC application tuning.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131966459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Large scale network situational awareness via 3D gaming technology 基于3D游戏技术的大规模网络态势感知
Pub Date : 2012-09-01 DOI: 10.1109/HPEC.2012.6408670
M. Hubbell, J. Kepner
Obtaining situational awareness of network activity across an enterprise presents unique visualization challenges. IT analysts are required to quickly gather and correlate large volumes of disparate data to identify the existence of anomalous behavior. This paper will show how the MIT Lincoln Laboratory LLGrid Team has approached obtaining network situational awareness utilizing the Unity 3D video game engine. We have developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter (FPS) to demonstrate a virtual depiction of the current state of the network and the machines operating on the network. Within the game or virtual world an analyst or player can gather critical information on all network assets as well as perform physical system actions on machines in question. 3D gaming technology provides tools to create an environment that is both visually familiar to the player as well display immense amounts of system data in a meaningful and easy to absorb format. Our prototype system was able to monitor and display 5000 assets in ~10% of the time of our network time window.
获得跨企业网络活动的态势感知提出了独特的可视化挑战。IT分析师需要快速收集和关联大量不同的数据,以识别异常行为的存在。本文将展示麻省理工学院林肯实验室LLGrid团队如何利用Unity 3D视频游戏引擎获得网络态势感知。我们以网络多人第一人称射击游戏(FPS)的形式开发了一个实体工厂的3D环境,以展示网络当前状态和网络上运行的机器的虚拟描述。在游戏或虚拟世界中,分析师或玩家可以收集有关所有网络资产的关键信息,并在相关机器上执行物理系统操作。3D游戏技术提供了一种工具,能够创造出玩家在视觉上熟悉的环境,并以一种有意义且易于理解的格式显示大量系统数据。我们的原型系统能够在网络时间窗口的10%左右的时间内监控和显示5000个资产。
{"title":"Large scale network situational awareness via 3D gaming technology","authors":"M. Hubbell, J. Kepner","doi":"10.1109/HPEC.2012.6408670","DOIUrl":"https://doi.org/10.1109/HPEC.2012.6408670","url":null,"abstract":"Obtaining situational awareness of network activity across an enterprise presents unique visualization challenges. IT analysts are required to quickly gather and correlate large volumes of disparate data to identify the existence of anomalous behavior. This paper will show how the MIT Lincoln Laboratory LLGrid Team has approached obtaining network situational awareness utilizing the Unity 3D video game engine. We have developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter (FPS) to demonstrate a virtual depiction of the current state of the network and the machines operating on the network. Within the game or virtual world an analyst or player can gather critical information on all network assets as well as perform physical system actions on machines in question. 3D gaming technology provides tools to create an environment that is both visually familiar to the player as well display immense amounts of system data in a meaningful and easy to absorb format. Our prototype system was able to monitor and display 5000 assets in ~10% of the time of our network time window.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2012 IEEE Conference on High Performance Extreme Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1