首页 > 最新文献

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)最新文献

英文 中文
Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes Mr. Scan:使用基于GPGPU节点的树状网络的基于极端规模密度的集群
Benjamin Welton, Evan Samanas, B. Miller
Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most wellknown density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN's computation over dense data regions. We tested Mr. Scan on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.
基于密度的聚类算法是一种广泛使用的数据挖掘技术,它可以发现不规则形状的聚类和聚类数据,而无需事先知道它包含的聚类数量。DBSCAN是最著名的基于密度的聚类算法。我们介绍我们的DBSCAN版本,称为Mr. Scan,它使用混合并行实现,将基于MRNet树的分布网络与配备gpgpu的节点相结合。Mr. Scan通过有效地划分点空间和优化DBSCAN在密集数据区域上的计算,避免了现有实现的问题。我们在Twitter的地理定位数据集和斯隆数字巡天(Sloan Digital Sky Survey)获得的图像数据上对Scan先生进行了测试。在最大规模的情况下,Scan在17.3分钟内在Cray Titan的8,192个GPU节点上从Twitter数据集中聚集了65亿个点。所有其他并行DBSCAN实现都只展示了最多可集群1亿个点的能力。
{"title":"Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes","authors":"Benjamin Welton, Evan Samanas, B. Miller","doi":"10.1145/2503210.2503262","DOIUrl":"https://doi.org/10.1145/2503210.2503262","url":null,"abstract":"Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most wellknown density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN's computation over dense data regions. We tested Mr. Scan on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124149674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing Intel®事务性同步扩展的高性能计算性能评估
Richard M. Yoo, C. Hughes, K. Lai, Ravi Rajwar
Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.
英特尔最近在英特尔第四代酷睿™处理器中推出了英特尔®事务性同步扩展(英特尔®TSX)。在Intel TSX中,处理器可以动态地确定线程是否需要通过锁保护的临界区进行序列化。在本文中,我们使用一组高性能计算(HPC)工作负载评估了英特尔TSX的第一个硬件实现,并证明将英特尔TSX应用于这些工作负载可以提供显着的性能改进。在一组真实的HPC工作负载上,应用英特尔TSX提供了1.41倍的平均加速。当应用于并行用户级TCP/IP堆栈时,英特尔TSX在网络密集型应用程序上提供了1.31倍的平均带宽改进。我们还演示了如何轻松地将英特尔TSX应用于各种工作负载。
{"title":"Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing","authors":"Richard M. Yoo, C. Hughes, K. Lai, Ravi Rajwar","doi":"10.1145/2503210.2503232","DOIUrl":"https://doi.org/10.1145/2503210.2503232","url":null,"abstract":"Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129077091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 277
Precimonious: Tuning assistant for floating-point precision Precimonious:浮点精度调优助手
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, J. Demmel, W. Kahan, Koushik Sen, D. Bailey, Costin Iancu, David G. Hough
Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we present Precimonious, a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs. Precimonious performs a search on the types of the floating-point program variables trying to lower their precision subject to accuracy constraints and performance goals. Our tool recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. We evaluate Precimonious on several widely used functions from the GNU Scientific Library, two NAS Parallel Benchmarks, and three other numerical programs. For most of the programs analyzed, Precimonious reduces precision, which results in performance improvements as high as 41%.
考虑到可能发生的各种数值错误,浮点程序很难编写、测试和调试。没有高级数值分析背景的开发人员通常采用的一种做法是使用最高可用精度。虽然更健壮,但这会显著降低程序性能。在本文中,我们提出了Precimonious,一个动态程序分析工具,以帮助开发人员调整浮点程序的精度。Precimonious对浮点程序变量的类型进行搜索,试图根据精度限制和性能目标降低它们的精度。我们的工具推荐一种类型实例化,它使用较低的精度,同时产生足够准确的答案,而不会导致异常。我们在GNU科学库的几个广泛使用的函数、两个NAS并行基准测试和三个其他数值程序上评估precimious。对于所分析的大多数程序,Precimonious降低了精度,从而导致性能提高高达41%。
{"title":"Precimonious: Tuning assistant for floating-point precision","authors":"Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, J. Demmel, W. Kahan, Koushik Sen, D. Bailey, Costin Iancu, David G. Hough","doi":"10.1145/2503210.2503296","DOIUrl":"https://doi.org/10.1145/2503210.2503296","url":null,"abstract":"Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we present Precimonious, a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs. Precimonious performs a search on the types of the floating-point program variables trying to lower their precision subject to accuracy constraints and performance goals. Our tool recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. We evaluate Precimonious on several widely used functions from the GNU Scientific Library, two NAS Parallel Benchmarks, and three other numerical programs. For most of the programs analyzed, Precimonious reduces precision, which results in performance improvements as high as 41%.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116674858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 285
2HOT: An improved parallel hashed oct-tree N-Body algorithm for cosmological simulation 一种改进的并行哈希oct-tree N-Body宇宙学模拟算法
Michael S. Warren
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (218) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (40963) particle cosmological simulations, accounting for 4 × 1020 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.
我们报告了在过去二十年中对自适应树码n体方法(HOT)的改进。描述了宇宙n体问题的数学和计算方法,其性能和可扩展性可测量到256k(218)个处理器。我们给出了一系列超过100 690亿(40963)个粒子宇宙学模拟的误差分析和科学应用结果,这些模拟占4 × 1020个浮点运算。这些结果包括首次使用普朗克卫星对宇宙学标准模型的新约束进行的模拟。我们的模拟设定了准确性和科学吞吐量的新标准,同时达到或超过最新一代混合TreePM n体方法的计算效率。
{"title":"2HOT: An improved parallel hashed oct-tree N-Body algorithm for cosmological simulation","authors":"Michael S. Warren","doi":"10.1145/2503210.2503220","DOIUrl":"https://doi.org/10.1145/2503210.2503220","url":null,"abstract":"We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (218) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (40963) particle cosmological simulations, accounting for 4 × 1020 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124746603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
期刊
2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1