ESPT '15最新文献

英文中文

Preventing the explosion of exascale profile data with smart thread-level aggregation 通过智能线程级聚合防止百亿亿级配置文件数据的爆炸

ESPT '15

Pub Date : 2015-11-15 DOI: 10.1145/2832106.2832107

Daniel Lorenz, Sergei Shudler, F. Wolf

State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time.

最先进的性能分析工具，如Score-P，以每个线程为基础记录性能配置文件。然而，对于百亿亿级系统，线程的数量预计将达到10亿个线程，这将导致非常大的性能配置文件。在大多数情况下，用户几乎从不检查每个线程的数据。在本文中，我们建议聚合每个进程中的每线程性能数据，以将其数量减少到合理的大小。我们的目标是聚合线程，这样线程级别的性能问题仍然是可见和可分析的。因此，我们在Score-P中实现了四种聚合策略:(i) SUM——将一个流程的所有线程聚合到一个流程配置文件中;(ii) SET——计算统计关键数据和总和;(iii) KEY——识别性能分析特别感兴趣的三个线程(即关键线程)，并汇总其余线程;(iv) CALLTREE——具有相同调用树结构的线程集群。对于这些策略中的每一种，我们都会评估压缩比以及它们如何维护线程级性能行为信息。聚合不会在应用程序运行时产生任何额外的性能开销。

{"title":"Preventing the explosion of exascale profile data with smart thread-level aggregation","authors":"Daniel Lorenz, Sergei Shudler, F. Wolf","doi":"10.1145/2832106.2832107","DOIUrl":"https://doi.org/10.1145/2832106.2832107","url":null,"abstract":"State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time.","PeriodicalId":424753,"journal":{"name":"ESPT '15","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115631776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

HPC I/O trace extrapolation HPC I/O跟踪外推

ESPT '15

Pub Date : 2015-11-15 DOI: 10.1145/2832106.2832108

Xiaoqing Luo, F. Mueller, P. Carns, John Jenkins, R. Latham, R. Ross, S. Snyder

Today's rapid development of supercomputers has caused I/O performance to become a major performance bottleneck for many scientific applications. Trace analysis tools have thus become vital for diagnosing root causes of I/O problems. This work contributes an I/O tracing framework with elastic traces. After gathering a set of smaller traces, we extrapolate the application trace to a large numbers of nodes. The traces can in principle be extrapolated even beyond the scale of present-day systems. Experiments with I/O benchmarks on up to 320 processors indicate that extrapolated I/O trace replays closely resemble the I/O behavior of equivalent applications.

当今超级计算机的快速发展使得I/O性能成为许多科学应用的主要性能瓶颈。因此，跟踪分析工具对于诊断I/O问题的根本原因变得至关重要。这项工作提供了一个具有弹性跟踪的I/O跟踪框架。在收集了一组较小的跟踪之后，我们将应用程序跟踪推断到大量节点。这些痕迹原则上可以被外推，甚至超越当今系统的规模。在多达320个处理器上进行I/O基准测试的实验表明，推断出的I/O跟踪重播与等效应用程序的I/O行为非常相似。

引用次数: 6

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ESPT '15

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀