首页 > 最新文献

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

英文 中文
Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application 利用小规模历史数据预测大规模高性能计算应用性能
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00135
Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun
Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.
性能建模是高性能计算(HPC)中的一个重要问题。机器学习(ML)是一种强大的高性能计算性能建模方法。机器学习可以从历史执行数据中学习应用程序参数与高性能计算应用程序性能之间的复杂关系。然而,仅使用ML进行小规模执行数据的大规模性能外推是困难的,因为独立和同分布假设(大多数ML算法的基本假设)在这种情况下不成立。为了解决外推问题,我们提出了一个由内插层和外推层组成的两层模型。插值级别预测小规模执行的小规模性能。外推层用其小规模的性能预测来预测固定输入参数的大规模性能。我们使用随机森林建立插值模型来预测插值级别的小尺度性能。在外推层面,为了减少内插误差的负面影响,我们采用多任务套索和聚类来构建可扩展性模型来预测大规模的性能。为了验证两层模型的有效性,我们在一个真实的高性能计算平台上进行了实验。我们使用我们的两级模型为两个HPC应用程序构建模型。与现有的机器学习方法相比,我们的方法可以达到更高的预测精度。
{"title":"Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application","authors":"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun","doi":"10.1109/IPDPSW50202.2020.00135","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00135","url":null,"abstract":"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115382551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Message from the HCW Steering Committee Chair HCW指导委员会主席致辞
Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00008
Behrooz A. Shiraz
These are the proceedings of the “29th Heterogeneity in Computing Workshop,” also known as HCW 2020. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.
这些是“第29届计算异构研讨会”的会议记录,也被称为HCW 2020。几年前,研讨会的名称由原来的“异构计算研讨会”改为“异构计算研讨会”,以反映异构影响的广度,并强调研讨会的重点是异构的管理和利用。当然,所有这些都是在母体会议国际并行和分布式处理研讨会(IPDPS)的背景下进行的,因此探讨了并行和分布式计算系统中的异构性。
{"title":"Message from the HCW Steering Committee Chair","authors":"Behrooz A. Shiraz","doi":"10.1109/ipdpsw50202.2020.00008","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00008","url":null,"abstract":"These are the proceedings of the “29th Heterogeneity in Computing Workshop,” also known as HCW 2020. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127052145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the workshop chairs 来自车间椅子的信息
Scott McMillan, Manoj Kumar, Danai Koutra, M. Halappanavar, T. Mattson, Antonino Tumeo
GrAPL 2020: Workshop on Graphs, Architectures, Programming, and Learning, brings together two closely related topics - how the synthesis (representation) and analysis of graphs is supported in hardware and software, and the ways graph algorithms interact with machine learning. Driven by the natural outgrowth of a wide range of methods used in large-scale data analytics workflows, GrAPL’s scope is broad. GrAPL’2020 is the second edition of the merger between two successful workshop series at IPDPS: GABB and GraML. GABB started at IPDPS’14 with a program of invited-talks and panel discussions. GraML was held at IPDPS in 2017 and 2018.
graphl 2020:图、架构、编程和学习研讨会,汇集了两个密切相关的主题——图的合成(表示)和分析如何在硬件和软件中得到支持,以及图算法与机器学习的交互方式。在大规模数据分析工作流程中使用的各种方法的自然结果的驱动下,GrAPL的范围很广。GrAPL ' 2020是IPDPS两个成功的研讨会系列:GABB和GraML合并的第二版。GABB在IPDPS的第14届会议上启动了一个邀请演讲和小组讨论的项目。GraML于2017年和2018年在IPDPS举行。
{"title":"Message from the workshop chairs","authors":"Scott McMillan, Manoj Kumar, Danai Koutra, M. Halappanavar, T. Mattson, Antonino Tumeo","doi":"10.1109/cahpc.2018.8645919","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645919","url":null,"abstract":"GrAPL 2020: Workshop on Graphs, Architectures, Programming, and Learning, brings together two closely related topics - how the synthesis (representation) and analysis of graphs is supported in hardware and software, and the ways graph algorithms interact with machine learning. Driven by the natural outgrowth of a wide range of methods used in large-scale data analytics workflows, GrAPL’s scope is broad. GrAPL’2020 is the second edition of the merger between two successful workshop series at IPDPS: GABB and GraML. GABB started at IPDPS’14 with a program of invited-talks and panel discussions. GraML was held at IPDPS in 2017 and 2018.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124980019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Parallel CFD Applications on Distributed Memory with Chapel 基于Chapel的分布式内存并行CFD应用开发
Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00110
M. Parenteau, Simon Bourgault-Cote, Frédéric Plante, E. Laurendeau
Traditionally, Computational Fluid Dynamics (CFD) software uses Message Passing Interface (MPI) to handle the parallelism over distributed memory systems. For a new developer, such as a student or a new employee, the barrier of entry can be high and more training is required for each particular software package, which slows down the research process on actual science. The Chapel programming language offers an interesting alternative for research and development of CFD applications.In this paper, the developments of two CFD applications are presented: the first one as an experiment by re-writing a 2D structured flow solver and the second one as writing from scratch a research 3D unstructured multi-physics simulation software. Details are given on both applications with emphasis on the Chapel features which were used positively in the code design, in particular to improve flexibility and extend to distributed memory. Some performance pitfalls are discussed with solutions to avoid them.The performance of the unstructured software is then studied and compared to a traditional open-source CFD software package programmed in C++ with MPI for communication (SU2). The results show that our Chapel implementation achieves performances similar to other CFD software written in C and C++, thus confirming that Chapel is a viable language for high-performance CFD applications.
传统上,计算流体动力学(CFD)软件使用消息传递接口(MPI)来处理分布式存储系统的并行性。对于一个新的开发人员,如学生或新员工,进入的门槛可能很高,每个特定的软件包都需要更多的培训,这减慢了对实际科学的研究过程。Chapel编程语言为CFD应用程序的研究和开发提供了一个有趣的选择。本文介绍了两种CFD应用的发展:第一个是通过重写二维结构化流求解器作为实验,第二个是从头开始编写研究三维非结构化多物理场模拟软件。详细介绍了这两个应用程序,重点介绍了在代码设计中积极使用的Chapel功能,特别是在提高灵活性和扩展到分布式内存方面。讨论了一些性能缺陷以及避免它们的解决方案。然后研究了非结构化软件的性能,并与传统的开源CFD软件包(SU2)进行了比较,该软件包使用c++编程,采用MPI进行通信。结果表明,我们的Chapel实现实现了与其他用C和c++编写的CFD软件相似的性能,从而证实了Chapel是一种可行的高性能CFD应用语言。
{"title":"Development of Parallel CFD Applications on Distributed Memory with Chapel","authors":"M. Parenteau, Simon Bourgault-Cote, Frédéric Plante, E. Laurendeau","doi":"10.1109/ipdpsw50202.2020.00110","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00110","url":null,"abstract":"Traditionally, Computational Fluid Dynamics (CFD) software uses Message Passing Interface (MPI) to handle the parallelism over distributed memory systems. For a new developer, such as a student or a new employee, the barrier of entry can be high and more training is required for each particular software package, which slows down the research process on actual science. The Chapel programming language offers an interesting alternative for research and development of CFD applications.In this paper, the developments of two CFD applications are presented: the first one as an experiment by re-writing a 2D structured flow solver and the second one as writing from scratch a research 3D unstructured multi-physics simulation software. Details are given on both applications with emphasis on the Chapel features which were used positively in the code design, in particular to improve flexibility and extend to distributed memory. Some performance pitfalls are discussed with solutions to avoid them.The performance of the unstructured software is then studied and compared to a traditional open-source CFD software package programmed in C++ with MPI for communication (SU2). The results show that our Chapel implementation achieves performances similar to other CFD software written in C and C++, thus confirming that Chapel is a viable language for high-performance CFD applications.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128701708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acceleration of Structural Analysis Simulations using CNN-based Auto-Tuning of Solver Tolerance 基于cnn自整定求解器公差的结构分析仿真加速
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00134
Amir Haderbache, Koichi Shirahata, T. Yamamoto, Y. Tomita, H. Okuda
With the emergence of AI, we observe a surge of interest in applying machine learning to traditional HPC workloads. An example is the use of surrogate models that approximate the output of scientific simulations at very low latency. However, such a black-box approach usually suffers from significant accuracy loss. An alternative method is to leverage the large amount of data generated at simulations’ runtime to improve the efficiency of numerical methods. However, there is still no clear solution to apply AI inside HPC simulations. Thus, we propose to incorporate AI into structural analysis simulations and develop an auto-tuning of the iterative solver tolerance used in the Newton-Raphson method. We leverage residual data to train a performance model that is aware of the time-accuracy trade-off. By controlling the tuning using AI softmax probability values, we achieve 1.58x acceleration compared to traditional simulations and maintain accuracy with 1e-02 precision.
随着人工智能的出现,人们对将机器学习应用于传统的高性能计算工作负载的兴趣激增。一个例子是使用代理模型,以非常低的延迟近似科学模拟的输出。然而,这种黑盒方法通常会遭受严重的准确性损失。另一种方法是利用模拟运行时生成的大量数据来提高数值方法的效率。然而,在HPC模拟中应用AI仍然没有明确的解决方案。因此,我们建议将人工智能纳入结构分析模拟,并开发牛顿-拉夫森方法中使用的迭代求解器公差的自动调谐。我们利用残差数据来训练一个意识到时间-精度权衡的性能模型。通过使用AI softmax概率值控制调谐,与传统模拟相比,我们实现了1.58倍的加速度,并保持了1e-02的精度。
{"title":"Acceleration of Structural Analysis Simulations using CNN-based Auto-Tuning of Solver Tolerance","authors":"Amir Haderbache, Koichi Shirahata, T. Yamamoto, Y. Tomita, H. Okuda","doi":"10.1109/IPDPSW50202.2020.00134","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00134","url":null,"abstract":"With the emergence of AI, we observe a surge of interest in applying machine learning to traditional HPC workloads. An example is the use of surrogate models that approximate the output of scientific simulations at very low latency. However, such a black-box approach usually suffers from significant accuracy loss. An alternative method is to leverage the large amount of data generated at simulations’ runtime to improve the efficiency of numerical methods. However, there is still no clear solution to apply AI inside HPC simulations. Thus, we propose to incorporate AI into structural analysis simulations and develop an auto-tuning of the iterative solver tolerance used in the Newton-Raphson method. We leverage residual data to train a performance model that is aware of the time-accuracy trade-off. By controlling the tuning using AI softmax probability values, we achieve 1.58x acceleration compared to traditional simulations and maintain accuracy with 1e-02 precision.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129313483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting near-optimal skin distance in Verlet buffer approach for Discrete Element Method 离散元法中Verlet缓冲法的近最优蒙皮距离预测
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00093
Abdoul Wahid Mainassara Checkaraou, Xavier Besseron, A. Rousset, Emmanuel Kieffer, B. Peters
The Verlet list method is a well-known bookkeeping technique of the interaction list used both in Molecular Dynamic (MD) and Discrete Element Method (DEM). The Verlet butter technique is an enhancement of the Verlet list that consists of extending the interaction radius of each particle by an extra margin to take into account more particles in the interaction list. The extra margin is based on the local flow regime of each particle to account for the different flow regimes that can coexist in the domain. However, the choice of the near-optimal extra margin (which ensures the best performance) for each particle and the related parameters remains unexplored in DEM unlike in MD.In this study, we demonstrate that the near-optimal extra margin can fairly be characterised by four parameters that describe each particle local flow regime: the particle velocity, the ratio of the containing cell size to particle size, the containing cell solid fraction, and the total number of particles in the system. For this purpose, we model the near-optimal extra margin as a function of these parameters using a quadratic polynomial function. We use the DAKOTA SOFTWARE to carry out the Design and Analysis of Computer Experiments (DACE) and the sampling of the parameters for the simulations. For a given instance of the set of parameters, a global optimisation method is considered to find the near-optimal extra margin. The latter is required for the construction of the quadratic polynomial model. The numerous simulations generated by the sampling of the parameter were performed on a High-Performance Computing (HPC) environment granting parallel and concurrent executions.This work provides a better understanding of the Verlet butter method in DEM simulations by analysing its performances and behaviour in various configurations. The near-optimal extra margin can reasonably be predicted by two out of the four chosen parameters using the quadratic polynomial model. This model has been integrated in XDEM in order to automatically choose the extra margin without any input from the user. Evaluations on real industrial-level test-cases show up to 26% of reduction of the execution time.
Verlet表法是分子动力学(MD)和离散元法(DEM)中常用的一种著名的相互作用表簿记技术。Verlet黄油技术是对Verlet列表的增强,它包括将每个粒子的相互作用半径扩展一个额外的边界,以考虑相互作用列表中的更多粒子。额外的余量是基于每个粒子的局部流动形式,以考虑不同的流动形式,可以共存于域。然而,与md不同,DEM中对每个颗粒和相关参数的近最优额外裕度(确保最佳性能)的选择仍未进行探索。在本研究中,我们证明了近最优额外裕度可以通过描述每个颗粒局部流动状态的四个参数来描述:颗粒速度、包含细胞尺寸与颗粒尺寸的比例、包含细胞固体分数和系统中的颗粒总数。为此,我们使用二次多项式函数将接近最优的额外余量建模为这些参数的函数。我们使用DAKOTA软件进行计算机实验的设计与分析(DACE)和模拟参数的采样。对于给定的参数集实例,考虑了一种全局优化方法来寻找接近最优的额外余量。后者是构建二次多项式模型所必需的。通过参数采样生成的大量模拟是在授予并行和并发执行的高性能计算(HPC)环境中执行的。本工作通过分析不同配置下的Verlet黄油方法的性能和行为,更好地理解了DEM模拟中的Verlet黄油方法。利用二次多项式模型,选取的4个参数中的2个可以合理地预测出接近最优的额外裕度。该模型已集成在XDEM中,以便在不需要用户输入的情况下自动选择额外的保证金。对实际工业级测试用例的评估显示执行时间减少了26%。
{"title":"Predicting near-optimal skin distance in Verlet buffer approach for Discrete Element Method","authors":"Abdoul Wahid Mainassara Checkaraou, Xavier Besseron, A. Rousset, Emmanuel Kieffer, B. Peters","doi":"10.1109/IPDPSW50202.2020.00093","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00093","url":null,"abstract":"The Verlet list method is a well-known bookkeeping technique of the interaction list used both in Molecular Dynamic (MD) and Discrete Element Method (DEM). The Verlet butter technique is an enhancement of the Verlet list that consists of extending the interaction radius of each particle by an extra margin to take into account more particles in the interaction list. The extra margin is based on the local flow regime of each particle to account for the different flow regimes that can coexist in the domain. However, the choice of the near-optimal extra margin (which ensures the best performance) for each particle and the related parameters remains unexplored in DEM unlike in MD.In this study, we demonstrate that the near-optimal extra margin can fairly be characterised by four parameters that describe each particle local flow regime: the particle velocity, the ratio of the containing cell size to particle size, the containing cell solid fraction, and the total number of particles in the system. For this purpose, we model the near-optimal extra margin as a function of these parameters using a quadratic polynomial function. We use the DAKOTA SOFTWARE to carry out the Design and Analysis of Computer Experiments (DACE) and the sampling of the parameters for the simulations. For a given instance of the set of parameters, a global optimisation method is considered to find the near-optimal extra margin. The latter is required for the construction of the quadratic polynomial model. The numerous simulations generated by the sampling of the parameter were performed on a High-Performance Computing (HPC) environment granting parallel and concurrent executions.This work provides a better understanding of the Verlet butter method in DEM simulations by analysing its performances and behaviour in various configurations. The near-optimal extra margin can reasonably be predicted by two out of the four chosen parameters using the quadratic polynomial model. This model has been integrated in XDEM in order to automatically choose the extra margin without any input from the user. Evaluations on real industrial-level test-cases show up to 26% of reduction of the execution time.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recorder 2.0: Efficient Parallel I/O Tracing and Analysis Recorder 2.0:高效的并行I/O跟踪和分析
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00176
Chen Wang, Jinghan Sun, M. Snir, K. Mohror, Elsa Gonsiorowski
Recorder is a multi-level I/O tracing tool that captures HDF5, MPI-I/O, and POSIX I/O calls. In this paper, we present a new version of Recorder that adds support for most metadata POSIX calls such as stat, link, and rename. We also introduce a compressed tracing format to reduce trace file size and run time overhead incurred from collecting the trace data. Moreover, we add a set of post-mortem and visualization routines to our new version of Recorder that manage the compressed trace data for users. Our experiments with four HPC applications show a file size reduction of over 2× and reduced post-processing time by 20% when using our new compressed trace file format.
Recorder是一个多级I/O跟踪工具,可捕获HDF5, MPI-I/O和POSIX I/O调用。在本文中,我们提出了一个新版本的Recorder,它增加了对大多数元数据POSIX调用的支持,如stat、link和rename。我们还引入了一种压缩的跟踪格式,以减少跟踪文件大小和收集跟踪数据所产生的运行时开销。此外,我们在新版本的Recorder中添加了一组事后分析和可视化例程,用于为用户管理压缩的跟踪数据。我们对四个HPC应用程序的实验表明,当使用我们新的压缩跟踪文件格式时,文件大小减少了2倍以上,后处理时间减少了20%。
{"title":"Recorder 2.0: Efficient Parallel I/O Tracing and Analysis","authors":"Chen Wang, Jinghan Sun, M. Snir, K. Mohror, Elsa Gonsiorowski","doi":"10.1109/IPDPSW50202.2020.00176","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00176","url":null,"abstract":"Recorder is a multi-level I/O tracing tool that captures HDF5, MPI-I/O, and POSIX I/O calls. In this paper, we present a new version of Recorder that adds support for most metadata POSIX calls such as stat, link, and rename. We also introduce a compressed tracing format to reduce trace file size and run time overhead incurred from collecting the trace data. Moreover, we add a set of post-mortem and visualization routines to our new version of Recorder that manage the compressed trace data for users. Our experiments with four HPC applications show a file size reduction of over 2× and reduced post-processing time by 20% when using our new compressed trace file format.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130632915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A Work-Time Optimal Parallel Exhaustive Search Algorithm for the QUBO and the Ising model, with GPU implementation 一种针对QUBO和Ising模型的工作时间最优并行穷举搜索算法,带GPU实现
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00098
Masaki Tao, K. Nakano, Yasuaki Ito, Ryota Yasudo, Masaru Tatekawa, Ryota Katsuki, Takashi Yazane, Yoko Inaba
The main contribution of this paper is to present a simple exhaustive search algorithm for the quadratic un-constraint binary optimization (QUBO) problem. It computes the values of the objective function $E(X)$ for all n-bit input vector X in $O(2^{n})$ time. Since $Omega(2^{n})$ time is necessary to output $E(X)$ for all 2n vectors X, this sequential algorithm is optimal. We also present a work-time optimal parallel algorithm running $O(log n)$ time using $2^{n}/log n$ processors on the CREW-PRAM. This parallel algorithm is work optimal, because the total number of computational operations is equal to the running time of the optimal sequential algorithm. Also, it is time optimal because any parallel algorithm using any large number of processors takes at least $Omega(log n)$ time for evaluating E(X). Further, we have implemented this parallel algorithm to run on the GPU. The experimental results on NVIDIA GeForce RTX 2080Ti GPU show that our GPU implementation runs more than 1000 times faster than the sequential algorithm running on Intel Corei7-8700K CPU(3.70GHz) for the QUBO with n-bit vector whenever n$geq$33. We also compare our exhaustive search parallel algorithm with several non-exhaustive search approaches for solving the QUBO including D-Wave 2000Q quantum annealer, simulated annealing algorithm, and Gurobi optimizer.
本文的主要贡献是提出了一种简单的穷举搜索算法来求解二次型无约束二元优化问题。它在$O(2^{n})$时间内计算所有n位输入向量X的目标函数$E(X)$的值。因为对于所有2n个向量X输出$E(X)$需要$Omega(2^{n})$时间,所以这个顺序算法是最优的。我们还提出了在CREW-PRAM上使用$2^{n}/log n$处理器运行$O(log n)$时间的工作时间优化并行算法。这种并行算法是工作最优的,因为计算操作的总数等于最优顺序算法的运行时间。此外,它是时间最优的,因为使用任何大量处理器的并行算法至少需要$Omega(log n)$时间来计算E(X)。此外,我们已经实现了该并行算法在GPU上运行。在NVIDIA GeForce RTX 2080Ti GPU上的实验结果表明,对于n位向量的QUBO,当n $geq$ 33时,我们的GPU实现比在Intel Corei7-8700K CPU(3.70GHz)上运行的顺序算法快1000倍以上。我们还将穷举搜索并行算法与几种求解QUBO的非穷举搜索方法(包括D-Wave 2000Q量子退火、模拟退火算法和Gurobi优化器)进行了比较。
{"title":"A Work-Time Optimal Parallel Exhaustive Search Algorithm for the QUBO and the Ising model, with GPU implementation","authors":"Masaki Tao, K. Nakano, Yasuaki Ito, Ryota Yasudo, Masaru Tatekawa, Ryota Katsuki, Takashi Yazane, Yoko Inaba","doi":"10.1109/IPDPSW50202.2020.00098","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00098","url":null,"abstract":"The main contribution of this paper is to present a simple exhaustive search algorithm for the quadratic un-constraint binary optimization (QUBO) problem. It computes the values of the objective function $E(X)$ for all n-bit input vector X in $O(2^{n})$ time. Since $Omega(2^{n})$ time is necessary to output $E(X)$ for all 2n vectors X, this sequential algorithm is optimal. We also present a work-time optimal parallel algorithm running $O(log n)$ time using $2^{n}/log n$ processors on the CREW-PRAM. This parallel algorithm is work optimal, because the total number of computational operations is equal to the running time of the optimal sequential algorithm. Also, it is time optimal because any parallel algorithm using any large number of processors takes at least $Omega(log n)$ time for evaluating E(X). Further, we have implemented this parallel algorithm to run on the GPU. The experimental results on NVIDIA GeForce RTX 2080Ti GPU show that our GPU implementation runs more than 1000 times faster than the sequential algorithm running on Intel Corei7-8700K CPU(3.70GHz) for the QUBO with n-bit vector whenever n$geq$33. We also compare our exhaustive search parallel algorithm with several non-exhaustive search approaches for solving the QUBO including D-Wave 2000Q quantum annealer, simulated annealing algorithm, and Gurobi optimizer.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123798679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
I/O Performance of the SX-Aurora TSUBASA SX-Aurora TSUBASA的I/O性能
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00014
M. Yokokawa, Ayano Nakai, K. Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi
File outputs or checkpoints for intermediate results frequently appear at appropriate time intervals in large-scale time-advancement numerical simulations where they are utilized for simulation post-processing and/or for restarting consecutive simulations. However, file input/output (I/O) for large-scale data often takes excessive time due to bandwidth limitations between processors and/or secondary storage systems like hard disk drives (HDDs) and solid state drives (SSDs). Accordingly, efforts are ongoing to reduce the time required for file I/O operations in order to speed up such simulations, which means it is necessary to acquire advanced I/O performance knowledge related to high-performance computing systems used.In this study, I/O performance with respect to the connection bandwidth between the vector host (VH) server and the vector engines (VEs) for three configurations of the SX-Aurora TSUB-ASA supercomputer system, specifically the A300–2, A300–4, and A300–8 configurations, were measured and evaluated. The accelerated I/O function, which is a distinctive feature of the SX-Aurora TSUBASA I/O system, was demonstrated to have excellent performance compared to its normal I/O function.
在大规模时间推进数值模拟中,中间结果的文件输出或检查点经常在适当的时间间隔出现,用于模拟后处理和/或重新启动连续模拟。然而,由于处理器和/或二级存储系统(如硬盘驱动器(hdd)和固态驱动器(ssd))之间的带宽限制,大规模数据的文件输入/输出(I/O)通常需要花费过多的时间。因此,正在努力减少文件I/O操作所需的时间,以加快这种模拟,这意味着有必要获得与所使用的高性能计算系统相关的高级I/O性能知识。在本研究中,测量和评估了SX-Aurora t子- asa超级计算机系统的三种配置,特别是A300-2、A300-4和A300-8配置的矢量主机(VH)服务器和矢量引擎(ve)之间的连接带宽的I/O性能。加速I/O功能是SX-Aurora TSUBASA I/O系统的一个显著特征,与普通I/O功能相比,它具有出色的性能。
{"title":"I/O Performance of the SX-Aurora TSUBASA","authors":"M. Yokokawa, Ayano Nakai, K. Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi","doi":"10.1109/IPDPSW50202.2020.00014","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00014","url":null,"abstract":"File outputs or checkpoints for intermediate results frequently appear at appropriate time intervals in large-scale time-advancement numerical simulations where they are utilized for simulation post-processing and/or for restarting consecutive simulations. However, file input/output (I/O) for large-scale data often takes excessive time due to bandwidth limitations between processors and/or secondary storage systems like hard disk drives (HDDs) and solid state drives (SSDs). Accordingly, efforts are ongoing to reduce the time required for file I/O operations in order to speed up such simulations, which means it is necessary to acquire advanced I/O performance knowledge related to high-performance computing systems used.In this study, I/O performance with respect to the connection bandwidth between the vector host (VH) server and the vector engines (VEs) for three configurations of the SX-Aurora TSUB-ASA supercomputer system, specifically the A300–2, A300–4, and A300–8 configurations, were measured and evaluated. The accelerated I/O function, which is a distinctive feature of the SX-Aurora TSUBASA I/O system, was demonstrated to have excellent performance compared to its normal I/O function.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123309484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multiperspective Automotive Labeling 多视角汽车标签
Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00155
Luke Jacobs, Akhil Kodumuri, Jim James, Seongha Park, Yongho Kim
Supervised machine learning techniques inherently rely on datasets to be trained. With image datasets traditionally being annotated by humans, many advancements in image annotation tools have been made to ensure creation of rich datasets with accurate labels. Nevertheless, users still find it challenging to create and use their own datasets with labels that reflect their problem domain. We propose a streamlined labeling process that aligns multiperspective images and allows a transition from a labeled perspective to other perspectives. The main goal of this work is to reduce the human effort required for labeling vehicle images under favorable conditions where the image perspectives are correlated and one or more perspectives are known. A case study is described and analyzed to show the effectiveness of the process, as well as constraints and limitations when applied to other cases.
监督式机器学习技术本质上依赖于要训练的数据集。由于图像数据集传统上是由人类注释的,因此图像注释工具取得了许多进步,以确保创建具有准确标签的丰富数据集。然而,用户仍然发现创建和使用他们自己的带有反映他们问题领域的标签的数据集是具有挑战性的。我们提出了一种简化的标记过程,可以对齐多视角图像,并允许从标记视角过渡到其他视角。这项工作的主要目标是在图像视角相关且已知一个或多个视角的有利条件下,减少标记车辆图像所需的人力。对案例研究进行描述和分析,以显示过程的有效性,以及应用于其他案例时的约束和限制。
{"title":"Multiperspective Automotive Labeling","authors":"Luke Jacobs, Akhil Kodumuri, Jim James, Seongha Park, Yongho Kim","doi":"10.1109/ipdpsw50202.2020.00155","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00155","url":null,"abstract":"Supervised machine learning techniques inherently rely on datasets to be trained. With image datasets traditionally being annotated by humans, many advancements in image annotation tools have been made to ensure creation of rich datasets with accurate labels. Nevertheless, users still find it challenging to create and use their own datasets with labels that reflect their problem domain. We propose a streamlined labeling process that aligns multiperspective images and allows a transition from a labeled perspective to other perspectives. The main goal of this work is to reduce the human effort required for labeling vehicle images under favorable conditions where the image perspectives are correlated and one or more perspectives are known. A case study is described and analyzed to show the effectiveness of the process, as well as constraints and limitations when applied to other cases.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126468737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1