Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00135
Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun
Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.
{"title":"Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application","authors":"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun","doi":"10.1109/IPDPSW50202.2020.00135","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00135","url":null,"abstract":"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115382551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00008
Behrooz A. Shiraz
These are the proceedings of the “29th Heterogeneity in Computing Workshop,” also known as HCW 2020. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.
{"title":"Message from the HCW Steering Committee Chair","authors":"Behrooz A. Shiraz","doi":"10.1109/ipdpsw50202.2020.00008","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00008","url":null,"abstract":"These are the proceedings of the “29th Heterogeneity in Computing Workshop,” also known as HCW 2020. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127052145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/cahpc.2018.8645919
Scott McMillan, Manoj Kumar, Danai Koutra, M. Halappanavar, T. Mattson, Antonino Tumeo
GrAPL 2020: Workshop on Graphs, Architectures, Programming, and Learning, brings together two closely related topics - how the synthesis (representation) and analysis of graphs is supported in hardware and software, and the ways graph algorithms interact with machine learning. Driven by the natural outgrowth of a wide range of methods used in large-scale data analytics workflows, GrAPL’s scope is broad. GrAPL’2020 is the second edition of the merger between two successful workshop series at IPDPS: GABB and GraML. GABB started at IPDPS’14 with a program of invited-talks and panel discussions. GraML was held at IPDPS in 2017 and 2018.
{"title":"Message from the workshop chairs","authors":"Scott McMillan, Manoj Kumar, Danai Koutra, M. Halappanavar, T. Mattson, Antonino Tumeo","doi":"10.1109/cahpc.2018.8645919","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645919","url":null,"abstract":"GrAPL 2020: Workshop on Graphs, Architectures, Programming, and Learning, brings together two closely related topics - how the synthesis (representation) and analysis of graphs is supported in hardware and software, and the ways graph algorithms interact with machine learning. Driven by the natural outgrowth of a wide range of methods used in large-scale data analytics workflows, GrAPL’s scope is broad. GrAPL’2020 is the second edition of the merger between two successful workshop series at IPDPS: GABB and GraML. GABB started at IPDPS’14 with a program of invited-talks and panel discussions. GraML was held at IPDPS in 2017 and 2018.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124980019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00110
M. Parenteau, Simon Bourgault-Cote, Frédéric Plante, E. Laurendeau
Traditionally, Computational Fluid Dynamics (CFD) software uses Message Passing Interface (MPI) to handle the parallelism over distributed memory systems. For a new developer, such as a student or a new employee, the barrier of entry can be high and more training is required for each particular software package, which slows down the research process on actual science. The Chapel programming language offers an interesting alternative for research and development of CFD applications.In this paper, the developments of two CFD applications are presented: the first one as an experiment by re-writing a 2D structured flow solver and the second one as writing from scratch a research 3D unstructured multi-physics simulation software. Details are given on both applications with emphasis on the Chapel features which were used positively in the code design, in particular to improve flexibility and extend to distributed memory. Some performance pitfalls are discussed with solutions to avoid them.The performance of the unstructured software is then studied and compared to a traditional open-source CFD software package programmed in C++ with MPI for communication (SU2). The results show that our Chapel implementation achieves performances similar to other CFD software written in C and C++, thus confirming that Chapel is a viable language for high-performance CFD applications.
{"title":"Development of Parallel CFD Applications on Distributed Memory with Chapel","authors":"M. Parenteau, Simon Bourgault-Cote, Frédéric Plante, E. Laurendeau","doi":"10.1109/ipdpsw50202.2020.00110","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00110","url":null,"abstract":"Traditionally, Computational Fluid Dynamics (CFD) software uses Message Passing Interface (MPI) to handle the parallelism over distributed memory systems. For a new developer, such as a student or a new employee, the barrier of entry can be high and more training is required for each particular software package, which slows down the research process on actual science. The Chapel programming language offers an interesting alternative for research and development of CFD applications.In this paper, the developments of two CFD applications are presented: the first one as an experiment by re-writing a 2D structured flow solver and the second one as writing from scratch a research 3D unstructured multi-physics simulation software. Details are given on both applications with emphasis on the Chapel features which were used positively in the code design, in particular to improve flexibility and extend to distributed memory. Some performance pitfalls are discussed with solutions to avoid them.The performance of the unstructured software is then studied and compared to a traditional open-source CFD software package programmed in C++ with MPI for communication (SU2). The results show that our Chapel implementation achieves performances similar to other CFD software written in C and C++, thus confirming that Chapel is a viable language for high-performance CFD applications.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128701708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00134
Amir Haderbache, Koichi Shirahata, T. Yamamoto, Y. Tomita, H. Okuda
With the emergence of AI, we observe a surge of interest in applying machine learning to traditional HPC workloads. An example is the use of surrogate models that approximate the output of scientific simulations at very low latency. However, such a black-box approach usually suffers from significant accuracy loss. An alternative method is to leverage the large amount of data generated at simulations’ runtime to improve the efficiency of numerical methods. However, there is still no clear solution to apply AI inside HPC simulations. Thus, we propose to incorporate AI into structural analysis simulations and develop an auto-tuning of the iterative solver tolerance used in the Newton-Raphson method. We leverage residual data to train a performance model that is aware of the time-accuracy trade-off. By controlling the tuning using AI softmax probability values, we achieve 1.58x acceleration compared to traditional simulations and maintain accuracy with 1e-02 precision.
{"title":"Acceleration of Structural Analysis Simulations using CNN-based Auto-Tuning of Solver Tolerance","authors":"Amir Haderbache, Koichi Shirahata, T. Yamamoto, Y. Tomita, H. Okuda","doi":"10.1109/IPDPSW50202.2020.00134","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00134","url":null,"abstract":"With the emergence of AI, we observe a surge of interest in applying machine learning to traditional HPC workloads. An example is the use of surrogate models that approximate the output of scientific simulations at very low latency. However, such a black-box approach usually suffers from significant accuracy loss. An alternative method is to leverage the large amount of data generated at simulations’ runtime to improve the efficiency of numerical methods. However, there is still no clear solution to apply AI inside HPC simulations. Thus, we propose to incorporate AI into structural analysis simulations and develop an auto-tuning of the iterative solver tolerance used in the Newton-Raphson method. We leverage residual data to train a performance model that is aware of the time-accuracy trade-off. By controlling the tuning using AI softmax probability values, we achieve 1.58x acceleration compared to traditional simulations and maintain accuracy with 1e-02 precision.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129313483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00093
Abdoul Wahid Mainassara Checkaraou, Xavier Besseron, A. Rousset, Emmanuel Kieffer, B. Peters
The Verlet list method is a well-known bookkeeping technique of the interaction list used both in Molecular Dynamic (MD) and Discrete Element Method (DEM). The Verlet butter technique is an enhancement of the Verlet list that consists of extending the interaction radius of each particle by an extra margin to take into account more particles in the interaction list. The extra margin is based on the local flow regime of each particle to account for the different flow regimes that can coexist in the domain. However, the choice of the near-optimal extra margin (which ensures the best performance) for each particle and the related parameters remains unexplored in DEM unlike in MD.In this study, we demonstrate that the near-optimal extra margin can fairly be characterised by four parameters that describe each particle local flow regime: the particle velocity, the ratio of the containing cell size to particle size, the containing cell solid fraction, and the total number of particles in the system. For this purpose, we model the near-optimal extra margin as a function of these parameters using a quadratic polynomial function. We use the DAKOTA SOFTWARE to carry out the Design and Analysis of Computer Experiments (DACE) and the sampling of the parameters for the simulations. For a given instance of the set of parameters, a global optimisation method is considered to find the near-optimal extra margin. The latter is required for the construction of the quadratic polynomial model. The numerous simulations generated by the sampling of the parameter were performed on a High-Performance Computing (HPC) environment granting parallel and concurrent executions.This work provides a better understanding of the Verlet butter method in DEM simulations by analysing its performances and behaviour in various configurations. The near-optimal extra margin can reasonably be predicted by two out of the four chosen parameters using the quadratic polynomial model. This model has been integrated in XDEM in order to automatically choose the extra margin without any input from the user. Evaluations on real industrial-level test-cases show up to 26% of reduction of the execution time.
{"title":"Predicting near-optimal skin distance in Verlet buffer approach for Discrete Element Method","authors":"Abdoul Wahid Mainassara Checkaraou, Xavier Besseron, A. Rousset, Emmanuel Kieffer, B. Peters","doi":"10.1109/IPDPSW50202.2020.00093","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00093","url":null,"abstract":"The Verlet list method is a well-known bookkeeping technique of the interaction list used both in Molecular Dynamic (MD) and Discrete Element Method (DEM). The Verlet butter technique is an enhancement of the Verlet list that consists of extending the interaction radius of each particle by an extra margin to take into account more particles in the interaction list. The extra margin is based on the local flow regime of each particle to account for the different flow regimes that can coexist in the domain. However, the choice of the near-optimal extra margin (which ensures the best performance) for each particle and the related parameters remains unexplored in DEM unlike in MD.In this study, we demonstrate that the near-optimal extra margin can fairly be characterised by four parameters that describe each particle local flow regime: the particle velocity, the ratio of the containing cell size to particle size, the containing cell solid fraction, and the total number of particles in the system. For this purpose, we model the near-optimal extra margin as a function of these parameters using a quadratic polynomial function. We use the DAKOTA SOFTWARE to carry out the Design and Analysis of Computer Experiments (DACE) and the sampling of the parameters for the simulations. For a given instance of the set of parameters, a global optimisation method is considered to find the near-optimal extra margin. The latter is required for the construction of the quadratic polynomial model. The numerous simulations generated by the sampling of the parameter were performed on a High-Performance Computing (HPC) environment granting parallel and concurrent executions.This work provides a better understanding of the Verlet butter method in DEM simulations by analysing its performances and behaviour in various configurations. The near-optimal extra margin can reasonably be predicted by two out of the four chosen parameters using the quadratic polynomial model. This model has been integrated in XDEM in order to automatically choose the extra margin without any input from the user. Evaluations on real industrial-level test-cases show up to 26% of reduction of the execution time.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00176
Chen Wang, Jinghan Sun, M. Snir, K. Mohror, Elsa Gonsiorowski
Recorder is a multi-level I/O tracing tool that captures HDF5, MPI-I/O, and POSIX I/O calls. In this paper, we present a new version of Recorder that adds support for most metadata POSIX calls such as stat, link, and rename. We also introduce a compressed tracing format to reduce trace file size and run time overhead incurred from collecting the trace data. Moreover, we add a set of post-mortem and visualization routines to our new version of Recorder that manage the compressed trace data for users. Our experiments with four HPC applications show a file size reduction of over 2× and reduced post-processing time by 20% when using our new compressed trace file format.
{"title":"Recorder 2.0: Efficient Parallel I/O Tracing and Analysis","authors":"Chen Wang, Jinghan Sun, M. Snir, K. Mohror, Elsa Gonsiorowski","doi":"10.1109/IPDPSW50202.2020.00176","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00176","url":null,"abstract":"Recorder is a multi-level I/O tracing tool that captures HDF5, MPI-I/O, and POSIX I/O calls. In this paper, we present a new version of Recorder that adds support for most metadata POSIX calls such as stat, link, and rename. We also introduce a compressed tracing format to reduce trace file size and run time overhead incurred from collecting the trace data. Moreover, we add a set of post-mortem and visualization routines to our new version of Recorder that manage the compressed trace data for users. Our experiments with four HPC applications show a file size reduction of over 2× and reduced post-processing time by 20% when using our new compressed trace file format.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130632915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The main contribution of this paper is to present a simple exhaustive search algorithm for the quadratic un-constraint binary optimization (QUBO) problem. It computes the values of the objective function $E(X)$ for all n-bit input vector X in $O(2^{n})$ time. Since $Omega(2^{n})$ time is necessary to output $E(X)$ for all 2n vectors X, this sequential algorithm is optimal. We also present a work-time optimal parallel algorithm running $O(log n)$ time using $2^{n}/log n$ processors on the CREW-PRAM. This parallel algorithm is work optimal, because the total number of computational operations is equal to the running time of the optimal sequential algorithm. Also, it is time optimal because any parallel algorithm using any large number of processors takes at least $Omega(log n)$ time for evaluating E(X). Further, we have implemented this parallel algorithm to run on the GPU. The experimental results on NVIDIA GeForce RTX 2080Ti GPU show that our GPU implementation runs more than 1000 times faster than the sequential algorithm running on Intel Corei7-8700K CPU(3.70GHz) for the QUBO with n-bit vector whenever n$geq$33. We also compare our exhaustive search parallel algorithm with several non-exhaustive search approaches for solving the QUBO including D-Wave 2000Q quantum annealer, simulated annealing algorithm, and Gurobi optimizer.
{"title":"A Work-Time Optimal Parallel Exhaustive Search Algorithm for the QUBO and the Ising model, with GPU implementation","authors":"Masaki Tao, K. Nakano, Yasuaki Ito, Ryota Yasudo, Masaru Tatekawa, Ryota Katsuki, Takashi Yazane, Yoko Inaba","doi":"10.1109/IPDPSW50202.2020.00098","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00098","url":null,"abstract":"The main contribution of this paper is to present a simple exhaustive search algorithm for the quadratic un-constraint binary optimization (QUBO) problem. It computes the values of the objective function $E(X)$ for all n-bit input vector X in $O(2^{n})$ time. Since $Omega(2^{n})$ time is necessary to output $E(X)$ for all 2n vectors X, this sequential algorithm is optimal. We also present a work-time optimal parallel algorithm running $O(log n)$ time using $2^{n}/log n$ processors on the CREW-PRAM. This parallel algorithm is work optimal, because the total number of computational operations is equal to the running time of the optimal sequential algorithm. Also, it is time optimal because any parallel algorithm using any large number of processors takes at least $Omega(log n)$ time for evaluating E(X). Further, we have implemented this parallel algorithm to run on the GPU. The experimental results on NVIDIA GeForce RTX 2080Ti GPU show that our GPU implementation runs more than 1000 times faster than the sequential algorithm running on Intel Corei7-8700K CPU(3.70GHz) for the QUBO with n-bit vector whenever n$geq$33. We also compare our exhaustive search parallel algorithm with several non-exhaustive search approaches for solving the QUBO including D-Wave 2000Q quantum annealer, simulated annealing algorithm, and Gurobi optimizer.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123798679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00014
M. Yokokawa, Ayano Nakai, K. Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi
File outputs or checkpoints for intermediate results frequently appear at appropriate time intervals in large-scale time-advancement numerical simulations where they are utilized for simulation post-processing and/or for restarting consecutive simulations. However, file input/output (I/O) for large-scale data often takes excessive time due to bandwidth limitations between processors and/or secondary storage systems like hard disk drives (HDDs) and solid state drives (SSDs). Accordingly, efforts are ongoing to reduce the time required for file I/O operations in order to speed up such simulations, which means it is necessary to acquire advanced I/O performance knowledge related to high-performance computing systems used.In this study, I/O performance with respect to the connection bandwidth between the vector host (VH) server and the vector engines (VEs) for three configurations of the SX-Aurora TSUB-ASA supercomputer system, specifically the A300–2, A300–4, and A300–8 configurations, were measured and evaluated. The accelerated I/O function, which is a distinctive feature of the SX-Aurora TSUBASA I/O system, was demonstrated to have excellent performance compared to its normal I/O function.
{"title":"I/O Performance of the SX-Aurora TSUBASA","authors":"M. Yokokawa, Ayano Nakai, K. Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi","doi":"10.1109/IPDPSW50202.2020.00014","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00014","url":null,"abstract":"File outputs or checkpoints for intermediate results frequently appear at appropriate time intervals in large-scale time-advancement numerical simulations where they are utilized for simulation post-processing and/or for restarting consecutive simulations. However, file input/output (I/O) for large-scale data often takes excessive time due to bandwidth limitations between processors and/or secondary storage systems like hard disk drives (HDDs) and solid state drives (SSDs). Accordingly, efforts are ongoing to reduce the time required for file I/O operations in order to speed up such simulations, which means it is necessary to acquire advanced I/O performance knowledge related to high-performance computing systems used.In this study, I/O performance with respect to the connection bandwidth between the vector host (VH) server and the vector engines (VEs) for three configurations of the SX-Aurora TSUB-ASA supercomputer system, specifically the A300–2, A300–4, and A300–8 configurations, were measured and evaluated. The accelerated I/O function, which is a distinctive feature of the SX-Aurora TSUBASA I/O system, was demonstrated to have excellent performance compared to its normal I/O function.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123309484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00155
Luke Jacobs, Akhil Kodumuri, Jim James, Seongha Park, Yongho Kim
Supervised machine learning techniques inherently rely on datasets to be trained. With image datasets traditionally being annotated by humans, many advancements in image annotation tools have been made to ensure creation of rich datasets with accurate labels. Nevertheless, users still find it challenging to create and use their own datasets with labels that reflect their problem domain. We propose a streamlined labeling process that aligns multiperspective images and allows a transition from a labeled perspective to other perspectives. The main goal of this work is to reduce the human effort required for labeling vehicle images under favorable conditions where the image perspectives are correlated and one or more perspectives are known. A case study is described and analyzed to show the effectiveness of the process, as well as constraints and limitations when applied to other cases.
{"title":"Multiperspective Automotive Labeling","authors":"Luke Jacobs, Akhil Kodumuri, Jim James, Seongha Park, Yongho Kim","doi":"10.1109/ipdpsw50202.2020.00155","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00155","url":null,"abstract":"Supervised machine learning techniques inherently rely on datasets to be trained. With image datasets traditionally being annotated by humans, many advancements in image annotation tools have been made to ensure creation of rich datasets with accurate labels. Nevertheless, users still find it challenging to create and use their own datasets with labels that reflect their problem domain. We propose a streamlined labeling process that aligns multiperspective images and allows a transition from a labeled perspective to other perspectives. The main goal of this work is to reduce the human effort required for labeling vehicle images under favorable conditions where the image perspectives are correlated and one or more perspectives are known. A case study is described and analyzed to show the effectiveness of the process, as well as constraints and limitations when applied to other cases.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126468737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}