Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00012
Lechen Yu, Feiyang Jin, Joachim Protze, Vivek Sarkar
OpenMP provides a rich set of constructs to support multiple paradigms of parallelization, e.g., single program multiple data (SPMD) and task parallelism. Integrating these disparate paradigms in a single execution model increases the complexity of OpenMP, making OpenMP programs prone to concurrency bugs such as data races. Inspired by the task-oriented execution model in the OpenMP spec, we extended SPD3, a data race detection algorithm designed for async-finish task parallelism to support OpenMP programs. We found that by extending SPD3's key data structure, the Dynamic Program Structure Tree (DPST), SPD3 can support the majority of OpenMP constructs. We have implemented a prototype, TSan-spd3, on top of Google's ThreadSanitizer (TSan). To conduct an apples-to-apples comparison with Archer, the state-of-the-art dynamic race detector for OpenMP programs, we compared TSan-spd3 with an Archer implementation that executes on the same version of TSan. In addition, we evaluated Archer in two modes, the default mode using the original TSan and the accelerated mode enabling the use of SIMD instructions in TSan. The evaluation was conducted on nine benchmarks from the BOTS and SPEC OMP2012 benchmark suites. The evaluation results show that in eight out of nine benchmarks TSan-spd3 achieved similar overhead with Archer, while TSan-spd3 can identify more potential races than Archer.
{"title":"Leveraging the Dynamic Program Structure Tree to Detect Data Races in OpenMP Programs","authors":"Lechen Yu, Feiyang Jin, Joachim Protze, Vivek Sarkar","doi":"10.1109/Correctness56720.2022.00012","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00012","url":null,"abstract":"OpenMP provides a rich set of constructs to support multiple paradigms of parallelization, e.g., single program multiple data (SPMD) and task parallelism. Integrating these disparate paradigms in a single execution model increases the complexity of OpenMP, making OpenMP programs prone to concurrency bugs such as data races. Inspired by the task-oriented execution model in the OpenMP spec, we extended SPD3, a data race detection algorithm designed for async-finish task parallelism to support OpenMP programs. We found that by extending SPD3's key data structure, the Dynamic Program Structure Tree (DPST), SPD3 can support the majority of OpenMP constructs. We have implemented a prototype, TSan-spd3, on top of Google's ThreadSanitizer (TSan). To conduct an apples-to-apples comparison with Archer, the state-of-the-art dynamic race detector for OpenMP programs, we compared TSan-spd3 with an Archer implementation that executes on the same version of TSan. In addition, we evaluated Archer in two modes, the default mode using the original TSan and the accelerated mode enabling the use of SIMD instructions in TSan. The evaluation was conducted on nine benchmarks from the BOTS and SPEC OMP2012 benchmark suites. The evaluation results show that in eight out of nine benchmarks TSan-spd3 achieved similar overhead with Archer, while TSan-spd3 can identify more potential races than Archer.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130256976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00008
Emmanuelle Saillard, Marc Sergent, Célia Tassadit Ait Kaci, Denis Barthou
Communications are a critical part of HPC simulations, and one of the main focuses of application developers when scaling on supercomputers. While classical message passing (also called two-sided communications) is the dominant communication paradigm, one-sided communications are often praised to be efficient to overlap communications with computations, but challenging to program. Their usage is then generally abstracted through languages and memory abstractions to ease programming (e.g. PGAS). Therefore, little work has been done to help programmers use intermediate runtime layers, such as MPI-RMA, that is often reserved to expert programmers. Indeed, programming with MPI - RMA presents several challenges that require handling the asynchronous nature of one-sided communications to ensure the proper semantics of the program while ensuring its memory consistency. To help programmers detect memory errors such as race conditions as early as possible, this paper proposes a new static analysis of MPI - RMA codes that shows to the programmer the errors that can be detected at compile time. The detection is based on a novel local concurrency errors detection algorithm that tracks accesses through BFS searches on the Control Flow Graphs of a program. We show on several tests and an MPI-RMA variant of the GUPS benchmark that the static analysis allows to detect such errors on user codes. The error codes are integrated in the MPI Bugs Initiative open-source test suite.
{"title":"Static Local Concurrency Errors Detection in MPI-RMA Programs","authors":"Emmanuelle Saillard, Marc Sergent, Célia Tassadit Ait Kaci, Denis Barthou","doi":"10.1109/Correctness56720.2022.00008","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00008","url":null,"abstract":"Communications are a critical part of HPC simulations, and one of the main focuses of application developers when scaling on supercomputers. While classical message passing (also called two-sided communications) is the dominant communication paradigm, one-sided communications are often praised to be efficient to overlap communications with computations, but challenging to program. Their usage is then generally abstracted through languages and memory abstractions to ease programming (e.g. PGAS). Therefore, little work has been done to help programmers use intermediate runtime layers, such as MPI-RMA, that is often reserved to expert programmers. Indeed, programming with MPI - RMA presents several challenges that require handling the asynchronous nature of one-sided communications to ensure the proper semantics of the program while ensuring its memory consistency. To help programmers detect memory errors such as race conditions as early as possible, this paper proposes a new static analysis of MPI - RMA codes that shows to the programmer the errors that can be detected at compile time. The detection is based on a novel local concurrency errors detection algorithm that tracks accesses through BFS searches on the Control Flow Graphs of a program. We show on several tests and an MPI-RMA variant of the GUPS benchmark that the static analysis allows to detect such errors on user codes. The error codes are integrated in the MPI Bugs Initiative open-source test suite.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00010
Feiyang Jin, J. Jacobson, Samuel D. Pollard, Vivek Sarkar
Kokkos is a C++ library and ecosystem for writing parallel programs on heterogeneous systems. One of the primary goals of Kokkos is portability: programs in Kokkos are expressed through general parallel constructs which can enable the same code to compile and execute on different parallel architectures. However, there is no known formal model of Kokkos's semantics, which must be generic enough to support current and future CPU and accelerator architectures. As a first step of formalizing Kokkos, We introduce MiniKokkos: a small language capturing the main features of Kokkos, and then prove that MiniKokkos ensures portability across all possible parallel executions. We also provide a case study of how MiniKokkos can help reason about Kokkos programs and help find a bug in the Kokkos implementation.
{"title":"MiniKokkos: A Calculus of Portable Parallelism","authors":"Feiyang Jin, J. Jacobson, Samuel D. Pollard, Vivek Sarkar","doi":"10.1109/Correctness56720.2022.00010","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00010","url":null,"abstract":"Kokkos is a C++ library and ecosystem for writing parallel programs on heterogeneous systems. One of the primary goals of Kokkos is portability: programs in Kokkos are expressed through general parallel constructs which can enable the same code to compile and execute on different parallel architectures. However, there is no known formal model of Kokkos's semantics, which must be generic enough to support current and future CPU and accelerator architectures. As a first step of formalizing Kokkos, We introduce MiniKokkos: a small language capturing the main features of Kokkos, and then prove that MiniKokkos ensures portability across all possible parallel executions. We also provide a case study of how MiniKokkos can help reason about Kokkos programs and help find a bug in the Kokkos implementation.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117332849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00007
Ariel E. Kellison, Mohit Tekriwal, Jean-Baptiste Jeannin, G. Hulette
Iterative methods for solving linear systems serve as a basic building block for computational science. The computational cost of these methods can be significantly influenced by the round-off errors that accumulate as a result of their implementation in finite precision. In the extreme case, round-off errors that occur in practice can completely prevent an implementation from satisfying the accuracy and convergence behavior prescribed by its underlying algorithm. In the exascale era where cost is paramount, a thorough and rigorous analysis of the delay of convergence due to round-off should not be ignored. In this paper, we use a small model problem and the Jacobi iterative method to demonstrate how the Coq proof assistant can be used to formally specify the floating-point behavior of iterative methods, and to rigorously prove the accuracy of these methods.
{"title":"Towards Verified Rounding Error Analysis for Stationary Iterative Methods","authors":"Ariel E. Kellison, Mohit Tekriwal, Jean-Baptiste Jeannin, G. Hulette","doi":"10.1109/Correctness56720.2022.00007","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00007","url":null,"abstract":"Iterative methods for solving linear systems serve as a basic building block for computational science. The computational cost of these methods can be significantly influenced by the round-off errors that accumulate as a result of their implementation in finite precision. In the extreme case, round-off errors that occur in practice can completely prevent an implementation from satisfying the accuracy and convergence behavior prescribed by its underlying algorithm. In the exascale era where cost is paramount, a thorough and rigorous analysis of the delay of convergence due to round-off should not be ignored. In this paper, we use a small model problem and the Jacobi iterative method to demonstrate how the Coq proof assistant can be used to formally specify the floating-point behavior of iterative methods, and to rigorously prove the accuracy of these methods.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/correctness56720.2022.00004
{"title":"Message from the Correctness 2022 Workshop Chairs","authors":"","doi":"10.1109/correctness56720.2022.00004","DOIUrl":"https://doi.org/10.1109/correctness56720.2022.00004","url":null,"abstract":"","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133685081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00009
Simon Schwitanski, Joachim Jenke, Felix Tomski, C. Terboven, Matthias S. Müller
MPI Remote Memory Access (RMA) provides a one-sided communication model for MPI applications. Ensuring consistency between RMA operations with synchronization calls is a key requirement when writing correct RMA codes. Wrong API usage may lead to concurrent modifications of the same memory location without proper synchronization resulting in data races across processes. Due to their non-deterministic nature, such data races are hard to detect. This paper presents MUST-RMA, an on-the-fly data race detector for MPI RMA applications. MUST-RMA uses a race detection model based on happened-before and consistency analysis. It combines the MPI correctness tool MUST with the race detector ThreadSanitizer to detect races across processes in RMA applications. A classification quality study on MUST-RMA with different test cases shows a precision and recall of 0.95. An overhead study on a stencil and a matrix transpose kernel shows runtime slowdowns of 3x to 20x for up to 192 processes.
{"title":"On-the-Fly Data Race Detection for MPI RMA Programs with MUST","authors":"Simon Schwitanski, Joachim Jenke, Felix Tomski, C. Terboven, Matthias S. Müller","doi":"10.1109/Correctness56720.2022.00009","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00009","url":null,"abstract":"MPI Remote Memory Access (RMA) provides a one-sided communication model for MPI applications. Ensuring consistency between RMA operations with synchronization calls is a key requirement when writing correct RMA codes. Wrong API usage may lead to concurrent modifications of the same memory location without proper synchronization resulting in data races across processes. Due to their non-deterministic nature, such data races are hard to detect. This paper presents MUST-RMA, an on-the-fly data race detector for MPI RMA applications. MUST-RMA uses a race detection model based on happened-before and consistency analysis. It combines the MPI correctness tool MUST with the race detector ThreadSanitizer to detect races across processes in RMA applications. A classification quality study on MUST-RMA with different test cases shows a precision and recall of 0.95. An overhead study on a stencil and a matrix transpose kernel shows runtime slowdowns of 3x to 20x for up to 192 processes.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"871 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133293949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/Correctness56720.2022.00011
Winson X. Chen, T. Vanderbruggen, Pei-Hung Lin, C. Liao, M. Emani
DataRaceBench (DRB) is a dedicated benchmark suite to evaluate tools aimed to find data race bugs in OpenMP programs. Using microbenchmarks with or without data races, DRB is able to generate standard quality metrics and provide systematic and quantitative assessments of data race detection tools. However, as the number of microbenchmarks grows, it is challenging to manually identify similar code patterns for DRB, within the context of identifying duplicated kernels or guiding the additions of new kernels. In this paper, we experiment with a transformer-based, deep learning approach to similarity analysis. A state-of-the-art transformer model, CodeBERT, has been adapted to find similar OpenMP code regions. We explore the challenges and the solutions when applying transformer-based similarity analysis to new source codes which are unseen by pre-trained transformers. Using comparative experiments of different variants of similarity analysis, we comment on the strengths and limitations of the transformer-based approach and point out future research directions.
{"title":"Early Experience with Transformer-Based Similarity Analysis for DataRaceBench","authors":"Winson X. Chen, T. Vanderbruggen, Pei-Hung Lin, C. Liao, M. Emani","doi":"10.1109/Correctness56720.2022.00011","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00011","url":null,"abstract":"DataRaceBench (DRB) is a dedicated benchmark suite to evaluate tools aimed to find data race bugs in OpenMP programs. Using microbenchmarks with or without data races, DRB is able to generate standard quality metrics and provide systematic and quantitative assessments of data race detection tools. However, as the number of microbenchmarks grows, it is challenging to manually identify similar code patterns for DRB, within the context of identifying duplicated kernels or guiding the additions of new kernels. In this paper, we experiment with a transformer-based, deep learning approach to similarity analysis. A state-of-the-art transformer model, CodeBERT, has been adapted to find similar OpenMP code regions. We explore the challenges and the solutions when applying transformer-based similarity analysis to new source codes which are unseen by pre-trained transformers. Using comparative experiments of different variants of similarity analysis, we comment on the strengths and limitations of the transformer-based approach and point out future research directions.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115427911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-19DOI: 10.1109/Correctness56720.2022.00006
J. Demmel, J. Dongarra, M. Gates, G. Henry, J. Langou, Xiaoye Li, P. Luszczek, W. Pereira, Jason Riedy, Cindy Rubio-Gonz'alez
Numerical exceptions, which may be caused by overflow, operations like division by 0 or sqrt(−1), or conver-gence failures, are unavoidable in many cases, in particular when software is used on unforeseen and difficult inputs. As more aspects of society become automated e.g., self-driving cars, health monitors, and cyber-physical systems more generally, it is becoming increasingly important to design software that is resilient to exceptions, and that responds to them in a consistent way. Consistency is needed to allow users to build higher-level software that is also resilient and consistent (and so on recursively). In this paper we explore the design space of consistent exception handling for the widely used BLAS and LAPACK linear algebra libraries, pointing out a variety of instances of inconsistent exception handling in the current versions, and propose a new design that balances consistency, complexity, ease of use, and performance. Some compromises are needed, because there are preexisting inconsistencies that are outside our control, including in or between existing vendor BLAS implementations, different programming languages, and even compilers for the same programming language. And user requests from our surveys are quite diverse. We also propose our design as a possible model for other numerical software, and welcome comments on our design choices.
{"title":"Proposed Consistent Exception Handling for the BLAS and LAPACK","authors":"J. Demmel, J. Dongarra, M. Gates, G. Henry, J. Langou, Xiaoye Li, P. Luszczek, W. Pereira, Jason Riedy, Cindy Rubio-Gonz'alez","doi":"10.1109/Correctness56720.2022.00006","DOIUrl":"https://doi.org/10.1109/Correctness56720.2022.00006","url":null,"abstract":"Numerical exceptions, which may be caused by overflow, operations like division by 0 or sqrt(−1), or conver-gence failures, are unavoidable in many cases, in particular when software is used on unforeseen and difficult inputs. As more aspects of society become automated e.g., self-driving cars, health monitors, and cyber-physical systems more generally, it is becoming increasingly important to design software that is resilient to exceptions, and that responds to them in a consistent way. Consistency is needed to allow users to build higher-level software that is also resilient and consistent (and so on recursively). In this paper we explore the design space of consistent exception handling for the widely used BLAS and LAPACK linear algebra libraries, pointing out a variety of instances of inconsistent exception handling in the current versions, and propose a new design that balances consistency, complexity, ease of use, and performance. Some compromises are needed, because there are preexisting inconsistencies that are outside our control, including in or between existing vendor BLAS implementations, different programming languages, and even compilers for the same programming language. And user requests from our surveys are quite diverse. We also propose our design as a possible model for other numerical software, and welcome comments on our design choices.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131887508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}