We define several techniques to extend gradual typing with semantic subtyping, specifically targeting dynamic languages. Focusing on the Elixir programming language, we provide the theoretical foundations for its type system. Our approach demonstrates how to achieve type soundness for gradual typing in existing dynamic languages without modifying their compilation, while still maintaining high precision. This is accomplished through the static detection of "strong functions", which leverage runtime checks inserted by the programmer or performed by the virtual machine, and through a fine-grained type analysis of pattern-matching expressions with guards.
{"title":"Guard Analysis and Safe Erasure Gradual Typing: a Type System for Elixir","authors":"Giuseppe Castagna, Guillaume Duboc","doi":"arxiv-2408.14345","DOIUrl":"https://doi.org/arxiv-2408.14345","url":null,"abstract":"We define several techniques to extend gradual typing with semantic\u0000subtyping, specifically targeting dynamic languages. Focusing on the Elixir\u0000programming language, we provide the theoretical foundations for its type\u0000system. Our approach demonstrates how to achieve type soundness for gradual\u0000typing in existing dynamic languages without modifying their compilation, while\u0000still maintaining high precision. This is accomplished through the static\u0000detection of \"strong functions\", which leverage runtime checks inserted by the\u0000programmer or performed by the virtual machine, and through a fine-grained type\u0000analysis of pattern-matching expressions with guards.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey
Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness. We explore a solution to this conundrum based on batch parallelism, an approach for designing concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. It solves the first challenge (how to use) by suggesting a new lightweight implicit batching design that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into efficient batch-parallel ones. We showcase OBatcher with a diverse set of benchmarks. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations and that (b) their throughput scales reasonably with the number of processors.
{"title":"Concurrent Data Structures Made Easy (Extended Version)","authors":"Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey","doi":"arxiv-2408.13779","DOIUrl":"https://doi.org/arxiv-2408.13779","url":null,"abstract":"Design of an efficient thread-safe concurrent data structure is a balancing\u0000act between its implementation complexity and performance. Lock-based\u0000concurrent data structures, which are relatively easy to derive from their\u0000sequential counterparts and to prove thread-safe, suffer from poor throughput\u0000under even light multi-threaded workload. At the same time, lock-free\u0000concurrent structures allow for high throughput, but are notoriously difficult\u0000to get right and require careful reasoning to formally establish their\u0000correctness. We explore a solution to this conundrum based on batch parallelism, an\u0000approach for designing concurrent data structures via a simple insight:\u0000efficiently processing a batch of a priori known operations in parallel is\u0000easier than optimising performance for a stream of arbitrary asynchronous\u0000requests. Alas, batch-parallel structures have not seen wide practical adoption\u0000due to (i) the inconvenience of having to structure multi-threaded programs to\u0000explicitly group operations and (ii) the lack of a systematic methodology to\u0000implement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design,\u0000implementation, and usage of batch-parallel structures. It solves the first\u0000challenge (how to use) by suggesting a new lightweight implicit batching design\u0000that is built on top of generic asynchronous programming mechanisms. The second\u0000challenge (how to implement) is addressed by identifying a family of strategies\u0000for converting common sequential structures into efficient batch-parallel ones.\u0000We showcase OBatcher with a diverse set of benchmarks. Our evaluation of all\u0000the implementations on large asynchronous workloads shows that (a) they\u0000consistently outperform the corresponding coarse-grained lock-based\u0000implementations and that (b) their throughput scales reasonably with the number\u0000of processors.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins
Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Decoding Objectives for Code Execution, a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-debugging as the core components. We then study the contributions of these components through execution-based evaluation metrics. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods. Furthermore, we assess the impact of filtering based on trial unit tests, a simple and effective strategy that has been often overlooked in prior works. We also propose self-debugging on multiple candidates, obtaining state-of-the-art performance on reranking for code generation. We expect our framework to provide a solid guideline for future research on code generation.
最近,有多种解码和重排程序被证明对基于 LLM 的代码生成有效。然而,目前还缺少一个将这些方法联系起来并进行实验比较的综合框架。为了解决这个问题,我们提出了 "代码执行的解码目标"(Decoding Objectives for Code Execution),这是一个综合框架,包括候选生成、$n$最优重排、最小贝叶斯风险(MBR)解码和自调试等核心组件。然后,我们通过基于执行的评估指标来研究这些组件的贡献。我们的研究结果强调了基于执行的方法的重要性,以及基于执行的方法与免执行方法之间的差距。此外,我们还评估了基于试验单元测试的过滤的影响,这种简单有效的策略在之前的研究中经常被忽视。我们还提出了对多个候选代码进行自调试的方法,在代码生成的重新排序方面取得了最先进的性能。我们希望我们的框架能为未来的代码生成研究提供坚实的指导。
{"title":"DOCE: Finding the Sweet Spot for Execution-Based Code Generation","authors":"Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins","doi":"arxiv-2408.13745","DOIUrl":"https://doi.org/arxiv-2408.13745","url":null,"abstract":"Recently, a diverse set of decoding and reranking procedures have been shown\u0000effective for LLM-based code generation. However, a comprehensive framework\u0000that links and experimentally compares these methods is missing. We address\u0000this by proposing Decoding Objectives for Code Execution, a comprehensive\u0000framework that includes candidate generation, $n$-best reranking, minimum Bayes\u0000risk (MBR) decoding, and self-debugging as the core components. We then study\u0000the contributions of these components through execution-based evaluation\u0000metrics. Our findings highlight the importance of execution-based methods and\u0000the difference gap between execution-based and execution-free methods.\u0000Furthermore, we assess the impact of filtering based on trial unit tests, a\u0000simple and effective strategy that has been often overlooked in prior works. We\u0000also propose self-debugging on multiple candidates, obtaining state-of-the-art\u0000performance on reranking for code generation. We expect our framework to\u0000provide a solid guideline for future research on code generation.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing sizes of data structures allocated in heap, understanding the actual use of heap memory is critically important for minimizing cache misses and reclaiming unused memory. A static analysis aimed at this is difficult because the heap locations are unnamed. Using allocation sites to name them creates very few distinctions making it difficult to identify allocated heap locations that are not used. Heap liveness analysis using access graphs solves this problem by (a) using a storeless model of heap memory by naming the locations with access paths, and (b) representing the unbounded sets of access paths (which are regular languages) as finite automata. We improve the scalability and efficiency of heap liveness analysis, and reduce the amount of computed heap liveness information by using deterministic automata and by minimizing the inclusion of aliased access paths in the language. Practically, our field-, flow-, context-sensitive liveness analysis on SPEC CPU2006 benchmarks scales to 36 kLoC (existing analysis scales to 10.5 kLoC) and improves efficiency even up to 99%. For some of the benchmarks, our technique shows multifold reduction in the computed liveness information, ranging from 2 to 100 times (in terms of the number of live access paths), without compromising on soundness.
{"title":"Which Part of the Heap is Useful? Improving Heap Liveness Analysis","authors":"Vini Kanvar, Uday P. Khedker","doi":"arxiv-2408.12947","DOIUrl":"https://doi.org/arxiv-2408.12947","url":null,"abstract":"With the growing sizes of data structures allocated in heap, understanding\u0000the actual use of heap memory is critically important for minimizing cache\u0000misses and reclaiming unused memory. A static analysis aimed at this is\u0000difficult because the heap locations are unnamed. Using allocation sites to\u0000name them creates very few distinctions making it difficult to identify\u0000allocated heap locations that are not used. Heap liveness analysis using access\u0000graphs solves this problem by (a) using a storeless model of heap memory by\u0000naming the locations with access paths, and (b) representing the unbounded sets\u0000of access paths (which are regular languages) as finite automata. We improve the scalability and efficiency of heap liveness analysis, and\u0000reduce the amount of computed heap liveness information by using deterministic\u0000automata and by minimizing the inclusion of aliased access paths in the\u0000language. Practically, our field-, flow-, context-sensitive liveness analysis\u0000on SPEC CPU2006 benchmarks scales to 36 kLoC (existing analysis scales to 10.5\u0000kLoC) and improves efficiency even up to 99%. For some of the benchmarks, our\u0000technique shows multifold reduction in the computed liveness information,\u0000ranging from 2 to 100 times (in terms of the number of live access paths),\u0000without compromising on soundness.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Specifications allow us to formally state and understand what programs are intended to do. To help one extract useful properties from code, Park et al. recently proposed a framework that given (i) a quantifier-free query posed about a set of function definitions, and (ii) a domain-specific language L in which each extracted property is to be expressed (we call properties in the language L-properties), synthesizes a set of L-properties such that each of the property is a strongest L-consequence for the query: the property is an over-approximation of query and there is no other L-property that over-approximates query and is strictly more precise than each property. The framework by Park et al. has two key limitations. First, it only supports quantifier-free query formulas and thus cannot synthesize specifications for queries involving nondeterminism, concurrency, etc. Second, it can only compute L-consequences, i.e., over-approximations of the program behavior. This paper addresses these two limitations and presents a framework, Loud, for synthesizing strongest L-consequences and weakest L-implicants (i.e., under-approximations of the query) for function definitions that can involve existential quantifiers. We implemented a solver, Aspire, for problems expressed in Loud which can be used to describe and identify sources of bugs in both deterministic and nondeterministic programs, extract properties from concurrent programs, and synthesize winning strategies in two-player games.
规范允许我们正式陈述和理解程序的意图。为了帮助人们从代码中提取有用的属性,Park 等人最近提出了一个框架。最近提出了一个框架,该框架给定(i)一个关于一组函数定义的无量纲查询,以及(ii)一种特定领域语言 L,在这种语言中,每个提取的属性都要用它来表达(我们称这种语言中的属性为 L-属性),然后合成一组 L-属性,使得每个属性都是查询的最强 L 后果:该属性是查询的过度逼近,并且没有其他 L-属性能够过度逼近查询,而且严格来说比每个属性都更精确。Park 等人的框架有两个主要局限。首先,它只支持无量子化查询公式,因此不能合成涉及非确定性、并发性等的查询规范。其次,它只能计算 L 后果,即程序行为的过度近似。本文针对这两个局限性,提出了一个框架 Loud,用于为可能涉及存在量词的函数定义合成最强 L 后果和最弱 L 因子(即查询的欠近似)。我们为用 Loud 表达的问题实现了一个求解器 Aspire,它可用于描述和识别确定性和非确定性程序中的错误源,从并发程序中提取属性,以及合成双人游戏中的获胜策略。
{"title":"LOUD: Synthesizing Strongest and Weakest Specifications","authors":"Kanghee Park, Xuanyu Peng, Loris D'Antoni","doi":"arxiv-2408.12539","DOIUrl":"https://doi.org/arxiv-2408.12539","url":null,"abstract":"Specifications allow us to formally state and understand what programs are\u0000intended to do. To help one extract useful properties from code, Park et al.\u0000recently proposed a framework that given (i) a quantifier-free query posed\u0000about a set of function definitions, and (ii) a domain-specific language L in\u0000which each extracted property is to be expressed (we call properties in the\u0000language L-properties), synthesizes a set of L-properties such that each of the\u0000property is a strongest L-consequence for the query: the property is an\u0000over-approximation of query and there is no other L-property that\u0000over-approximates query and is strictly more precise than each property. The framework by Park et al. has two key limitations. First, it only supports\u0000quantifier-free query formulas and thus cannot synthesize specifications for\u0000queries involving nondeterminism, concurrency, etc. Second, it can only compute\u0000L-consequences, i.e., over-approximations of the program behavior. This paper addresses these two limitations and presents a framework, Loud,\u0000for synthesizing strongest L-consequences and weakest L-implicants (i.e.,\u0000under-approximations of the query) for function definitions that can involve\u0000existential quantifiers. We implemented a solver, Aspire, for problems expressed in Loud which can be\u0000used to describe and identify sources of bugs in both deterministic and\u0000nondeterministic programs, extract properties from concurrent programs, and\u0000synthesize winning strategies in two-player games.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran
Large Language Models (LLM) are evolving and have significantly revolutionized the landscape of software development. If used well, they can significantly accelerate the software development cycle. At the same time, the community is very cautious of the models being trained on biased or sensitive data, which can lead to biased outputs along with the inadvertent release of confidential information. Additionally, the carbon footprints and the un-explainability of these black box models continue to raise questions about the usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores the idea of judging tests used to evaluate compiler implementations of directive-based programming models as well as probe into the black box of LLMs. Based on our results, utilizing an agent-based prompting approach and setting up a validation pipeline structure drastically increased the quality of DeepSeek Coder, the LLM chosen for the evaluation purposes.
{"title":"LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites","authors":"Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran","doi":"arxiv-2408.11729","DOIUrl":"https://doi.org/arxiv-2408.11729","url":null,"abstract":"Large Language Models (LLM) are evolving and have significantly\u0000revolutionized the landscape of software development. If used well, they can\u0000significantly accelerate the software development cycle. At the same time, the\u0000community is very cautious of the models being trained on biased or sensitive\u0000data, which can lead to biased outputs along with the inadvertent release of\u0000confidential information. Additionally, the carbon footprints and the\u0000un-explainability of these black box models continue to raise questions about\u0000the usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores\u0000the idea of judging tests used to evaluate compiler implementations of\u0000directive-based programming models as well as probe into the black box of LLMs.\u0000Based on our results, utilizing an agent-based prompting approach and setting\u0000up a validation pipeline structure drastically increased the quality of\u0000DeepSeek Coder, the LLM chosen for the evaluation purposes.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ellie Y. Cheng, Eric Atkinson, Guillaume Baudart, Louis Mandel, Michael Carbin
Advanced probabilistic programming languages (PPLs) use hybrid inference systems to combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned with the performance evaluation metrics used by the developer. In this work, we present inference plans, a programming interface that enables developers to control the partitioning of random variables during hybrid particle filtering. We further present Siren, a new PPL that enables developers to use annotations to specify inference plans the inference system must implement. To assist developers with statically reasoning about whether an inference plan can be implemented, we present an abstract-interpretation-based static analysis for Siren for determining inference plan satisfiability. We prove the analysis is sound with respect to Siren's semantics. Our evaluation applies inference plans to three different hybrid particle filtering algorithms on a suite of benchmarks and shows that the control provided by inference plans enables speed ups of 1.76x on average and up to 206x to reach target accuracy, compared to the inference plans implemented by default heuristics; the results also show that inference plans improve accuracy by 1.83x on average and up to 595x with less or equal runtime, compared to the default inference plans. We further show that the static analysis is precise in practice, identifying all satisfiable inference plans in 27 out of the 33 benchmark-algorithm combinations.
{"title":"Inference Plans for Hybrid Particle Filtering","authors":"Ellie Y. Cheng, Eric Atkinson, Guillaume Baudart, Louis Mandel, Michael Carbin","doi":"arxiv-2408.11283","DOIUrl":"https://doi.org/arxiv-2408.11283","url":null,"abstract":"Advanced probabilistic programming languages (PPLs) use hybrid inference\u0000systems to combine symbolic exact inference and Monte Carlo methods to improve\u0000inference performance. These systems use heuristics to partition random\u0000variables within the program into variables that are encoded symbolically and\u0000variables that are encoded with sampled values, and the heuristics are not\u0000necessarily aligned with the performance evaluation metrics used by the\u0000developer. In this work, we present inference plans, a programming interface\u0000that enables developers to control the partitioning of random variables during\u0000hybrid particle filtering. We further present Siren, a new PPL that enables\u0000developers to use annotations to specify inference plans the inference system\u0000must implement. To assist developers with statically reasoning about whether an\u0000inference plan can be implemented, we present an abstract-interpretation-based\u0000static analysis for Siren for determining inference plan satisfiability. We\u0000prove the analysis is sound with respect to Siren's semantics. Our evaluation\u0000applies inference plans to three different hybrid particle filtering algorithms\u0000on a suite of benchmarks and shows that the control provided by inference plans\u0000enables speed ups of 1.76x on average and up to 206x to reach target accuracy,\u0000compared to the inference plans implemented by default heuristics; the results\u0000also show that inference plans improve accuracy by 1.83x on average and up to\u0000595x with less or equal runtime, compared to the default inference plans. We\u0000further show that the static analysis is precise in practice, identifying all\u0000satisfiable inference plans in 27 out of the 33 benchmark-algorithm\u0000combinations.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ReScript introduces a strongly typed language that targets JavaScript, as an alternative to gradually typed languages, such as TypeScript. In this paper, we present a type system for data-flow analysis for a subset of the ReScript language, more specific for a lambda-calculus with mutability and pattern matching. The type system is a local analysis that collects information about what variables are used and alias information.
{"title":"A type system for data flow and alias analysis in ReScript","authors":"Nicky Ask Lund, Hans Hüttel","doi":"arxiv-2408.11954","DOIUrl":"https://doi.org/arxiv-2408.11954","url":null,"abstract":"ReScript introduces a strongly typed language that targets JavaScript, as an\u0000alternative to gradually typed languages, such as TypeScript. In this paper, we\u0000present a type system for data-flow analysis for a subset of the ReScript\u0000language, more specific for a lambda-calculus with mutability and pattern\u0000matching. The type system is a local analysis that collects information about\u0000what variables are used and alias information.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron Jarmusch, Felipe Cabarcas, Swaroop Pophale, Andrew Kallai, Johannes Doerfert, Luke Peyralans, Seyong Lee, Joel Denny, Sunita Chandrasekaran
Software developers must adapt to keep up with the changing capabilities of platforms so that they can utilize the power of High- Performance Computers (HPC), including exascale systems. OpenMP, a directive-based parallel programming model, allows developers to include directives to existing C, C++, or Fortran code to allow node level parallelism without compromising performance. This paper describes our CI/CD efforts to provide easy evaluation of the support of OpenMP across different compilers using existing testsuites and benchmark suites on HPC platforms. Our main contributions include (1) the set of a Continuous Integration (CI) and Continuous Development (CD) workflow that captures bugs and provides faster feedback to compiler developers, (2) an evaluation of OpenMP (offloading) implementations supported by AMD, HPE, GNU, LLVM, and Intel, and (3) evaluation of the quality of compilers across different heterogeneous HPC platforms. With the comprehensive testing through the CI/CD workflow, we aim to provide a comprehensive understanding of the current state of OpenMP (offloading) support in different compilers and heterogeneous platforms consisting of CPUs and GPUs from NVIDIA, AMD, and Intel.
{"title":"CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations","authors":"Aaron Jarmusch, Felipe Cabarcas, Swaroop Pophale, Andrew Kallai, Johannes Doerfert, Luke Peyralans, Seyong Lee, Joel Denny, Sunita Chandrasekaran","doi":"arxiv-2408.11777","DOIUrl":"https://doi.org/arxiv-2408.11777","url":null,"abstract":"Software developers must adapt to keep up with the changing capabilities of\u0000platforms so that they can utilize the power of High- Performance Computers\u0000(HPC), including exascale systems. OpenMP, a directive-based parallel\u0000programming model, allows developers to include directives to existing C, C++,\u0000or Fortran code to allow node level parallelism without compromising\u0000performance. This paper describes our CI/CD efforts to provide easy evaluation\u0000of the support of OpenMP across different compilers using existing testsuites\u0000and benchmark suites on HPC platforms. Our main contributions include (1) the\u0000set of a Continuous Integration (CI) and Continuous Development (CD) workflow\u0000that captures bugs and provides faster feedback to compiler developers, (2) an\u0000evaluation of OpenMP (offloading) implementations supported by AMD, HPE, GNU,\u0000LLVM, and Intel, and (3) evaluation of the quality of compilers across\u0000different heterogeneous HPC platforms. With the comprehensive testing through\u0000the CI/CD workflow, we aim to provide a comprehensive understanding of the\u0000current state of OpenMP (offloading) support in different compilers and\u0000heterogeneous platforms consisting of CPUs and GPUs from NVIDIA, AMD, and\u0000Intel.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan-Erik Ekberg, Pramod Bhatotia
WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM's sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarantees for applications written in low-level, unsafe languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that supports unmodified applications compiled to WASM and utilizes diverse Arm hardware features aiming to enrich the memory safety properties of WASM. Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i)~provide spatial and temporal memory safety for heap and stack allocations and (ii)~improve the performance of WASM's sandboxing mechanism. Cage further employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from being reused by other WASM instances, thus enhancing WASM's security properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and runtime with support for Arm's MTE and PAC. On top of that, Cage's LLVM-based compiler toolchain transforms unmodified applications to provide spatial and temporal memory safety for stack and heap allocations and prevent function pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal runtime ($<5.8,%$) and memory ($<3.7,%$) overheads and can improve the performance of WASM's sandboxing mechanism, achieving a speedup of over $5.1,%$, while offering efficient memory safety guarantees.
{"title":"Cage: Hardware-Accelerated Safe WebAssembly","authors":"Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan-Erik Ekberg, Pramod Bhatotia","doi":"arxiv-2408.11456","DOIUrl":"https://doi.org/arxiv-2408.11456","url":null,"abstract":"WebAssembly (WASM) is an immensely versatile and increasingly popular\u0000compilation target. It executes applications written in several languages\u0000(e.g., C/C++) with near-native performance in various domains (e.g., mobile,\u0000edge, cloud). Despite WASM's sandboxing feature, which isolates applications\u0000from other instances and the host platform, WASM does not inherently provide\u0000any memory safety guarantees for applications written in low-level, unsafe\u0000languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that\u0000supports unmodified applications compiled to WASM and utilizes diverse Arm\u0000hardware features aiming to enrich the memory safety properties of WASM.\u0000Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i)~provide\u0000spatial and temporal memory safety for heap and stack allocations and\u0000(ii)~improve the performance of WASM's sandboxing mechanism. Cage further\u0000employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from\u0000being reused by other WASM instances, thus enhancing WASM's security\u0000properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and\u0000runtime with support for Arm's MTE and PAC. On top of that, Cage's LLVM-based\u0000compiler toolchain transforms unmodified applications to provide spatial and\u0000temporal memory safety for stack and heap allocations and prevent function\u0000pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal\u0000runtime ($<5.8,%$) and memory ($<3.7,%$) overheads and can improve the\u0000performance of WASM's sandboxing mechanism, achieving a speedup of over\u0000$5.1,%$, while offering efficient memory safety guarantees.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}