Pattern matching is an expressive way of matching data and extracting pieces of information from it. The recent inclusion of pattern matching in the Java and Python languages highlights that such a facility is more and more adopted by developers for everyday development. Other main stream programming languages also offer pattern matching capabilities as part of the language (Rust, Scala, Haskell, and OCaml), with different degrees of expressivity in what can be matched. In the meantime, in graphs, pattern matching takes a slightly different turn; it enhances the expressivity of the patterns that can be defined. Smalltalk currently offers little pattern matching capability to find specific objects inside a large graph of objects using a declarative pattern. In Pharo, the closest library to classical pattern matching that exists is the RBParseTreeSearcher, which allows to express specialized patterns over a Pharo Abstract Syntax Tree to find some inner node. The question arises of what features a flexible pattern matching language should have. In this paper, we review the features found in different existing pattern matching languages, both in General Purpose Languages (like Java) and in declarative graph pattern matching languages. We then describe MoTion, a new pattern matching engine for Pharo smalltalk, combining all these features. We discuss some aspects of MoTion’s implementation and illustrate its use with real case examples.
In solving binary code similarity detection, many approaches choose to operate on certain unified intermediate representations (IRs), such as Low Level Virtual Machine (LLVM) IR, to overcome the cross-architecture analysis challenge induced by the significant morphological and syntactic gaps across the diverse instruction set architectures (ISAs). However, the LLVM IRs of the same program can be affected by diverse factors, such as the acquisition source, i.e., compiled from source code or disassembled and lifted from binary code. While the impact of compilation settings on binary code has been explored, the specific differences between LLVM IRs from varied sources remain underexamined. To this end, we pioneer an in-depth empirical study to assess the discrepancies in LLVM IRs derived from different sources. Correspondingly, an extensive dataset containing nearly 98 million LLVM IR instructions distributed in 808,431 functions is curated with respect to these potential IR-influential factors. On this basis, three types of code metrics detailing the syntactic, structural, and semantic aspects of the IR samples are devised and leveraged to assess the divergence of the IRs across different origins. The findings offer insights into how and to what extent the various factors affect the IRs, providing valuable guidance for assembling a training corpus aimed at developing robust LLVM IR-oriented pre-training models, as well as facilitating relevant program analysis studies that operate on the LLVM IRs.
Fault localization aims to automatically identify the cause of an error in a program by localizing the error to a relatively small part of the program. In this paper, we present a novel technique for automated fault localization via error invariants inferred by abstract interpretation. An error invariant for a location in an error program over-approximates the reachable states at the given location that may produce the error, if the execution of the program is continued from that location. Error invariants can be used for statement-wise semantic slicing of error programs and for obtaining concise error explanations. We use an iterative refinement sequence of backward–forward static analyses by abstract interpretation to compute error invariants, which are designed to explain why an error program violates a particular assertion.
Furthermore, we present a practical application of the fault localization technique for automatic repair of programs. Given an erroneous program, we first use the fault localization to automatically identify statements relevant for the error, and then repeatedly mutate the expressions in those relevant statements until a correct program that satisfies all assertions is found. All other statements classified by the fault localization as irrelevant for the error are not mutated in the program repair process. This way, we significantly reduce the search space of mutated programs without losing any potentially correct program, and so locate a repaired program much faster than a program repair without fault localization.
We have developed a prototype tool for automatic fault localization and repair of C programs. We demonstrate the effectiveness of our approach to localize errors in realistic C programs, and to subsequently repair them. Moreover, we show that our approach based on combining fault localization and code mutations is significantly faster that the previous program repair approach without fault localization.
Systematic reviews (SRs) provide valuable evidence for guiding new research directions. However, the manual effort involved in selecting articles for inclusion in an SR is error-prone and time-consuming. While screening articles has traditionally been considered challenging to automate, the advent of large language models offers new possibilities. In this paper, we discuss the effect of using ChatGPT on the SR process. In particular, we investigate the effectiveness of different prompt strategies for automating the article screening process using five real SR datasets. Our results show that ChatGPT can reach up to 82% accuracy. The best performing prompts specify exclusion criteria and avoid negative shots. However, prompts should be adapted to different corpus characteristics.
Systems engineering is fundamental to the performance, functionality, and value of modern systems and products. Systems model development and the usage of modeling tools in systems engineering can be complex. Systems engineering is concerned not only with the modeling process but many other aspects of development, such as hardware, data, and modeling tools. Because systems modeling practitioners are versatile and have expertise in specifying, building, maintaining, and supporting technical infrastructure, the modeling practices and challenges vary. We conducted an empirical study from a systems modeler’s perspective to better understand the challenges posed to systems engineers in the LabVIEW community. Our inspection consists of two investigations. First, with the help of machine learning techniques, we mined online discussion forum posts to identify what questions engineers ask regarding modeling challenges. The result shows that the most challenging part is related to coding practice in the development. Inspired by this discovery, we conducted another empirical study. We surveyed systems engineers in the LabVIEW community using an online questionnaire to recognize what system engineers tell regarding coding challenges in their development practice. Our paper also provided some observations and concrete suggestions to both systems engineers and tool developers on how to improve system model quality and facilitate the modeling process when using LabVIEW.
At Thales Defense Mission Systems (DMS), software products first go through an industrial prototyping phase. Prototypes are serious applications that we evaluate with our end-users during demonstrations. End-users have a central role in the design process of our products. They often ask for software modifications during demonstrations to experiment new ideas or to focus the existing design on their needs.
In this paper, we present how we combined Smalltalk’s live-programming capabilities with software component models to obtain flexible and modular software designs in our context of live prototyping. We present Molecule, an open-source implementation of the Lightweight CORBA Component Model in Pharo. We use Molecule to build HMI systems prototypes, and we benefit from the dynamic run-time modification capabilities of Pharo during demonstrations with our end-users where we explore software designs in a lively way.
Molecule is an industrial contribution to Smalltalk, as it capitalizes 20 years of usage and maturation in our prototyping activity. The Molecule framework and tools are now mature, and we started building end-user software used in production at Thales DMS. We present two such end-user software and analyze their component architecture, that are representative of how we (learnt to) build HMI prototypes. Finally, we analyze our technological decisions with regards to the benefits we sought for our industrial activity.