As data volume and complexity grow at an unprecedented rate, the performance of data manipulation programs is becoming a major concern for developers. In this article, we study how alternative API choices could improve data manipulation performance while preserving task-specific input/output equivalence. We propose a lightweight approach that leverages the comparative structures in Q&A sites to extracting alternative implementations. On a large dataset of Stack Overflow posts, our approach extracts 5,080 pairs of alternative implementations that invoke different data manipulation APIs to solve the same tasks, with an accuracy of 86%. Experiments show that for 15% of the extracted pairs, the faster implementation achieved >10x speedup over its slower alternative. We also characterize 68 recurring alternative API pairs from the extraction results to understand the type of APIs that can be used alternatively. To put these findings into practice, we implement a tool, AlterApi7, to automatically optimize real-world data manipulation programs. In the 1,267 optimization attempts on the Kaggle dataset, 76% achieved desirable performance improvements with up to orders-of-magnitude speedup. Finally, we discuss notable challenges of using alternative APIs for optimizing data manipulation programs. We hope that our study offers a new perspective on API recommendation and automatic performance optimization.
{"title":"Speeding Up Data Manipulation Tasks with Alternative Implementations","authors":"Yida Tao, Shan Tang, Yepang Liu, Zhiwu Xu, S. Qin","doi":"10.1145/3456873","DOIUrl":"https://doi.org/10.1145/3456873","url":null,"abstract":"As data volume and complexity grow at an unprecedented rate, the performance of data manipulation programs is becoming a major concern for developers. In this article, we study how alternative API choices could improve data manipulation performance while preserving task-specific input/output equivalence. We propose a lightweight approach that leverages the comparative structures in Q&A sites to extracting alternative implementations. On a large dataset of Stack Overflow posts, our approach extracts 5,080 pairs of alternative implementations that invoke different data manipulation APIs to solve the same tasks, with an accuracy of 86%. Experiments show that for 15% of the extracted pairs, the faster implementation achieved >10x speedup over its slower alternative. We also characterize 68 recurring alternative API pairs from the extraction results to understand the type of APIs that can be used alternatively. To put these findings into practice, we implement a tool, AlterApi7, to automatically optimize real-world data manipulation programs. In the 1,267 optimization attempts on the Kaggle dataset, 76% achieved desirable performance improvements with up to orders-of-magnitude speedup. Finally, we discuss notable challenges of using alternative APIs for optimizing data manipulation programs. We hope that our study offers a new perspective on API recommendation and automatic performance optimization.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"124 4 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85581124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chris Bogart, Christian Kästner, J. Herbsleb, Ferdian Thung
Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.
{"title":"When and How to Make Breaking Changes","authors":"Chris Bogart, Christian Kästner, J. Herbsleb, Ferdian Thung","doi":"10.1145/3447245","DOIUrl":"https://doi.org/10.1145/3447245","url":null,"abstract":"Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"47 1","pages":"1 - 56"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85532701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock
Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.
{"title":"Automatically Identifying the Quality of Developer Chats for Post Hoc Use","authors":"Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock","doi":"10.1145/3450503","DOIUrl":"https://doi.org/10.1145/3450503","url":null,"abstract":"Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"58 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82980682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dalia Sobhy, R. Bahsoon, Leandro L. Minku, R. Kazman
Context: Evaluating software architectures in uncertain environments raises new challenges, which require continuous approaches. We define continuous evaluation as multiple evaluations of the software architecture that begins at the early stages of the development and is periodically and repeatedly performed throughout the lifetime of the software system. Numerous approaches have been developed for continuous evaluation; to handle dynamics and uncertainties at run-time, over the past years, these approaches are still very few, limited, and lack maturity. Objective: This review surveys efforts on architecture evaluation and provides a unified terminology and perspective on the subject. Method: We conducted a systematic literature review to identify and analyse architecture evaluation approaches for uncertainty including continuous and non-continuous, covering work published between 1990–2020. We examined each approach and provided a classification framework for this field. We present an analysis of the results and provide insights regarding open challenges. Major results and conclusions: The survey reveals that most of the existing architecture evaluation approaches typically lack an explicit linkage between design-time and run-time. Additionally, there is a general lack of systematic approaches on how continuous architecture evaluation can be realised or conducted. To remedy this lack, we present a set of necessary requirements for continuous evaluation and describe some examples.
{"title":"Evaluation of Software Architectures under Uncertainty","authors":"Dalia Sobhy, R. Bahsoon, Leandro L. Minku, R. Kazman","doi":"10.1145/3464305","DOIUrl":"https://doi.org/10.1145/3464305","url":null,"abstract":"Context: Evaluating software architectures in uncertain environments raises new challenges, which require continuous approaches. We define continuous evaluation as multiple evaluations of the software architecture that begins at the early stages of the development and is periodically and repeatedly performed throughout the lifetime of the software system. Numerous approaches have been developed for continuous evaluation; to handle dynamics and uncertainties at run-time, over the past years, these approaches are still very few, limited, and lack maturity. Objective: This review surveys efforts on architecture evaluation and provides a unified terminology and perspective on the subject. Method: We conducted a systematic literature review to identify and analyse architecture evaluation approaches for uncertainty including continuous and non-continuous, covering work published between 1990–2020. We examined each approach and provided a classification framework for this field. We present an analysis of the results and provide insights regarding open challenges. Major results and conclusions: The survey reveals that most of the existing architecture evaluation approaches typically lack an explicit linkage between design-time and run-time. Additionally, there is a general lack of systematic approaches on how continuous architecture evaluation can be realised or conducted. To remedy this lack, we present a set of necessary requirements for continuous evaluation and describe some examples.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"49 1","pages":"1 - 50"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73324937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern systems, such as cyber-physical systems, often consist of multiple products within/across product lines communicating with each other through information networks. Consequently, their runtime behaviors are influenced by product configurations and networks. Such systems play a vital role in our daily life; thus, ensuring their correctness by thorough testing becomes essential. However, testing these systems is particularly challenging due to a large number of possible configurations and limited available resources. Therefore, it is important and practically useful to test these systems with specific configurations under which products will most likely fail to communicate with each other. Motivated by this, we present a search-based configuration recommendation (SBCR) approach to recommend faulty configurations for the system under test (SUT) based on cross-product line (CPL) rules. CPL rules are soft constraints, constraining product configurations while indicating the most probable system states with a certain degree of confidence. In SBCR, we defined four search objectives based on CPL rules and combined them with six commonly applied search algorithms. To evaluate SBCR (i.e., SBCRNSGA-II, SBCRIBEA, SBCRMoCell, SBCRSPEA2, SBCRPAES, and SBCRSMPSO), we performed two case studies (Cisco and Jitsi) and conducted difference analyses. Results show that for both of the case studies, SBCR significantly outperformed random search-based configuration recommendation (RBCR) for 86% of the total comparisons based on six quality indicators, and 100% of the total comparisons based on the percentage of faulty configurations (PFC). Among the six variants of SBCR, SBCRSPEA2 outperformed the others in 85% of the total comparisons based on six quality indicators and 100% of the total comparisons based on PFC.
{"title":"Recommending Faulty Configurations for Interacting Systems Under Test Using Multi-objective Search","authors":"Safdar Aqeel Safdar, T. Yue, Shaukat Ali","doi":"10.1145/3464939","DOIUrl":"https://doi.org/10.1145/3464939","url":null,"abstract":"Modern systems, such as cyber-physical systems, often consist of multiple products within/across product lines communicating with each other through information networks. Consequently, their runtime behaviors are influenced by product configurations and networks. Such systems play a vital role in our daily life; thus, ensuring their correctness by thorough testing becomes essential. However, testing these systems is particularly challenging due to a large number of possible configurations and limited available resources. Therefore, it is important and practically useful to test these systems with specific configurations under which products will most likely fail to communicate with each other. Motivated by this, we present a search-based configuration recommendation (SBCR) approach to recommend faulty configurations for the system under test (SUT) based on cross-product line (CPL) rules. CPL rules are soft constraints, constraining product configurations while indicating the most probable system states with a certain degree of confidence. In SBCR, we defined four search objectives based on CPL rules and combined them with six commonly applied search algorithms. To evaluate SBCR (i.e., SBCRNSGA-II, SBCRIBEA, SBCRMoCell, SBCRSPEA2, SBCRPAES, and SBCRSMPSO), we performed two case studies (Cisco and Jitsi) and conducted difference analyses. Results show that for both of the case studies, SBCR significantly outperformed random search-based configuration recommendation (RBCR) for 86% of the total comparisons based on six quality indicators, and 100% of the total comparisons based on the percentage of faulty configurations (PFC). Among the six variants of SBCR, SBCRSPEA2 outperformed the others in 85% of the total comparisons based on six quality indicators and 100% of the total comparisons based on PFC.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"5 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73713786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, J. Grundy
Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this article, we propose CoRec to address the above two limitations. Specifically, we first train a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10K popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.
{"title":"Context-aware Retrieval-based Deep Commit Message Generation","authors":"Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, J. Grundy","doi":"10.1145/3464689","DOIUrl":"https://doi.org/10.1145/3464689","url":null,"abstract":"Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this article, we propose CoRec to address the above two limitations. Specifically, we first train a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10K popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"106 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76119076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.
{"title":"An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions","authors":"A. Hassan","doi":"10.1145/3447876","DOIUrl":"https://doi.org/10.1145/3447876","url":null,"abstract":"AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"27 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87536632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Organizations are increasingly adopting Agile frameworks for their internal software development. Cost reduction, rapid deployment, requirements and mental model alignment are typical reasons for an Agile transformation. This article presents an in-depth field study of a large-scale Agile transformation in a mission-critical environment, where stakeholders’ commitment was a critical success factor. The goal of such a transformation was to implement mission-oriented features, reducing costs and time to operate in critical scenarios. The project lasted several years and involved over 40 professionals. We report how a hierarchical and plan-driven organization exploited Agile methods to develop a Command & Control (C2) system. Accordingly, we first abstract our experience, inducing a success model of general use for other comparable organizations by performing a post-mortem study. The goal of the inductive research process was to identify critical success factors and their relations. Finally, we validated and generalized our model through Partial Least Squares - Structural Equation Modelling, surveying 200 software engineers involved in similar projects. We conclude the article with data-driven recommendations concerning the management of Agile projects.
{"title":"The Agile Success Model","authors":"Daniel Russo","doi":"10.1145/3464938","DOIUrl":"https://doi.org/10.1145/3464938","url":null,"abstract":"Organizations are increasingly adopting Agile frameworks for their internal software development. Cost reduction, rapid deployment, requirements and mental model alignment are typical reasons for an Agile transformation. This article presents an in-depth field study of a large-scale Agile transformation in a mission-critical environment, where stakeholders’ commitment was a critical success factor. The goal of such a transformation was to implement mission-oriented features, reducing costs and time to operate in critical scenarios. The project lasted several years and involved over 40 professionals. We report how a hierarchical and plan-driven organization exploited Agile methods to develop a Command & Control (C2) system. Accordingly, we first abstract our experience, inducing a success model of general use for other comparable organizations by performing a post-mortem study. The goal of the inductive research process was to identify critical success factors and their relations. Finally, we validated and generalized our model through Partial Least Squares - Structural Equation Modelling, surveying 200 software engineers involved in similar projects. We conclude the article with data-driven recommendations concerning the management of Agile projects.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"49 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87143779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract State Machine (ASM) theory is a well-known state-based formal method. As in other state-based formal methods, the proposed specification languages for ASMs still lack easy-to-comprehend abstractions to express structural and behavioral aspects of specifications. Our goal is to investigate object-oriented abstractions such as interfaces and traits for ASM-based specification languages. We report on a controlled experiment with 98 participants to study the specification efficiency and effectiveness in which participants needed to comprehend an informal specification as problem (stimulus) in form of a textual description and express a corresponding solution in form of a textual ASM specification using either interface or trait syntax extensions. The study was carried out with a completely randomized design and one alternative (interface or trait) per experimental group. The results indicate that specification effectiveness of the traits experiment group shows a better performance compared to the interfaces experiment group, but specification efficiency shows no statistically significant differences. To the best of our knowledge, this is the first empirical study studying the specification effectiveness and efficiency of object-oriented abstractions in the context of formal methods.
{"title":"Specifying with Interface and Trait Abstractions in Abstract State Machines: A Controlled Experiment","authors":"P. Paulweber, Georg Simhandl, Uwe Zdun","doi":"10.1145/3450968","DOIUrl":"https://doi.org/10.1145/3450968","url":null,"abstract":"Abstract State Machine (ASM) theory is a well-known state-based formal method. As in other state-based formal methods, the proposed specification languages for ASMs still lack easy-to-comprehend abstractions to express structural and behavioral aspects of specifications. Our goal is to investigate object-oriented abstractions such as interfaces and traits for ASM-based specification languages. We report on a controlled experiment with 98 participants to study the specification efficiency and effectiveness in which participants needed to comprehend an informal specification as problem (stimulus) in form of a textual description and express a corresponding solution in form of a textual ASM specification using either interface or trait syntax extensions. The study was carried out with a completely randomized design and one alternative (interface or trait) per experimental group. The results indicate that specification effectiveness of the traits experiment group shows a better performance compared to the interfaces experiment group, but specification efficiency shows no statistically significant differences. To the best of our knowledge, this is the first empirical study studying the specification effectiveness and efficiency of object-oriented abstractions in the context of formal methods.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"6 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82121347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Umair Z. Ahmed, Zhiyu Fan, Jooyong Yi, Omar I. Al-Bataineh, Abhik Roychoudhury
Automated feedback generation for introductory programming assignments is useful for programming education. Most works try to generate feedback to correct a student program by comparing its behavior with an instructor’s reference program on selected tests. In this work, our aim is to generate verifiably correct program repairs as student feedback. A student-submitted program is aligned and composed with a reference solution in terms of control flow, and the variables of the two programs are automatically aligned via predicates describing the relationship between the variables. When verification attempt for the obtained aligned program fails, we turn a verification problem into a MaxSMT problem whose solution leads to a minimal repair. We have conducted experiments on student assignments curated from a widely deployed intelligent tutoring system. Our results show that generating verified repair without sacrificing the overall repair rate is possible. In fact, our implementation, Verifix, is shown to outperform Clara, a state-of-the-art tool, in terms of repair rate. This shows the promise of using verified repair to generate high confidence feedback in programming pedagogy settings.
{"title":"Verifix: Verified Repair of Programming Assignments","authors":"Umair Z. Ahmed, Zhiyu Fan, Jooyong Yi, Omar I. Al-Bataineh, Abhik Roychoudhury","doi":"10.1145/3510418","DOIUrl":"https://doi.org/10.1145/3510418","url":null,"abstract":"Automated feedback generation for introductory programming assignments is useful for programming education. Most works try to generate feedback to correct a student program by comparing its behavior with an instructor’s reference program on selected tests. In this work, our aim is to generate verifiably correct program repairs as student feedback. A student-submitted program is aligned and composed with a reference solution in terms of control flow, and the variables of the two programs are automatically aligned via predicates describing the relationship between the variables. When verification attempt for the obtained aligned program fails, we turn a verification problem into a MaxSMT problem whose solution leads to a minimal repair. We have conducted experiments on student assignments curated from a widely deployed intelligent tutoring system. Our results show that generating verified repair without sacrificing the overall repair rate is possible. In fact, our implementation, Verifix, is shown to outperform Clara, a state-of-the-art tool, in terms of repair rate. This shows the promise of using verified repair to generate high confidence feedback in programming pedagogy settings.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"12 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81916847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}