ACM Transactions on Software Engineering and Methodology (TOSEM)最新文献_第6页

Speeding Up Data Manipulation Tasks with Alternative Implementations 使用替代实现加速数据操作任务

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3456873

Yida Tao, Shan Tang, Yepang Liu, Zhiwu Xu, S. Qin

As data volume and complexity grow at an unprecedented rate, the performance of data manipulation programs is becoming a major concern for developers. In this article, we study how alternative API choices could improve data manipulation performance while preserving task-specific input/output equivalence. We propose a lightweight approach that leverages the comparative structures in Q&A sites to extracting alternative implementations. On a large dataset of Stack Overflow posts, our approach extracts 5,080 pairs of alternative implementations that invoke different data manipulation APIs to solve the same tasks, with an accuracy of 86%. Experiments show that for 15% of the extracted pairs, the faster implementation achieved >10x speedup over its slower alternative. We also characterize 68 recurring alternative API pairs from the extraction results to understand the type of APIs that can be used alternatively. To put these findings into practice, we implement a tool, AlterApi7, to automatically optimize real-world data manipulation programs. In the 1,267 optimization attempts on the Kaggle dataset, 76% achieved desirable performance improvements with up to orders-of-magnitude speedup. Finally, we discuss notable challenges of using alternative APIs for optimizing data manipulation programs. We hope that our study offers a new perspective on API recommendation and automatic performance optimization.

随着数据量和复杂性以前所未有的速度增长，数据操作程序的性能正成为开发人员关注的主要问题。在本文中，我们将研究可选择的API如何在保持特定于任务的输入/输出等价的同时提高数据操作性能。我们提出了一种轻量级的方法，利用问答站点中的比较结构来提取可选的实现。在Stack Overflow帖子的大型数据集中，我们的方法提取了5,080对替代实现，这些实现调用不同的数据操作api来解决相同的任务，准确率为86%。实验表明，对于15%的提取对，更快的实现比较慢的替代方案实现了10倍的加速。我们还从提取结果中描述了68个重复出现的替代API对，以了解可以替代使用的API类型。为了将这些发现付诸实践，我们实现了一个工具，AlterApi7，来自动优化现实世界的数据处理程序。在对Kaggle数据集进行的1267次优化尝试中，76%的优化获得了令人满意的性能改进，加速速度达到了数量级。最后，我们讨论了使用替代api来优化数据操作程序的显著挑战。我们希望我们的研究能为API推荐和自动性能优化提供一个新的视角。

{"title":"Speeding Up Data Manipulation Tasks with Alternative Implementations","authors":"Yida Tao, Shan Tang, Yepang Liu, Zhiwu Xu, S. Qin","doi":"10.1145/3456873","DOIUrl":"https://doi.org/10.1145/3456873","url":null,"abstract":"As data volume and complexity grow at an unprecedented rate, the performance of data manipulation programs is becoming a major concern for developers. In this article, we study how alternative API choices could improve data manipulation performance while preserving task-specific input/output equivalence. We propose a lightweight approach that leverages the comparative structures in Q&A sites to extracting alternative implementations. On a large dataset of Stack Overflow posts, our approach extracts 5,080 pairs of alternative implementations that invoke different data manipulation APIs to solve the same tasks, with an accuracy of 86%. Experiments show that for 15% of the extracted pairs, the faster implementation achieved >10x speedup over its slower alternative. We also characterize 68 recurring alternative API pairs from the extraction results to understand the type of APIs that can be used alternatively. To put these findings into practice, we implement a tool, AlterApi7, to automatically optimize real-world data manipulation programs. In the 1,267 optimization attempts on the Kaggle dataset, 76% achieved desirable performance improvements with up to orders-of-magnitude speedup. Finally, we discuss notable challenges of using alternative APIs for optimizing data manipulation programs. We hope that our study offers a new perspective on API recommendation and automatic performance optimization.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"124 4 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85581124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

When and How to Make Breaking Changes 何时以及如何做出重大改变

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3447245

Chris Bogart, Christian Kästner, J. Herbsleb, Ferdian Thung

Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.

开源软件项目通常依赖于包管理系统，这些系统可以帮助项目发现、合并和维护对由其他人维护的其他包的依赖关系。这样的系统在广告、打包和传输有用库的特殊方式上节省了大量的工作，但是当一个包做出影响其他包的破坏性更改时，项目团队之间仍然需要协调。不同的生态系统有不同的突破变化的方法，并且没有通用的理论来解释特征、行为规范、生态系统结果和激励价值之间的关系。我们通过两项实证研究来解决这个问题。在一个访谈案例研究中，我们对比了Eclipse、NPM和CRAN，展示了这些不同的规范来协调破坏变更，在涉众之间转移使用和维护软件的成本，适合于每个生态系统的任务。在第二项研究中，我们将调查、资源库挖掘和文件分析结合起来，在18个生态系统中扩大和系统化这些观察结果。我们发现所有的生态系统都有共同的价值观，比如稳定性和兼容性，但在其他价值观上有所不同。生态系统的实践通常支持他们所信奉的价值观，但方式却出奇地多样化。这些数据提供了反证据，反驳了关于生态系统群落为什么会这样做的简单概括。

{"title":"When and How to Make Breaking Changes","authors":"Chris Bogart, Christian Kästner, J. Herbsleb, Ferdian Thung","doi":"10.1145/3447245","DOIUrl":"https://doi.org/10.1145/3447245","url":null,"abstract":"Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"47 1","pages":"1 - 56"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85532701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Automatically Identifying the Quality of Developer Chats for Post Hoc Use 自动识别开发人员聊天的质量用于事后使用

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3450503

Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock

Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.

软件工程师在问答论坛(如Stack Overflow)和最近的公共聊天社区(如Slack、IRC和Gitter)上众包他们日常挑战的答案。许多与软件相关的聊天对话包含有价值的专家知识，这些知识对于挖掘改进编程支持工具和没有参与原始聊天对话的读者都很有用。然而，大多数聊天平台和社区不包含内置的质量指标(例如，接受的答案，投票计数)。因此，很难识别包含用于挖掘或阅读的有用信息的对话，即具有事后质量的对话。在本文中，我们研究了自动检测公共聊天频道中即时质量的开发人员对话。我们首先描述了对400个开发人员对话的分析，这些对话表明了事后质量的潜在特征，然后是基于机器学习的方法，用于自动识别事后质量的对话。我们对四个编程社区(python、clojure、elm和racket)中的2000个带注释的Slack会话进行了评估，结果表明，我们的方法可以达到0.82的精度、0.90的召回率、0.86的F-measure和0.57的MCC。据我们所知，这是第一个检测开发人员事后质量对话的自动化技术。

{"title":"Automatically Identifying the Quality of Developer Chats for Post Hoc Use","authors":"Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, L. Pollock","doi":"10.1145/3450503","DOIUrl":"https://doi.org/10.1145/3450503","url":null,"abstract":"Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"58 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82980682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Evaluation of Software Architectures under Uncertainty 不确定性下的软件体系结构评估

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3464305

Dalia Sobhy, R. Bahsoon, Leandro L. Minku, R. Kazman

Context: Evaluating software architectures in uncertain environments raises new challenges, which require continuous approaches. We define continuous evaluation as multiple evaluations of the software architecture that begins at the early stages of the development and is periodically and repeatedly performed throughout the lifetime of the software system. Numerous approaches have been developed for continuous evaluation; to handle dynamics and uncertainties at run-time, over the past years, these approaches are still very few, limited, and lack maturity. Objective: This review surveys efforts on architecture evaluation and provides a unified terminology and perspective on the subject. Method: We conducted a systematic literature review to identify and analyse architecture evaluation approaches for uncertainty including continuous and non-continuous, covering work published between 1990–2020. We examined each approach and provided a classification framework for this field. We present an analysis of the results and provide insights regarding open challenges. Major results and conclusions: The survey reveals that most of the existing architecture evaluation approaches typically lack an explicit linkage between design-time and run-time. Additionally, there is a general lack of systematic approaches on how continuous architecture evaluation can be realised or conducted. To remedy this lack, we present a set of necessary requirements for continuous evaluation and describe some examples.

上下文:在不确定的环境中评估软件架构提出了新的挑战，这需要持续的方法。我们将持续评估定义为对软件架构的多次评估，这种评估从开发的早期阶段开始，并在软件系统的整个生命周期中周期性地重复执行。为持续评价制定了许多方法;为了处理运行时的动态和不确定性，在过去的几年中，这些方法仍然非常少，有限，并且缺乏成熟度。目的:本文综述了建筑评估方面的研究成果，并提供了一个统一的术语和观点。方法:我们进行了系统的文献综述，以识别和分析不确定性的架构评估方法，包括连续和非连续，涵盖1990-2020年间发表的工作。我们研究了每种方法，并为该领域提供了一个分类框架。我们对结果进行了分析，并提供了有关开放挑战的见解。主要结果和结论:调查显示，大多数现有的架构评估方法通常缺乏设计时和运行时之间的明确联系。此外，对于如何实现或执行持续架构评估，普遍缺乏系统的方法。为了弥补这一不足，我们提出了一组持续评估的必要要求，并描述了一些示例。

{"title":"Evaluation of Software Architectures under Uncertainty","authors":"Dalia Sobhy, R. Bahsoon, Leandro L. Minku, R. Kazman","doi":"10.1145/3464305","DOIUrl":"https://doi.org/10.1145/3464305","url":null,"abstract":"Context: Evaluating software architectures in uncertain environments raises new challenges, which require continuous approaches. We define continuous evaluation as multiple evaluations of the software architecture that begins at the early stages of the development and is periodically and repeatedly performed throughout the lifetime of the software system. Numerous approaches have been developed for continuous evaluation; to handle dynamics and uncertainties at run-time, over the past years, these approaches are still very few, limited, and lack maturity. Objective: This review surveys efforts on architecture evaluation and provides a unified terminology and perspective on the subject. Method: We conducted a systematic literature review to identify and analyse architecture evaluation approaches for uncertainty including continuous and non-continuous, covering work published between 1990–2020. We examined each approach and provided a classification framework for this field. We present an analysis of the results and provide insights regarding open challenges. Major results and conclusions: The survey reveals that most of the existing architecture evaluation approaches typically lack an explicit linkage between design-time and run-time. Additionally, there is a general lack of systematic approaches on how continuous architecture evaluation can be realised or conducted. To remedy this lack, we present a set of necessary requirements for continuous evaluation and describe some examples.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"49 1","pages":"1 - 50"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73324937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Recommending Faulty Configurations for Interacting Systems Under Test Using Multi-objective Search 基于多目标搜索的交互测试系统故障配置推荐

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3464939

Safdar Aqeel Safdar, T. Yue, Shaukat Ali

Modern systems, such as cyber-physical systems, often consist of multiple products within/across product lines communicating with each other through information networks. Consequently, their runtime behaviors are influenced by product configurations and networks. Such systems play a vital role in our daily life; thus, ensuring their correctness by thorough testing becomes essential. However, testing these systems is particularly challenging due to a large number of possible configurations and limited available resources. Therefore, it is important and practically useful to test these systems with specific configurations under which products will most likely fail to communicate with each other. Motivated by this, we present a search-based configuration recommendation (SBCR) approach to recommend faulty configurations for the system under test (SUT) based on cross-product line (CPL) rules. CPL rules are soft constraints, constraining product configurations while indicating the most probable system states with a certain degree of confidence. In SBCR, we defined four search objectives based on CPL rules and combined them with six commonly applied search algorithms. To evaluate SBCR (i.e., SBCRNSGA-II, SBCRIBEA, SBCRMoCell, SBCRSPEA2, SBCRPAES, and SBCRSMPSO), we performed two case studies (Cisco and Jitsi) and conducted difference analyses. Results show that for both of the case studies, SBCR significantly outperformed random search-based configuration recommendation (RBCR) for 86% of the total comparisons based on six quality indicators, and 100% of the total comparisons based on the percentage of faulty configurations (PFC). Among the six variants of SBCR, SBCRSPEA2 outperformed the others in 85% of the total comparisons based on six quality indicators and 100% of the total comparisons based on PFC.

现代系统，如网络物理系统，通常由多个产品在产品线内/跨产品线通过信息网络相互通信组成。因此，它们的运行时行为受到产品配置和网络的影响。这些系统在我们的日常生活中起着至关重要的作用;因此，通过彻底的测试来确保它们的正确性变得至关重要。然而，由于大量可能的配置和有限的可用资源，测试这些系统尤其具有挑战性。因此，用特定的配置测试这些系统是非常重要和实用的，在这些配置下，产品很可能无法相互通信。受此启发，我们提出了一种基于搜索的配置推荐(SBCR)方法，以基于跨产品线(CPL)规则为被测系统(SUT)推荐错误配置。CPL规则是软约束，约束产品配置，同时以一定程度的置信度指示最可能的系统状态。在SBCR中，我们基于CPL规则定义了四个搜索目标，并将它们与六种常用的搜索算法相结合。为了评估SBCR(即SBCRNSGA-II、SBCRIBEA、SBCRMoCell、SBCRSPEA2、SBCRPAES和SBCRSMPSO)，我们进行了两个案例研究(Cisco和Jitsi)并进行了差异分析。结果表明，在这两个案例研究中，基于六个质量指标的总比较中，SBCR的性能明显优于随机搜索配置推荐(RBCR)的86%，以及基于故障配置百分比(PFC)的总比较的100%。在6个SBCR变体中，基于6个质量指标的总比较中，SBCRSPEA2优于其他变体的比例为85%，基于PFC的总比较中，SBCRSPEA2优于其他变体的比例为100%。

{"title":"Recommending Faulty Configurations for Interacting Systems Under Test Using Multi-objective Search","authors":"Safdar Aqeel Safdar, T. Yue, Shaukat Ali","doi":"10.1145/3464939","DOIUrl":"https://doi.org/10.1145/3464939","url":null,"abstract":"Modern systems, such as cyber-physical systems, often consist of multiple products within/across product lines communicating with each other through information networks. Consequently, their runtime behaviors are influenced by product configurations and networks. Such systems play a vital role in our daily life; thus, ensuring their correctness by thorough testing becomes essential. However, testing these systems is particularly challenging due to a large number of possible configurations and limited available resources. Therefore, it is important and practically useful to test these systems with specific configurations under which products will most likely fail to communicate with each other. Motivated by this, we present a search-based configuration recommendation (SBCR) approach to recommend faulty configurations for the system under test (SUT) based on cross-product line (CPL) rules. CPL rules are soft constraints, constraining product configurations while indicating the most probable system states with a certain degree of confidence. In SBCR, we defined four search objectives based on CPL rules and combined them with six commonly applied search algorithms. To evaluate SBCR (i.e., SBCRNSGA-II, SBCRIBEA, SBCRMoCell, SBCRSPEA2, SBCRPAES, and SBCRSMPSO), we performed two case studies (Cisco and Jitsi) and conducted difference analyses. Results show that for both of the case studies, SBCR significantly outperformed random search-based configuration recommendation (RBCR) for 86% of the total comparisons based on six quality indicators, and 100% of the total comparisons based on the percentage of faulty configurations (PFC). Among the six variants of SBCR, SBCRSPEA2 outperformed the others in 85% of the total comparisons based on six quality indicators and 100% of the total comparisons based on PFC.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"5 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73713786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Context-aware Retrieval-based Deep Commit Message Generation 基于上下文感知检索的深度提交消息生成

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3464689

Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, J. Grundy

Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this article, we propose CoRec to address the above two limitations. Specifically, we first train a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10K popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.

版本控制系统中记录的提交消息包含对软件开发、维护和理解有价值的信息。不幸的是，开发人员经常使用空的或低质量的提交消息提交代码。为了解决这个问题，一些研究提出了从提交差异生成提交消息的方法。最近的研究利用神经机器翻译算法尝试将git差异翻译成提交消息，并取得了一些有希望的结果。然而，这些基于学习的方法倾向于生成高频词而忽略低频词。此外，它们还存在暴露偏差问题，这导致了训练阶段和测试阶段之间的差距。在本文中，我们提出CoRec来解决上述两个限制。具体来说，我们首先训练一个上下文感知的编码器-解码器模型，该模型随机选择解码器的先前输出或基础真值词的嵌入向量作为上下文，使模型逐渐意识到先前的对齐选择。给定一个用于测试的diff，将重用训练好的模型以从训练集中检索最相似的diff。最后，我们使用检索难度来指导最终生成词汇的概率分布。我们的方法结合了信息检索和神经机器翻译的优点。我们在Liu等人的数据集和从Github上的10K流行Java存储库抓取的大型数据集上评估CoRec。我们的实验结果表明，CoRec在BLEU方面显著优于最先进的NNGen方法，平均高出19%。

{"title":"Context-aware Retrieval-based Deep Commit Message Generation","authors":"Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, J. Grundy","doi":"10.1145/3464689","DOIUrl":"https://doi.org/10.1145/3464689","url":null,"abstract":"Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this article, we propose CoRec to address the above two limitations. Specifically, we first train a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10K popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"106 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76119076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions 数据分割决策对AIOps解决方案性能影响的实证研究

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3447876

A. Hassan

AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.

AIOps (IT运营的人工智能)利用机器学习模型来帮助从业者处理大型系统运行过程中产生的大量数据。然而，由于操作数据的性质，AIOps建模面临着一些与数据分裂相关的挑战，如数据不平衡、数据泄漏和概念漂移。在这项工作中，我们研究了AIOps背景下的数据泄漏和概念漂移挑战，并评估了不同建模决策对这些挑战的影响。具体来说，我们对两个常用的AIOps应用程序进行了案例研究:(1)基于来自大规模集群环境的跟踪数据预测作业故障;(2)基于来自大规模云存储环境的磁盘监控数据预测磁盘故障。首先，我们观察到AIOps解决方案中存在数据泄漏问题。使用基于时间的训练和验证数据集分割可以显著减少此类数据泄漏，使其比在AIOps上下文中使用随机分割更合适。其次，我们表明AIOps解决方案受到概念漂移的影响。定期更新AIOps模型可以帮助减轻这种概念漂移的影响，而提高更新频率的性能收益和建模成本在很大程度上取决于应用程序数据和使用的模型。我们的发现鼓励了未来开发AIOps解决方案的研究和实践，以关注其数据分离决策，以处理数据泄漏和概念漂移挑战。

{"title":"An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions","authors":"A. Hassan","doi":"10.1145/3447876","DOIUrl":"https://doi.org/10.1145/3447876","url":null,"abstract":"AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"27 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87536632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

The Agile Success Model 敏捷成功模型

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3464938

Daniel Russo

Organizations are increasingly adopting Agile frameworks for their internal software development. Cost reduction, rapid deployment, requirements and mental model alignment are typical reasons for an Agile transformation. This article presents an in-depth field study of a large-scale Agile transformation in a mission-critical environment, where stakeholders’ commitment was a critical success factor. The goal of such a transformation was to implement mission-oriented features, reducing costs and time to operate in critical scenarios. The project lasted several years and involved over 40 professionals. We report how a hierarchical and plan-driven organization exploited Agile methods to develop a Command & Control (C2) system. Accordingly, we first abstract our experience, inducing a success model of general use for other comparable organizations by performing a post-mortem study. The goal of the inductive research process was to identify critical success factors and their relations. Finally, we validated and generalized our model through Partial Least Squares - Structural Equation Modelling, surveying 200 software engineers involved in similar projects. We conclude the article with data-driven recommendations concerning the management of Agile projects.

组织越来越多地采用敏捷框架进行内部软件开发。降低成本、快速部署、需求和心智模型一致性是敏捷转换的典型原因。本文对关键任务环境中的大规模敏捷转型进行了深入的实地研究，涉众的承诺是关键的成功因素。这种转换的目标是实现面向任务的功能，减少在关键场景中操作的成本和时间。该项目历时数年，涉及40多名专业人员。我们报告了一个分层和计划驱动的组织如何利用敏捷方法来开发命令与控制(C2)系统。因此，我们首先抽象我们的经验，通过执行事后分析研究，归纳出一个普遍适用于其他可比组织的成功模型。归纳研究过程的目标是确定关键的成功因素及其关系。最后，我们通过偏最小二乘-结构方程模型验证并推广了我们的模型，调查了200名参与类似项目的软件工程师。我们用数据驱动的关于敏捷项目管理的建议来结束这篇文章。

{"title":"The Agile Success Model","authors":"Daniel Russo","doi":"10.1145/3464938","DOIUrl":"https://doi.org/10.1145/3464938","url":null,"abstract":"Organizations are increasingly adopting Agile frameworks for their internal software development. Cost reduction, rapid deployment, requirements and mental model alignment are typical reasons for an Agile transformation. This article presents an in-depth field study of a large-scale Agile transformation in a mission-critical environment, where stakeholders’ commitment was a critical success factor. The goal of such a transformation was to implement mission-oriented features, reducing costs and time to operate in critical scenarios. The project lasted several years and involved over 40 professionals. We report how a hierarchical and plan-driven organization exploited Agile methods to develop a Command & Control (C2) system. Accordingly, we first abstract our experience, inducing a success model of general use for other comparable organizations by performing a post-mortem study. The goal of the inductive research process was to identify critical success factors and their relations. Finally, we validated and generalized our model through Partial Least Squares - Structural Equation Modelling, surveying 200 software engineers involved in similar projects. We conclude the article with data-driven recommendations concerning the management of Agile projects.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"49 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87143779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Specifying with Interface and Trait Abstractions in Abstract State Machines: A Controlled Experiment 抽象状态机中接口和特征抽象的指定:一个受控实验

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-07-01 DOI: 10.1145/3450968

P. Paulweber, Georg Simhandl, Uwe Zdun

Abstract State Machine (ASM) theory is a well-known state-based formal method. As in other state-based formal methods, the proposed specification languages for ASMs still lack easy-to-comprehend abstractions to express structural and behavioral aspects of specifications. Our goal is to investigate object-oriented abstractions such as interfaces and traits for ASM-based specification languages. We report on a controlled experiment with 98 participants to study the specification efficiency and effectiveness in which participants needed to comprehend an informal specification as problem (stimulus) in form of a textual description and express a corresponding solution in form of a textual ASM specification using either interface or trait syntax extensions. The study was carried out with a completely randomized design and one alternative (interface or trait) per experimental group. The results indicate that specification effectiveness of the traits experiment group shows a better performance compared to the interfaces experiment group, but specification efficiency shows no statistically significant differences. To the best of our knowledge, this is the first empirical study studying the specification effectiveness and efficiency of object-oriented abstractions in the context of formal methods.

抽象状态机(ASM)理论是一种著名的基于状态的形式化方法。与其他基于状态的形式化方法一样，asm的建议规范语言仍然缺乏易于理解的抽象来表达规范的结构和行为方面。我们的目标是研究面向对象的抽象，例如基于asm的规范语言的接口和特征。我们报告了一项有98名参与者的对照实验，以研究规范的效率和有效性，其中参与者需要以文本描述的形式理解非正式规范作为问题(刺激)，并使用接口或特征语法扩展以文本ASM规范的形式表达相应的解决方案。该研究采用完全随机设计，每个实验组有一个选择(界面或特征)。结果表明，性状试验组的规范有效性优于界面试验组，但规范效率差异无统计学意义。据我们所知，这是第一个在形式化方法背景下研究面向对象抽象的规范有效性和效率的实证研究。

{"title":"Specifying with Interface and Trait Abstractions in Abstract State Machines: A Controlled Experiment","authors":"P. Paulweber, Georg Simhandl, Uwe Zdun","doi":"10.1145/3450968","DOIUrl":"https://doi.org/10.1145/3450968","url":null,"abstract":"Abstract State Machine (ASM) theory is a well-known state-based formal method. As in other state-based formal methods, the proposed specification languages for ASMs still lack easy-to-comprehend abstractions to express structural and behavioral aspects of specifications. Our goal is to investigate object-oriented abstractions such as interfaces and traits for ASM-based specification languages. We report on a controlled experiment with 98 participants to study the specification efficiency and effectiveness in which participants needed to comprehend an informal specification as problem (stimulus) in form of a textual description and express a corresponding solution in form of a textual ASM specification using either interface or trait syntax extensions. The study was carried out with a completely randomized design and one alternative (interface or trait) per experimental group. The results indicate that specification effectiveness of the traits experiment group shows a better performance compared to the interfaces experiment group, but specification efficiency shows no statistically significant differences. To the best of our knowledge, this is the first empirical study studying the specification effectiveness and efficiency of object-oriented abstractions in the context of formal methods.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"6 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82121347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Verifix: Verified Repair of Programming Assignments 验证:已验证的编程作业修复

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-06-30 DOI: 10.1145/3510418

Umair Z. Ahmed, Zhiyu Fan, Jooyong Yi, Omar I. Al-Bataineh, Abhik Roychoudhury

Automated feedback generation for introductory programming assignments is useful for programming education. Most works try to generate feedback to correct a student program by comparing its behavior with an instructor’s reference program on selected tests. In this work, our aim is to generate verifiably correct program repairs as student feedback. A student-submitted program is aligned and composed with a reference solution in terms of control flow, and the variables of the two programs are automatically aligned via predicates describing the relationship between the variables. When verification attempt for the obtained aligned program fails, we turn a verification problem into a MaxSMT problem whose solution leads to a minimal repair. We have conducted experiments on student assignments curated from a widely deployed intelligent tutoring system. Our results show that generating verified repair without sacrificing the overall repair rate is possible. In fact, our implementation, Verifix, is shown to outperform Clara, a state-of-the-art tool, in terms of repair rate. This shows the promise of using verified repair to generate high confidence feedback in programming pedagogy settings.

编程入门作业的自动反馈生成对编程教育很有用。大多数研究都是通过将学生程序与教师参考程序在选定测试中的表现进行比较来产生反馈，以纠正学生程序。在这项工作中，我们的目标是生成可验证的正确程序修复作为学生的反馈。学生提交的程序在控制流程方面与参考解决方案进行对齐和组合，两个程序的变量通过描述变量之间关系的谓词自动对齐。当对获得的对齐程序的验证尝试失败时，我们将验证问题转化为MaxSMT问题，其解决方案导致最小的修复。我们已经对从广泛部署的智能辅导系统中挑选的学生作业进行了实验。我们的结果表明，在不牺牲整体修复率的情况下生成经过验证的修复是可能的。事实上，我们的实现，Verifix，在修复率方面优于最先进的工具Clara。这显示了在编程教学设置中使用经过验证的修复来生成高可信度反馈的前景。

{"title":"Verifix: Verified Repair of Programming Assignments","authors":"Umair Z. Ahmed, Zhiyu Fan, Jooyong Yi, Omar I. Al-Bataineh, Abhik Roychoudhury","doi":"10.1145/3510418","DOIUrl":"https://doi.org/10.1145/3510418","url":null,"abstract":"Automated feedback generation for introductory programming assignments is useful for programming education. Most works try to generate feedback to correct a student program by comparing its behavior with an instructor’s reference program on selected tests. In this work, our aim is to generate verifiably correct program repairs as student feedback. A student-submitted program is aligned and composed with a reference solution in terms of control flow, and the variables of the two programs are automatically aligned via predicates describing the relationship between the variables. When verification attempt for the obtained aligned program fails, we turn a verification problem into a MaxSMT problem whose solution leads to a minimal repair. We have conducted experiments on student assignments curated from a widely deployed intelligent tutoring system. Our results show that generating verified repair without sacrificing the overall repair rate is possible. In fact, our implementation, Verifix, is shown to outperform Clara, a state-of-the-art tool, in terms of repair rate. This shows the promise of using verified repair to generate high confidence feedback in programming pedagogy settings.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"12 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81916847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15