ACM Transactions on Software Engineering and Methodology (TOSEM)最新文献_第8页

Predicting Performance Anomalies in Software Systems at Run-time 预测软件系统运行时的性能异常

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-04-23 DOI: 10.1145/3440757

Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, Toby Corbin

High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.

高性能是实现和维护软件系统成功的关键因素。性能异常表示软件系统在运行时的性能退化问题(例如，系统响应时间变慢)。性能异常会对用户满意度造成极大的负面影响。先前的研究提出了在异常发生后通过分析执行日志和资源利用指标来检测异常的不同方法。然而，先验检测方法无法提前预测异常;这种限制不可避免地导致采取纠正措施以防止性能异常发生的延迟。我们提出了一种预测软件系统性能异常并提前提出异常警告的方法。我们的方法使用长短期记忆神经网络来捕捉软件系统的正常行为。然后，我们的方法通过识别捕获的正常系统行为的早期偏差来预测性能异常。我们使用两个真实的软件系统(即Elasticsearch和Hadoop)进行了大量的实验来评估我们的方法。我们将方法的性能与两条基线进行比较。第一个基线是最先进的基线，称为无监督行为学习。第二个基线通过检查资源利用率是否超过预定义的阈值来预测性能异常。结果表明，该方法能够以较高的准确率(97-100%)和召回率(80-100%)预测各种性能异常，而基线的准确率为25-97%，召回率为93-100%。对于一系列性能异常，我们的方法可以实现从20到1403秒(即23.4分钟)不等的充足交货时间。我们还演示了我们的方法预测由实际性能错误引起的性能异常的能力。对于预测由实际性能错误引起的性能异常，我们的方法达到了95-100%的精度和87-100%的召回率，而基线达到了49-83%的精度和100%的召回率。结果表明，该方法优于现有的异常预测方法，能够预测实际系统中的性能异常。

{"title":"Predicting Performance Anomalies in Software Systems at Run-time","authors":"Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, Toby Corbin","doi":"10.1145/3440757","DOIUrl":"https://doi.org/10.1145/3440757","url":null,"abstract":"High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"25 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81070776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Predictive Mutation Analysis via the Natural Language Channel in Source Code 通过自然语言通道预测突变分析的源代码

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-04-22 DOI: 10.1145/3510417

Jinhan Kim, Juyoung Jeon, Shin Hong, S. Yoo

Mutation analysis can provide valuable insights into both the system under test and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been proposed to reduce the cost of mutation testing, but it can only provide statistical inference about whether a mutant will be killed or not by the entire test suite. We propose Seshat, a Predictive Mutation Analysis (PMA) technique that can accurately predict the entire kill matrix, not just the Mutation Score (MS) of the given test suite. Seshat exploits the natural language channel in code, and learns the relationship between the syntactic and semantic concepts of each test case and the mutants it can kill, from a given kill matrix. The learnt model can later be used to predict the kill matrices for subsequent versions of the program, even after both the source and test code have changed significantly. Empirical evaluation using the programs in Defects4J shows that Seshat can predict kill matrices with an average F-score of 0.83 for versions that are up to years apart. This is an improvement in F-score by 0.14 and 0.45 points over the state-of-the-art PMT technique and a simple coverage-based heuristic, respectively. Seshat also performs as well as PMT for the prediction of the MS only. When applied to a mutant-based fault localisation technique, the predicted kill matrix by Seshat is successfully used to locate faults within the top 10 position, showing its usefulness beyond prediction of MS. Once Seshat trains its model using a concrete mutation analysis, the subsequent predictions made by Seshat are on average 39 times faster than actual test-based analysis. We also show that Seshat can be successfully applied to automatically generated test cases with an experiment using EvoSuite.

突变分析可以为被测系统及其测试套件提供有价值的见解。然而，由于构建和测试大量突变体的成本，它无法扩展。预测突变检测(Predictive Mutation Testing, PMT)是为了降低突变检测的成本而提出的，但它只能提供关于突变体是否会被整个测试套件杀死的统计推断。我们提出了一种预测突变分析(PMA)技术Seshat，它可以准确地预测整个杀伤矩阵，而不仅仅是给定测试套件的突变分数(MS)。Seshat利用代码中的自然语言通道，并从给定的kill矩阵中学习每个测试用例的语法和语义概念与它可以杀死的突变体之间的关系。学习到的模型可以用来预测程序后续版本的终止矩阵，甚至在源代码和测试代码都发生了重大变化之后。使用缺陷4j中的程序进行的经验评估表明，Seshat可以预测间隔长达数年的版本的kill矩阵，平均f值为0.83。这比最先进的PMT技术和简单的基于覆盖率的启发式分别提高了0.14和0.45分。Seshat在预测多发性硬化症方面也表现得和PMT一样好。当应用于基于突变的故障定位技术时，Seshat预测的死亡矩阵成功地用于定位前10位的故障，这表明它比ms预测更有用。一旦Seshat使用具体的突变分析训练其模型，Seshat随后做出的预测平均比实际基于测试的分析快39倍。我们还展示了Seshat可以通过使用EvoSuite的实验成功地应用于自动生成的测试用例。

{"title":"Predictive Mutation Analysis via the Natural Language Channel in Source Code","authors":"Jinhan Kim, Juyoung Jeon, Shin Hong, S. Yoo","doi":"10.1145/3510417","DOIUrl":"https://doi.org/10.1145/3510417","url":null,"abstract":"Mutation analysis can provide valuable insights into both the system under test and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been proposed to reduce the cost of mutation testing, but it can only provide statistical inference about whether a mutant will be killed or not by the entire test suite. We propose Seshat, a Predictive Mutation Analysis (PMA) technique that can accurately predict the entire kill matrix, not just the Mutation Score (MS) of the given test suite. Seshat exploits the natural language channel in code, and learns the relationship between the syntactic and semantic concepts of each test case and the mutants it can kill, from a given kill matrix. The learnt model can later be used to predict the kill matrices for subsequent versions of the program, even after both the source and test code have changed significantly. Empirical evaluation using the programs in Defects4J shows that Seshat can predict kill matrices with an average F-score of 0.83 for versions that are up to years apart. This is an improvement in F-score by 0.14 and 0.45 points over the state-of-the-art PMT technique and a simple coverage-based heuristic, respectively. Seshat also performs as well as PMT for the prediction of the MS only. When applied to a mutant-based fault localisation technique, the predicted kill matrix by Seshat is successfully used to locate faults within the top 10 position, showing its usefulness beyond prediction of MS. Once Seshat trains its model using a concrete mutation analysis, the subsequent predictions made by Seshat are on average 39 times faster than actual test-based analysis. We also show that Seshat can be successfully applied to automatically generated test cases with an experiment using EvoSuite.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"100 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84628014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Diversifying Focused Testing for Unit Testing 为单元测试多样化重点测试

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-04-01 DOI: 10.1145/3447265

Héctor D. Menéndez, Gunel Jahangirova, Federica Sarro, P. Tonella, David Clark

Software changes constantly, because developers add new features or modifications. This directly affects the effectiveness of the test suite associated with that software, especially when these new modifications are in a specific area that no test case covers. This article tackles the problem of generating a high-quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search-based software testing and constraint solving offer ready, but low-quality, solutions to this: Ideally, a maximally diverse covering test set is required, whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution, and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects.

软件不断变化，因为开发人员添加新的特性或修改。这直接影响了与该软件相关的测试套件的有效性，特别是当这些新的修改位于没有测试用例覆盖的特定区域时。本文处理生成高质量测试套件的问题，以重复覆盖程序中的给定点，其最终目标是暴露可能影响给定程序点的错误。基于搜索的软件测试和约束求解都提供了现成的，但质量较低的解决方案:理想情况下，需要一个最大程度多样化的覆盖测试集，而搜索和约束求解倾向于生成带有偏差分布的测试集。我们的方法，多元化重点测试(DFT)，使用了一种受GödelTest启发的搜索策略。我们在代码分支条件中人为地注入参数，并使用双目标搜索算法通过扰动注入的参数来寻找不同的输入，同时保持路径条件仍然是可满足的。我们的结果表明，我们的DFT技术能够在至少90%的时间内覆盖代码中所需的点。此外，增加多样性提高了测试套件的错误检测和突变杀灭能力。我们证明DFT比集中测试、符号执行和随机测试取得了更好的结果，在105个软件主题中，突变得分提高了3%到70%，故障检测提高了100%。

{"title":"Diversifying Focused Testing for Unit Testing","authors":"Héctor D. Menéndez, Gunel Jahangirova, Federica Sarro, P. Tonella, David Clark","doi":"10.1145/3447265","DOIUrl":"https://doi.org/10.1145/3447265","url":null,"abstract":"Software changes constantly, because developers add new features or modifications. This directly affects the effectiveness of the test suite associated with that software, especially when these new modifications are in a specific area that no test case covers. This article tackles the problem of generating a high-quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search-based software testing and constraint solving offer ready, but low-quality, solutions to this: Ideally, a maximally diverse covering test set is required, whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution, and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"453 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76806450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Accessibility in Software Practice: A Practitioner’s Perspective 软件实践中的可访问性:一个实践者的视角

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-03-16 DOI: 10.1145/3503508

Tingting Bi, Xin Xia, David Lo, J. Grundy, Thomas Zimmermann, Denae Ford

Being able to access software in daily life is vital for everyone, and thus accessibility is a fundamental challenge for software development. However, given the number of accessibility issues reported by many users, e.g., in app reviews, it is not clear if accessibility is widely integrated into current software projects and how software projects address accessibility issues. In this article, we report a study of the critical challenges and benefits of incorporating accessibility into software development and design. We applied a mixed qualitative and quantitative approach for gathering data from 15 interviews and 365 survey respondents from 26 countries across five continents to understand how practitioners perceive accessibility development and design in practice. We got 44 statements grouped into eight topics on accessibility from practitioners’ viewpoints and different software development stages. Our statistical analysis reveals substantial gaps between groups, e.g., practitioners have Direct vs. Indirect accessibility relevant work experience when they reviewed the summarized statements. These gaps might hinder the quality of accessibility development and design, and we use our findings to establish a set of guidelines to help practitioners be aware of accessibility challenges and benefit factors. We suggest development teams put accessibility as a first-class consideration throughout the software development process, and we also propose some remedies to resolve the gaps between groups and to highlight key future research directions to incorporate accessibility into software design and development.

能够在日常生活中访问软件对每个人来说都是至关重要的，因此可访问性是软件开发的基本挑战。然而，考虑到许多用户报告的可访问性问题的数量，例如在应用程序评论中，可访问性是否被广泛地集成到当前的软件项目中，以及软件项目如何解决可访问性问题，目前还不清楚。在本文中，我们报告了将可访问性纳入软件开发和设计的关键挑战和好处的研究。我们采用了混合定性和定量的方法，收集了来自五大洲26个国家的15个访谈和365个调查受访者的数据，以了解从业者在实践中如何看待无障碍开发和设计。我们从实践者的观点和不同的软件开发阶段得到了关于可访问性的8个主题的44个陈述。我们的统计分析揭示了群体之间的巨大差距，例如，从业者在回顾总结陈述时具有直接与间接可及性相关的工作经验。这些差距可能会阻碍可访问性开发和设计的质量，我们使用我们的发现来建立一套指导方针，以帮助从业者意识到可访问性的挑战和好处因素。我们建议开发团队将可访问性作为贯穿软件开发过程的首要考虑因素，并且我们还提出了一些补救措施来解决小组之间的差距，并强调将可访问性纳入软件设计和开发的关键未来研究方向。

{"title":"Accessibility in Software Practice: A Practitioner’s Perspective","authors":"Tingting Bi, Xin Xia, David Lo, J. Grundy, Thomas Zimmermann, Denae Ford","doi":"10.1145/3503508","DOIUrl":"https://doi.org/10.1145/3503508","url":null,"abstract":"Being able to access software in daily life is vital for everyone, and thus accessibility is a fundamental challenge for software development. However, given the number of accessibility issues reported by many users, e.g., in app reviews, it is not clear if accessibility is widely integrated into current software projects and how software projects address accessibility issues. In this article, we report a study of the critical challenges and benefits of incorporating accessibility into software development and design. We applied a mixed qualitative and quantitative approach for gathering data from 15 interviews and 365 survey respondents from 26 countries across five continents to understand how practitioners perceive accessibility development and design in practice. We got 44 statements grouped into eight topics on accessibility from practitioners’ viewpoints and different software development stages. Our statistical analysis reveals substantial gaps between groups, e.g., practitioners have Direct vs. Indirect accessibility relevant work experience when they reviewed the summarized statements. These gaps might hinder the quality of accessibility development and design, and we use our findings to establish a set of guidelines to help practitioners be aware of accessibility challenges and benefit factors. We suggest development teams put accessibility as a first-class consideration throughout the software development process, and we also propose some remedies to resolve the gaps between groups and to highlight key future research directions to incorporate accessibility into software design and development.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"27 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82227760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching 基于启发式搜索的深度学习漏洞检测预测解释

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-03-10 DOI: 10.1145/3429444

Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, Hengkai Ye

Detecting software vulnerabilities is an important problem and a recent development in tackling the problem is the use of deep learning models to detect software vulnerabilities. While effective, it is hard to explain why a deep learning model predicts a piece of code as vulnerable or not because of the black-box nature of deep learning models. Indeed, the interpretability of deep learning models is a daunting open problem. In this article, we make a significant step toward tackling the interpretability of deep learning model in vulnerability detection. Specifically, we introduce a high-fidelity explanation framework, which aims to identify a small number of tokens that make significant contributions to a detector’s prediction with respect to an example. Systematic experiments show that the framework indeed has a higher fidelity than existing methods, especially when features are not independent of each other (which often occurs in the real world). In particular, the framework can produce some vulnerability rules that can be understood by domain experts for accepting a detector’s outputs (i.e., true positives) or rejecting a detector’s outputs (i.e., false-positives and false-negatives). We also discuss limitations of the present study, which indicate interesting open problems for future research.

检测软件漏洞是一个重要的问题，最近解决这个问题的一个发展是使用深度学习模型来检测软件漏洞。虽然有效，但由于深度学习模型的黑箱性质，很难解释为什么深度学习模型会预测一段代码是否容易受到攻击。事实上，深度学习模型的可解释性是一个令人生畏的开放性问题。在本文中，我们朝着解决漏洞检测中深度学习模型的可解释性迈出了重要的一步。具体来说，我们引入了一个高保真解释框架，旨在识别少量token，这些token对检测器对示例的预测做出了重大贡献。系统实验表明，该框架确实比现有方法具有更高的保真度，特别是当特征彼此不独立时(这在现实世界中经常发生)。特别是，框架可以产生一些漏洞规则，这些规则可以被领域专家理解，用于接受检测器的输出(即，真阳性)或拒绝检测器的输出(即，假阳性和假阴性)。我们还讨论了本研究的局限性，指出了未来研究中有趣的开放问题。

{"title":"Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching","authors":"Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, Hengkai Ye","doi":"10.1145/3429444","DOIUrl":"https://doi.org/10.1145/3429444","url":null,"abstract":"Detecting software vulnerabilities is an important problem and a recent development in tackling the problem is the use of deep learning models to detect software vulnerabilities. While effective, it is hard to explain why a deep learning model predicts a piece of code as vulnerable or not because of the black-box nature of deep learning models. Indeed, the interpretability of deep learning models is a daunting open problem. In this article, we make a significant step toward tackling the interpretability of deep learning model in vulnerability detection. Specifically, we introduce a high-fidelity explanation framework, which aims to identify a small number of tokens that make significant contributions to a detector’s prediction with respect to an example. Systematic experiments show that the framework indeed has a higher fidelity than existing methods, especially when features are not independent of each other (which often occurs in the real world). In particular, the framework can produce some vulnerability rules that can be understood by domain experts for accepting a detector’s outputs (i.e., true positives) or rejecting a detector’s outputs (i.e., false-positives and false-negatives). We also discuss limitations of the present study, which indicate interesting open problems for future research.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"77 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88224303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Why an Android App Is Classified as Malware 为什么Android应用会被归类为恶意软件

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-03-10 DOI: 10.1145/3423096

Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu

Machine learning–(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent–based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMal hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMal with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMal. We find that XMal is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMal can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.

基于机器学习(ML)的方法被认为是最有前途的Android恶意软件检测技术之一，并且通过利用常用功能实现了高精度。在实践中，大多数ML分类只向移动用户和应用程序安全分析师提供二进制标签。然而，利益相关者更感兴趣的是为什么应用程序在学术界和工业界都被归类为恶意软件。这属于可解释机器学习的研究领域，但在一个特定的研究领域(即移动恶意软件检测)。尽管在许多基于人工智能的前沿研究领域，已经展示了几种可解释的ML方法来解释最终的分类结果，但到目前为止，还没有研究解释为什么一个应用程序被归类为恶意软件或揭示特定领域的挑战。在本文中，为了填补这一空白，我们提出了一种新颖的、可解释的基于ml的方法(称为XMal)来对恶意软件进行高精度分类，同时对分类结果进行解释。(1) xml的第一个分类阶段涉及多层感知器和注意机制，并确定了与分类结果最相关的关键特征。(2)第二阶段旨在自动生成神经语言描述，以解释应用内部的核心恶意行为。我们通过利用人类研究和深入的定量分析来评估行为描述结果。此外，我们进一步将XMal与现有的可解释的基于ml的方法(即Drebin和LIME)进行比较，以证明XMal的有效性。我们发现xml能够更准确地揭示恶意行为。此外，我们的实验表明，xml也可以解释为什么一些样本被ML分类器错误分类的原因。我们的研究通过Android恶意软件检测和分析的研究来窥视可解释的机器学习。

{"title":"Why an Android App Is Classified as Malware","authors":"Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu","doi":"10.1145/3423096","DOIUrl":"https://doi.org/10.1145/3423096","url":null,"abstract":"Machine learning–(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent–based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMal hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMal with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMal. We find that XMal is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMal can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"140 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76066580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Toward an Objective Measure of Developers’ Cognitive Activities 面向开发者认知活动的客观测量

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-03-01 DOI: 10.1145/3434643

Zohreh Sharafi, Yu Huang, Kevin Leach, Westley Weimer

Understanding how developers carry out different computer science activities with objective measures can help to improve productivity and guide the use and development of supporting tools in software engineering. In this article, we present two controlled experiments involving 112 students to explore multiple computing activities (code comprehension, code review, and data structure manipulations) using three different objective measures including neuroimaging (functional near-infrared spectroscopy (fNIRS) and functional magnetic resonance imaging (fMRI)) and eye tracking. By examining code review and prose review using fMRI, we find that the neural representations of programming languages vs. natural languages are distinct. We can classify which task a participant is undertaking based solely on brain activity, and those task distinctions are modulated by expertise. We leverage insights from the psychological notion of spatial ability to decode the neural representations of several fundamental data structures and their manipulations using fMRI, fNIRS, and eye tracking. We examine list, array, tree, and mental rotation tasks and find that data structure and spatial operations use the same focal regions of the brain but to different degrees: they are related but distinct neural tasks. We demonstrate best practices and describe the implication and tradeoffs between fMRI, fNIRS, eye tracking, and self-reporting for software engineering research.

了解开发人员如何使用客观的度量来执行不同的计算机科学活动，可以帮助提高生产力，并指导软件工程中支持工具的使用和开发。在本文中，我们提出了两个涉及112名学生的对照实验，利用三种不同的客观测量方法，包括神经成像(功能近红外光谱(fNIRS)和功能磁共振成像(fMRI))和眼动追踪，探索多种计算活动(代码理解、代码审查和数据结构操作)。通过使用fMRI检查代码审查和散文审查，我们发现编程语言与自然语言的神经表征是不同的。我们可以根据参与者的大脑活动来区分他们正在进行的任务，而这些任务的区别是由专业知识来调节的。我们利用空间能力的心理学概念来解码几种基本数据结构的神经表征及其使用功能磁共振成像，近红外光谱和眼动追踪的操作。我们研究了列表、数组、树和心理旋转任务，发现数据结构和空间操作使用相同的大脑焦点区域，但程度不同:它们是相关的，但不同的神经任务。我们展示了最佳实践，并描述了fMRI、fNIRS、眼动追踪和自我报告在软件工程研究中的含义和权衡。

{"title":"Toward an Objective Measure of Developers’ Cognitive Activities","authors":"Zohreh Sharafi, Yu Huang, Kevin Leach, Westley Weimer","doi":"10.1145/3434643","DOIUrl":"https://doi.org/10.1145/3434643","url":null,"abstract":"Understanding how developers carry out different computer science activities with objective measures can help to improve productivity and guide the use and development of supporting tools in software engineering. In this article, we present two controlled experiments involving 112 students to explore multiple computing activities (code comprehension, code review, and data structure manipulations) using three different objective measures including neuroimaging (functional near-infrared spectroscopy (fNIRS) and functional magnetic resonance imaging (fMRI)) and eye tracking. By examining code review and prose review using fMRI, we find that the neural representations of programming languages vs. natural languages are distinct. We can classify which task a participant is undertaking based solely on brain activity, and those task distinctions are modulated by expertise. We leverage insights from the psychological notion of spatial ability to decode the neural representations of several fundamental data structures and their manipulations using fMRI, fNIRS, and eye tracking. We examine list, array, tree, and mental rotation tasks and find that data structure and spatial operations use the same focal regions of the brain but to different degrees: they are related but distinct neural tasks. We demonstrate best practices and describe the implication and tradeoffs between fMRI, fNIRS, eye tracking, and self-reporting for software engineering research.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"42 1","pages":"1 - 40"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84251394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Developing Cost-Effective Blockchain-Powered Applications 开发具有成本效益的区块链应用程序

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-03-01 DOI: 10.1145/3431726

A. A. Zarir, G. Oliva, Z. Jiang, Ahmed E. Hassan

Ethereum is a blockchain platform that hosts and executes smart contracts. Executing a function of a smart contract burns a certain amount of gas units (a.k.a., gas usage). The total gas usage depends on how much computing power is necessary to carry out the execution of the function. Ethereum follows a free-market policy for deciding the transaction fee for executing a transaction. More specifically, transaction issuers choose how much they are willing to pay for each unit of gas (a.k.a., gas price). The final transaction fee corresponds to the gas price times the gas usage. Miners process transactions to gain mining rewards, which come directly from these transaction fees. The flexibility and the inherent complexity of the gas system pose challenges to the development of blockchain-powered applications. Developers of blockchain-powered applications need to translate requests received in the frontend of their application into one or more smart contract transactions. Yet, it is unclear how developers should set the gas parameters of these transactions given that (i) miners are free to prioritize transactions whichever way they wish and (ii) the gas usage of a contract transaction is only known after the transaction is processed and included in a new block. In this article, we analyze the gas usage of Ethereum transactions that were processed between Oct. 2017 and Feb. 2019 (the Byzantium era). We discover that (i) most miners prioritize transactions based on their gas price only, (ii) 25% of the functions that received at least 10 transactions have an unstable gas usage (coefficient of variation = 19%), and (iii) a simple prediction model that operates on the recent gas usage of a function achieves an R-Squared of 0.76 and a median absolute percentage error of 3.3%. We conclude that (i) blockchain-powered application developers should be aware that transaction prioritization in Ethereum is frequently done based solely on the gas price of transactions (e.g., a higher transaction fee does not necessarily imply a higher transaction priority) and act accordingly and (ii) blockchain-powered application developers can leverage gas usage prediction models similar to ours to make more informed decisions to set the gas price of their transactions. Lastly, based on our findings, we list and discuss promising avenues for future research.

以太坊是一个托管和执行智能合约的区块链平台。执行一个智能合约的功能会消耗一定数量的gas单位(也就是gas使用量)。总气体使用量取决于执行该函数所需的计算能力。以太坊遵循自由市场政策来决定执行交易的交易费用。更具体地说，交易发行者选择他们愿意为每单位天然气支付多少钱(又称天然气价格)。最终的交易费用对应于天然气价格乘以天然气使用量。矿工处理交易以获得采矿奖励，这些奖励直接来自这些交易费用。天然气系统的灵活性和固有的复杂性给区块链应用的开发带来了挑战。区块链应用程序的开发人员需要将其应用程序前端收到的请求转换为一个或多个智能合约交易。然而，目前尚不清楚开发者应该如何设置这些交易的天然气参数，因为(i)矿工可以自由地按照他们希望的方式优先处理交易，(ii)合同交易的天然气使用情况只有在交易被处理并包含在新区块后才知道。在本文中，我们分析了2017年10月至2019年2月(拜占庭时代)处理的以太坊交易的天然气使用情况。我们发现(i)大多数矿工只根据他们的天然气价格来优先处理交易，(ii)收到至少10笔交易的函数中有25%具有不稳定的天然气使用(变异系数= 19%)，以及(iii)对函数最近的天然气使用进行操作的简单预测模型的r平方为0.76，中位数绝对百分比误差为3.3%。我们得出的结论是:(i)区块链驱动的应用程序开发人员应该意识到，以太坊中的交易优先级通常仅基于交易的天然气价格(例如，更高的交易费用并不一定意味着更高的交易优先级)，并采取相应的行动;(ii)区块链驱动的应用程序开发人员可以利用类似于我们的天然气使用预测模型来做出更明智的决策，以设定交易的天然气价格。最后，根据我们的发现，我们列出并讨论了未来研究的有希望的途径。

{"title":"Developing Cost-Effective Blockchain-Powered Applications","authors":"A. A. Zarir, G. Oliva, Z. Jiang, Ahmed E. Hassan","doi":"10.1145/3431726","DOIUrl":"https://doi.org/10.1145/3431726","url":null,"abstract":"Ethereum is a blockchain platform that hosts and executes smart contracts. Executing a function of a smart contract burns a certain amount of gas units (a.k.a., gas usage). The total gas usage depends on how much computing power is necessary to carry out the execution of the function. Ethereum follows a free-market policy for deciding the transaction fee for executing a transaction. More specifically, transaction issuers choose how much they are willing to pay for each unit of gas (a.k.a., gas price). The final transaction fee corresponds to the gas price times the gas usage. Miners process transactions to gain mining rewards, which come directly from these transaction fees. The flexibility and the inherent complexity of the gas system pose challenges to the development of blockchain-powered applications. Developers of blockchain-powered applications need to translate requests received in the frontend of their application into one or more smart contract transactions. Yet, it is unclear how developers should set the gas parameters of these transactions given that (i) miners are free to prioritize transactions whichever way they wish and (ii) the gas usage of a contract transaction is only known after the transaction is processed and included in a new block. In this article, we analyze the gas usage of Ethereum transactions that were processed between Oct. 2017 and Feb. 2019 (the Byzantium era). We discover that (i) most miners prioritize transactions based on their gas price only, (ii) 25% of the functions that received at least 10 transactions have an unstable gas usage (coefficient of variation = 19%), and (iii) a simple prediction model that operates on the recent gas usage of a function achieves an R-Squared of 0.76 and a median absolute percentage error of 3.3%. We conclude that (i) blockchain-powered application developers should be aware that transaction prioritization in Ethereum is frequently done based solely on the gas price of transactions (e.g., a higher transaction fee does not necessarily imply a higher transaction priority) and act accordingly and (ii) blockchain-powered application developers can leverage gas usage prediction models similar to ours to make more informed decisions to set the gas price of their transactions. Lastly, based on our findings, we list and discuss promising avenues for future research.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"30 16 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73349207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Automatic API Usage Scenario Documentation from Technical Q&A Sites 来自技术问答站点的自动API使用场景文档

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-02-16 DOI: 10.1145/3439769

Gias Uddin, Foutse Khomh, C. Roy

The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research works have thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official documentation. Recently, surveys of software developers find that developers in SO consider the combination of code examples and reviews about APIs as a form of API documentation, and that they consider such a combination to be more useful than official API documentation when the official resources can be incomplete, ambiguous, incorrect, and outdated. Reviews are opinionated sentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produce API documentation from SO by considering both API code examples and reviews. In this article, we present two novel algorithms that can be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples. The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the code examples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, which clusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual description of the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) from other developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate information about APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three user studies based on produced documentation from the posts. The first study is a survey, where we asked the participants to compare our proposed algorithms against a Javadoc-syle documentation format (called as Type-based documentation in Opiner). The participants were asked to compare along four development scenarios (e.g., selection, documentation). The participants preferred our proposed two algorithms over type-based documentation. In our second user study, we asked the participants to complete four coding tasks using Opiner and the API official and informal documentation resources. The participants were more effective and accurate while using Opiner. In a subsequent survey, more than 80% of participants asked the Opiner documentation platform to be integrated into the formal API documentation to complement and improve the API official documentation.

在线技术问答网站Stack Overflow (SO)在开发人员中很受欢迎，以支持他们的编码和各种开发需求。为了解决API官方文档资源的不足，一些研究工作因此集中在用SO的见解(例如，代码示例)来增加官方API文档。该技术建议在其官方文档中添加有关api的代码示例/见解。最近，对软件开发人员的调查发现，SO中的开发人员将代码示例和API评论的组合视为API文档的一种形式，并且当官方资源可能不完整、不明确、不正确和过时时，他们认为这种组合比官方API文档更有用。评论是带有积极/消极情绪的固执己见的句子。然而，我们知道之前没有研究试图通过考虑API代码示例和审查来自动从SO生成API文档。在本文中，我们将介绍两种新颖的算法，通过结合代码示例和对这些示例的审查，它们可用于从SO自动生成API文档。第一种算法被称为统计文档，它显示了使用不同度量(例如，星级)的API代码示例的积极和消极分布。第二种算法称为基于概念的文档，它将类似的和概念上相关的使用场景聚类。API使用场景包含代码示例，代码示例所处理的底层任务的文本描述，以及其他开发人员对代码示例的评论(即，带有积极和消极情绪的意见)。我们将算法部署在Opiner中，这是一个基于网络的平台，用于聚合来自在线论坛的api信息。我们通过挖掘SO中所有基于Java json的帖子，以及根据这些帖子生成的文档进行三次用户研究，来评估这些算法。第一个研究是一项调查，我们要求参与者将我们提出的算法与javadoc风格的文档格式(在Opiner中称为基于类型的文档)进行比较。参与者被要求比较四种开发方案(例如，选择、文档)。与基于类型的文档相比，参与者更喜欢我们提出的两种算法。在我们的第二个用户研究中，我们要求参与者使用Opiner和API官方和非正式文档资源完成四个编码任务。参与者在使用Opiner时更有效和准确。在随后的调查中，超过80%的参与者要求将Opiner文档平台集成到正式的API文档中，以补充和改进API官方文档。

{"title":"Automatic API Usage Scenario Documentation from Technical Q&A Sites","authors":"Gias Uddin, Foutse Khomh, C. Roy","doi":"10.1145/3439769","DOIUrl":"https://doi.org/10.1145/3439769","url":null,"abstract":"The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research works have thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official documentation. Recently, surveys of software developers find that developers in SO consider the combination of code examples and reviews about APIs as a form of API documentation, and that they consider such a combination to be more useful than official API documentation when the official resources can be incomplete, ambiguous, incorrect, and outdated. Reviews are opinionated sentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produce API documentation from SO by considering both API code examples and reviews. In this article, we present two novel algorithms that can be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples. The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the code examples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, which clusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual description of the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) from other developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate information about APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three user studies based on produced documentation from the posts. The first study is a survey, where we asked the participants to compare our proposed algorithms against a Javadoc-syle documentation format (called as Type-based documentation in Opiner). The participants were asked to compare along four development scenarios (e.g., selection, documentation). The participants preferred our proposed two algorithms over type-based documentation. In our second user study, we asked the participants to complete four coding tasks using Opiner and the API official and informal documentation resources. The participants were more effective and accurate while using Opiner. In a subsequent survey, more than 80% of participants asked the Opiner documentation platform to be integrated into the formal API documentation to complement and improve the API official documentation.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"47 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2021-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77843854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Facet-oriented Modelling Facet-oriented造型

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2021-02-11 DOI: 10.1145/3428076

J. Lara, E. Guerra, J. Kienzle

Models are the central assets in model-driven engineering (MDE), as they are actively used in all phases of software development. Models are built using metamodel-based languages, and so objects in models are typed by a metamodel class. This typing is static, established at creation time, and cannot be changed later. Therefore, objects in MDE are closed and fixed with respect to the class they conform to, the fields they have, and the well-formedness constraints they must comply with. This hampers many MDE activities, like the reuse of model-related artefacts such as transformations, the opportunistic or dynamic combination of metamodels, or the dynamic reconfiguration of models. To alleviate this rigidity, we propose making model objects open so that they can acquire or drop so-called facets. These contribute with a type, fields and constraints to the objects holding them. Facets are defined by regular metamodels, hence being a lightweight extension of standard metamodelling. Facet metamodels may declare usage interfaces, as well as laws that govern the assignment of facets to objects (or classes). This article describes our proposal, reporting on a theory, analysis techniques, and an implementation. The benefits of the approach are validated on the basis of five case studies dealing with annotation models, transformation reuse, multi-view modelling, multi-level modelling, and language product lines.

模型是模型驱动工程(MDE)中的核心资产，因为它们在软件开发的所有阶段都被积极地使用。模型是使用基于元模型的语言构建的，因此模型中的对象由元模型类类型化。这种类型是静态的，在创建时建立，以后不能更改。因此，MDE中的对象就其遵循的类、具有的字段和必须遵循的格式良好性约束而言是封闭和固定的。这阻碍了许多MDE活动，比如与模型相关的工件(如转换)的重用、元模型的机会性或动态组合，或者模型的动态重新配置。为了减轻这种刚性，我们建议使模型对象开放，以便它们可以获取或丢弃所谓的facet。它们为持有它们的对象提供类型、字段和约束。facet由常规元模型定义，因此是标准元模型的轻量级扩展。Facet元模型可以声明使用接口，以及管理将Facet分配给对象(或类)的规则。这篇文章描述了我们的建议，报告了一个理论，分析技术，和一个实现。该方法的优点在五个案例研究的基础上得到了验证，这些案例研究涉及注释模型、转换重用、多视图建模、多级建模和语言产品线。

{"title":"Facet-oriented Modelling","authors":"J. Lara, E. Guerra, J. Kienzle","doi":"10.1145/3428076","DOIUrl":"https://doi.org/10.1145/3428076","url":null,"abstract":"Models are the central assets in model-driven engineering (MDE), as they are actively used in all phases of software development. Models are built using metamodel-based languages, and so objects in models are typed by a metamodel class. This typing is static, established at creation time, and cannot be changed later. Therefore, objects in MDE are closed and fixed with respect to the class they conform to, the fields they have, and the well-formedness constraints they must comply with. This hampers many MDE activities, like the reuse of model-related artefacts such as transformations, the opportunistic or dynamic combination of metamodels, or the dynamic reconfiguration of models. To alleviate this rigidity, we propose making model objects open so that they can acquire or drop so-called facets. These contribute with a type, fields and constraints to the objects holding them. Facets are defined by regular metamodels, hence being a lightweight extension of standard metamodelling. Facet metamodels may declare usage interfaces, as well as laws that govern the assignment of facets to objects (or classes). This article describes our proposal, reporting on a theory, analysis techniques, and an implementation. The benefits of the approach are validated on the basis of five case studies dealing with annotation models, transformation reuse, multi-view modelling, multi-level modelling, and language product lines.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"38 1","pages":"1 - 59"},"PeriodicalIF":0.0,"publicationDate":"2021-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78365746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2