首页 > 最新文献

ACM Transactions on Software Engineering and Methodology (TOSEM)最新文献

英文 中文
Why My Code Summarization Model Does Not Work 为什么我的代码总结模型不起作用
Pub Date : 2021-02-10 DOI: 10.1145/3434280
Qiuyuan Chen, Xin Xia, Han Hu, D. Lo, Shanping Li
Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.
代码摘要旨在生成给定源代码块的代码注释,通常通过在现有代码块-注释对上训练机器学习算法来执行。代码注释在实践中有不同的意图。例如,一些代码注释可能解释方法是如何工作的,而另一些注释解释为什么要编写某些方法。以前的工作已经表明,在代码块和与其关联的注释类别之间存在关系。在本文中,我们的目标是研究我们可以在多大程度上利用这种关系来提高代码汇总性能。我们首先将注释分为六个意图类别,并手动标记20,000个代码-注释对。这些类别包括“什么”、“为什么”、“如何使用”、“如何完成”、“属性”和“其他”。基于此数据集,我们进行了一个实验来研究不同的最先进的代码摘要方法在类别上的性能。我们发现不同代码汇总方法的性能在不同的类别中有很大的不同。此外,对于不同的模型,代码摘要模型表现最好的类别是不同的。特别是,在六个类别中,没有模型对“为什么”和“属性”注释表现得最好。我们设计了一种复合方法来证明注释类别预测可以提高代码摘要以达到更好的结果。该方法利用分类代码-类别标记数据来训练分类器来推断类别。然后为推断的类别选择最合适的模型并输出合成结果。我们的复合方法优于其他不考虑评论类别的方法,在ROUGE-L和BLEU-4得分方面分别获得了8.57%和16.34%的相对改进。
{"title":"Why My Code Summarization Model Does Not Work","authors":"Qiuyuan Chen, Xin Xia, Han Hu, D. Lo, Shanping Li","doi":"10.1145/3434280","DOIUrl":"https://doi.org/10.1145/3434280","url":null,"abstract":"Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"22 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88359801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Are Comments on Stack Overflow Well Organized for Easy Retrieval by Developers? 关于堆栈溢出的评论是否组织得很好,便于开发人员检索?
Pub Date : 2021-02-10 DOI: 10.1145/3434279
Haoxiang Zhang, Shaowei Wang, T. Chen, Ahmed E. Hassan
Many Stack Overflow answers have associated informative comments that can strengthen them and assist developers. A prior study found that comments can provide additional information to point out issues in their associated answer, such as the obsolescence of an answer. By showing more informative comments (e.g., the ones with higher scores) and hiding less informative ones, developers can more effectively retrieve information from the comments that are associated with an answer. Currently, Stack Overflow prioritizes the display of comments, and, as a result, 4.4 million comments (possibly including informative comments) are hidden by default from developers. In this study, we investigate whether this mechanism effectively organizes informative comments. We find that (1) the current comment organization mechanism does not work well due to the large amount of tie-scored comments (e.g., 87% of the comments have 0-score) and (2) in 97.3% of answers with hidden comments, at least one comment that is possibly informative is hidden while another comment with the same score is shown (i.e., unfairly hidden comments). The longest unfairly hidden comment is more likely to be informative than the shortest one. Our findings highlight that Stack Overflow should consider adjusting the comment organization mechanism to help developers effectively retrieve informative comments. Furthermore, we build a classifier that can effectively distinguish informative comments from uninformative comments. We also evaluate two alternative comment organization mechanisms (i.e., the Length mechanism and the Random mechanism) based on text similarity and the prediction of our classifier.
许多Stack Overflow答案都有相关的信息注释,这些注释可以加强它们并帮助开发人员。先前的一项研究发现,评论可以提供额外的信息,以指出与其相关的答案中的问题,例如答案的过时。通过显示更多信息的评论(例如,得分较高的评论)和隐藏较少信息的评论,开发人员可以更有效地从与答案相关的评论中检索信息。目前,Stack Overflow优先显示评论,因此,440万条评论(可能包括信息评论)默认情况下对开发人员隐藏。在本研究中,我们探讨了这一机制是否有效地组织了信息性评论。我们发现:(1)目前的评论组织机制没有很好地发挥作用,因为有大量的平均分评论(例如87%的评论为0分);(2)在97.3%的隐藏评论的回答中,至少有一条可能是有用的评论被隐藏,而另一条相同分数的评论被显示出来(即不公平隐藏评论)。最长的不公平隐藏评论比最短的更有可能提供信息。我们的研究结果强调Stack Overflow应该考虑调整评论组织机制,以帮助开发人员有效地检索信息评论。此外,我们建立了一个分类器,可以有效地区分信息评论和非信息评论。我们还基于文本相似度和分类器的预测评估了两种可选的评论组织机制(即长度机制和随机机制)。
{"title":"Are Comments on Stack Overflow Well Organized for Easy Retrieval by Developers?","authors":"Haoxiang Zhang, Shaowei Wang, T. Chen, Ahmed E. Hassan","doi":"10.1145/3434279","DOIUrl":"https://doi.org/10.1145/3434279","url":null,"abstract":"Many Stack Overflow answers have associated informative comments that can strengthen them and assist developers. A prior study found that comments can provide additional information to point out issues in their associated answer, such as the obsolescence of an answer. By showing more informative comments (e.g., the ones with higher scores) and hiding less informative ones, developers can more effectively retrieve information from the comments that are associated with an answer. Currently, Stack Overflow prioritizes the display of comments, and, as a result, 4.4 million comments (possibly including informative comments) are hidden by default from developers. In this study, we investigate whether this mechanism effectively organizes informative comments. We find that (1) the current comment organization mechanism does not work well due to the large amount of tie-scored comments (e.g., 87% of the comments have 0-score) and (2) in 97.3% of answers with hidden comments, at least one comment that is possibly informative is hidden while another comment with the same score is shown (i.e., unfairly hidden comments). The longest unfairly hidden comment is more likely to be informative than the shortest one. Our findings highlight that Stack Overflow should consider adjusting the comment organization mechanism to help developers effectively retrieve informative comments. Furthermore, we build a classifier that can effectively distinguish informative comments from uninformative comments. We also evaluate two alternative comment organization mechanisms (i.e., the Length mechanism and the Random mechanism) based on text similarity and the prediction of our classifier.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"11 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82281386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Empirical Study on Type Annotations 类型标注的实证研究
Pub Date : 2021-02-10 DOI: 10.1145/3439775
J. Ore, Carrick Detweiler, Sebastian G. Elbaum
Type annotations connect variables to domain-specific types. They enable the power of type checking and can detect faults early. In practice, type annotations have a reputation of being burdensome to developers. We lack, however, an empirical understanding of how and why they are burdensome. Hence, we seek to measure the baseline accuracy and speed for developers making type annotations to previously unseen code. We also study the impact of one or more type suggestions. We conduct an empirical study of 97 developers using 20 randomly selected code artifacts from the robotics domain containing physical unit types. We find that subjects select the correct physical type with just 51% accuracy, and a single correct annotation takes about 2 minutes on average. Showing subjects a single suggestion has a strong and significant impact on accuracy both when correct and incorrect, while showing three suggestions retains the significant benefits without the negative effects. We also find that suggestions do not come with a time penalty. We require subjects to explain their annotation choices, and we qualitatively analyze their explanations. We find that identifier names and reasoning about code operations are the primary clues for selecting a type. We also examine two state-of-the-art automated type annotation systems and find opportunities for their improvement.
类型注释将变量连接到特定于域的类型。它们启用了类型检查的功能,可以及早发现故障。在实践中,类型注释给开发人员带来了沉重的负担。然而,我们缺乏对它们如何以及为什么造成负担的经验理解。因此,我们试图衡量开发人员对以前未见过的代码进行类型注释的基线准确性和速度。我们还研究了一种或多种类型建议的影响。我们对97名开发人员进行了实证研究,使用了20个随机选择的代码工件,这些代码工件来自包含物理单元类型的机器人领域。我们发现,受试者选择正确的身体类型的准确率只有51%,而一个正确的注释平均需要2分钟左右。向被试展示一个建议对正确和错误的准确性都有强烈而显著的影响,而显示三个建议保留了显著的好处而没有负面影响。我们还发现,建议并不会带来时间惩罚。我们要求受试者解释他们的注释选择,并对他们的解释进行定性分析。我们发现标识符名称和关于代码操作的推理是选择类型的主要线索。我们还研究了两种最先进的自动类型注释系统,并找到了改进它们的机会。
{"title":"An Empirical Study on Type Annotations","authors":"J. Ore, Carrick Detweiler, Sebastian G. Elbaum","doi":"10.1145/3439775","DOIUrl":"https://doi.org/10.1145/3439775","url":null,"abstract":"Type annotations connect variables to domain-specific types. They enable the power of type checking and can detect faults early. In practice, type annotations have a reputation of being burdensome to developers. We lack, however, an empirical understanding of how and why they are burdensome. Hence, we seek to measure the baseline accuracy and speed for developers making type annotations to previously unseen code. We also study the impact of one or more type suggestions. We conduct an empirical study of 97 developers using 20 randomly selected code artifacts from the robotics domain containing physical unit types. We find that subjects select the correct physical type with just 51% accuracy, and a single correct annotation takes about 2 minutes on average. Showing subjects a single suggestion has a strong and significant impact on accuracy both when correct and incorrect, while showing three suggestions retains the significant benefits without the negative effects. We also find that suggestions do not come with a time penalty. We require subjects to explain their annotation choices, and we qualitatively analyze their explanations. We find that identifier names and reasoning about code operations are the primary clues for selecting a type. We also examine two state-of-the-art automated type annotation systems and find opportunities for their improvement.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"68 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85857115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Beyond Tests 除了测试
Pub Date : 2021-02-10 DOI: 10.1145/3418461
Xiang Gao
Automated program repair is an emerging technology that seeks to automatically rectify program errors and vulnerabilities. Repair techniques are driven by a correctness criterion that is often in the form of a test suite. Such test-based repair may produce overfitting patches, where the patches produced fail on tests outside the test suite driving the repair. In this work, we present a repair method that fixes program vulnerabilities without the need for a voluminous test suite. Given a vulnerability as evidenced by an exploit, the technique extracts a constraint representing the vulnerability with the help of sanitizers. The extracted constraint serves as a proof obligation that our synthesized patch should satisfy. The proof obligation is met by propagating the extracted constraint to locations that are deemed to be “suitable” fix locations. An implementation of our approach (ExtractFix) on top of the KLEE symbolic execution engine shows its efficacy in fixing a wide range of vulnerabilities taken from the ManyBugs benchmark, real-world CVEs and Google’s OSS-Fuzz framework. We believe that our work presents a way forward for the overfitting problem in program repair by generalizing observable hazards/vulnerabilities (as constraint) from a single failing test or exploit.
自动程序修复是一种新兴技术,旨在自动纠正程序错误和漏洞。修复技术是由正确性标准驱动的,它通常以测试套件的形式出现。这种基于测试的修复可能产生过拟合的补丁,其中产生的补丁在驱动修复的测试套件之外的测试中失败。在这项工作中,我们提出了一种修复方法,可以修复程序漏洞,而不需要大量的测试套件。给定一个被利用证明的漏洞,该技术在杀毒程序的帮助下提取一个表示该漏洞的约束。提取的约束作为我们的合成补丁应该满足的证明义务。通过将提取的约束传播到被认为是“合适的”固定位置的位置来满足证明义务。我们的方法(ExtractFix)在KLEE符号执行引擎之上的实现显示了它在修复许多漏洞方面的有效性,这些漏洞来自ManyBugs基准,现实世界的cve和Google的OSS-Fuzz框架。我们相信我们的工作为程序修复中的过拟合问题提供了一条前进的道路,通过从单个失败的测试或利用中概括可观察到的危险/漏洞(作为约束)。
{"title":"Beyond Tests","authors":"Xiang Gao","doi":"10.1145/3418461","DOIUrl":"https://doi.org/10.1145/3418461","url":null,"abstract":"Automated program repair is an emerging technology that seeks to automatically rectify program errors and vulnerabilities. Repair techniques are driven by a correctness criterion that is often in the form of a test suite. Such test-based repair may produce overfitting patches, where the patches produced fail on tests outside the test suite driving the repair. In this work, we present a repair method that fixes program vulnerabilities without the need for a voluminous test suite. Given a vulnerability as evidenced by an exploit, the technique extracts a constraint representing the vulnerability with the help of sanitizers. The extracted constraint serves as a proof obligation that our synthesized patch should satisfy. The proof obligation is met by propagating the extracted constraint to locations that are deemed to be “suitable” fix locations. An implementation of our approach (ExtractFix) on top of the KLEE symbolic execution engine shows its efficacy in fixing a wide range of vulnerabilities taken from the ManyBugs benchmark, real-world CVEs and Google’s OSS-Fuzz framework. We believe that our work presents a way forward for the overfitting problem in program repair by generalizing observable hazards/vulnerabilities (as constraint) from a single failing test or exploit.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"127 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75811986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Mutant Reduction Evaluation: What is There and What is Missing? 突变减少评估:有什么和缺少什么?
Pub Date : 2021-02-05 DOI: 10.1145/3522578
Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou
Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.
背景。突变测试是一种常用的缺陷注入技术,用于评估测试套件的有效性。然而,它通常在计算上很昂贵。因此,人们提出了许多旨在减少突变体数量的突变减少策略。问题。度量突变减少策略的能力以维护测试套件的有效性评估是很重要的。然而,现有的评价指标无法衡量“保序能力”,即在突变还原前后测试套件之间的突变评分顺序保持到什么程度。因此,在使用现有指标评价减排效果时,可能会得出误导性结论。目标。我们的目标是提出评价指标来衡量突变减少策略的“保序能力”,这是我们社区中重要但缺失的。方法。给定一个带有一组原始突变的在测软件(SUT)上的测试套件,我们利用该测试套件来生成一组在缺陷检测能力中具有偏序关系的测试套件。在评估约简策略时,我们首先根据突变得分在生成的测试套件之间构建两个偏序关系,一个是原始突变体,另一个是减少的突变体。然后,我们测量了原始突变下的偏序在减少突变下的偏序保持不变的程度。偏序越不变,说明突变约简策略的序保持性(OP)越强,约简策略越有效。此外,我们提出了努力感知相对顺序保存(EROP)来衡量与随机约简策略相比,突变约简策略可以提供多少增益。结果。实验结果表明,OP和EROP能够有效地衡量突变约简策略的“保序能力”。因此,与现有的评价指标相比,它们具有更好的区分各种减少突变策略的能力。此外,我们发现在OP和EROP条件下,包含突变选择(Subsuming Mutant Selection, SMS)和聚类突变选择(Clustering Mutant Selection, CMS)比其他策略更有效。我们建议,对于研究人员来说,应该使用OP和EROP来衡量突变减少策略的有效性;对于从业者来说,在实践中应该优先考虑SMS和CMS。
{"title":"Mutant Reduction Evaluation: What is There and What is Missing?","authors":"Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou","doi":"10.1145/3522578","DOIUrl":"https://doi.org/10.1145/3522578","url":null,"abstract":"Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"23 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85315952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality 将贝叶斯分析指南应用于经验软件工程数据:编程语言和代码质量的案例
Pub Date : 2021-01-29 DOI: 10.1145/3490953
Carlo A. Furia, R. Torkar, R. Feldt
Statistical analysis is the tool of choice to turn data into information and then information into empirical knowledge. However, the process that goes from data to knowledge is long, uncertain, and riddled with pitfalls. To be valid, it should be supported by detailed, rigorous guidelines that help ferret out issues with the data or model and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset’s original analysis [Ray et al. 55] and a critical reanalysis [Berger et al. 6] have attracted considerable attention—in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality—such as the impact of project-specific characteristics other than the used programming language. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state-of-the-art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results. Thus, they can help buttress continued progress in empirical software engineering research.
统计分析是将数据转化为信息,然后将信息转化为经验知识的首选工具。然而,从数据到知识的过程是漫长的、不确定的,而且充满了陷阱。为了有效,它应该得到详细的、严格的指导方针的支持,这些指导方针可以帮助找出数据或模型的问题,并得出合格的结果,在一般性和实际相关性之间取得合理的平衡。统计学家正在制定这样的指导方针,以支持贝叶斯数据分析的最新技术。在本文中,我们以一种适合于软件工程中的实证研究的方式来构建这些指导方针。为了在实践中演示这些指导方针,我们将它们应用于重新分析关于不同编程语言的代码质量的GitHub数据集。数据集的原始分析[Ray et al. 55]和关键的重新分析[Berger et al. 6]已经引起了相当大的关注,这在很大程度上是因为他们针对的主题(不同编程语言的影响)有很多强烈的意见。我们重新分析的目标与之前的工作在很大程度上是正交的,因为我们关心的是如何在一个有趣的领域中展示数据,如何构建一个有原则的贝叶斯数据分析并展示其好处。在这个过程中,我们还将阐明分析数据的一些关键方面,以及编程语言和代码质量之间的关系,比如项目特定特征的影响,而不是使用的编程语言。我们练习的高级结论是,贝叶斯统计技术可以以一种有原则的、灵活的方式应用于分析软件工程数据,并得出令人信服的结果,在强调其有效性边界的同时,告知最先进的技术。这些指导方针可以支持建立可靠的统计分析并将其结果联系起来。因此,它们可以帮助支持实证软件工程研究的持续进展。
{"title":"Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality","authors":"Carlo A. Furia, R. Torkar, R. Feldt","doi":"10.1145/3490953","DOIUrl":"https://doi.org/10.1145/3490953","url":null,"abstract":"Statistical analysis is the tool of choice to turn data into information and then information into empirical knowledge. However, the process that goes from data to knowledge is long, uncertain, and riddled with pitfalls. To be valid, it should be supported by detailed, rigorous guidelines that help ferret out issues with the data or model and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset’s original analysis [Ray et al. 55] and a critical reanalysis [Berger et al. 6] have attracted considerable attention—in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality—such as the impact of project-specific characteristics other than the used programming language. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state-of-the-art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results. Thus, they can help buttress continued progress in empirical software engineering research.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"8 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90251153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
In-IDE Code Generation from Natural Language: Promise and Challenges 从自然语言生成ide内代码:希望与挑战
Pub Date : 2021-01-27 DOI: 10.1145/3487569
Frank F. Xu, Bogdan Vasilescu, Graham Neubig
A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.
软件开发的很大一部分涉及概念化或沟通需要在程序中表达的底层过程和逻辑。编程的一个主要困难是将概念转化为代码,特别是在处理不熟悉的库的api时。最近,用于从自然语言查询生成和检索代码的机器学习方法激增,但这些方法主要是纯粹基于检索准确性或生成的代码与开发人员编写的代码的重叠来评估的,并且这些方法对开发人员工作流程的实际影响令人惊讶地未经证实。在本文中,我们对在PyCharm IDE中使用此类技术的前景和挑战进行了首次全面调查,并提出了以下问题:“在目前的技术状态下,它是否提高了开发人员的生产力或准确性?它如何影响开发人员的体验?还有哪些差距和挑战?”为了便于研究,我们首先为PyCharm IDE开发了一个插件,该插件实现了代码生成和代码检索功能的混合,并且我们编排了虚拟环境,以支持收集许多用户事件(例如,网页浏览,击键,细粒度代码编辑)。我们要求具有不同背景的开发人员完成14种Python编程任务中的7种,从基本的文件操作到机器学习或数据可视化,有或没有插件的帮助。虽然对开发人员经验的定性调查在很大程度上是积极的,但是关于提高生产力、代码质量或程序正确性的定量结果是不确定的。进一步的分析确定了几个痛点,这些痛点可以提高未来基于机器学习的代码生成/检索开发人员助手的有效性,并演示了开发人员何时更喜欢代码生成而不是代码检索,反之亦然。我们发布了所有的数据和软件,为未来对这个主题的实证研究铺平道路,以及开发更好的代码生成模型。
{"title":"In-IDE Code Generation from Natural Language: Promise and Challenges","authors":"Frank F. Xu, Bogdan Vasilescu, Graham Neubig","doi":"10.1145/3487569","DOIUrl":"https://doi.org/10.1145/3487569","url":null,"abstract":"A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"1 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83017717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data 基于软件开发者交流数据的表情符号情感和情感检测
Pub Date : 2021-01-27 DOI: 10.1145/3424308
Zhenpeng Chen, Yanbin Cao, Huihan Yao, Xuan Lu, Xin Peng, Hong Mei, Xuanzhe Liu
Sentiment and emotion detection from textual communication records of developers have various application scenarios in software engineering (SE). However, commonly used off-the-shelf sentiment/emotion detection tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to create labeled SE-related datasets manually and customize SE-specific methods. However, the scarce labeled data can cover only very limited lexicon and expressions. In this article, we employ emojis as an instrument to address this problem. Different from manual labels that are provided by annotators, emojis are self-reported labels provided by the authors themselves to intentionally convey affective states and thus are suitable indications of sentiment and emotion in texts. Since emojis have been widely adopted in online communication, a large amount of emoji-labeled texts can be easily accessed to help tackle the scarcity of the manually labeled data. Specifically, we leverage Tweets and GitHub posts containing emojis to learn representations of SE-related texts through emoji prediction. By predicting emojis containing in each text, texts that tend to surround the same emoji are represented with similar vectors, which transfers the sentiment knowledge contained in emoji usage to the representations of texts. Then we leverage the sentiment-aware representations as well as manually labeled data to learn the final sentiment/emotion classifier via transfer learning. Compared to existing approaches, our approach can achieve significant improvement on representative benchmark datasets, with an average increase of 0.036 and 0.049 in macro-F1 in sentiment and emotion detection, respectively. Further investigations reveal that the large-scale Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource but try to transform knowledge from the open domain through ubiquitous signals such as emojis. Finally, we present the open challenges of sentiment and emotion detection in SE through a qualitative analysis of texts misclassified by our approach.
基于开发者文本通信记录的情感和情感检测在软件工程中有着多种应用场景。然而,常用的现成的情绪/情绪检测工具无法在SE任务中获得可靠的结果,并且对技术知识的误解被证明是主要原因。然后,研究人员开始手动创建标记的se相关数据集,并定制se特定的方法。然而,稀缺的标记数据只能覆盖非常有限的词汇和表达式。在本文中,我们使用表情符号作为解决这个问题的工具。与注释者提供的手动标签不同,表情符号是作者自己提供的自我报告标签,旨在有意地传达情感状态,因此是文本中情绪和情感的合适指示。由于表情符号在在线交流中被广泛使用,因此可以轻松访问大量带有表情符号的文本,以帮助解决手动标记数据的稀缺性。具体来说,我们利用包含表情符号的tweet和GitHub帖子,通过表情符号预测来学习se相关文本的表示。通过预测每个文本中包含的表情符号,倾向于围绕相同表情符号的文本用相似的向量表示,这将表情符号使用中包含的情感知识转移到文本的表示中。然后,我们利用情感感知表示以及手动标记的数据,通过迁移学习来学习最终的情感/情感分类器。与现有方法相比,我们的方法在代表性基准数据集上取得了显著的改进,在情绪和情绪检测上的宏观f1均值分别提高了0.036和0.049。进一步的调查表明,大规模的推文对我们的方法的力量做出了关键贡献。这一发现提示未来的研究不要单方面追求特定领域的资源,而是试图通过无处不在的信号(如表情符号)从开放领域转换知识。最后,我们通过对我们的方法错误分类的文本进行定性分析,提出了SE中情绪和情感检测的公开挑战。
{"title":"Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data","authors":"Zhenpeng Chen, Yanbin Cao, Huihan Yao, Xuan Lu, Xin Peng, Hong Mei, Xuanzhe Liu","doi":"10.1145/3424308","DOIUrl":"https://doi.org/10.1145/3424308","url":null,"abstract":"Sentiment and emotion detection from textual communication records of developers have various application scenarios in software engineering (SE). However, commonly used off-the-shelf sentiment/emotion detection tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to create labeled SE-related datasets manually and customize SE-specific methods. However, the scarce labeled data can cover only very limited lexicon and expressions. In this article, we employ emojis as an instrument to address this problem. Different from manual labels that are provided by annotators, emojis are self-reported labels provided by the authors themselves to intentionally convey affective states and thus are suitable indications of sentiment and emotion in texts. Since emojis have been widely adopted in online communication, a large amount of emoji-labeled texts can be easily accessed to help tackle the scarcity of the manually labeled data. Specifically, we leverage Tweets and GitHub posts containing emojis to learn representations of SE-related texts through emoji prediction. By predicting emojis containing in each text, texts that tend to surround the same emoji are represented with similar vectors, which transfers the sentiment knowledge contained in emoji usage to the representations of texts. Then we leverage the sentiment-aware representations as well as manually labeled data to learn the final sentiment/emotion classifier via transfer learning. Compared to existing approaches, our approach can achieve significant improvement on representative benchmark datasets, with an average increase of 0.036 and 0.049 in macro-F1 in sentiment and emotion detection, respectively. Further investigations reveal that the large-scale Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource but try to transform knowledge from the open domain through ubiquitous signals such as emojis. Finally, we present the open challenges of sentiment and emotion detection in SE through a qualitative analysis of texts misclassified by our approach.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"42 1","pages":"1 - 48"},"PeriodicalIF":0.0,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87539413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Verification of Program Transformations with Inductive Refinement Types 具有归纳细化类型的程序转换的验证
Pub Date : 2021-01-20 DOI: 10.1145/3409805
Ahmad Salim Al-Sibahi, T. Jensen, Aleksandar S. Dimovski, A. Wąsowski
High-level transformation languages like Rascal include expressive features for manipulating large abstract syntax trees: first-class traversals, expressive pattern matching, backtracking, and generalized iterators. We present the design and implementation of an abstract interpretation tool, Rabit, for verifying inductive type and shape properties for transformations written in such languages. We describe how to perform abstract interpretation based on operational semantics, specifically focusing on the challenges arising when analyzing the expressive traversals and pattern matching. Finally, we evaluate Rabit on a series of transformations (normalization, desugaring, refactoring, code generators, type inference, etc.) showing that we can effectively verify stated properties.
像Rascal这样的高级转换语言包括用于操作大型抽象语法树的表达性特性:一级遍历、表达性模式匹配、回溯和广义迭代器。我们提出了一个抽象解释工具Rabit的设计和实现,用于验证用这些语言编写的转换的归纳类型和形状属性。我们描述了如何基于操作语义执行抽象解释,特别关注在分析表达遍历和模式匹配时出现的挑战。最后,我们在一系列转换(规范化、去糖化、重构、代码生成器、类型推断等)上对Rabit进行了评估,表明我们可以有效地验证声明的属性。
{"title":"Verification of Program Transformations with Inductive Refinement Types","authors":"Ahmad Salim Al-Sibahi, T. Jensen, Aleksandar S. Dimovski, A. Wąsowski","doi":"10.1145/3409805","DOIUrl":"https://doi.org/10.1145/3409805","url":null,"abstract":"High-level transformation languages like Rascal include expressive features for manipulating large abstract syntax trees: first-class traversals, expressive pattern matching, backtracking, and generalized iterators. We present the design and implementation of an abstract interpretation tool, Rabit, for verifying inductive type and shape properties for transformations written in such languages. We describe how to perform abstract interpretation based on operational semantics, specifically focusing on the challenges arising when analyzing the expressive traversals and pattern matching. Finally, we evaluate Rabit on a series of transformations (normalization, desugaring, refactoring, code generators, type inference, etc.) showing that we can effectively verify stated properties.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"230 ","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91550646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Security Smells in Ansible and Chef Scripts Ansible和Chef脚本中的安全气味
Pub Date : 2021-01-20 DOI: 10.1145/3408897
A. Rahman, Md. Rayhanur Rahman, Chris Parnin, L. Williams
Context: Security smells are recurring coding patterns that are indicative of security weakness and require further inspection. As infrastructure as code (IaC) scripts, such as Ansible and Chef scripts, are used to provision cloud-based servers and systems at scale, security smells in IaC scripts could be used to enable malicious users to exploit vulnerabilities in the provisioned systems. Goal: The goal of this article is to help practitioners avoid insecure coding practices while developing infrastructure as code scripts through an empirical study of security smells in Ansible and Chef scripts. Methodology: We conduct a replication study where we apply qualitative analysis with 1,956 IaC scripts to identify security smells for IaC scripts written in two languages: Ansible and Chef. We construct a static analysis tool called Security Linter for Ansible and Chef scripts (SLAC) to automatically identify security smells in 50,323 scripts collected from 813 open source software repositories. We also submit bug reports for 1,000 randomly selected smell occurrences. Results: We identify two security smells not reported in prior work: missing default in case statement and no integrity check. By applying SLAC we identify 46,600 occurrences of security smells that include 7,849 hard-coded passwords. We observe agreement for 65 of the responded 94 bug reports, which suggests the relevance of security smells for Ansible and Chef scripts amongst practitioners. Conclusion: We observe security smells to be prevalent in Ansible and Chef scripts, similarly to that of the Puppet scripts. We recommend practitioners to rigorously inspect the presence of the identified security smells in Ansible and Chef scripts using (i) code review, and (ii) static analysis tools.
上下文:安全气味是反复出现的编码模式,表明存在安全弱点,需要进一步检查。由于基础设施即代码(IaC)脚本(如Ansible和Chef脚本)用于大规模地提供基于云的服务器和系统,因此IaC脚本中的安全气味可用于使恶意用户能够利用所提供系统中的漏洞。目标:本文的目标是通过对Ansible和Chef脚本中的安全气味的实证研究,帮助实践者在开发作为代码脚本的基础架构时避免不安全的编码实践。方法:我们进行了一项复制研究,我们对1956个IaC脚本进行定性分析,以确定用两种语言(Ansible和Chef)编写的IaC脚本的安全气味。我们构建了一个静态分析工具,称为Ansible和Chef脚本的Security Linter (SLAC),用于自动识别从813个开源软件库收集的50,323个脚本中的安全气味。我们还为1000个随机选择的气味事件提交bug报告。结果:我们确定了在以前的工作中未报告的两个安全气味:缺少默认情况声明和没有完整性检查。通过应用SLAC,我们识别了46,600次安全气味,其中包括7,849个硬编码密码。我们观察到94个bug报告中有65个是一致的,这表明从业者之间存在Ansible和Chef脚本的安全气味的相关性。结论:我们观察到安全气味在Ansible和Chef脚本中普遍存在,与Puppet脚本相似。我们建议从业者使用(i)代码审查和(ii)静态分析工具严格检查Ansible和Chef脚本中已识别的安全气味的存在。
{"title":"Security Smells in Ansible and Chef Scripts","authors":"A. Rahman, Md. Rayhanur Rahman, Chris Parnin, L. Williams","doi":"10.1145/3408897","DOIUrl":"https://doi.org/10.1145/3408897","url":null,"abstract":"Context: Security smells are recurring coding patterns that are indicative of security weakness and require further inspection. As infrastructure as code (IaC) scripts, such as Ansible and Chef scripts, are used to provision cloud-based servers and systems at scale, security smells in IaC scripts could be used to enable malicious users to exploit vulnerabilities in the provisioned systems. Goal: The goal of this article is to help practitioners avoid insecure coding practices while developing infrastructure as code scripts through an empirical study of security smells in Ansible and Chef scripts. Methodology: We conduct a replication study where we apply qualitative analysis with 1,956 IaC scripts to identify security smells for IaC scripts written in two languages: Ansible and Chef. We construct a static analysis tool called Security Linter for Ansible and Chef scripts (SLAC) to automatically identify security smells in 50,323 scripts collected from 813 open source software repositories. We also submit bug reports for 1,000 randomly selected smell occurrences. Results: We identify two security smells not reported in prior work: missing default in case statement and no integrity check. By applying SLAC we identify 46,600 occurrences of security smells that include 7,849 hard-coded passwords. We observe agreement for 65 of the responded 94 bug reports, which suggests the relevance of security smells for Ansible and Chef scripts amongst practitioners. Conclusion: We observe security smells to be prevalent in Ansible and Chef scripts, similarly to that of the Puppet scripts. We recommend practitioners to rigorously inspect the presence of the identified security smells in Ansible and Chef scripts using (i) code review, and (ii) static analysis tools.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"172 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83446680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
ACM Transactions on Software Engineering and Methodology (TOSEM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1