Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement最新文献_第2页

Building an Ensemble for Software Defect Prediction Based on Diversity Selection 基于多样性选择的软件缺陷预测集成构建

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962610

Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo

Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.

背景:集成技术在各个科学领域得到了广泛的关注。缺陷预测研究人员已经研究了许多最先进的集成模型，并得出结论，在许多情况下，这些模型优于标准的单一分类器技术。几乎所有先前使用集成技术进行缺陷预测的工作都依赖于多数投票方案来组合预测输出，以及单个分类器之间的隐式多样性。目的:考虑到不同的分类器识别不同的缺陷集，研究是否可以使用带有堆叠集成的显式多样性技术来改进缺陷预测。方法:采用四科分类器和加权精度多样性(WAD)技术来挖掘分类器之间的多样性。为了结合单个预测，我们使用了堆叠集成技术。我们在软件缺陷预测中使用最先进的知识来构建我们的集成模型，并针对8个公开可用的数据集测试了它们的预测能力。结论:与其他缺陷预测模型相比，使用堆叠集成模型的性能有所提高。用于建筑集成的分类器之间的多样性对于实现这些性能改进至关重要。

{"title":"Building an Ensemble for Software Defect Prediction Based on Diversity Selection","authors":"Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo","doi":"10.1145/2961111.2962610","DOIUrl":"https://doi.org/10.1145/2961111.2962610","url":null,"abstract":"Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Worse Than Spam: Issues In Sampling Software Developers 比垃圾邮件更糟糕:抽样软件开发人员的问题

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962628

Sebastian Baltes, S. Diehl

Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.

背景:接触专业软件开发人员是经验软件工程研究的关键部分。调查研究是调查实践状况的一种重要方法。由于抽取专业软件开发人员的随机样本进行调查是不可能的，研究人员依赖于各种抽样策略。目的:在本文中，我们报告了我们使用不同抽样策略的经验，强调了伦理问题，并激发了维护软件开发人员关键人口统计数据的需要，以简化研究的外部有效性评估。方法:我们的报告基于我们过去进行的两项研究的数据。结果:通过公共媒体联系开发者是最有效的抽样策略。然而，我们不仅描述了那些对达到大量参与者或高回复率等目标感兴趣的研究人员的观点，而且我们还揭示了不同采样策略的伦理含义。我们提出了一个具体的道德准则，并指出在其他研究社区的争论，在软件工程研究社区开始讨论哪种抽样策略应该被认为是道德的。

{"title":"Worse Than Spam: Issues In Sampling Software Developers","authors":"Sebastian Baltes, S. Diehl","doi":"10.1145/2961111.2962628","DOIUrl":"https://doi.org/10.1145/2961111.2962628","url":null,"abstract":"Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Function Point Analysis for Software Maintenance 软件维护的功能点分析

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962613

Anandi Hira, B. Boehm

Context: Software maintenance is required to fix defects, adapt to changes in the environment, and meet new or changed user requirements. The effort of these tasks need to be estimated to track progress, manage resources, and make decisions. Most widely used cost models use source lines of code (SLOC) as the software size input measure, due to its quantifiability and high correlation with effort. Estimating the SLOC of a project is very difficult in early stages of the software lifecycle. Function Points (FPs) represents software size by functions or modifications to functions, making them easier to calculate early in the lifecycle for new development projects or maintenance tasks. Several cost estimators use FPs to estimate the SLOC of a project to take advantage of existing cost models. Goal: Through empirical analysis, the authors want to determine whether FPs can effectively estimate maintenance tasks, as a better alternative to using SLOC as a software size metric. Additionally, the authors will demonstrate that FPs to SLOC ratios add uncertainty to effort estimates. Method: The empirical analysis will be run on Unified Code Count (UCC)'s dataset, a software tool maintained by University of Southern California (USC). Results: The analyses found that separating projects adding new functions from those modifying existing functions resulted in improved estimation models using FPs. The effort estimation model for projects adding functions to UCC had high prediction accuracy statistics, but less impressive results for projects modifying existing functions in UCC. The effort estimation accuracy became unsatisfactorily low when using a FPs to SLOC ratio. Conclusions: Cost estimators should not use FPs to SLOC ratios for effort estimation due to low prediction accuracy. FPs is only an effective size measure for a portion of UCC's maintenance tasks - specifically for the projects adding new functions to UCC. Another size measure may need to be considered that might be more effective independently or in conjunction with FPs for all of UCC's maintenance tasks.

背景:需要进行软件维护，以修复缺陷，适应环境的变化，并满足新的或变更的用户需求。需要对这些任务的工作量进行评估，以跟踪进度、管理资源和做出决策。大多数广泛使用的成本模型使用源代码行(SLOC)作为软件大小输入度量，因为它的可量化性和与工作量的高度相关性。在软件生命周期的早期阶段，评估项目的SLOC是非常困难的。功能点(FPs)通过功能或对功能的修改来表示软件大小，使它们更容易在新开发项目或维护任务的生命周期早期进行计算。一些成本估算师使用FPs来估算项目的SLOC，以利用现有的成本模型。目标:通过实证分析，作者想要确定FPs是否可以有效地评估维护任务，作为使用SLOC作为软件大小度量的更好选择。此外，作者将证明FPs与SLOC的比率增加了工作量估算的不确定性。方法:实证分析将在南加州大学(USC)维护的统一代码计数(UCC)软件工具的数据集上运行。结果:分析发现，将增加新功能的项目与修改现有功能的项目分开，可以使用FPs改进估计模型。对于向UCC中添加功能的项目，工作量估计模型具有较高的预测精度统计，但是对于修改UCC中现有功能的项目，结果不太令人印象深刻。当使用FPs / SLOC比率时，工作量估计精度变得不令人满意。结论:由于预测精度低，成本估算者不应该使用FPs与SLOC比率进行工作量估算。对于UCC的一部分维护任务来说，FPs只是一个有效的衡量尺度——特别是对于为UCC添加新功能的项目。对于UCC的所有维护任务，可能需要考虑另一种尺寸度量，它可能更有效地独立或与FPs结合使用。

{"title":"Function Point Analysis for Software Maintenance","authors":"Anandi Hira, B. Boehm","doi":"10.1145/2961111.2962613","DOIUrl":"https://doi.org/10.1145/2961111.2962613","url":null,"abstract":"Context: Software maintenance is required to fix defects, adapt to changes in the environment, and meet new or changed user requirements. The effort of these tasks need to be estimated to track progress, manage resources, and make decisions. Most widely used cost models use source lines of code (SLOC) as the software size input measure, due to its quantifiability and high correlation with effort. Estimating the SLOC of a project is very difficult in early stages of the software lifecycle. Function Points (FPs) represents software size by functions or modifications to functions, making them easier to calculate early in the lifecycle for new development projects or maintenance tasks. Several cost estimators use FPs to estimate the SLOC of a project to take advantage of existing cost models. Goal: Through empirical analysis, the authors want to determine whether FPs can effectively estimate maintenance tasks, as a better alternative to using SLOC as a software size metric. Additionally, the authors will demonstrate that FPs to SLOC ratios add uncertainty to effort estimates. Method: The empirical analysis will be run on Unified Code Count (UCC)'s dataset, a software tool maintained by University of Southern California (USC). Results: The analyses found that separating projects adding new functions from those modifying existing functions resulted in improved estimation models using FPs. The effort estimation model for projects adding functions to UCC had high prediction accuracy statistics, but less impressive results for projects modifying existing functions in UCC. The effort estimation accuracy became unsatisfactorily low when using a FPs to SLOC ratio. Conclusions: Cost estimators should not use FPs to SLOC ratios for effort estimation due to low prediction accuracy. FPs is only an effective size measure for a portion of UCC's maintenance tasks - specifically for the projects adding new functions to UCC. Another size measure may need to be considered that might be more effective independently or in conjunction with FPs for all of UCC's maintenance tasks.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Semantic Coupling Between Classes: Corpora or Identifiers? 类之间的语义耦合:语料库还是标识符?

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962622

N. Ajienka, A. Capiluppi

Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved. Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes. Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools. Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time. Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.

上下文:概念耦合是通过考虑嵌入在注释和标识符中的语义信息来度量两个软件构件之间的关联有多松散或多紧密。这种类型的耦合通常使用从源代码到语料库的语义信息进行评估。语料库的提取可能会很长，特别是当系统很大并且涉及许多类时。目标:本研究探讨是否可以仅使用类标识符(例如，类名)来评估类之间的概念耦合，而不是使用整个类的单词语料库。方法:在本研究中，我们分析了两个Java系统，并使用(i)基于语料库的方法提取类对之间的概念耦合;(ii)两个基于标识符的工具。结果:我们的结果表明，使用(仅)类的标识符度量类之间的语义相似性与使用类语料库相似。此外，使用标识符在精度、召回率和计算时间方面更有效。结论:仅使用类标识符来度量它们的语义相似度可以节省大型软件项目的程序理解任务的时间;本文的研究结果支持了这一假设，因为评估中使用的系统也可以用来指导研究人员开发支持程序理解的未来几代工具。

{"title":"Semantic Coupling Between Classes: Corpora or Identifiers?","authors":"N. Ajienka, A. Capiluppi","doi":"10.1145/2961111.2962622","DOIUrl":"https://doi.org/10.1145/2961111.2962622","url":null,"abstract":"Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved. Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes. Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools. Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time. Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131849046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

The Intersection of Continuous Deployment and Architecting Process: Practitioners' Perspectives 持续部署和架构过程的交集:实践者的视角

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962587

Mojtaba Shahin, M. Babar, Liming Zhu

Context: Development and Operations (DevOps) is an emerging software industry movement to bridge the gap between software development and operations teams. DevOps supports frequently and reliably releasing new features and products-- thus subsuming Continuous Deployment (CD) practice. Goal: This research aims at empirically exploring the potential impact of CD practice on architecting process. Method: We carried out a case study involving interviews with 16 software practitioners. Results: We have identified (1) a range of recurring architectural challenges (i.e., highly coupled monolithic architecture, team dependencies, and ever-changing operational environments and tools) and (2) five main architectural principles (i.e., small and independent deployment units, not too much focus on reusability, aggregating logs, isolating changes, and testability inside the architecture) that should be considered when an application is (re-) architected for CD practice. This study also supports that software architecture can better support operations if an operations team is engaged at an early stage of software development for taking operational aspects into considerations. Conclusion: These findings provide evidence that software architecture plays a significant role in successfully and efficiently adopting continuous deployment. The findings contribute to establish an evidential body of knowledge about the state of the art of architecting for CD practice

上下文:开发和运维(DevOps)是一种新兴的软件行业运动，旨在弥合软件开发和运维团队之间的鸿沟。DevOps支持频繁且可靠地发布新特性和产品——因此包含了持续部署(CD)实践。目的:本研究旨在实证地探索CD实践对架构过程的潜在影响。方法:对16位软件从业人员进行了个案研究。结果:我们已经确定了(1)一系列重复出现的架构挑战(例如，高度耦合的整体架构，团队依赖关系，以及不断变化的操作环境和工具)和(2)五个主要的架构原则(例如，小而独立的部署单元，不要过多地关注可重用性，聚合日志，隔离更改，以及架构内部的可测试性)，当应用程序为CD实践(重新)架构时，应该考虑这些原则。这项研究还支持，如果操作团队在软件开发的早期阶段就考虑到操作方面，那么软件架构可以更好地支持操作。结论:这些发现证明了软件架构在成功和有效地采用持续部署方面起着重要的作用。这些发现有助于建立一个关于CD实践的架构艺术状态的证据性知识体系

{"title":"The Intersection of Continuous Deployment and Architecting Process: Practitioners' Perspectives","authors":"Mojtaba Shahin, M. Babar, Liming Zhu","doi":"10.1145/2961111.2962587","DOIUrl":"https://doi.org/10.1145/2961111.2962587","url":null,"abstract":"Context: Development and Operations (DevOps) is an emerging software industry movement to bridge the gap between software development and operations teams. DevOps supports frequently and reliably releasing new features and products-- thus subsuming Continuous Deployment (CD) practice. Goal: This research aims at empirically exploring the potential impact of CD practice on architecting process. Method: We carried out a case study involving interviews with 16 software practitioners. Results: We have identified (1) a range of recurring architectural challenges (i.e., highly coupled monolithic architecture, team dependencies, and ever-changing operational environments and tools) and (2) five main architectural principles (i.e., small and independent deployment units, not too much focus on reusability, aggregating logs, isolating changes, and testability inside the architecture) that should be considered when an application is (re-) architected for CD practice. This study also supports that software architecture can better support operations if an operations team is engaged at an early stage of software development for taking operational aspects into considerations. Conclusion: These findings provide evidence that software architecture plays a significant role in successfully and efficiently adopting continuous deployment. The findings contribute to establish an evidential body of knowledge about the state of the art of architecting for CD practice","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130953436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Bottom-up Adoption of Continuous Delivery in a Stage-Gate Managed Software Organization 在阶段门管理的软件组织中自下而上地采用持续交付

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962608

E. Laukkanen, Timo O. A. Lehtinen, Juha Itkonen, M. Paasivaara, C. Lassenius

Context: Continuous delivery (CD) is a development practice for decreasing the time-to-market by keeping software releasable all the time. Adopting CD within a stage-gate managed development process might be useful, although scientific evidence of such adoption is not available. In a stage-gate process, new releases pass through stages and gates protect low-quality output from progressing. Large organizations with stage-gate processes are often hierarchical and the adoption can be either top-down, driven by the management, or bottom-up, driven by the development unit. Goal: We investigate the perceived problems of bottom-up CD adoption in a large global software development unit at Nokia Networks. Our goal is to understand how the stage-gate development process used by the unit affects the adoption. Method: The overall research approach is a qualitative single case study on one of the several geographical sites of the development unit. We organized two 2-hour workshops with altogether 15 participants to discover how the stage-gate process affected the adoption. Results: The stage-gate development process caused tight schedules for development and process overhead because of the gate requirements. Moreover, the process required using multiple version control branches for different stages in the process, which increased development complexity and caused additional branch overhead. Together, tight schedule, process overhead and branch overhead caused the lack of time to adopt CD. In addition, the use of multiple branches limited the available hardware resources and caused delayed integration. Conclusions: Adopting CD in a development organization that needs to conform to a stage-gate development process is challenging. Practitioners should either gain support from the management to relax the required process or reduce their expectations on what can be achieved while conforming to the process. To simplify the development process, the use of multiple version control branches could be replaced with feature toggles.

上下文:持续交付(CD)是一种开发实践，通过始终保持软件的可发布性来缩短上市时间。在阶段门管理的开发过程中采用CD可能是有用的，尽管没有这种采用的科学证据。在阶段-门过程中，新版本要经过几个阶段，而门则保护低质量的输出不受进展的影响。具有阶段-门过程的大型组织通常是分层的，采用可以是由管理层驱动的自顶向下的，或者由开发单元驱动的自底向上的。目标:我们调查了在Nokia Networks的一个大型全球软件开发单位中自下而上采用CD的感知问题。我们的目标是理解单元所使用的阶段-门开发过程是如何影响采用的。方法:整体研究方法是对开发单位的几个地理地点之一进行定性的单一案例研究。我们组织了两个2小时的研讨会，共有15名参与者，以了解阶段-门过程如何影响采用。结果:阶段门开发过程由于门的需求导致了紧凑的开发时间表和过程开销。此外，该流程需要在流程的不同阶段使用多个版本控制分支，这增加了开发复杂性并导致了额外的分支开销。紧凑的日程安排、流程开销和分支开销共同导致采用CD的时间不足。此外，多个分支的使用限制了可用的硬件资源，并导致集成延迟。结论:在需要遵循阶段-门开发过程的开发组织中采用CD是具有挑战性的。从业者应该从管理层那里获得支持，以放松所需要的过程，或者降低他们对在遵循过程的同时可以实现什么的期望。为了简化开发过程，多个版本控制分支的使用可以被功能切换所取代。

{"title":"Bottom-up Adoption of Continuous Delivery in a Stage-Gate Managed Software Organization","authors":"E. Laukkanen, Timo O. A. Lehtinen, Juha Itkonen, M. Paasivaara, C. Lassenius","doi":"10.1145/2961111.2962608","DOIUrl":"https://doi.org/10.1145/2961111.2962608","url":null,"abstract":"Context: Continuous delivery (CD) is a development practice for decreasing the time-to-market by keeping software releasable all the time. Adopting CD within a stage-gate managed development process might be useful, although scientific evidence of such adoption is not available. In a stage-gate process, new releases pass through stages and gates protect low-quality output from progressing. Large organizations with stage-gate processes are often hierarchical and the adoption can be either top-down, driven by the management, or bottom-up, driven by the development unit. Goal: We investigate the perceived problems of bottom-up CD adoption in a large global software development unit at Nokia Networks. Our goal is to understand how the stage-gate development process used by the unit affects the adoption. Method: The overall research approach is a qualitative single case study on one of the several geographical sites of the development unit. We organized two 2-hour workshops with altogether 15 participants to discover how the stage-gate process affected the adoption. Results: The stage-gate development process caused tight schedules for development and process overhead because of the gate requirements. Moreover, the process required using multiple version control branches for different stages in the process, which increased development complexity and caused additional branch overhead. Together, tight schedule, process overhead and branch overhead caused the lack of time to adopt CD. In addition, the use of multiple branches limited the available hardware resources and caused delayed integration. Conclusions: Adopting CD in a development organization that needs to conform to a stage-gate development process is challenging. Practitioners should either gain support from the management to relax the required process or reduce their expectations on what can be achieved while conforming to the process. To simplify the development process, the use of multiple version control branches could be replaced with feature toggles.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131251071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

The Obscure Process of Innovation Assessment: A Report of an Industrial Survey 创新评估的模糊过程:一项产业调查报告

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962634

A. C. A. França, E. Peixoto, Bruno Falcão, Cleviton V. F. Monteiro

Context - Software companies should track innovation as rigorously as core business operations. For that, the assessment of innovation projects is a critical process, in particular to make their innovation initiatives funded. Objective - In this article, we aim to evaluate the need for more practical measurement tools, by checking the agreement of very experienced analysts, from the industry, about the innovation degree of four actual software projects. Method - We conducted a survey with eight business analysts, using a combination of the Three Horizons Model and the Gartner's Hyper Cycle for emerging technologies as a frame of reference. Results - In general, the level of agreement about the innovation degree in the projects was very low. Looking at the cases in isolation, it is possible to suggest reasons for the low level of agreement between the evaluators. Conclusions - Our data support the fact that innovation is an activity difficult to characterize and even more difficult to measure, and the need for practices to achieve better intersubjective agreement for innovation assessment became evident in this work.

软件公司应该像跟踪核心业务运营一样严格跟踪创新。为此，对创新项目的评估是一个关键的过程，特别是要为他们的创新计划提供资金。在这篇文章中，我们的目标是通过检查行业中非常有经验的分析师对四个实际软件项目的创新程度的一致意见，来评估对更实用的度量工具的需求。方法-我们与8位商业分析师进行了一项调查，使用三个地平线模型和高德纳新兴技术的超周期作为参考框架。结果-总体而言，对项目创新程度的认同程度非常低。孤立地看待这些案例，有可能提出评估者之间意见不一致的原因。结论-我们的数据支持这样一个事实，即创新是一种难以描述的活动，甚至更难衡量，并且在这项工作中，实现更好的创新评估主体间协议的实践需求变得明显。

{"title":"The Obscure Process of Innovation Assessment: A Report of an Industrial Survey","authors":"A. C. A. França, E. Peixoto, Bruno Falcão, Cleviton V. F. Monteiro","doi":"10.1145/2961111.2962634","DOIUrl":"https://doi.org/10.1145/2961111.2962634","url":null,"abstract":"Context - Software companies should track innovation as rigorously as core business operations. For that, the assessment of innovation projects is a critical process, in particular to make their innovation initiatives funded. Objective - In this article, we aim to evaluate the need for more practical measurement tools, by checking the agreement of very experienced analysts, from the industry, about the innovation degree of four actual software projects. Method - We conducted a survey with eight business analysts, using a combination of the Three Horizons Model and the Gartner's Hyper Cycle for emerging technologies as a frame of reference. Results - In general, the level of agreement about the innovation degree in the projects was very low. Looking at the cases in isolation, it is possible to suggest reasons for the low level of agreement between the evaluators. Conclusions - Our data support the fact that innovation is an activity difficult to characterize and even more difficult to measure, and the need for practices to achieve better intersubjective agreement for innovation assessment became evident in this work.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114718939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach 使用多站点盲分析方法对测试驱动开发效果的外部复制

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962592

D. Fucci, G. Scanniello, Simone Romano, M. Shepperd, Boyce Sigweni, F. Uyaguari, Burak Turhan, Natalia Juristo Juzgado, M. Oivo

Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

上下文:测试驱动开发(TDD)是一种敏捷实践，声称可以提高软件产品的质量，以及开发人员的生产力。奥卢大学(芬兰)先前的一项研究(即基线实验)通过随机对照试验将TDD与测试后开发(TLD)方法进行了比较。研究结果并不能支持这些说法。目的:我们希望通过在巴西利卡塔大学(意大利)使用不同的设计进行复制来验证原始研究结果。方法:采用交叉设计，在21名研究生中重复基线实验。我们保持设置和背景尽可能接近基线实验。为了限制研究者的偏倚，我们引入了另外两个站点(UPM，西班牙和Brunel，英国)对数据进行盲分析。结果:Kruskal-Wallis测试在测试工作量(p值= 0.27)、外部代码质量(p值= 0.82)和开发人员的生产力(p值= 0.83)方面没有显示TDD和TLD之间的任何显著差异。尽管如此，我们的数据显示了基于TDD和TLD应用顺序的差异，尽管没有结转效应。结论:我们验证了基线研究结果，但我们的结果引起了对实验对象选择的关注，特别是它们与应用治疗顺序的相互作用。我们建议未来的研究调查在实验评估TDD中使用的任务。最后，为了降低重复研究的成本和减少研究者的偏倚，我们鼓励其他研究小组采用本文描述的类似的多位点盲分析方法。

{"title":"An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach","authors":"D. Fucci, G. Scanniello, Simone Romano, M. Shepperd, Boyce Sigweni, F. Uyaguari, Burak Turhan, Natalia Juristo Juzgado, M. Oivo","doi":"10.1145/2961111.2962592","DOIUrl":"https://doi.org/10.1145/2961111.2962592","url":null,"abstract":"Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

A Study of Documentation in Agile Software Projects 敏捷软件项目中的文档研究

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962616

Stefan Voigt, Jörg von Garrel, Julia Müller, D. Wirth

Although agile methods have become established in software engineering, documentation in projects is rare. Employing a theoretical model of information and documentation, our paper analyzes documentation practices in agile software projects in their entirety. Our analysis uses method triangulation: partly-structured interviews, observation and online survey. We demonstrate the correlation between satisfaction with information searches and the amount of documentation that exists for most types of information as an example. Also digital searches demand nearly twice as much time as documentation. In the conclusion, we provide recommendations on the use of supporting methods or tools to shape agile documentation.

尽管敏捷方法已经在软件工程中建立起来，但是项目中的文档很少。本文采用信息和文档的理论模型，从整体上分析了敏捷软件项目中的文档实践。我们的分析采用三角法:部分结构化访谈、观察和在线调查。作为一个例子，我们展示了对信息搜索的满意度与大多数信息类型存在的文档数量之间的相关性。此外，数字搜索所需的时间几乎是文档的两倍。在结论部分，我们提供了关于使用支持方法或工具来塑造敏捷文档的建议。

引用次数: 21

Is Newer Always Better?: The Case of Vulnerability Prediction Models 更新的总是更好吗?:漏洞预测模型案例

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962612

A. Hovsepyan, R. Scandariato, W. Joosen

Finding security vulnerabilities in the source code as early as possible is becoming more and more essential. In this respect, vulnerability prediction models have the potential to help the security assurance activities by identifying code locations that deserve the most attention. In this paper, we investigate whether prediction models behave like milk (i.e., they turn with time) or wine (i.e., the improve with time) when used to predict future vulnerabilities. Our findings indicate that the recall values are largely in favor of predictors based on older versions. However, the better recall comes at the price of much higher file inspection ratio values.

尽早发现源代码中的安全漏洞变得越来越重要。在这方面，漏洞预测模型有可能通过识别最值得关注的代码位置来帮助安全保证活动。在本文中，我们研究了预测模型在用于预测未来漏洞时是否表现得像牛奶(即，它们随着时间的推移而变化)或葡萄酒(即，随着时间的推移而改善)。我们的研究结果表明，召回值在很大程度上有利于基于旧版本的预测。然而，更好的召回是以更高的文件检查比率值为代价的。

引用次数: 17