Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement最新文献_第5页

DIGS: A Framework for Discovering Goals for Security Requirements Engineering DIGS:用于发现安全需求工程目标的框架

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962599

M. Riaz, Jonathan Stallings, Munindar P. Singh, John Slankas, L. Williams

Context: The security goals of a software system provide a foundation for security requirements engineering. Identifying security goals is a process of iteration and refinement, leveraging the knowledge and expertise of the analyst to secure not only the core functionality but the security mechanisms as well. Moreover, a comprehensive security plan should include goals for not only preventing a breach, but also for detecting and appropriately responding in case a breach does occur. Goal: The objective of this research is to support analysts in security requirements engineering by providing a framework that supports a systematic and comprehensive discovery of security goals for a software system. Method: We develop a framework, Discovering Goals for Security (DIGS), that models the key entities in information security, including assets and security goals. We systematically develop a set of security goal patterns that capture multiple dimensions of security for assets. DIGS explicitly captures the relations and assumptions that underlie security goals to elicit implied goals. We map the goal patterns to NIST controls to help in operationalizing the goals. We evaluate DIGS via a controlled experiment where 28 participants analyzed systems from mobile banking and human resource management domains. Results: Participants considered security goals commensurate to the knowledge available to them. Although the overall recall was low given the empirical constraints, participants using DIGS identified more implied goals and felt more confident in completing the task. Conclusion: Explicitly providing the additional knowledge for the identification of implied security goals significantly increased the chances of discovering such goals, thereby improving coverage of stakeholder security requirements, even if they are unstated.

上下文:软件系统的安全目标为安全需求工程提供了基础。确定安全目标是一个迭代和细化的过程，利用分析人员的知识和专业知识，不仅确保核心功能的安全，还确保安全机制的安全。此外，全面的安全计划不仅应该包括防止违规的目标，还应该包括在违规发生时检测和适当响应的目标。目标:本研究的目标是通过提供一个框架来支持安全需求工程中的分析人员，该框架支持对软件系统的安全目标进行系统和全面的发现。方法:我们开发了一个框架，发现安全目标(DIGS)，它对信息安全中的关键实体建模，包括资产和安全目标。我们系统地开发了一组安全目标模式，用于捕获资产安全的多个维度。DIGS显式地捕获作为安全目标基础的关系和假设，从而引出隐含的目标。我们将目标模式映射到NIST控件，以帮助实现目标。我们通过一项对照实验来评估DIGS，其中28名参与者分析了来自移动银行和人力资源管理领域的系统。结果:参与者认为安全目标与他们可获得的知识相称。尽管在经验约束下，总体回忆率较低，但使用DIGS的参与者识别出更多隐含目标，并对完成任务更有信心。结论:明确地为识别隐含的安全目标提供额外的知识，显著地增加了发现此类目标的机会，从而改进涉众安全需求的覆盖范围，即使它们没有说明。

{"title":"DIGS: A Framework for Discovering Goals for Security Requirements Engineering","authors":"M. Riaz, Jonathan Stallings, Munindar P. Singh, John Slankas, L. Williams","doi":"10.1145/2961111.2962599","DOIUrl":"https://doi.org/10.1145/2961111.2962599","url":null,"abstract":"Context: The security goals of a software system provide a foundation for security requirements engineering. Identifying security goals is a process of iteration and refinement, leveraging the knowledge and expertise of the analyst to secure not only the core functionality but the security mechanisms as well. Moreover, a comprehensive security plan should include goals for not only preventing a breach, but also for detecting and appropriately responding in case a breach does occur. Goal: The objective of this research is to support analysts in security requirements engineering by providing a framework that supports a systematic and comprehensive discovery of security goals for a software system. Method: We develop a framework, Discovering Goals for Security (DIGS), that models the key entities in information security, including assets and security goals. We systematically develop a set of security goal patterns that capture multiple dimensions of security for assets. DIGS explicitly captures the relations and assumptions that underlie security goals to elicit implied goals. We map the goal patterns to NIST controls to help in operationalizing the goals. We evaluate DIGS via a controlled experiment where 28 participants analyzed systems from mobile banking and human resource management domains. Results: Participants considered security goals commensurate to the knowledge available to them. Although the overall recall was low given the empirical constraints, participants using DIGS identified more implied goals and felt more confident in completing the task. Conclusion: Explicitly providing the additional knowledge for the identification of implied security goals significantly increased the chances of discovering such goals, thereby improving coverage of stakeholder security requirements, even if they are unstated.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115630144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Software Development Practices, Barriers in the Field and the Relationship to Software Quality 软件开发实践、领域中的障碍以及与软件质量的关系

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962614

B. Yost, Michael J. Coblenz, B. Myers, Joshua Sunshine, Jonathan Aldrich, Sam Weber, M. Patron, M. Heeren, S. Krueger, M. Pfaff

Context: Critical software systems developed for the government continue to be of lower quality than expected, despite extensive literature describing best practices in software engineering. Goal: We wanted to better understand the extent of certain issues in the field and the relationship to software quality. Method: We surveyed fifty software development professionals and asked about practices and barriers in the field and the resulting software quality. Results: There is evidence of certain problematic issues for developers and specific quality characteristics that seem to be affected. Conclusions: This motivates future work to address the most problematic barriers and issues impacting software quality.

背景:尽管有大量文献描述了软件工程中的最佳实践，但为政府开发的关键软件系统的质量仍然低于预期。目标:我们想要更好地理解领域中某些问题的程度以及与软件质量的关系。方法:我们调查了50位软件开发专业人员，并询问了该领域的实践和障碍以及由此产生的软件质量。结果:有证据表明开发人员的某些问题和特定的质量特征似乎受到了影响。结论:这激发了未来的工作，以解决影响软件质量的最有问题的障碍和问题。

引用次数: 6

A Study on the Influence of Software and Hardware Features on Program Energy 软件和硬件特性对程序能量影响的研究

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962593

A. Rajan, Adel Noureddine, Panagiotis Stratis

Software energy consumption has emerged as a growing concern in recent years. Managing the energy consumed by a software is, however, a difficult challenge due to the large number of factors affecting it -- namely, features of the processor, memory, cache, and other hardware components, characteristics of the program and the workload running, OS routines, compiler optimisations, among others. In this paper we study the relevance of numerous architectural and program features (static and dynamic) to the energy consumed by software. The motivation behind the study is to gain an understanding of the features affecting software energy and to provide recommendations on features to optimise for energy efficiency. In our study we used 58 subject desktop programs, each with their own workload, and from different application domains. We collected over 100 hardware and software metrics, statically and dynamically, using existing tools for program analysis, instrumentation and run time monitoring. We then performed statistical feature selection to extract the features relevant to energy consumption. We discuss potential optimisations for the selected features. We also examine whether the energy-relevant features are different from those known to affect software performance. The features commonly selected in our experiments were execution time, cache accesses, memory instructions, context switches, CPU migrations, and program length (Halstead metric). All of these features are known to affect software performance, in terms of running time, power consumed and latency.

近年来，软件能耗问题日益受到关注。然而，管理软件消耗的能量是一项艰巨的挑战，因为影响它的因素很多——即处理器、内存、缓存和其他硬件组件的特性、程序和工作负载运行的特性、操作系统例程、编译器优化等等。在本文中，我们研究了许多架构和程序特性(静态和动态)与软件能耗的相关性。这项研究背后的动机是了解影响软件能量的特性，并提供关于优化能源效率的特性的建议。在我们的研究中，我们使用了58个主题桌面程序，每个程序都有自己的工作负载，并且来自不同的应用领域。我们收集了超过100个硬件和软件指标，静态和动态的，使用现有的工具进行程序分析，仪器和运行时监控。然后进行统计特征选择，提取与能耗相关的特征。我们讨论了所选特性的潜在优化。我们还研究了与能源相关的特性是否与那些已知的影响软件性能的特性不同。在我们的实验中，通常选择的特征是执行时间、缓存访问、内存指令、上下文切换、CPU迁移和程序长度(Halstead度量)。众所周知，所有这些特性都会影响软件性能，包括运行时间、功耗和延迟。

{"title":"A Study on the Influence of Software and Hardware Features on Program Energy","authors":"A. Rajan, Adel Noureddine, Panagiotis Stratis","doi":"10.1145/2961111.2962593","DOIUrl":"https://doi.org/10.1145/2961111.2962593","url":null,"abstract":"Software energy consumption has emerged as a growing concern in recent years. Managing the energy consumed by a software is, however, a difficult challenge due to the large number of factors affecting it -- namely, features of the processor, memory, cache, and other hardware components, characteristics of the program and the workload running, OS routines, compiler optimisations, among others. In this paper we study the relevance of numerous architectural and program features (static and dynamic) to the energy consumed by software. The motivation behind the study is to gain an understanding of the features affecting software energy and to provide recommendations on features to optimise for energy efficiency. In our study we used 58 subject desktop programs, each with their own workload, and from different application domains. We collected over 100 hardware and software metrics, statically and dynamically, using existing tools for program analysis, instrumentation and run time monitoring. We then performed statistical feature selection to extract the features relevant to energy consumption. We discuss potential optimisations for the selected features. We also examine whether the energy-relevant features are different from those known to affect software performance. The features commonly selected in our experiments were execution time, cache accesses, memory instructions, context switches, CPU migrations, and program length (Halstead metric). All of these features are known to affect software performance, in terms of running time, power consumed and latency.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Survey Guidelines in Software Engineering: An Annotated Review 软件工程中的调查指南:注释评论

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962619

J. Molléri, K. Petersen, E. Mendes

Background: Survey is a method of research aiming to gather data from a large population of interest. Despite being extensively used in software engineering, survey-based research faces several challenges, such as selecting a representative population sample and designing the data collection instruments. Objective: This article aims to summarize the existing guidelines, supporting instruments and recommendations on how to conduct and evaluate survey-based research. Methods: A systematic search using manual search and snowballing techniques were used to identify primary studies supporting survey research in software engineering. We used an annotated review to present the findings, describing the references of interest in the research topic. Results: The summary provides a description of 15 available articles addressing the survey methodology, based upon which we derived a set of recommendations on how to conduct survey research, and their impact in the community. Conclusion: Survey-based research in software engineering has its particular challenges, as illustrated by several articles in this review. The annotated review can contribute by raising awareness of such challenges and present the proper recommendations to overcome them.

背景:调查是一种研究方法，旨在从大量感兴趣的人群中收集数据。尽管在软件工程中得到了广泛的应用，基于调查的研究仍然面临着一些挑战，如选择一个有代表性的人口样本和设计数据收集工具。目的:本文旨在总结现有的指导方针，支持工具和建议如何开展和评估基于调查的研究。方法:使用人工搜索和滚雪球技术进行系统搜索，以确定支持软件工程调查研究的主要研究。我们使用带注释的综述来呈现研究结果，描述对研究主题感兴趣的参考文献。结果:摘要提供了关于调查方法的15篇可用文章的描述，在此基础上，我们得出了一组关于如何进行调查研究及其在社区中的影响的建议。结论:软件工程中基于调查的研究有其特殊的挑战，正如本综述中的几篇文章所说明的那样。带注释的审查有助于提高对这些挑战的认识，并提出克服这些挑战的适当建议。

{"title":"Survey Guidelines in Software Engineering: An Annotated Review","authors":"J. Molléri, K. Petersen, E. Mendes","doi":"10.1145/2961111.2962619","DOIUrl":"https://doi.org/10.1145/2961111.2962619","url":null,"abstract":"Background: Survey is a method of research aiming to gather data from a large population of interest. Despite being extensively used in software engineering, survey-based research faces several challenges, such as selecting a representative population sample and designing the data collection instruments. Objective: This article aims to summarize the existing guidelines, supporting instruments and recommendations on how to conduct and evaluate survey-based research. Methods: A systematic search using manual search and snowballing techniques were used to identify primary studies supporting survey research in software engineering. We used an annotated review to present the findings, describing the references of interest in the research topic. Results: The summary provides a description of 15 available articles addressing the survey methodology, based upon which we derived a set of recommendations on how to conduct survey research, and their impact in the community. Conclusion: Survey-based research in software engineering has its particular challenges, as illustrated by several articles in this review. The annotated review can contribute by raising awareness of such challenges and present the proper recommendations to overcome them.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133691236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Identifying Thresholds for Software Faultiness via Optimistic and Pessimistic Estimations 通过乐观估计和悲观估计识别软件缺陷阈值

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962595

L. Lavazza, S. Morasca

Background. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a "grey zone" and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules be-longing to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The "pessimistic" model is built by assuming that all modules whose faultiness is unknown are actually faulty and the "optimistic" model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is estimated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the "grey zone," i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying "grey zones" in which it does not make much sense to estimate modules' faultiness based on measure X and, therefore, the zones in which modules' faultiness can be e

背景。在根据软件内部属性(如大小、结构复杂性、内聚性、耦合性)的度量X的值来估计软件模块是否存在故障时，可以先设置一个故障倾向的阈值，然后使用一个以X为自变量的故障倾向模型来推导出一个X的阈值。然而，有些模块不能准确地估计为有缺陷或无缺陷:它们属于“灰色地带”，估计它们是有缺陷的，可能会导致一些错误的决定。目标。我们提出并评估了一种在X上设置阈值的方法，以确定哪些模块可以自信地估计为故障或非故障，哪些模块无法估计。方法。假设我们不知道属于一组模块的一个子集的模块是否有故障，就像实际情况中需要对故障性进行估计的模块一样。我们以模块集作为训练集，建立了两个故障倾向模型。“悲观”模型是假设所有故障未知的模块实际上都有故障，而“乐观”模型是假设它们实际上没有故障。乐观和悲观模型可用于设置乐观和悲观两个阈值。通过乐观响应估计模块故障。(悲观)模型和(乐观)模型。如果其故障倾向高于阈值，则为悲观阈值，否则为非故障阈值。一个模块被估计为故障(例如)。具有乐观阈值的乐观模型和具有悲观阈值的悲观模型都被估计为故障(p < 0.05)。non-faulty)。具有相关阈值的两个模型的估计相冲突的模块处于“灰色地带”，也就是说，无法对它们进行可靠的不完备性估计。结果。我们将我们的方法应用于PROMISE存储库中的数据集，我们进行了交叉验证，并通过常用的指标评估了准确性。我们还将我们的结果与使用一个二元逻辑回归模型的传统方法获得的结果进行了比较。我们的结果表明，我们的方法在识别X值的灰色区域是有效的，其中模块不能可靠地估计为故障或非故障，反过来，模块可以估计为故障或非故障的区间。就f值而言，我们的方法在大多数情况下比传统方法更准确。此外，它提供了非常集中的f测量值，即，它一致地识别出模块可以被估计为故障或非故障的间隔。结论。我们的方法可以实际用于识别“灰色地带”，在这些“灰色地带”中，基于度量X估计模块的不完全性没有多大意义，因此，可以自信地估计模块的不完全性。

{"title":"Identifying Thresholds for Software Faultiness via Optimistic and Pessimistic Estimations","authors":"L. Lavazza, S. Morasca","doi":"10.1145/2961111.2962595","DOIUrl":"https://doi.org/10.1145/2961111.2962595","url":null,"abstract":"Background. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a \"grey zone\" and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules be-longing to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The \"pessimistic\" model is built by assuming that all modules whose faultiness is unknown are actually faulty and the \"optimistic\" model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is estimated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the \"grey zone,\" i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying \"grey zones\" in which it does not make much sense to estimate modules' faultiness based on measure X and, therefore, the zones in which modules' faultiness can be e","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131790902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

An Empirical Study on Performance Bugs for Highly Configurable Software Systems 高可配置软件系统性能缺陷的实证研究

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962602

Xue Han, Tingting Yu

Modern computer systems are highly-configurable, complicating the testing and debugging process. The sheer size of the configuration space makes the quality of software even harder to achieve. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as a specific execution environment (e.g., configurations). While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, we conjecture that many of these techniques may not be effective in highly-configurable systems. In this paper, we study the challenges that configurability creates for handling performance bugs. We study 113 real-world performance bugs, randomly sampled from three highly-configurable open-source projects: Apache, MySQL and Firefox. The findings of this study provide a set of lessons learned and guidance to aid practitioners and researchers to better handle performance bugs in highly-configurable software systems.

现代计算机系统是高度可配置的，使测试和调试过程复杂化。配置空间的绝对规模使得软件质量更难达到。性能是非功能性质量的关键方面之一，性能缺陷会导致显著的性能下降，并导致糟糕的用户体验。然而，性能缺陷很难暴露，主要是因为检测它们需要特定的输入，以及特定的执行环境(例如，配置)。虽然研究人员已经开发了分析、量化、检测和修复性能错误的技术，但我们推测，其中许多技术在高度可配置的系统中可能并不有效。在本文中，我们研究了可配置性为处理性能错误所带来的挑战。我们研究了113个真实世界的性能漏洞，随机从三个高度可配置的开源项目中抽取:Apache, MySQL和Firefox。本研究的结果提供了一组经验教训和指导，以帮助从业者和研究人员更好地处理高可配置软件系统中的性能错误。

{"title":"An Empirical Study on Performance Bugs for Highly Configurable Software Systems","authors":"Xue Han, Tingting Yu","doi":"10.1145/2961111.2962602","DOIUrl":"https://doi.org/10.1145/2961111.2962602","url":null,"abstract":"Modern computer systems are highly-configurable, complicating the testing and debugging process. The sheer size of the configuration space makes the quality of software even harder to achieve. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as a specific execution environment (e.g., configurations). While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, we conjecture that many of these techniques may not be effective in highly-configurable systems. In this paper, we study the challenges that configurability creates for handling performance bugs. We study 113 real-world performance bugs, randomly sampled from three highly-configurable open-source projects: Apache, MySQL and Firefox. The findings of this study provide a set of lessons learned and guidance to aid practitioners and researchers to better handle performance bugs in highly-configurable software systems.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121763660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

Is there a Future for Empirical Software Engineering? 实证软件工程有未来吗?

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962641

C. Wohlin

Empirical studies of different kinds are nowadays regularly published in software engineering journals and conferences. Many empirical studies have been published, but are this sufficient? Individual studies are important, but the actual potential in relation to evidence-based software engineering [1] is not fully exploited. As a discipline we have to be able to go further to make our individual studies more useful. Other research should be able to leverage on the studies and industry should be able to make informed decisions based on the empirical research. There are several challenges related to making individual empirical studies useful in a broader context. Anyone having conducted a systematic literature review [2] has most likely experienced the problem of being able to synthesize the relevant studies. In all too many cases, we end up with a systematic mapping study [3], or in the best case something on the borderline between a review and a mapping study. This illustrates the need to write for synthesis [4], and in particular including sufficient contextual information to allow for synthesis [4]. Evidence-based software engineering [1] through the use of systematic literature studies (reviews and maps) has emerged. Methodological support and guidelines (e.g. [2], [3], [6] and [7]) for conducting systematic literature studies have been formulated and they should be carefully followed. However, more is needed! We still need to improve! The keynote is focused on the needs for the future as seen by the presenter. Synthesis has proven hard, and improvements are needed when it comes to both primary studies and secondary studies. It has been shown that the reliability of secondary studies can be challenged [8]. However, if we do manage to publish high quality primary studies, and we truly manage to conduct strong systematic literature reviews, we have a good basis for both building theories in software engineering and to enable industry to make informed decisions using scientific evidence. Unfortunately, this is not the situation today. Theories are mostly based on our own research, as exemplified by [9]. This is fine, but much more can be done if we can easier leverage on the research done by others to build theories. Furthermore, industry is often making decision related to processes, methods, techniques and tools before we manage to obtain sufficient evidence for recommendations. The points made above are highlighted using personal experiences from conducting systematic literature studies, collaborating with industry and research on developing an empirically based software engineering theory.

如今，不同类型的实证研究定期发表在软件工程期刊和会议上。许多实证研究已经发表，但这就足够了吗?个别研究很重要，但是与基于证据的软件工程相关的实际潜力并没有得到充分利用。作为一门学科，我们必须能够走得更远，使我们的个人研究更有用。其他研究应该能够利用研究和行业应该能够根据实证研究做出明智的决定。要使个别的实证研究在更广泛的背景下有用，有几个挑战。任何进行过系统文献综述的人都很可能经历过无法综合相关研究的问题。在太多的情况下，我们最终以系统的地图研究告终，或者在最好的情况下，处于评论和地图研究之间的边缘。这说明需要编写合成[4]，特别是包括足够的上下文信息，以允许合成[4]。基于证据的软件工程[1]通过使用系统的文献研究(评论和地图)已经出现。已经制定了进行系统文献研究的方法支持和指南(例如[2]，[3]，[6]和[7])，应认真遵守。然而，还需要更多!我们还需要改进!主题演讲的重点是演讲者所看到的未来需求。合成已被证明是困难的，当涉及到初级研究和二级研究时，需要改进。有证据表明，次级研究的可靠性是可以受到质疑的。然而，如果我们设法发表高质量的初步研究，并且我们真正设法进行强有力的系统文献综述，我们就有了一个良好的基础，既可以在软件工程中构建理论，又可以使工业界使用科学证据做出明智的决策。不幸的是，今天的情况并非如此。理论大多基于我们自己的研究，b[9]就是一个例子。这很好，但如果我们能更容易地利用其他人所做的研究来建立理论，我们可以做得更多。此外，在我们设法获得足够的证据来提出建议之前，工业界经常做出与流程、方法、技术和工具相关的决策。上面的观点是通过个人经验来强调的，这些个人经验来自于进行系统的文献研究，与行业合作以及开发基于经验的软件工程理论的研究。

{"title":"Is there a Future for Empirical Software Engineering?","authors":"C. Wohlin","doi":"10.1145/2961111.2962641","DOIUrl":"https://doi.org/10.1145/2961111.2962641","url":null,"abstract":"Empirical studies of different kinds are nowadays regularly published in software engineering journals and conferences. Many empirical studies have been published, but are this sufficient? Individual studies are important, but the actual potential in relation to evidence-based software engineering [1] is not fully exploited. As a discipline we have to be able to go further to make our individual studies more useful. Other research should be able to leverage on the studies and industry should be able to make informed decisions based on the empirical research. There are several challenges related to making individual empirical studies useful in a broader context. Anyone having conducted a systematic literature review [2] has most likely experienced the problem of being able to synthesize the relevant studies. In all too many cases, we end up with a systematic mapping study [3], or in the best case something on the borderline between a review and a mapping study. This illustrates the need to write for synthesis [4], and in particular including sufficient contextual information to allow for synthesis [4]. Evidence-based software engineering [1] through the use of systematic literature studies (reviews and maps) has emerged. Methodological support and guidelines (e.g. [2], [3], [6] and [7]) for conducting systematic literature studies have been formulated and they should be carefully followed. However, more is needed! We still need to improve! The keynote is focused on the needs for the future as seen by the presenter. Synthesis has proven hard, and improvements are needed when it comes to both primary studies and secondary studies. It has been shown that the reliability of secondary studies can be challenged [8]. However, if we do manage to publish high quality primary studies, and we truly manage to conduct strong systematic literature reviews, we have a good basis for both building theories in software engineering and to enable industry to make informed decisions using scientific evidence. Unfortunately, this is not the situation today. Theories are mostly based on our own research, as exemplified by [9]. This is fine, but much more can be done if we can easier leverage on the research done by others to build theories. Furthermore, industry is often making decision related to processes, methods, techniques and tools before we manage to obtain sufficient evidence for recommendations. The points made above are highlighted using personal experiences from conducting systematic literature studies, collaborating with industry and research on developing an empirically based software engineering theory.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127198735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Release Readiness Classification: An Explorative Case Study 发布准备分类:一个探索性案例研究

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962629

S. Alam, Dietmar Pfahl, G. Ruhe

Context: To survive in a highly competitive software market, product managers are striving for frequent, incremental releases in ever shorter cycles. Release decisions are characterized by high complexity and have a high impact on project success. Under such conditions, using the experience from past releases could help product managers to take more informed decisions. Goal and research objectives: To make decisions about when to make a release more operational, we formulated release readiness (RR) as a binary classification problem. The goal of our research presented in this paper is twofold: (i) to propose a machine learning approach called RC* (Release readiness Classification applying predictive techniques) with two approaches for defining the training set called incremental and sliding window, and (ii) to empirically evaluate the applicability of RC* for varying project characteristics. Methodology: In the form of explorative case study research, we applied the RC* method to four OSS projects under the Apache Software Foundation. We retrospectively covered a period of 82 months, 90 releases and 3722 issues. We use Random Forest as the classification technique along with eight independent variables to classify release readiness in individual weeks. Predictive performance was measured in terms of precision, recall, F-measure, and accuracy. Results: The incremental and sliding window approaches respectively achieve an overall 76% and 79% accuracy in classifying RR for four analyzed projects. Incremental approach outperforms sliding window approach in terms of stability of the predictive performance. Predictive performance for both approaches are significantly influenced by three project characteristics i) release duration, ii) number of issues in a release, iii) size of the initial training dataset. Conclusion: As our initial observation we identified, incremental approach achieves higher accuracy when releases have long duration, low number of issues and classifiers are trained with large training set. On the other hand, sliding window approach achieves higher accuracy when releases have short duration and classifiers are trained with small training set.

背景:为了在竞争激烈的软件市场中生存，产品经理正在努力在更短的周期内频繁地、增量地发布产品。发布决策具有高复杂性的特点，并且对项目的成功有很大的影响。在这种情况下，使用过去版本的经验可以帮助产品经理做出更明智的决定。目标和研究目的:为了决定何时使发布更具可操作性，我们将发布准备(RR)表述为一个二元分类问题。我们在本文中提出的研究目标是双重的:(i)提出一种称为RC*(应用预测技术的发布就绪分类)的机器学习方法，其中有两种方法用于定义称为增量和滑动窗口的训练集，以及(ii)经验评估RC*对不同项目特征的适用性。方法:以探索性案例研究的形式，我们将RC*方法应用于Apache软件基金会下的四个OSS项目。我们回顾了82个月，90个版本和3722个问题。我们使用随机森林作为分类技术，并使用8个独立变量对每个星期的发布准备情况进行分类。预测性能是根据精确度、召回率、f值和准确性来衡量的。结果:增量和滑动窗口方法在四个分析项目的RR分类中分别达到76%和79%的总体准确率。增量方法在预测性能的稳定性方面优于滑动窗口方法。两种方法的预测性能都受到三个项目特征的显著影响:i)发布持续时间，ii)发布中的问题数量，iii)初始训练数据集的大小。结论:正如我们最初观察到的那样，增量方法在发布持续时间长、问题数量少、分类器训练集大的情况下具有更高的准确性。另一方面，滑动窗口方法在发布时间较短、分类器训练集较小的情况下具有较高的准确率。

{"title":"Release Readiness Classification: An Explorative Case Study","authors":"S. Alam, Dietmar Pfahl, G. Ruhe","doi":"10.1145/2961111.2962629","DOIUrl":"https://doi.org/10.1145/2961111.2962629","url":null,"abstract":"Context: To survive in a highly competitive software market, product managers are striving for frequent, incremental releases in ever shorter cycles. Release decisions are characterized by high complexity and have a high impact on project success. Under such conditions, using the experience from past releases could help product managers to take more informed decisions. Goal and research objectives: To make decisions about when to make a release more operational, we formulated release readiness (RR) as a binary classification problem. The goal of our research presented in this paper is twofold: (i) to propose a machine learning approach called RC* (Release readiness Classification applying predictive techniques) with two approaches for defining the training set called incremental and sliding window, and (ii) to empirically evaluate the applicability of RC* for varying project characteristics. Methodology: In the form of explorative case study research, we applied the RC* method to four OSS projects under the Apache Software Foundation. We retrospectively covered a period of 82 months, 90 releases and 3722 issues. We use Random Forest as the classification technique along with eight independent variables to classify release readiness in individual weeks. Predictive performance was measured in terms of precision, recall, F-measure, and accuracy. Results: The incremental and sliding window approaches respectively achieve an overall 76% and 79% accuracy in classifying RR for four analyzed projects. Incremental approach outperforms sliding window approach in terms of stability of the predictive performance. Predictive performance for both approaches are significantly influenced by three project characteristics i) release duration, ii) number of issues in a release, iii) size of the initial training dataset. Conclusion: As our initial observation we identified, incremental approach achieves higher accuracy when releases have long duration, low number of issues and classifiers are trained with large training set. On the other hand, sliding window approach achieves higher accuracy when releases have short duration and classifiers are trained with small training set.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Clustering Mobile Apps Based on Mined Textual Features 基于挖掘文本特征的移动应用聚类

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962600

A. Al-Subaihin, Federica Sarro, S. Black, L. Capra, M. Harman, Yue Jia, Yuanyuan Zhang

Context: Categorising software systems according to their functionality yields many benefits to both users and developers. Goal: In order to uncover the latent clustering of mobile apps in app stores, we propose a novel technique that measures app similarity based on claimed behaviour. Method: Features are extracted using information retrieval augmented with ontological analysis and used as attributes to characterise apps. These attributes are then used to cluster the apps using agglomerative hierarchical clustering. We empirically evaluate our approach on 17,877 apps mined from the BlackBerry and Google app stores in 2014. Results: The results show that our approach dramatically improves the existing categorisation quality for both Blackberry (from 0.02 to 0.41 on average) and Google (from 0.03 to 0.21 on average) stores. We also find a strong Spearman rank correlation (ρ= 0.96 for Google and ρ= 0.99 for BlackBerry) between the number of apps and the ideal granularity within each category, indicating that ideal granularity increases with category size, as expected. Conclusions: Current categorisation in the app stores studied do not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.

上下文:根据功能对软件系统进行分类，对用户和开发人员都有很多好处。目标:为了揭示应用商店中手机应用的潜在聚类，我们提出了一种基于声称的行为来衡量应用相似性的新技术。方法:利用信息检索与本体分析相结合的方法提取特征，并将特征作为应用程序特征的属性。然后使用这些属性来使用聚合分层聚类对应用程序进行聚类。我们对2014年从黑莓和谷歌应用商店中挖掘的17877款应用进行了实证评估。结果:结果表明，我们的方法显著提高了黑莓(从平均0.02到0.41)和谷歌(从平均0.03到0.21)商店现有的分类质量。我们还发现，在每个类别中，应用数量与理想粒度之间存在很强的Spearman秩相关性(谷歌ρ= 0.96，黑莓ρ= 0.99)，这表明理想粒度随着类别规模的增加而增加，正如预期的那样。结论:根据所研究的应用商店的分类，就所声称的功能空间而言，目前的分类质量并不好。然而，使用良好的特征提取技术和传统的聚类方法可以获得更好的质量。

{"title":"Clustering Mobile Apps Based on Mined Textual Features","authors":"A. Al-Subaihin, Federica Sarro, S. Black, L. Capra, M. Harman, Yue Jia, Yuanyuan Zhang","doi":"10.1145/2961111.2962600","DOIUrl":"https://doi.org/10.1145/2961111.2962600","url":null,"abstract":"Context: Categorising software systems according to their functionality yields many benefits to both users and developers. Goal: In order to uncover the latent clustering of mobile apps in app stores, we propose a novel technique that measures app similarity based on claimed behaviour. Method: Features are extracted using information retrieval augmented with ontological analysis and used as attributes to characterise apps. These attributes are then used to cluster the apps using agglomerative hierarchical clustering. We empirically evaluate our approach on 17,877 apps mined from the BlackBerry and Google app stores in 2014. Results: The results show that our approach dramatically improves the existing categorisation quality for both Blackberry (from 0.02 to 0.41 on average) and Google (from 0.03 to 0.21 on average) stores. We also find a strong Spearman rank correlation (ρ= 0.96 for Google and ρ= 0.99 for BlackBerry) between the number of apps and the ideal granularity within each category, indicating that ideal granularity increases with category size, as expected. Conclusions: Current categorisation in the app stores studied do not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130336311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Towards a Substantive Theory of Decision-Making in Software Project Management: Preliminary Findings from a Qualitative Study 面向软件项目管理决策的实体理论:一项定性研究的初步发现

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2016-09-08 DOI: 10.1145/2961111.2962604

J. A. O. G. Cunha, F. Silva, H. Moura, Francisco J. S. Vasconcellos

Context: In software project management, the decision-making process is a complex set of tasks largely based on human relations and individual knowledge and cultural background. The factors that affect the decisions of the software project managers (SPMs) as well as their potential consequences require attention because project delays and failures might be related to a series of poor decisions. Goals: To understand how SPMs make decisions based on how they interpret their experiences in the workplace. Further, to identify antecedents and consequences of those decisions in order to increase the effectiveness of project management. We also aim to refine the research design for future investigations. Method: Semi-structured interviews were carried out with SPMs within a Brazilian large governmental organization and a Brazilian large private organization. Results: We found that decision-making in software project management is based on knowledge sharing in which the SPM acts as a facilitator. This phenomenon is influenced by individual factors, such as experience, knowledge, personality, organizational ability, communication, negotiation, interpersonal relationship and systemic vision of the project and by situational factors such as the autonomy of the SPM, constant feedback and team members' technical competence. Conclusions: Due to the uncertainty and dynamism inherent to software projects, the SPMs focus on making, monitoring and adjusting decisions in an argument-driven way. From the initial relationships among the identified factors, the research design was refined.

背景:在软件项目管理中，决策过程是一组复杂的任务，主要基于人际关系、个人知识和文化背景。需要注意影响软件项目经理(SPMs)决策的因素以及它们的潜在后果，因为项目延迟和失败可能与一系列糟糕的决策有关。目标:了解spm如何根据他们如何解释他们在工作场所的经历做出决策。进一步，确定这些决策的前提和结果，以提高项目管理的有效性。我们还旨在为未来的研究完善研究设计。方法:对巴西大型政府组织和巴西大型私人组织的spm进行半结构化访谈。结果:我们发现软件项目管理中的决策是基于知识共享的，其中SPM充当促进者。这种现象受到个人因素的影响，如经验、知识、个性、组织能力、沟通、谈判、人际关系和项目的系统性愿景，也受到情景因素的影响，如SPM的自主性、持续的反馈和团队成员的技术能力。结论:由于软件项目固有的不确定性和动态性，spm关注于以论证驱动的方式制定、监控和调整决策。从确定的因素之间的初始关系出发，对研究设计进行了细化。

{"title":"Towards a Substantive Theory of Decision-Making in Software Project Management: Preliminary Findings from a Qualitative Study","authors":"J. A. O. G. Cunha, F. Silva, H. Moura, Francisco J. S. Vasconcellos","doi":"10.1145/2961111.2962604","DOIUrl":"https://doi.org/10.1145/2961111.2962604","url":null,"abstract":"Context: In software project management, the decision-making process is a complex set of tasks largely based on human relations and individual knowledge and cultural background. The factors that affect the decisions of the software project managers (SPMs) as well as their potential consequences require attention because project delays and failures might be related to a series of poor decisions. Goals: To understand how SPMs make decisions based on how they interpret their experiences in the workplace. Further, to identify antecedents and consequences of those decisions in order to increase the effectiveness of project management. We also aim to refine the research design for future investigations. Method: Semi-structured interviews were carried out with SPMs within a Brazilian large governmental organization and a Brazilian large private organization. Results: We found that decision-making in software project management is based on knowledge sharing in which the SPM acts as a facilitator. This phenomenon is influenced by individual factors, such as experience, knowledge, personality, organizational ability, communication, negotiation, interpersonal relationship and systemic vision of the project and by situational factors such as the autonomy of the SPM, constant feedback and team members' technical competence. Conclusions: Due to the uncertainty and dynamism inherent to software projects, the SPMs focus on making, monitoring and adjusting decisions in an argument-driven way. From the initial relationships among the identified factors, the research design was refined.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5