Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo
Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.
{"title":"Building an Ensemble for Software Defect Prediction Based on Diversity Selection","authors":"Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo","doi":"10.1145/2961111.2962610","DOIUrl":"https://doi.org/10.1145/2961111.2962610","url":null,"abstract":"Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.
{"title":"Worse Than Spam: Issues In Sampling Software Developers","authors":"Sebastian Baltes, S. Diehl","doi":"10.1145/2961111.2962628","DOIUrl":"https://doi.org/10.1145/2961111.2962628","url":null,"abstract":"Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Context: Software maintenance is required to fix defects, adapt to changes in the environment, and meet new or changed user requirements. The effort of these tasks need to be estimated to track progress, manage resources, and make decisions. Most widely used cost models use source lines of code (SLOC) as the software size input measure, due to its quantifiability and high correlation with effort. Estimating the SLOC of a project is very difficult in early stages of the software lifecycle. Function Points (FPs) represents software size by functions or modifications to functions, making them easier to calculate early in the lifecycle for new development projects or maintenance tasks. Several cost estimators use FPs to estimate the SLOC of a project to take advantage of existing cost models. Goal: Through empirical analysis, the authors want to determine whether FPs can effectively estimate maintenance tasks, as a better alternative to using SLOC as a software size metric. Additionally, the authors will demonstrate that FPs to SLOC ratios add uncertainty to effort estimates. Method: The empirical analysis will be run on Unified Code Count (UCC)'s dataset, a software tool maintained by University of Southern California (USC). Results: The analyses found that separating projects adding new functions from those modifying existing functions resulted in improved estimation models using FPs. The effort estimation model for projects adding functions to UCC had high prediction accuracy statistics, but less impressive results for projects modifying existing functions in UCC. The effort estimation accuracy became unsatisfactorily low when using a FPs to SLOC ratio. Conclusions: Cost estimators should not use FPs to SLOC ratios for effort estimation due to low prediction accuracy. FPs is only an effective size measure for a portion of UCC's maintenance tasks - specifically for the projects adding new functions to UCC. Another size measure may need to be considered that might be more effective independently or in conjunction with FPs for all of UCC's maintenance tasks.
{"title":"Function Point Analysis for Software Maintenance","authors":"Anandi Hira, B. Boehm","doi":"10.1145/2961111.2962613","DOIUrl":"https://doi.org/10.1145/2961111.2962613","url":null,"abstract":"Context: Software maintenance is required to fix defects, adapt to changes in the environment, and meet new or changed user requirements. The effort of these tasks need to be estimated to track progress, manage resources, and make decisions. Most widely used cost models use source lines of code (SLOC) as the software size input measure, due to its quantifiability and high correlation with effort. Estimating the SLOC of a project is very difficult in early stages of the software lifecycle. Function Points (FPs) represents software size by functions or modifications to functions, making them easier to calculate early in the lifecycle for new development projects or maintenance tasks. Several cost estimators use FPs to estimate the SLOC of a project to take advantage of existing cost models. Goal: Through empirical analysis, the authors want to determine whether FPs can effectively estimate maintenance tasks, as a better alternative to using SLOC as a software size metric. Additionally, the authors will demonstrate that FPs to SLOC ratios add uncertainty to effort estimates. Method: The empirical analysis will be run on Unified Code Count (UCC)'s dataset, a software tool maintained by University of Southern California (USC). Results: The analyses found that separating projects adding new functions from those modifying existing functions resulted in improved estimation models using FPs. The effort estimation model for projects adding functions to UCC had high prediction accuracy statistics, but less impressive results for projects modifying existing functions in UCC. The effort estimation accuracy became unsatisfactorily low when using a FPs to SLOC ratio. Conclusions: Cost estimators should not use FPs to SLOC ratios for effort estimation due to low prediction accuracy. FPs is only an effective size measure for a portion of UCC's maintenance tasks - specifically for the projects adding new functions to UCC. Another size measure may need to be considered that might be more effective independently or in conjunction with FPs for all of UCC's maintenance tasks.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved. Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes. Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools. Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time. Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.
{"title":"Semantic Coupling Between Classes: Corpora or Identifiers?","authors":"N. Ajienka, A. Capiluppi","doi":"10.1145/2961111.2962622","DOIUrl":"https://doi.org/10.1145/2961111.2962622","url":null,"abstract":"Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved. Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes. Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools. Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time. Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131849046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Context: Development and Operations (DevOps) is an emerging software industry movement to bridge the gap between software development and operations teams. DevOps supports frequently and reliably releasing new features and products-- thus subsuming Continuous Deployment (CD) practice. Goal: This research aims at empirically exploring the potential impact of CD practice on architecting process. Method: We carried out a case study involving interviews with 16 software practitioners. Results: We have identified (1) a range of recurring architectural challenges (i.e., highly coupled monolithic architecture, team dependencies, and ever-changing operational environments and tools) and (2) five main architectural principles (i.e., small and independent deployment units, not too much focus on reusability, aggregating logs, isolating changes, and testability inside the architecture) that should be considered when an application is (re-) architected for CD practice. This study also supports that software architecture can better support operations if an operations team is engaged at an early stage of software development for taking operational aspects into considerations. Conclusion: These findings provide evidence that software architecture plays a significant role in successfully and efficiently adopting continuous deployment. The findings contribute to establish an evidential body of knowledge about the state of the art of architecting for CD practice
{"title":"The Intersection of Continuous Deployment and Architecting Process: Practitioners' Perspectives","authors":"Mojtaba Shahin, M. Babar, Liming Zhu","doi":"10.1145/2961111.2962587","DOIUrl":"https://doi.org/10.1145/2961111.2962587","url":null,"abstract":"Context: Development and Operations (DevOps) is an emerging software industry movement to bridge the gap between software development and operations teams. DevOps supports frequently and reliably releasing new features and products-- thus subsuming Continuous Deployment (CD) practice. Goal: This research aims at empirically exploring the potential impact of CD practice on architecting process. Method: We carried out a case study involving interviews with 16 software practitioners. Results: We have identified (1) a range of recurring architectural challenges (i.e., highly coupled monolithic architecture, team dependencies, and ever-changing operational environments and tools) and (2) five main architectural principles (i.e., small and independent deployment units, not too much focus on reusability, aggregating logs, isolating changes, and testability inside the architecture) that should be considered when an application is (re-) architected for CD practice. This study also supports that software architecture can better support operations if an operations team is engaged at an early stage of software development for taking operational aspects into considerations. Conclusion: These findings provide evidence that software architecture plays a significant role in successfully and efficiently adopting continuous deployment. The findings contribute to establish an evidential body of knowledge about the state of the art of architecting for CD practice","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130953436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Laukkanen, Timo O. A. Lehtinen, Juha Itkonen, M. Paasivaara, C. Lassenius
Context: Continuous delivery (CD) is a development practice for decreasing the time-to-market by keeping software releasable all the time. Adopting CD within a stage-gate managed development process might be useful, although scientific evidence of such adoption is not available. In a stage-gate process, new releases pass through stages and gates protect low-quality output from progressing. Large organizations with stage-gate processes are often hierarchical and the adoption can be either top-down, driven by the management, or bottom-up, driven by the development unit. Goal: We investigate the perceived problems of bottom-up CD adoption in a large global software development unit at Nokia Networks. Our goal is to understand how the stage-gate development process used by the unit affects the adoption. Method: The overall research approach is a qualitative single case study on one of the several geographical sites of the development unit. We organized two 2-hour workshops with altogether 15 participants to discover how the stage-gate process affected the adoption. Results: The stage-gate development process caused tight schedules for development and process overhead because of the gate requirements. Moreover, the process required using multiple version control branches for different stages in the process, which increased development complexity and caused additional branch overhead. Together, tight schedule, process overhead and branch overhead caused the lack of time to adopt CD. In addition, the use of multiple branches limited the available hardware resources and caused delayed integration. Conclusions: Adopting CD in a development organization that needs to conform to a stage-gate development process is challenging. Practitioners should either gain support from the management to relax the required process or reduce their expectations on what can be achieved while conforming to the process. To simplify the development process, the use of multiple version control branches could be replaced with feature toggles.
{"title":"Bottom-up Adoption of Continuous Delivery in a Stage-Gate Managed Software Organization","authors":"E. Laukkanen, Timo O. A. Lehtinen, Juha Itkonen, M. Paasivaara, C. Lassenius","doi":"10.1145/2961111.2962608","DOIUrl":"https://doi.org/10.1145/2961111.2962608","url":null,"abstract":"Context: Continuous delivery (CD) is a development practice for decreasing the time-to-market by keeping software releasable all the time. Adopting CD within a stage-gate managed development process might be useful, although scientific evidence of such adoption is not available. In a stage-gate process, new releases pass through stages and gates protect low-quality output from progressing. Large organizations with stage-gate processes are often hierarchical and the adoption can be either top-down, driven by the management, or bottom-up, driven by the development unit. Goal: We investigate the perceived problems of bottom-up CD adoption in a large global software development unit at Nokia Networks. Our goal is to understand how the stage-gate development process used by the unit affects the adoption. Method: The overall research approach is a qualitative single case study on one of the several geographical sites of the development unit. We organized two 2-hour workshops with altogether 15 participants to discover how the stage-gate process affected the adoption. Results: The stage-gate development process caused tight schedules for development and process overhead because of the gate requirements. Moreover, the process required using multiple version control branches for different stages in the process, which increased development complexity and caused additional branch overhead. Together, tight schedule, process overhead and branch overhead caused the lack of time to adopt CD. In addition, the use of multiple branches limited the available hardware resources and caused delayed integration. Conclusions: Adopting CD in a development organization that needs to conform to a stage-gate development process is challenging. Practitioners should either gain support from the management to relax the required process or reduce their expectations on what can be achieved while conforming to the process. To simplify the development process, the use of multiple version control branches could be replaced with feature toggles.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131251071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. C. A. França, E. Peixoto, Bruno Falcão, Cleviton V. F. Monteiro
Context - Software companies should track innovation as rigorously as core business operations. For that, the assessment of innovation projects is a critical process, in particular to make their innovation initiatives funded. Objective - In this article, we aim to evaluate the need for more practical measurement tools, by checking the agreement of very experienced analysts, from the industry, about the innovation degree of four actual software projects. Method - We conducted a survey with eight business analysts, using a combination of the Three Horizons Model and the Gartner's Hyper Cycle for emerging technologies as a frame of reference. Results - In general, the level of agreement about the innovation degree in the projects was very low. Looking at the cases in isolation, it is possible to suggest reasons for the low level of agreement between the evaluators. Conclusions - Our data support the fact that innovation is an activity difficult to characterize and even more difficult to measure, and the need for practices to achieve better intersubjective agreement for innovation assessment became evident in this work.
{"title":"The Obscure Process of Innovation Assessment: A Report of an Industrial Survey","authors":"A. C. A. França, E. Peixoto, Bruno Falcão, Cleviton V. F. Monteiro","doi":"10.1145/2961111.2962634","DOIUrl":"https://doi.org/10.1145/2961111.2962634","url":null,"abstract":"Context - Software companies should track innovation as rigorously as core business operations. For that, the assessment of innovation projects is a critical process, in particular to make their innovation initiatives funded. Objective - In this article, we aim to evaluate the need for more practical measurement tools, by checking the agreement of very experienced analysts, from the industry, about the innovation degree of four actual software projects. Method - We conducted a survey with eight business analysts, using a combination of the Three Horizons Model and the Gartner's Hyper Cycle for emerging technologies as a frame of reference. Results - In general, the level of agreement about the innovation degree in the projects was very low. Looking at the cases in isolation, it is possible to suggest reasons for the low level of agreement between the evaluators. Conclusions - Our data support the fact that innovation is an activity difficult to characterize and even more difficult to measure, and the need for practices to achieve better intersubjective agreement for innovation assessment became evident in this work.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114718939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Fucci, G. Scanniello, Simone Romano, M. Shepperd, Boyce Sigweni, F. Uyaguari, Burak Turhan, Natalia Juristo Juzgado, M. Oivo
Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.
{"title":"An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach","authors":"D. Fucci, G. Scanniello, Simone Romano, M. Shepperd, Boyce Sigweni, F. Uyaguari, Burak Turhan, Natalia Juristo Juzgado, M. Oivo","doi":"10.1145/2961111.2962592","DOIUrl":"https://doi.org/10.1145/2961111.2962592","url":null,"abstract":"Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Voigt, Jörg von Garrel, Julia Müller, D. Wirth
Although agile methods have become established in software engineering, documentation in projects is rare. Employing a theoretical model of information and documentation, our paper analyzes documentation practices in agile software projects in their entirety. Our analysis uses method triangulation: partly-structured interviews, observation and online survey. We demonstrate the correlation between satisfaction with information searches and the amount of documentation that exists for most types of information as an example. Also digital searches demand nearly twice as much time as documentation. In the conclusion, we provide recommendations on the use of supporting methods or tools to shape agile documentation.
{"title":"A Study of Documentation in Agile Software Projects","authors":"Stefan Voigt, Jörg von Garrel, Julia Müller, D. Wirth","doi":"10.1145/2961111.2962616","DOIUrl":"https://doi.org/10.1145/2961111.2962616","url":null,"abstract":"Although agile methods have become established in software engineering, documentation in projects is rare. Employing a theoretical model of information and documentation, our paper analyzes documentation practices in agile software projects in their entirety. Our analysis uses method triangulation: partly-structured interviews, observation and online survey. We demonstrate the correlation between satisfaction with information searches and the amount of documentation that exists for most types of information as an example. Also digital searches demand nearly twice as much time as documentation. In the conclusion, we provide recommendations on the use of supporting methods or tools to shape agile documentation.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anurag Goswami, G. Walia, M. McCourt, Ganesh Padmanabhan
Background -- Inspecting requirements and design artifacts to find faults saves rework effort significantly. While inspections are effective, their overall team performance rely on inspectors' ability to detect and report faults. Our previous research showed that individual inspectors have varying LSs (i.e., they vary in their ability to process information recorded in requirements document). To extend the results of our previous LS research, this paper utilizes the concept of eye tracking (to record eye movements of inspectors) along with their LSs to detect reading patterns of inspectors during requirements inspections. Aim -- The objective of this research is to analyze the reading trends of effective and efficient inspectors using eye movement and LS data of individual inspectors and virtual inspection teams. Method -- The current research uses data (LS, eye tracking, and inspection) from thirteen inspectors to find its impact on inspection effectiveness and efficiency. Results -- Results from this study show that, inspectors who detect more faults during inspection, focus significantly more at the fault region to find and report faults as opposed to comprehending requirements information. Results also showed Inspection teams with diverse inspectors outperform similar teams and spend more time in comprehending information at the fault region. Additionally, results showed that inspectors with SEQ LS significantly tends to focus more at fault locations and are preferred for inspection. Conclusion -- These results can aid the selection of inspectors during the inspection process thus improving software quality
{"title":"Using Eye Tracking to Investigate Reading Patterns and Learning Styles of Software Requirement Inspectors to Enhance Inspection Team Outcome","authors":"Anurag Goswami, G. Walia, M. McCourt, Ganesh Padmanabhan","doi":"10.1145/2961111.2962598","DOIUrl":"https://doi.org/10.1145/2961111.2962598","url":null,"abstract":"Background -- Inspecting requirements and design artifacts to find faults saves rework effort significantly. While inspections are effective, their overall team performance rely on inspectors' ability to detect and report faults. Our previous research showed that individual inspectors have varying LSs (i.e., they vary in their ability to process information recorded in requirements document). To extend the results of our previous LS research, this paper utilizes the concept of eye tracking (to record eye movements of inspectors) along with their LSs to detect reading patterns of inspectors during requirements inspections. Aim -- The objective of this research is to analyze the reading trends of effective and efficient inspectors using eye movement and LS data of individual inspectors and virtual inspection teams. Method -- The current research uses data (LS, eye tracking, and inspection) from thirteen inspectors to find its impact on inspection effectiveness and efficiency. Results -- Results from this study show that, inspectors who detect more faults during inspection, focus significantly more at the fault region to find and report faults as opposed to comprehending requirements information. Results also showed Inspection teams with diverse inspectors outperform similar teams and spend more time in comprehending information at the fault region. Additionally, results showed that inspectors with SEQ LS significantly tends to focus more at fault locations and are preferred for inspection. Conclusion -- These results can aid the selection of inspectors during the inspection process thus improving software quality","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126448582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}