The planning phase of empirical studies is a success-critical key activity in empirical research to achieve best benefits for contributing stakeholders, e.g., researchers and industry partners, and to reduce study risks, e.g., insufficient validity and unaddressed stakeholder win conditions. The design of empirical studies typically covers issues of empirical methodology, but seldom explicitly discusses tradeoffs between conflicting study goals. This work proposes a value-based empirical research planning framework for eliciting and reconciling stakeholder win conditions in order to compare the benefits and risks of empirical study variants and reports on findings from an initial feasibility study in a ISERN meeting of empirical research experts.
{"title":"Value-Based Empirical Research Plan Evaluation","authors":"S. Biffl, D. Winkler","doi":"10.1109/ESEM.2007.50","DOIUrl":"https://doi.org/10.1109/ESEM.2007.50","url":null,"abstract":"The planning phase of empirical studies is a success-critical key activity in empirical research to achieve best benefits for contributing stakeholders, e.g., researchers and industry partners, and to reduce study risks, e.g., insufficient validity and unaddressed stakeholder win conditions. The design of empirical studies typically covers issues of empirical methodology, but seldom explicitly discusses tradeoffs between conflicting study goals. This work proposes a value-based empirical research planning framework for eliciting and reconciling stakeholder win conditions in order to compare the benefits and risks of empirical study variants and reports on findings from an initial feasibility study in a ISERN meeting of empirical research experts.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121782826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For five years, the Hackystat project has incremen tally developed and evaluated a generic framework for in process software engineering measurement and analysis (ISEMA). At least five other independent ISEMA system development projects have been initiated during this time, indicating growing interest and investment in this approach by the software engineering community. This pa per presents 12 important requirement and design trade offs made in the Hackystat system, some of their implications for organizations wishing to introduce ISEMA, and six directions for future research and development. The three goals of this paper are to: (1) help potential users of ISEMA systems to better evaluate the relative strengths and weaknesses of current and future systems, (2) help potential developers of ISEMA systems to better understand some of the important requirement and design tradeoffs that they must make, and (3) help accelerate progress in ISEMA by identifying promising directions for future research and development.
{"title":"Requirement and Design Trade-offs in Hackystat: An In-Process Software Engineering Measurement and Analysis System","authors":"Philip M. Johnson","doi":"10.1109/ESEM.2007.36","DOIUrl":"https://doi.org/10.1109/ESEM.2007.36","url":null,"abstract":"For five years, the Hackystat project has incremen tally developed and evaluated a generic framework for in process software engineering measurement and analysis (ISEMA). At least five other independent ISEMA system development projects have been initiated during this time, indicating growing interest and investment in this approach by the software engineering community. This pa per presents 12 important requirement and design trade offs made in the Hackystat system, some of their implications for organizations wishing to introduce ISEMA, and six directions for future research and development. The three goals of this paper are to: (1) help potential users of ISEMA systems to better evaluate the relative strengths and weaknesses of current and future systems, (2) help potential developers of ISEMA systems to better understand some of the important requirement and design tradeoffs that they must make, and (3) help accelerate progress in ISEMA by identifying promising directions for future research and development.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128448997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Bailey, D. Budgen, M. Turner, B. Kitchenham, P. Brereton, S. Linkman
There is little empirical knowledge of the effectiveness of the object-oriented paradigm. To conduct a systematic review of the literature describing empirical studies of this paradigm. We undertook a Mapping Study of the literature. 138 papers have been identified and classified by topic, form of study involved, and source. The majority of empirical studies of OO concentrate on metrics, relatively few consider effectiveness.
{"title":"Evidence relating to Object-Oriented software design: A survey","authors":"John Bailey, D. Budgen, M. Turner, B. Kitchenham, P. Brereton, S. Linkman","doi":"10.1109/ESEM.2007.58","DOIUrl":"https://doi.org/10.1109/ESEM.2007.58","url":null,"abstract":"There is little empirical knowledge of the effectiveness of the object-oriented paradigm. To conduct a systematic review of the literature describing empirical studies of this paradigm. We undertook a Mapping Study of the literature. 138 papers have been identified and classified by topic, form of study involved, and source. The majority of empirical studies of OO concentrate on metrics, relatively few consider effectiveness.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pekka Abrahamsson, Raimund Moser, W. Pedrycz, A. Sillitti, G. Succi
Estimation of development effort without imposing overhead on the project and the development team is of paramount importance for any software company. This study proposes a new effort estimation methodology aimed at agile and iterative development environments not suitable for description by traditional prediction methods. We propose a detailed development methodology, discuss a number of architectures of such models (including a wealth of augmented regression models and neural networks) and include a thorough case study of Extreme Programming (XP) in two semi-industrial projects. The results of this research evidence that in the XP environment under study the proposed incremental model outperforms traditional estimation techniques most notably in early phases of development. Moreover, when dealing with new projects, the incremental model can be developed from scratch without resorting itself to historic data.
{"title":"Effort Prediction in Iterative Software Development Processes -- Incremental Versus Global Prediction Models","authors":"Pekka Abrahamsson, Raimund Moser, W. Pedrycz, A. Sillitti, G. Succi","doi":"10.1109/ESEM.2007.16","DOIUrl":"https://doi.org/10.1109/ESEM.2007.16","url":null,"abstract":"Estimation of development effort without imposing overhead on the project and the development team is of paramount importance for any software company. This study proposes a new effort estimation methodology aimed at agile and iterative development environments not suitable for description by traditional prediction methods. We propose a detailed development methodology, discuss a number of architectures of such models (including a wealth of augmented regression models and neural networks) and include a thorough case study of Extreme Programming (XP) in two semi-industrial projects. The results of this research evidence that in the XP environment under study the proposed incremental model outperforms traditional estimation techniques most notably in early phases of development. Moreover, when dealing with new projects, the incremental model can be developed from scratch without resorting itself to historic data.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133023738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The number of scientific publications is constantly increasing, and the results published on empirical software engineering are growing even faster. Some software engineering publishers have begun to collaborate with research groups to make available repositories of software engineering empirical data. However, these initiatives are limited due to data ownership and privacy issues. As a result, many researchers in the area have adopted systematic reviews as a mean to extract empirical evidence from published material. Systematic reviews are labor intensive and costly. In this paper, we argue that the use of information extraction tools can support systematic reviews and significantly speed up the creation of repositories of SE empirical evidence.
{"title":"Automated Information Extraction from Empirical Software Engineering Literature: Is that possible?","authors":"D. Cruzes, V. Basili, F. Shull, M. Jino","doi":"10.1109/ESEM.2007.62","DOIUrl":"https://doi.org/10.1109/ESEM.2007.62","url":null,"abstract":"The number of scientific publications is constantly increasing, and the results published on empirical software engineering are growing even faster. Some software engineering publishers have begun to collaborate with research groups to make available repositories of software engineering empirical data. However, these initiatives are limited due to data ownership and privacy issues. As a result, many researchers in the area have adopted systematic reviews as a mean to extract empirical evidence from published material. Systematic reviews are labor intensive and costly. In this paper, we argue that the use of information extraction tools can support systematic reviews and significantly speed up the creation of repositories of SE empirical evidence.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115529443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Ratzinger, Thomas Sigmund, P. Vorburger, H. Gall
Can we predict locations of future refactoring based on the development history? In an empirical study of open source projects we found that attributes of software evolution data can be used to predict the need for refactoring in the following two months of development. Information systems utilized in software projects provide a broad range of data for decision support. Versioning systems log each activity during the development, which we use to extract data mining features such as growth measures, relationships between classes, the number of authors working on a particular piece of code, etc. We use this information as input into classification algorithms to create prediction models for future refactoring activities. Different state-of-the-art classifiers are investigated such as decision trees, logistic model trees, prepositional rule learners, and nearest neighbor algorithms. With both high precision and high recall we can assess the refactoring proneness of object-oriented systems. Although we investigate different domains, we discovered critical factors within the development life cycle leading to refactoring, which are common among all studied projects.
{"title":"Mining Software Evolution to Predict Refactoring","authors":"J. Ratzinger, Thomas Sigmund, P. Vorburger, H. Gall","doi":"10.1109/ESEM.2007.9","DOIUrl":"https://doi.org/10.1109/ESEM.2007.9","url":null,"abstract":"Can we predict locations of future refactoring based on the development history? In an empirical study of open source projects we found that attributes of software evolution data can be used to predict the need for refactoring in the following two months of development. Information systems utilized in software projects provide a broad range of data for decision support. Versioning systems log each activity during the development, which we use to extract data mining features such as growth measures, relationships between classes, the number of authors working on a particular piece of code, etc. We use this information as input into classification algorithms to create prediction models for future refactoring activities. Different state-of-the-art classifiers are investigated such as decision trees, logistic model trees, prepositional rule learners, and nearest neighbor algorithms. With both high precision and high recall we can assess the refactoring proneness of object-oriented systems. Although we investigate different domains, we discovered critical factors within the development life cycle leading to refactoring, which are common among all studied projects.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"442 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125778886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasutaka Kamei, Akito Monden, S. Matsumoto, Takeshi Kakimoto, Ken-ichi Matsumoto
The goal of this paper is to improve the prediction performance of fault-prone module prediction models (fault-proneness models) by employing over/under sampling methods, which are preprocessing procedures for a fit dataset. The sampling methods are expected to improve prediction performance when the fit dataset is unbalanced, i.e. there exists a large difference between the number of fault-prone modules and not-fault-prone modules. So far, there has been no research reporting the effects of applying sampling methods to fault-proneness models. In this paper, we experimentally evaluated the effects of four sampling methods (random over sampling, synthetic minority over sampling, random under sampling and one-sided selection) applied to four fault-proneness models (linear discriminant analysis, logistic regression analysis, neural network and classification tree) by using two module sets of industry legacy software. All four sampling methods improved the prediction performance of the linear and logistic models, while neural network and classification tree models did not benefit from the sampling methods. The improvements of Fl-values in linear and logistic models were 0.078 at minimum, 0.224 at maximum and 0.121 at the mean.
{"title":"The Effects of Over and Under Sampling on Fault-prone Module Detection","authors":"Yasutaka Kamei, Akito Monden, S. Matsumoto, Takeshi Kakimoto, Ken-ichi Matsumoto","doi":"10.1109/ESEM.2007.28","DOIUrl":"https://doi.org/10.1109/ESEM.2007.28","url":null,"abstract":"The goal of this paper is to improve the prediction performance of fault-prone module prediction models (fault-proneness models) by employing over/under sampling methods, which are preprocessing procedures for a fit dataset. The sampling methods are expected to improve prediction performance when the fit dataset is unbalanced, i.e. there exists a large difference between the number of fault-prone modules and not-fault-prone modules. So far, there has been no research reporting the effects of applying sampling methods to fault-proneness models. In this paper, we experimentally evaluated the effects of four sampling methods (random over sampling, synthetic minority over sampling, random under sampling and one-sided selection) applied to four fault-proneness models (linear discriminant analysis, logistic regression analysis, neural network and classification tree) by using two module sets of industry legacy software. All four sampling methods improved the prediction performance of the linear and logistic models, while neural network and classification tree models did not benefit from the sampling methods. The improvements of Fl-values in linear and logistic models were 0.078 at minimum, 0.224 at maximum and 0.121 at the mean.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125246272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important element in scenario-based architecture evaluation is the development of scenario profiles by stakeholders working in groups. In practice groups can vary in size from 2 to 20 people. Currently, there is no empirical evidence about the impact of group size on the scenario development activity. Our experimental goal was to investigate the impact of group size on the quality of scenario profiles developed by different sizes of groups. We had 165 subjects, who were randomly assigned to 10 groups of size 3, 13 groups of size 5, and 10 groups of size 7. Participants were asked to develop scenario profiles. After the experiment each participant completed a questionnaire aimed at identifying their opinion of the group activity. The average quality score for group scenario profiles for 3 person groups was 362.4, for groups of 5 person groups was 534.23 and for 7 person groups was. 444.5. The quality of scenario profiles for groups of size 5 was significantly greater than the quality of scenario profiles for groups of size 3 (p=0.025), but there was no difference between the size 3 and size 7 groups. However, participants in groups of size 3 had a significantly better opinion of the group activity outcome and their personal interaction with their group than participants in groups of size 5 or 7. Our results suggest that the quality of the output from a group does not increase linearly with group size. However, individual participants prefer small groups. This means there is a trade-off between group output quality and the personal experience of group members.
{"title":"The Impact of Group Size on Software Architecture Evaluation: A Controlled Experiment","authors":"M. Babar, B. Kitchenham","doi":"10.1109/esem.2007.38","DOIUrl":"https://doi.org/10.1109/esem.2007.38","url":null,"abstract":"An important element in scenario-based architecture evaluation is the development of scenario profiles by stakeholders working in groups. In practice groups can vary in size from 2 to 20 people. Currently, there is no empirical evidence about the impact of group size on the scenario development activity. Our experimental goal was to investigate the impact of group size on the quality of scenario profiles developed by different sizes of groups. We had 165 subjects, who were randomly assigned to 10 groups of size 3, 13 groups of size 5, and 10 groups of size 7. Participants were asked to develop scenario profiles. After the experiment each participant completed a questionnaire aimed at identifying their opinion of the group activity. The average quality score for group scenario profiles for 3 person groups was 362.4, for groups of 5 person groups was 534.23 and for 7 person groups was. 444.5. The quality of scenario profiles for groups of size 5 was significantly greater than the quality of scenario profiles for groups of size 3 (p=0.025), but there was no difference between the size 3 and size 7 groups. However, participants in groups of size 3 had a significantly better opinion of the group activity outcome and their personal interaction with their group than participants in groups of size 5 or 7. Our results suggest that the quality of the output from a group does not increase linearly with group size. However, individual participants prefer small groups. This means there is a trade-off between group output quality and the personal experience of group members.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122032748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a case study, the defect detection for functional test teams is investigated. In the study it is shown that the test teams not only discover defects in the features under test that they are responsible for, but also defects in interacting components, belonging to other test teams' features. The paper presents the metrics collected and the results as such from the study, which gives insights into a complex development environment and highlights the need for coordination between test teams in function test.
{"title":"Investigating Test Teams' Defect Detection in Function test","authors":"Carina Andersson, P. Runeson","doi":"10.1109/ESEM.2007.68","DOIUrl":"https://doi.org/10.1109/ESEM.2007.68","url":null,"abstract":"In a case study, the defect detection for functional test teams is investigated. In the study it is shown that the test teams not only discover defects in the features under test that they are responsible for, but also defects in interacting components, belonging to other test teams' features. The paper presents the metrics collected and the results as such from the study, which gives insights into a complex development environment and highlights the need for coordination between test teams in function test.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128124069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Mitani, Tomoko Matsumura, Mike Barker, S. Tsuruho, Katsuro Inoue, Ken-ichi Matsumoto
This paper focuses on in-process measurements during requirements definition where measurements of processes and products are relatively difficult. However, development processes in Japan based on the enterprise architecture method provide standardized formats for such upstream processes and products, allowing in-process measurements. Based on previous work and on this examination of in-process measurements of requirements definition with the enterprise architecture method and previous results of empirical studies of in-process measurements and empirically validates of later development processes, this paper proposes a new measurement model, the "full in-process process and product (I-PAP) measurement model," which includes the complete software development process from requirements to maintenance. Standardization of the requirements definition phase using the enterprise architecture method in Japan allows in-process measurement across the complete development lifecycle. Combining this with collaborative filtering and a project benchmark database will support project evaluation, estimation, and prediction.
{"title":"Proposal of a Complete Life Cycle In-Process Measurement Model Based on Evaluation of an In-Process Measurement Experiment Using a Standardized Requirement Definition Process","authors":"Y. Mitani, Tomoko Matsumura, Mike Barker, S. Tsuruho, Katsuro Inoue, Ken-ichi Matsumoto","doi":"10.1109/ESEM.2007.27","DOIUrl":"https://doi.org/10.1109/ESEM.2007.27","url":null,"abstract":"This paper focuses on in-process measurements during requirements definition where measurements of processes and products are relatively difficult. However, development processes in Japan based on the enterprise architecture method provide standardized formats for such upstream processes and products, allowing in-process measurements. Based on previous work and on this examination of in-process measurements of requirements definition with the enterprise architecture method and previous results of empirical studies of in-process measurements and empirically validates of later development processes, this paper proposes a new measurement model, the \"full in-process process and product (I-PAP) measurement model,\" which includes the complete software development process from requirements to maintenance. Standardization of the requirements definition phase using the enterprise architecture method in Japan allows in-process measurement across the complete development lifecycle. Combining this with collaborative filtering and a project benchmark database will support project evaluation, estimation, and prediction.","PeriodicalId":124420,"journal":{"name":"First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134561045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}