Markus Schnappinger, Arnaud Fietzke, A. Pretschner
{"title":"基于静态代码度量的人类级别有序可维护性预测","authors":"Markus Schnappinger, Arnaud Fietzke, A. Pretschner","doi":"10.1145/3463274.3463315","DOIUrl":null,"url":null,"abstract":"One of the greatest challenges in software quality control is the efficient and effective measurement of maintainability. Thorough expert assessments are precise yet slow and expensive, whereas automated static analysis yields imprecise yet rapid feedback. Several machine learning approaches aim to integrate the advantages of both concepts. However, most prior studies did not adhere to expert judgment and predicted the number of changed lines as a proxy for maintainability, or were biased towards a small group of experts. In contrast, the present study builds on a manually labeled and validated dataset. Prediction is done using static code metrics where we found simple structural metrics such as the size of a class and its methods to yield the highest predictive power towards maintainability. Using just a small set of these metrics, our models can distinguish easy from hard to maintain code with an F-score of 91.3% and AUC of 82.3%. In addition, we perform a more detailed ordinal classification and compare the quality of the classification with the performance of experts. Here, we use the deviations between the individual expert’s ratings and the eventually determined consensus of all experts. In sum, our models achieve the same level of performance as an average human expert. In fact, the obtained accuracy and mean squared error outperform human performance. We hence argue that our models provide an automated and trustworthy prediction of software maintainability.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Human-level Ordinal Maintainability Prediction Based on Static Code Metrics\",\"authors\":\"Markus Schnappinger, Arnaud Fietzke, A. Pretschner\",\"doi\":\"10.1145/3463274.3463315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the greatest challenges in software quality control is the efficient and effective measurement of maintainability. Thorough expert assessments are precise yet slow and expensive, whereas automated static analysis yields imprecise yet rapid feedback. Several machine learning approaches aim to integrate the advantages of both concepts. However, most prior studies did not adhere to expert judgment and predicted the number of changed lines as a proxy for maintainability, or were biased towards a small group of experts. In contrast, the present study builds on a manually labeled and validated dataset. Prediction is done using static code metrics where we found simple structural metrics such as the size of a class and its methods to yield the highest predictive power towards maintainability. Using just a small set of these metrics, our models can distinguish easy from hard to maintain code with an F-score of 91.3% and AUC of 82.3%. In addition, we perform a more detailed ordinal classification and compare the quality of the classification with the performance of experts. Here, we use the deviations between the individual expert’s ratings and the eventually determined consensus of all experts. In sum, our models achieve the same level of performance as an average human expert. In fact, the obtained accuracy and mean squared error outperform human performance. We hence argue that our models provide an automated and trustworthy prediction of software maintainability.\",\"PeriodicalId\":328024,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3463274.3463315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463274.3463315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human-level Ordinal Maintainability Prediction Based on Static Code Metrics
One of the greatest challenges in software quality control is the efficient and effective measurement of maintainability. Thorough expert assessments are precise yet slow and expensive, whereas automated static analysis yields imprecise yet rapid feedback. Several machine learning approaches aim to integrate the advantages of both concepts. However, most prior studies did not adhere to expert judgment and predicted the number of changed lines as a proxy for maintainability, or were biased towards a small group of experts. In contrast, the present study builds on a manually labeled and validated dataset. Prediction is done using static code metrics where we found simple structural metrics such as the size of a class and its methods to yield the highest predictive power towards maintainability. Using just a small set of these metrics, our models can distinguish easy from hard to maintain code with an F-score of 91.3% and AUC of 82.3%. In addition, we perform a more detailed ordinal classification and compare the quality of the classification with the performance of experts. Here, we use the deviations between the individual expert’s ratings and the eventually determined consensus of all experts. In sum, our models achieve the same level of performance as an average human expert. In fact, the obtained accuracy and mean squared error outperform human performance. We hence argue that our models provide an automated and trustworthy prediction of software maintainability.