通过乐观估计和悲观估计识别软件缺陷阈值

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2016-09-08 DOI:10.1145/2961111.2962595

L. Lavazza, S. Morasca

{"title":"通过乐观估计和悲观估计识别软件缺陷阈值","authors":"L. Lavazza, S. Morasca","doi":"10.1145/2961111.2962595","DOIUrl":null,"url":null,"abstract":"Background. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a \"grey zone\" and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules be-longing to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The \"pessimistic\" model is built by assuming that all modules whose faultiness is unknown are actually faulty and the \"optimistic\" model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is estimated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the \"grey zone,\" i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying \"grey zones\" in which it does not make much sense to estimate modules' faultiness based on measure X and, therefore, the zones in which modules' faultiness can be estimated with confidence.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Identifying Thresholds for Software Faultiness via Optimistic and Pessimistic Estimations\",\"authors\":\"L. Lavazza, S. Morasca\",\"doi\":\"10.1145/2961111.2962595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a \\\"grey zone\\\" and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules be-longing to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The \\\"pessimistic\\\" model is built by assuming that all modules whose faultiness is unknown are actually faulty and the \\\"optimistic\\\" model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is estimated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the \\\"grey zone,\\\" i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying \\\"grey zones\\\" in which it does not make much sense to estimate modules' faultiness based on measure X and, therefore, the zones in which modules' faultiness can be estimated with confidence.\",\"PeriodicalId\":208212,\"journal\":{\"name\":\"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2961111.2962595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2961111.2962595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

背景。在根据软件内部属性(如大小、结构复杂性、内聚性、耦合性)的度量X的值来估计软件模块是否存在故障时，可以先设置一个故障倾向的阈值，然后使用一个以X为自变量的故障倾向模型来推导出一个X的阈值。然而，有些模块不能准确地估计为有缺陷或无缺陷:它们属于“灰色地带”，估计它们是有缺陷的，可能会导致一些错误的决定。目标。我们提出并评估了一种在X上设置阈值的方法，以确定哪些模块可以自信地估计为故障或非故障，哪些模块无法估计。方法。假设我们不知道属于一组模块的一个子集的模块是否有故障，就像实际情况中需要对故障性进行估计的模块一样。我们以模块集作为训练集，建立了两个故障倾向模型。“悲观”模型是假设所有故障未知的模块实际上都有故障，而“乐观”模型是假设它们实际上没有故障。乐观和悲观模型可用于设置乐观和悲观两个阈值。通过乐观响应估计模块故障。(悲观)模型和(乐观)模型。如果其故障倾向高于阈值，则为悲观阈值，否则为非故障阈值。一个模块被估计为故障(例如)。具有乐观阈值的乐观模型和具有悲观阈值的悲观模型都被估计为故障(p < 0.05)。non-faulty)。具有相关阈值的两个模型的估计相冲突的模块处于“灰色地带”，也就是说，无法对它们进行可靠的不完备性估计。结果。我们将我们的方法应用于PROMISE存储库中的数据集，我们进行了交叉验证，并通过常用的指标评估了准确性。我们还将我们的结果与使用一个二元逻辑回归模型的传统方法获得的结果进行了比较。我们的结果表明，我们的方法在识别X值的灰色区域是有效的，其中模块不能可靠地估计为故障或非故障，反过来，模块可以估计为故障或非故障的区间。就f值而言，我们的方法在大多数情况下比传统方法更准确。此外，它提供了非常集中的f测量值，即，它一致地识别出模块可以被估计为故障或非故障的间隔。结论。我们的方法可以实际用于识别“灰色地带”，在这些“灰色地带”中，基于度量X估计模块的不完全性没有多大意义，因此，可以自信地估计模块的不完全性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Identifying Thresholds for Software Faultiness via Optimistic and Pessimistic Estimations

Background. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a "grey zone" and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules be-longing to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The "pessimistic" model is built by assuming that all modules whose faultiness is unknown are actually faulty and the "optimistic" model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is estimated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the "grey zone," i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying "grey zones" in which it does not make much sense to estimate modules' faultiness based on measure X and, therefore, the zones in which modules' faultiness can be estimated with confidence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量