Background. The Online Just-In-Time Software Defect Prediction (O-JIT-SDP) employs an online model to predict whether a new software change will introduce a bug. Previous studies have neglected to consider the interaction between Software Quality Assurance (SQA) personnel and the model, potentially missing opportunities to refine prediction accuracy through human feedback. Problem. A recent study introduced the first Human-In-The-Loop (HITL) O-JIT-SDP framework called HumLa, integrating SQA staff feedback without accounting for inspection time to boost the prediction performance of O-JIT-SDP. However, upon a thorough revisit of HumLa, we find that while certain aspects of the HITL O-JIT-SDP system appear feasible in ideal conditions, they prove impractical in real-world context. Objective. We aim to reformulate HITL O-JIT-SDP, which are crucial yet absent for practical application. Method. We propose four crucial enhancements to facilitate practical application of HITL O-JIT-SDP. First, we advocate for the use of observed labels rather than ground-truth labels to evaluate online classifiers in real-world settings. Second, we suggest refraining from utilizing the entire data stream for normalizing features of each new instance, as was done in HumLa. Third, we propose incorporating non-zero SQA inspection time into the formulation of HITL O-JIT-SDP. Fourth, we introduce real-time statistical classifier comparison into the HITL system. Result. Our replication uncovers that the performance evaluation of HumLa under a practical scenario significantly deviate from the originally reported performance under an ideal experimental scenario, potentially diminishing the promise of HITL O-JIT-SDP. Furthermore, with our enhanced HITL O-JIT-SDP framework, we revisit a fundamental question in O-JIT-SDP: the benefits of HITL integration. Our experimental findings demonstrate that HITL not only enhances O-JIT-SDP when SQA feedback surpasses Bug-Fixing Commit (BFC) feedback (by providing training commits with superior label quality in less time) but also improves O-JIT-SDP even when SQA feedback delay equals that of BFC feedback (by consistently delivering training commits with improved label quality). The real-time statistical analysis reveals that HITL approaches generally outperform non-HITL O-JIT-SDP approaches with a statistically significant margin. Conclusion. Our work bolsters model evaluation credibility and holds the potential to substantially enhance the value of HITL O-JIT-SDP for industrial applications.
扫码关注我们
求助内容:
应助结果提醒方式:
