An empirical study of fault localization in Python programs

IF 3.6 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-06-13 DOI:10.1007/s10664-024-10475-3

Mohammad Rezaalipour, Carlo A. Furia

{"title":"An empirical study of fault localization in Python programs","authors":"Mohammad Rezaalipour, Carlo A. Furia","doi":"10.1007/s10664-024-10475-3","DOIUrl":null,"url":null,"abstract":"Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java—the most common choices for fault localization research—carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.’s recent large-scale empirical study of fault localization in Java (Zou et al. 2021) as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the BugsInPy curated collection (Widyasari et al. 2020). The results replicate for Python several results known about Java, and shed light on whether Python’s peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool FauxPy that we implemented to conduct the study.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"68 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10475-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java—the most common choices for fault localization research—carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.’s recent large-scale empirical study of fault localization in Java (Zou et al. 2021) as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the BugsInPy curated collection (Widyasari et al. 2020). The results replicate for Python several results known about Java, and shed light on whether Python’s peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool FauxPy that we implemented to conduct the study.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Python 程序故障定位实证研究

尽管 Python 作为一种编程语言大受欢迎，尤其是在数据科学程序等新领域，但针对 Python 的故障定位研究却相对较少。尽管关于 C/C++ 和 Java 等编程语言（故障定位研究中最常见的选择）的一些发现有可能被其他语言所借鉴，但 Python 的动态特性以及该语言在实践中的使用方式是否会影响经典故障定位方法的能力，仍然是有待研究的开放性问题。本文是对真实 Python 程序和故障进行故障定位的首次多家族大规模实证研究。我们以 Zou 等人最近对 Java 中故障定位的大规模实证研究（Zou et al、基于频谱、基于突变、基于谓词切换和基于堆栈跟踪）的七种著名的故障定位技术的有效性（即定位精度）、效率（即运行时性能）和其他特征（如不同的实体粒度），这些技术分别来自 BugsInPy 精选集（Widyasari 等人，2020 年）中 13 个开源 Python 项目的 135 个故障。这些结果在 Python 上复制了 Java 的一些已知结果，并揭示了 Python 的特殊性是否会影响故障定位的能力。本文附带的复制包中包含了有关我们实验的详细数据，以及我们为开展研究而实施的工具 FauxPy。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.