Pub Date : 2026-03-01Epub Date: 2025-09-09DOI: 10.1016/j.scico.2025.103386
Anton Wijs
In 2009, the Simple Language of Communicating Objects (Slco) Domain-Specific Language was designed. Since then, a range of tools have been developed around this language to conduct research on a wide range of topics, all related to the construction of complex, component-based software, with formal verification being applied in every development step. This addresses our vision that formal verification should be seamlessly integrated into Model-Driven Software Engineering, to effectively develop correct software. In this article, we present this range of topics, and draw connections between the various, at first glance disparate, research results. We discuss the current status of the Slco framework, i.e., the language in combination with the tools, related work w.r.t. each of the topics, and plans for future work.
{"title":"An overview of research with Slco on seamless integration of formal verification into model-driven software engineering","authors":"Anton Wijs","doi":"10.1016/j.scico.2025.103386","DOIUrl":"10.1016/j.scico.2025.103386","url":null,"abstract":"<div><div>In 2009, the Simple Language of Communicating Objects (<span>Slco</span>) Domain-Specific Language was designed. Since then, a range of tools have been developed around this language to conduct research on a wide range of topics, all related to the construction of complex, component-based software, with formal verification being applied in every development step. This addresses our vision that formal verification should be seamlessly integrated into Model-Driven Software Engineering, to effectively develop correct software. In this article, we present this range of topics, and draw connections between the various, at first glance disparate, research results. We discuss the current status of the <span>Slco</span> framework, i.e., the language in combination with the tools, related work w.r.t. each of the topics, and plans for future work.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103386"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-08-13DOI: 10.1016/j.scico.2025.103380
Matteo Cimini, Joan Montas
Language verification is an important aspect in the cycle of programming language development, especially when such endeavor establishes properties of programming languages with mathematical proofs. Prior work proposed , which is a domain-specific language for expressing language-parameterized proofs, that is, proofs that apply to classes of languages rather than a single language. Such work developed the language-parameterized proofs of type soundness (excluding the substitution lemmas) for a certain class of functional languages. In this paper, we extend that work to include subtyping. We have added new operations to for expressing the proofs that are related to subtyping more naturally. We provide a semantics of our new system based on a compilation into proofs of the Abella proof assistant. Next, we develop language-parameterized proofs of type soundness (excluding the substitution lemmas) for the class of functional languages mentioned above, and of the equivalence between algorithmic and declarative subtyping. Our extended generates Abella proofs that machine-check the type soundness of a nontrivial class of functional languages with declarative and algorithmic subtyping, when just a few simple lemmas are admitted.
{"title":"Type soundness of functional languages with subtyping in Lang-n-Prove","authors":"Matteo Cimini, Joan Montas","doi":"10.1016/j.scico.2025.103380","DOIUrl":"10.1016/j.scico.2025.103380","url":null,"abstract":"<div><div>Language verification is an important aspect in the cycle of programming language development, especially when such endeavor establishes properties of programming languages with mathematical proofs. Prior work proposed <figure><img></figure>, which is a domain-specific language for expressing language-parameterized proofs, that is, proofs that apply to classes of languages rather than a single language. Such work developed the language-parameterized proofs of type soundness (excluding the substitution lemmas) for a certain class of functional languages. In this paper, we extend that work to include subtyping. We have added new operations to <figure><img></figure> for expressing the proofs that are related to subtyping more naturally. We provide a semantics of our new system based on a compilation into proofs of the Abella proof assistant. Next, we develop language-parameterized proofs of type soundness (excluding the substitution lemmas) for the class of functional languages mentioned above, and of the equivalence between algorithmic and declarative subtyping. Our extended <figure><img></figure> generates Abella proofs that machine-check the type soundness of a nontrivial class of functional languages with declarative and algorithmic subtyping, when just a few simple lemmas are admitted.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103380"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144867123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-07-11DOI: 10.1016/j.scico.2025.103364
Erika Abraham (The Guest Editor) , Manuel Mazo Espinosa (The Guest Editor)
{"title":"Preface Special Issue SCICO on HSCC2024-software","authors":"Erika Abraham (The Guest Editor) , Manuel Mazo Espinosa (The Guest Editor)","doi":"10.1016/j.scico.2025.103364","DOIUrl":"10.1016/j.scico.2025.103364","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103364"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-09-27DOI: 10.1016/j.scico.2025.103393
Yunior Pacheco Correa , Coen De Roover , Johannes Härtel
Tool support in software engineering often relies on relationships, regularities, patterns, or rules mined from other users’ code. Examples include approaches to bug prediction, code recommendation, and code autocompletion. Mining is typically performed on samples of code rather than the entirety of available software projects. While sampling is crucial for scaling data analysis, it can affect the generalization of the mined patterns.
This paper focuses on sampling software projects filtered for specific libraries and frameworks, and on mining patterns that connect different libraries. We call these inter-library patterns. We observe that limiting the sample to a specific library may hinder the generalization of inter-library patterns, posing a threat to their use or interpretation. Using a simulation and a real case study, we show this threat for different sampling methods. Our simulation shows that only when sampling for the disjunction of both libraries involved in the implication of a pattern, the implication generalizes well. Additionally, we show that real empirical data sampled using the GitHub search API does not behave as expected from our simulation. This identifies a potential threat relevant for many studies that use the GitHub search API for studying inter-library patterns.
{"title":"The sampling threat when mining generalizable inter-library usage patterns","authors":"Yunior Pacheco Correa , Coen De Roover , Johannes Härtel","doi":"10.1016/j.scico.2025.103393","DOIUrl":"10.1016/j.scico.2025.103393","url":null,"abstract":"<div><div>Tool support in software engineering often relies on relationships, regularities, patterns, or rules mined from other users’ code. Examples include approaches to bug prediction, code recommendation, and code autocompletion. Mining is typically performed on samples of code rather than the entirety of available software projects. While sampling is crucial for scaling data analysis, it can affect the generalization of the mined patterns.</div><div>This paper focuses on sampling software projects filtered for specific libraries and frameworks, and on mining patterns that connect different libraries. We call these inter-library patterns. We observe that limiting the sample to a specific library may hinder the generalization of inter-library patterns, posing a threat to their use or interpretation. Using a simulation and a real case study, we show this threat for different sampling methods. Our simulation shows that only when sampling for the disjunction of both libraries involved in the implication of a pattern, the implication generalizes well. Additionally, we show that real empirical data sampled using the GitHub search API does not behave as expected from our simulation. This identifies a potential threat relevant for many studies that use the GitHub search API for studying inter-library patterns.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103393"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-08-11DOI: 10.1016/j.scico.2025.103379
Takashi Suwa , Atsushi Igarashi
We propose MetaFM, a novel ML-style module system that enables users to decompose multi-stage programs (i.e., programs written in a typed multi-stage programming language) into loosely coupled components in a manner natural with respect to type abstraction. The distinctive aspect of MetaFM is that it allows values at different stages to be bound in a single structure (i.e., ). This feature is crucial, for example, for defining a function and a macro that use one abstract type in common without revealing the implementation detail of that type. MetaFM also accommodates staging with full-fledged module-related features such as functors, higher-kinded types, and the with type-construct. We give two separate formalizations of MetaFM's semantics by employing the technique of elaborations, i.e., type-directed translations to target languages. Specifically, we first define F-ing Modules-based semantics as a set of elaboration rules that convert MetaFM programs into System F, a multi-stage extension of System Fω, and prove that the elaboration preserves typing. The existential quantification offered by System F demonstrates that a type abstraction mechanism is properly formalized in our language. Then, because our F-ing Modules-based semantics of staging has some issues as to the evaluation order, we give another elaboration by utilizing a method called static interpretation, which flattens nested structures into arrays of bindings and inlines functor applications through type-checking. While our F-ing Modules-based semantics cannot be naturally extended with effectful computations, the static interpretation-based one can easily accommodate effectful features such as mutable references, though this is achieved with the limitation that functors must be first-order for the moment. As a sideline, we develop a technique that simplifies the correctness proof of the static interpretation for first-order functors. Additionally, our language supports cross-stage persistence (CSP), a feature for code reuse spanning more than one stage, without breaking type safety. We also implemented a module system for a language of real-world use based on the latter semantics to demonstrate the utility of our formalization.
{"title":"An ML-style module system for cross-stage type abstraction in multi-stage programming","authors":"Takashi Suwa , Atsushi Igarashi","doi":"10.1016/j.scico.2025.103379","DOIUrl":"10.1016/j.scico.2025.103379","url":null,"abstract":"<div><div>We propose <em>MetaFM</em>, a novel ML-style module system that enables users to decompose <em>multi-stage programs</em> (i.e., programs written in a typed <em>multi-stage programming</em> language) into loosely coupled components in a manner natural with respect to type abstraction. The distinctive aspect of MetaFM is that it allows values at different stages to be bound in a single structure (i.e., <span><math><mrow><mi>struct</mi></mrow><mspace></mspace><mo>⋯</mo><mspace></mspace><mrow><mi>end</mi></mrow></math></span>). This feature is crucial, for example, for defining a function and a macro that use one abstract type in common without revealing the implementation detail of that type. MetaFM also accommodates staging with full-fledged module-related features such as <em>functors</em>, <em>higher-kinded types</em>, and the <strong>with type</strong><em>-construct</em>. We give two separate formalizations of MetaFM's semantics by employing the technique of <em>elaborations</em>, i.e., type-directed translations to target languages. Specifically, we first define <em>F-ing Modules</em>-based semantics as a set of elaboration rules that convert MetaFM programs into <em>System F</em><span><math><msup><mrow><mi>ω</mi></mrow><mrow><mo>〈</mo><mo>〉</mo></mrow></msup></math></span>, a multi-stage extension of System F<em>ω</em>, and prove that the elaboration preserves typing. The existential quantification offered by System F<span><math><msup><mrow><mi>ω</mi></mrow><mrow><mo>〈</mo><mo>〉</mo></mrow></msup></math></span> demonstrates that a type abstraction mechanism is properly formalized in our language. Then, because our F-ing Modules-based semantics of staging has some issues as to the evaluation order, we give another elaboration by utilizing a method called <em>static interpretation</em>, which flattens nested structures into arrays of bindings and inlines functor applications through type-checking. While our F-ing Modules-based semantics cannot be naturally extended with effectful computations, the static interpretation-based one can easily accommodate effectful features such as mutable references, though this is achieved with the limitation that functors must be first-order for the moment. As a sideline, we develop a technique that simplifies the correctness proof of the static interpretation for first-order functors. Additionally, our language supports <em>cross-stage persistence</em> (<em>CSP</em>), a feature for code reuse spanning more than one stage, without breaking type safety. We also implemented a module system for a language of real-world use based on the latter semantics to demonstrate the utility of our formalization.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103379"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144830041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-08-25DOI: 10.1016/j.scico.2025.103384
Paula Herber (Editors of the Special Issue) , Muhammad Osama , Anton Wijs
{"title":"Research software from the integrated Formal Methods (iFM) conference 2023","authors":"Paula Herber (Editors of the Special Issue) , Muhammad Osama , Anton Wijs","doi":"10.1016/j.scico.2025.103384","DOIUrl":"10.1016/j.scico.2025.103384","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103384"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145412779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-07-24DOI: 10.1016/j.scico.2025.103366
Zhehao Zhao, Yifeng Chen
Distributed Computing Continuum Systems (DCCS) integrate a large-scale cloud side and a wide-area heterogeneous edge side within the same network environment and is therefore a more general computing paradigm than cloud or edge computing. There are currently no programming technologies suitable for the support of DCCS application development. In this paper, we introduce a programming framework called Wolfs, which uses only basic programming constructs from programming-language theories. The framework is communication-centric in the sense that high-level communication models and lower-level communication protocols are organised in a hierarchy. Data-processing computations are inserted in the hierarchy as actions. This design allows application-level information to influence network routing and connection management. Wolfs programming is demonstrated in two case studies.
{"title":"A programming framework for distributed computing continuum systems","authors":"Zhehao Zhao, Yifeng Chen","doi":"10.1016/j.scico.2025.103366","DOIUrl":"10.1016/j.scico.2025.103366","url":null,"abstract":"<div><div>Distributed Computing Continuum Systems (DCCS) integrate a large-scale cloud side and a wide-area heterogeneous edge side within the same network environment and is therefore a more general computing paradigm than cloud or edge computing. There are currently no programming technologies suitable for the support of DCCS application development. In this paper, we introduce a programming framework called <span>Wolfs</span>, which uses only basic programming constructs from programming-language theories. The framework is communication-centric in the sense that high-level communication models and lower-level communication protocols are organised in a hierarchy. Data-processing computations are inserted in the hierarchy as actions. This design allows application-level information to influence network routing and connection management. <span>Wolfs</span> programming is demonstrated in two case studies.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103366"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-07-15DOI: 10.1016/j.scico.2025.103362
Yibo Dai , Li Xie , Peng Wu , Shecheng Cui , Linhai Ma
Concurrent data structures or classes are designed to provide safe accesses and simultaneous updates by multiple threads to shared objects in a concurrent environment, with the goal of enhancing parallelism and throughput. However, testing concurrent objects poses significant challenges due to the potential explosion of concurrency test spaces, the variety of programming vulnerabilities, and the inherent nondeterminism of concurrent test executions. In this paper, we propose an Intrathread Method Orders based Adaptive Concurrency Testing (IMOACT) framework for concurrent objects. IMOACT can capture diverse behaviors of interthread method pairs through characterizing concurrent execution contexts with intrathread method orders. Moreover, IMOACT can adaptively optimize concurrent test executions by generating scheduling sequences based on the key scheduling points visited so far, streamlining test generation and execution organically across multiple tests. Experimental case studies with typical C/C++ concurrent classes demonstrate that IMOACT outperforms baseline approaches. On average, IMOACT promotes the effectiveness of detecting concurrency bugs by 65%, and achieves a speedup of 2.43x compared to the underlying state-of-the-art concurrency testing approach.
{"title":"Intrathread method orders based adaptive testing of concurrent objects","authors":"Yibo Dai , Li Xie , Peng Wu , Shecheng Cui , Linhai Ma","doi":"10.1016/j.scico.2025.103362","DOIUrl":"10.1016/j.scico.2025.103362","url":null,"abstract":"<div><div>Concurrent data structures or classes are designed to provide safe accesses and simultaneous updates by multiple threads to shared objects in a concurrent environment, with the goal of enhancing parallelism and throughput. However, testing concurrent objects poses significant challenges due to the potential explosion of concurrency test spaces, the variety of programming vulnerabilities, and the inherent nondeterminism of concurrent test executions. In this paper, we propose an Intrathread Method Orders based Adaptive Concurrency Testing (IMOACT) framework for concurrent objects. IMOACT can capture diverse behaviors of interthread method pairs through characterizing concurrent execution contexts with intrathread method orders. Moreover, IMOACT can adaptively optimize concurrent test executions by generating scheduling sequences based on the key scheduling points visited so far, streamlining test generation and execution organically across multiple tests. Experimental case studies with typical C/C++ concurrent classes demonstrate that IMOACT outperforms baseline approaches. On average, IMOACT promotes the effectiveness of detecting concurrency bugs by 65%, and achieves a speedup of 2.43x compared to the underlying state-of-the-art concurrency testing approach.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103362"},"PeriodicalIF":1.5,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-09-20DOI: 10.1016/j.scico.2025.103389
Christophe Crochet , John Aoga , Axel Legay
We propose PANTHER, a modular and extensible framework for automated testing and verification of network protocols. PANTHER lets researchers plug in arbitrary protocol implementations, tester scripts, and network topologies to conduct experiments. Internally, it combines Microsoft’s Ivy tool for formal specification with the Shadow network simulator to handle time-varying behavior and real network conditions with reproducibility. Experiments are configured via simple YAML files and executed in Docker containers, ensuring easy deployment. We demonstrate PANTHER with a case study on the Quick UDP Internet Connections (QUIC) protocol. All code and experiment setups are publicly available for replication.
{"title":"PANTHER: Pluginizable testing environment for network protocols","authors":"Christophe Crochet , John Aoga , Axel Legay","doi":"10.1016/j.scico.2025.103389","DOIUrl":"10.1016/j.scico.2025.103389","url":null,"abstract":"<div><div>We propose PANTHER, a modular and extensible framework for automated testing and verification of network protocols. PANTHER lets researchers plug in arbitrary protocol implementations, tester scripts, and network topologies to conduct experiments. Internally, it combines Microsoft’s Ivy tool for formal specification with the Shadow network simulator to handle time-varying behavior and real network conditions with reproducibility. Experiments are configured via simple YAML files and executed in Docker containers, ensuring easy deployment. We demonstrate PANTHER with a case study on the Quick UDP Internet Connections (QUIC) protocol. All code and experiment setups are publicly available for replication.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103389"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-08-13DOI: 10.1016/j.scico.2025.103381
Yongchang Ding , Wei Han , Zhiqiang Li , Haowen Chen , Linjun Chen , Rong Peng , Xiao-Yuan Jing
In the field of software engineering, defect prediction has always been a popular research direction. Currently, the research on traditional software defect prediction mainly focuses on metric features, which are derived from various descriptive rules. Many researchers have proposed a large number of defect prediction models based on these metric features and various framework models. However, the problem of data scarcity has severely hindered the development of the field. Therefore, this work proposes a new method, namely the Metric Attention Module (MAM), which excavates the correlations within the metric data features, between features, within modules, and between modules. By learning new data representations, MAM guides the model's learning process and ultimately improves the model's performance without changing the network framework structure. Additionally, the method is interpretable.
In this work, experiments were conducted in various task environments and on different datasets, all resulting in varying degrees of improvement. In the context of within-project defect prediction (WPDP), experiments with the MAM data model showed an average improvement of 14.7% in Accuracy, 15.9% in F1 score, 23.7% in AUC, and 65.1% in MCC. In cross-project defect prediction (CPDP), under more complex task environments, the model demonstrated excellent performance across multiple standard datasets. Compared to the baseline models and training results, the F1, Accuracy, and MCC scores improved by approximately 40%, 20%, and 50%, respectively.
{"title":"Metric information mining with metric attention to boost software defect prediction performance","authors":"Yongchang Ding , Wei Han , Zhiqiang Li , Haowen Chen , Linjun Chen , Rong Peng , Xiao-Yuan Jing","doi":"10.1016/j.scico.2025.103381","DOIUrl":"10.1016/j.scico.2025.103381","url":null,"abstract":"<div><div>In the field of software engineering, defect prediction has always been a popular research direction. Currently, the research on traditional software defect prediction mainly focuses on metric features, which are derived from various descriptive rules. Many researchers have proposed a large number of defect prediction models based on these metric features and various framework models. However, the problem of data scarcity has severely hindered the development of the field. Therefore, this work proposes a new method, namely the Metric Attention Module (MAM), which excavates the correlations within the metric data features, between features, within modules, and between modules. By learning new data representations, MAM guides the model's learning process and ultimately improves the model's performance without changing the network framework structure. Additionally, the method is interpretable.</div><div>In this work, experiments were conducted in various task environments and on different datasets, all resulting in varying degrees of improvement. In the context of within-project defect prediction (WPDP), experiments with the MAM data model showed an average improvement of 14.7% in Accuracy, 15.9% in F1 score, 23.7% in AUC, and 65.1% in MCC. In cross-project defect prediction (CPDP), under more complex task environments, the model demonstrated excellent performance across multiple standard datasets. Compared to the baseline models and training results, the F1, Accuracy, and MCC scores improved by approximately 40%, 20%, and 50%, respectively.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"248 ","pages":"Article 103381"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144861269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}