Pub Date : 2026-06-01Epub Date: 2026-01-19DOI: 10.1016/j.scico.2026.103446
Shaykhah S. Aldosari , Layla S. Aldawsari
Large Language Models (LLMs) represent a significant advancement in artificial intelligence (AI) capabilities, enabling natural and intuitive human-machine interactions. One rapidly evolving AI application involves LLM code generation, which can expedite software development by automating code writing, debugging, and optimizing. However, despite these enhanced capabilities, essential questions remain regarding the security implications of code generated by these models. This study addresses three key research questions to examine the security risks in LLM-generated code. It examines whether code generated by different open-source LLMs exhibits measurable variation in vulnerability prevalence. Furthermore, it also investigates how the choice of programming languages influences the security of LLM-generated code. Finally, it explores the degree to which prompt specificity and construction shape the security of the generated code. Our findings demonstrate differences across all dimensions: LLMs exhibited a variance of up to 136.06, programming languages showed a maximum performance gap of 56%, and prompt engineering achieved up to 77% improvement in security.
{"title":"Securing LLM code generation: Leveraging prompt engineering to mitigate vulnerabilities across models and languages","authors":"Shaykhah S. Aldosari , Layla S. Aldawsari","doi":"10.1016/j.scico.2026.103446","DOIUrl":"10.1016/j.scico.2026.103446","url":null,"abstract":"<div><div>Large Language Models (LLMs) represent a significant advancement in artificial intelligence (AI) capabilities, enabling natural and intuitive human-machine interactions. One rapidly evolving AI application involves LLM code generation, which can expedite software development by automating code writing, debugging, and optimizing. However, despite these enhanced capabilities, essential questions remain regarding the security implications of code generated by these models. This study addresses three key research questions to examine the security risks in LLM-generated code. It examines whether code generated by different open-source LLMs exhibits measurable variation in vulnerability prevalence. Furthermore, it also investigates how the choice of programming languages influences the security of LLM-generated code. Finally, it explores the degree to which prompt specificity and construction shape the security of the generated code. Our findings demonstrate differences across all dimensions: LLMs exhibited a variance of up to 136.06, programming languages showed a maximum performance gap of 56%, and prompt engineering achieved up to 77% improvement in security.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"251 ","pages":"Article 103446"},"PeriodicalIF":1.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-23DOI: 10.1016/j.scico.2025.103433
Xinjie Wei , Chang-Ai Sun , Xiaoyi Zhang , Dave Towey
Context:Log-based anomaly detection (LAD) techniques examine whether or not continuously-generated logs match historically-normal patterns: This helps to ensure reliability in distributed systems using DevOps. However, complex anomalies can span multiple log-pattern types and thus may only be detected by combining these patterns: Relying only on any single pattern may cause anomalies to be missed. These are false negatives in anomaly detection.
Objective:In this paper, we propose an Anomaly-Detection approach based on Multi-type log-pattern fusion and Multi-model integration (MulAD): MulAD fuses multi-type log patterns into a synthetic representation to detect complex anomalies.
Method:MulAD first rearranges logs by source parameters to decouple interleaving logs and isolate relevant events. It then derives log patterns across five dimensions — semantic, sequential, quantitative, temporal (chronological), and parametric — and fuses them into a unified synthesized pattern. Finally, to detect anomalies, MulAD integrates the MABi-LSTM, Transformer, and graph neural network (GNN) models together: Each of these models is specifically designed to capture temporal and sequential dependencies, contextual information, and structural dependencies.
Result:We evaluated MulAD on three public datasets (HDFS, BGL, and ThunderBird) and one industrial one, from the Ray system. Experimental results show that MulAD outperforms all state-of-the-art techniques.
Conclusion:We conclude that MulAD is a promising anomaly-detection technique for complex anomalies in distributed systems.
{"title":"MulAD: A log-based anomaly detection approach for distributed systems using multi-pattern and multi-model fusion","authors":"Xinjie Wei , Chang-Ai Sun , Xiaoyi Zhang , Dave Towey","doi":"10.1016/j.scico.2025.103433","DOIUrl":"10.1016/j.scico.2025.103433","url":null,"abstract":"<div><div><strong>Context:</strong>Log-based anomaly detection (LAD) techniques examine whether or not continuously-generated logs match historically-normal patterns: This helps to ensure reliability in distributed systems using DevOps. However, complex anomalies can span multiple log-pattern types and thus may only be detected by combining these patterns: Relying only on any single pattern may cause anomalies to be missed. These are false negatives in anomaly detection.</div><div><strong>Objective:</strong>In this paper, we propose an Anomaly-Detection approach based on Multi-type log-pattern fusion and Multi-model integration (MulAD): MulAD fuses multi-type log patterns into a synthetic representation to detect complex anomalies.</div><div><strong>Method:</strong>MulAD first rearranges logs by source parameters to decouple interleaving logs and isolate relevant events. It then derives log patterns across five dimensions — semantic, sequential, quantitative, temporal (chronological), and parametric — and fuses them into a unified <em>synthesized pattern</em>. Finally, to detect anomalies, MulAD integrates the MABi-LSTM, Transformer, and graph neural network (GNN) models together: Each of these models is specifically designed to capture temporal and sequential dependencies, contextual information, and structural dependencies.</div><div><strong>Result:</strong>We evaluated MulAD on three public datasets (HDFS, BGL, and ThunderBird) and one industrial one, from the Ray system. Experimental results show that MulAD outperforms all state-of-the-art techniques.</div><div><strong>Conclusion:</strong>We conclude that MulAD is a promising anomaly-detection technique for complex anomalies in distributed systems.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"251 ","pages":"Article 103433"},"PeriodicalIF":1.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bug severity prediction plays a crucial role in software development by enabling timely defect management. Traditional approaches that rely on bug reports are prone to subjective bias, often leading to inaccurate severity assessments. In contrast, source code-based methods can directly learn code representations to more accurately identify potential defects. However, existing source code-based models don’t make full use of the hierarchical deep semantic information, and don’t pay enough attention on the intrinsic class imbalance issue. To overcome these challenges, this paper presents the Cost-Adaptive Multi-level sEmantic feature Learning (CAMEL) framework for bug severity prediction. The framework comprises three core modules: the feature extraction module, the Multi-level Semantic Information Fusion (MSIF) module, and the Cost Weight Optimization (CWO) module. Specifically, the feature extraction module leverages CodeBERT to capture multi-level semantic information from source code. The MSIF then dynamically aggregates layer-specific features from each CodeBERT layer using an LSTM combined with a hierarchical attention mechanism, thereby preserving global semantic integrity. Finally, the CWO module mitigates the influence of class imbalance issue by dynamically adjusting class weight parameters. Experiments conducted on a dataset of 3342 method-level code snippets with varying bug severity levels demonstrate that CAMEL significantly outperforms state-of-the-art methods across key metrics, including F1-Weighted, Precision, Recall, and MCC.
{"title":"Cost-adaptive multi-level semantic feature learning for source code based bug severity prediction","authors":"Xiaoke Zhu , Yufeng Shi , Xiaopan Chen , Caihong Yuan , Fumin Qi , Xiao-Yuan Jing","doi":"10.1016/j.scico.2026.103444","DOIUrl":"10.1016/j.scico.2026.103444","url":null,"abstract":"<div><div>Bug severity prediction plays a crucial role in software development by enabling timely defect management. Traditional approaches that rely on bug reports are prone to subjective bias, often leading to inaccurate severity assessments. In contrast, source code-based methods can directly learn code representations to more accurately identify potential defects. However, existing source code-based models don’t make full use of the hierarchical deep semantic information, and don’t pay enough attention on the intrinsic class imbalance issue. To overcome these challenges, this paper presents the Cost-Adaptive Multi-level sEmantic feature Learning (CAMEL) framework for bug severity prediction. The framework comprises three core modules: the feature extraction module, the Multi-level Semantic Information Fusion (MSIF) module, and the Cost Weight Optimization (CWO) module. Specifically, the feature extraction module leverages CodeBERT to capture multi-level semantic information from source code. The MSIF then dynamically aggregates layer-specific features from each CodeBERT layer using an LSTM combined with a hierarchical attention mechanism, thereby preserving global semantic integrity. Finally, the CWO module mitigates the influence of class imbalance issue by dynamically adjusting class weight parameters. Experiments conducted on a dataset of 3342 method-level code snippets with varying bug severity levels demonstrate that CAMEL significantly outperforms state-of-the-art methods across key metrics, including F1-Weighted, Precision, Recall, and MCC.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"251 ","pages":"Article 103444"},"PeriodicalIF":1.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-12-05DOI: 10.1016/j.scico.2025.103420
Daniel Fernando Gómez-Barrera, Luccas Rojas Becerra, Juan Pinzón Roncancio, David Ortiz Almanza, Juan Arboleda, Mario Linares-Vásquez, Rubén Francisco Manrique
This paper presents CFFitST, a novel strategy for iteratively fine-tuning sentence embeddings using a pre-trained sentence transformer to enhance classification performance in few-shot settings. The method dynamically adjusts the number and composition of training samples based on internal assessments over the training data. CFFitST was evaluated in the “NLBSE 2024” tool competition, which focused on multi-class classification of GitHub issues. The competition required robust few-shot learning models to classify 300 issues across five different repositories. Our approach achieved an F1 score of 84.2 %, representing a 2.44 % statistical significant improvement over the SetFit baseline.
{"title":"CFFitST: Classification few-shot fit sentence transformer","authors":"Daniel Fernando Gómez-Barrera, Luccas Rojas Becerra, Juan Pinzón Roncancio, David Ortiz Almanza, Juan Arboleda, Mario Linares-Vásquez, Rubén Francisco Manrique","doi":"10.1016/j.scico.2025.103420","DOIUrl":"10.1016/j.scico.2025.103420","url":null,"abstract":"<div><div>This paper presents CFFitST, a novel strategy for iteratively fine-tuning sentence embeddings using a pre-trained sentence transformer to enhance classification performance in few-shot settings. The method dynamically adjusts the number and composition of training samples based on internal assessments over the training data. CFFitST was evaluated in the “NLBSE 2024” tool competition, which focused on multi-class classification of GitHub issues. The competition required robust few-shot learning models to classify 300 issues across five different repositories. Our approach achieved an F1 score of 84.2 %, representing a 2.44 % statistical significant improvement over the SetFit baseline.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103420"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-11-21DOI: 10.1016/j.scico.2025.103416
Michael Hanus
Unintended failures during a computation are painful but frequent during software development. Failures due to external reasons (e.g., missing files, no permissions, etc.) can be caught by exception handlers. Programming failures, such as calling a partially defined operation with unintended arguments, are often not caught due to the assumption that the software is correct. This paper presents an approach to verify such assumptions. For this purpose, non-failure conditions for operations are inferred and then checked in all uses of partially defined operations. In the positive case, the absence of such failures is ensured. In the negative case, the programmer could adapt the program to handle possibly failing situations and check the program again. Our method is fully automatic and can be applied to larger declarative programs. The results of an implementation for functional logic Curry programs are presented.
{"title":"Inferring non-failure conditions for declarative programs","authors":"Michael Hanus","doi":"10.1016/j.scico.2025.103416","DOIUrl":"10.1016/j.scico.2025.103416","url":null,"abstract":"<div><div>Unintended failures during a computation are painful but frequent during software development. Failures due to external reasons (e.g., missing files, no permissions, etc.) can be caught by exception handlers. Programming failures, such as calling a partially defined operation with unintended arguments, are often not caught due to the assumption that the software is correct. This paper presents an approach to verify such assumptions. For this purpose, non-failure conditions for operations are inferred and then checked in all uses of partially defined operations. In the positive case, the absence of such failures is ensured. In the negative case, the programmer could adapt the program to handle possibly failing situations and check the program again. Our method is fully automatic and can be applied to larger declarative programs. The results of an implementation for functional logic Curry programs are presented.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103416"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-11-15DOI: 10.1016/j.scico.2025.103397
Oleg Kiselyov
MetaOCaml is a superset of OCaml for convenient code generation with static guarantees: the generated code is well-formed, well-typed and well-scoped, by construction. Not only the produced code always compiles; code fragments with a variable escaping its scope are detected already during code generation. MetaOCaml has been employed for compiling domain-specific languages, generic programming, automating tedious specializations in high-performance computing, generating efficient computational kernels and embedded programming. It is used in education, and served as inspiration for several other metaprogramming systems.
Most well-known in MetaOCaml are the types for values representing generated code and the template-based mechanism to produce such values, a.k.a., brackets and escapes. MetaOCaml also features cross-stage persistence, generating ordinary and mutually-recursive definitions, first-class pattern-matching and heterogeneous metaprogramming.
The extant implementation of MetaOCaml, first presented at FLOPS 2014, has been continuously evolving. We describe the current design and implementation, stressing particularly notable additions. Among them is a new and efficient translation from typed code templates to code combinators. Scope extrusion detection unexpectedly brought let- insertion, and a conclusive solution to the 20-year–old vexing problem of cross-stage persistence.
{"title":"MetaOCaml: ten years later – System description","authors":"Oleg Kiselyov","doi":"10.1016/j.scico.2025.103397","DOIUrl":"10.1016/j.scico.2025.103397","url":null,"abstract":"<div><div>MetaOCaml is a superset of OCaml for convenient code generation with static guarantees: the generated code is well-formed, well-typed and well-scoped, by construction. Not only the produced code always compiles; code fragments with a variable escaping its scope are detected already during code generation. MetaOCaml has been employed for compiling domain-specific languages, generic programming, automating tedious specializations in high-performance computing, generating efficient computational kernels and embedded programming. It is used in education, and served as inspiration for several other metaprogramming systems.</div><div>Most well-known in MetaOCaml are the types for values representing generated code and the template-based mechanism to produce such values, a.k.a., brackets and escapes. MetaOCaml also features cross-stage persistence, generating ordinary and mutually-recursive definitions, first-class pattern-matching and heterogeneous metaprogramming.</div><div>The extant implementation of MetaOCaml, first presented at FLOPS 2014, has been continuously evolving. We describe the current design and implementation, stressing particularly notable additions. Among them is a new and efficient translation from typed code templates to code combinators. Scope extrusion detection unexpectedly brought let- insertion, and a conclusive solution to the 20-year–old vexing problem of cross-stage persistence.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103397"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-10-25DOI: 10.1016/j.scico.2025.103398
Arwa Hameed Alsubhi, Ornela Dardha, Simon J. Gay
This paper introduces Coconut, a C++ tool that uses templates for defining object behaviours and validates them with typestate checking. Coconut employs the GIMPLE intermediate representation (IR) from the GCC compiler’s middle-end phase for static checks, ensuring objects follow valid state transitions as defined in typestate templates. It supports features like branching, recursion, aliasing, inheritance, and typestate visualisation. We illustrate Coconut’s application in embedded systems, validating their behaviour pre-deployment. We present an experimental study, showing that Coconut improves performance and reduces code complexity wrt the original code, highlighting the benefits of typestate-based verification.
{"title":"Design and Evaluation of Coconut: Typestates for C++","authors":"Arwa Hameed Alsubhi, Ornela Dardha, Simon J. Gay","doi":"10.1016/j.scico.2025.103398","DOIUrl":"10.1016/j.scico.2025.103398","url":null,"abstract":"<div><div>This paper introduces Coconut, a C++ tool that uses templates for defining object behaviours and validates them with typestate checking. Coconut employs the GIMPLE intermediate representation (IR) from the GCC compiler’s middle-end phase for static checks, ensuring objects follow valid state transitions as defined in typestate templates. It supports features like branching, recursion, aliasing, inheritance, and typestate visualisation. We illustrate Coconut’s application in embedded systems, validating their behaviour pre-deployment. We present an experimental study, showing that Coconut improves performance and reduces code complexity wrt the original code, highlighting the benefits of typestate-based verification.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103398"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145747796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-11-22DOI: 10.1016/j.scico.2025.103419
Bin Hu , Lizhi Zheng , Dongjin Yu , Yijian Wu , Jie Chen , Tianyi Hu
Code clones have been a hot topic in software engineering for decades. Due to the rapid development of clone detection techniques, it is not difficult to find code clones in software systems, while managing the vast amounts of clones remains an open problem. Typically, we should adopt refactoring approaches to eliminate clones, thereby mitigating the threat to software maintenance. In some situations, the clone group may contain several different code variants that reside in different locations, thus making refactoring too complicated, as their differences must be analyzed and reconciled before refactoring. Therefore, we should find an approach to recognize clone groups that are easy to refactor or eliminate. In this paper, we first collected large-scale datasets from three different domains and studied the distribution of four different metrics of code clones. We found that the distribution of each metric follows a certain pattern, the number of inner file clone accounts for approximately 50 %, the number of Type3 clone accounts for above 45 %. But we cannot judge the complexity of code clone groups based solely on these metrics. Based on our findings, we propose a classification approach to assist developers to find clone groups that are easy to eliminate by refactoring from those that are hard to refactor. We propose four different clone feature entropy measures based on information entropy theory, including variant entropy, distribution entropy, relation entropy, and syntactic entropy. Then, we calculate fused clone entropy, which is the weighted summation of the above four clone feature entropy. Finally, we use the four types of feature entropy and the fused feature entropy to classify or rank code clone groups. Experiments on three different application domains show that the proposed clone feature entropy can help developers identify clone groups that are easy to eliminate by refactoring. Manual validation also reveals that the complexity of clone groups is not solely dependent on the number of clone instances. This approach provides a new way to manage code clones and offers some useful ideas for future clone maintenance research.
{"title":"Code clone classification based on multi-dimension feature entropy","authors":"Bin Hu , Lizhi Zheng , Dongjin Yu , Yijian Wu , Jie Chen , Tianyi Hu","doi":"10.1016/j.scico.2025.103419","DOIUrl":"10.1016/j.scico.2025.103419","url":null,"abstract":"<div><div>Code clones have been a hot topic in software engineering for decades. Due to the rapid development of clone detection techniques, it is not difficult to find code clones in software systems, while managing the vast amounts of clones remains an open problem. Typically, we should adopt refactoring approaches to eliminate clones, thereby mitigating the threat to software maintenance. In some situations, the clone group may contain several different code variants that reside in different locations, thus making refactoring too complicated, as their differences must be analyzed and reconciled before refactoring. Therefore, we should find an approach to recognize clone groups that are easy to refactor or eliminate. In this paper, we first collected large-scale datasets from three different domains and studied the distribution of four different metrics of code clones. We found that the distribution of each metric follows a certain pattern, the number of inner file clone accounts for approximately 50 %, the number of Type3 clone accounts for above 45 %. But we cannot judge the complexity of code clone groups based solely on these metrics. Based on our findings, we propose a classification approach to assist developers to find clone groups that are easy to eliminate by refactoring from those that are hard to refactor. We propose four different clone feature entropy measures based on information entropy theory, including variant entropy, distribution entropy, relation entropy, and syntactic entropy. Then, we calculate fused clone entropy, which is the weighted summation of the above four clone feature entropy. Finally, we use the four types of feature entropy and the fused feature entropy to classify or rank code clone groups. Experiments on three different application domains show that the proposed clone feature entropy can help developers identify clone groups that are easy to eliminate by refactoring. Manual validation also reveals that the complexity of clone groups is not solely dependent on the number of clone instances. This approach provides a new way to manage code clones and offers some useful ideas for future clone maintenance research.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103419"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-11-14DOI: 10.1016/j.scico.2025.103414
Rafaela Almeida , Sidney Nogueira , Augusto Sampaio
Testing concurrent systems is challenging due to their complex interactions and behaviours, along with the difficulty in reproducing failures. We propose a sound strategy for testing concurrent mobile applications by extracting use cases that capture interleavings of behaviours of existing test cases for individual features. These use cases are then used to create a formal model that is the input for a refinement checking approach to generate test cases that are still sequential but exercise the execution of concurrent features. We introduce a conformance relation, cspioq, which considers quiescent behaviour (absence of output). This relation is based on cspio (which is itself inspired by ioco); cspio does not take quiescence behaviour into account. While ioco as well as cspioco (a denotational semantics for ioco based on CSP) rely on suspension traces, our approach adopts the traces model annotated with a special event to represent quiescence. This allowed us to reuse our previous theory and test case generation strategy for sequential systems in a conservative way. We also analyse the complexity of automatically generating test cases. For implementation efficiency, we optimise the strategy by directly interleaving steps of existing test cases and show that this preserves soundness. Moreover, we provide tool support for every phase of the approach. Finally, we present the results of an empirical evaluation designed to measure the effectiveness of the overall strategy in terms of test coverage and bug detection. The results indicate that our approach yields higher coverage and higher bug detection rates compared to the set of tests originally developed by our industrial partner (Motorola) engineers.
{"title":"Combining sequential feature test cases to generate sound tests for concurrent features","authors":"Rafaela Almeida , Sidney Nogueira , Augusto Sampaio","doi":"10.1016/j.scico.2025.103414","DOIUrl":"10.1016/j.scico.2025.103414","url":null,"abstract":"<div><div>Testing concurrent systems is challenging due to their complex interactions and behaviours, along with the difficulty in reproducing failures. We propose a sound strategy for testing concurrent mobile applications by extracting use cases that capture interleavings of behaviours of existing test cases for individual features. These use cases are then used to create a formal model that is the input for a refinement checking approach to generate test cases that are still sequential but exercise the execution of concurrent features. We introduce a conformance relation, <strong>cspio</strong><sub><strong>q</strong></sub>, which considers quiescent behaviour (absence of output). This relation is based on <strong>cspio</strong> (which is itself inspired by <strong>ioco</strong>); <strong>cspio</strong> does not take quiescence behaviour into account. While <strong>ioco</strong> as well as <strong>cspioco</strong> (a denotational semantics for <strong>ioco</strong> based on CSP) rely on suspension traces, our approach adopts the traces model annotated with a special event to represent quiescence. This allowed us to reuse our previous theory and test case generation strategy for sequential systems in a conservative way. We also analyse the complexity of automatically generating test cases. For implementation efficiency, we optimise the strategy by directly interleaving steps of existing test cases and show that this preserves soundness. Moreover, we provide tool support for every phase of the approach. Finally, we present the results of an empirical evaluation designed to measure the effectiveness of the overall strategy in terms of test coverage and bug detection. The results indicate that our approach yields higher coverage and higher bug detection rates compared to the set of tests originally developed by our industrial partner (Motorola) engineers.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103414"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145580046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-01Epub Date: 2025-11-13DOI: 10.1016/j.scico.2025.103411
Yuzhou Liu , Qi Wang , Shuang Jiang , Runze Wu , Hongxu Tian , Peng Zhang
Context: Many researchers have proposed vulnerability detection methods to enhance software reliability by analyzing the program. However, some vulnerabilities are difficult to be identified only from the source codes, especially the ones related to the execution.
Objectives: To solve this problem, this paper introduces extra binary codes and proposes a novel solution for software vulnerability detection based on the multimodal information fusion.
Methods: The approach treats the source and binary codes as different modalities, and uses two pre-trained models as feature extractors to analyze them separately. Then, we design an attention-based information fusion strategy that taking the information from source codes as the main body while the one from binary codes as the supplement. It could not only capture the correlations among features across different modalities, but also filter the redundancy from the binary codes in the fusion process. In this way, a more comprehensive representation of software is gained and finally taken as the basis for the vulnerability detection.
Results: Our method was comprehensively evaluated on three widely-used datasets in different languages, that is Reveal in C, Devign in C++, and Code_vulnerability_java in Java: (1) For vulnerability detection performance, the Accuracy reached 86.09 %, 84.58 %, and 80.43 % across the three datasets, with F1-scores of 82.87 %, 84.62 %, and 79.58 % respectively; (2) Compared with seven state-of-the-art baseline methods, our approach achieved Accuracy improvements of 2.38 %-3.01 % and F1-score enhancements of 2.32 %-8.47 % across the datasets; (3) Moreover, the ablation experiment shows when combining binary codes with source codes (versus using source codes alone), the Accuracy improved by 6.83 %-13.76 % and F1-score increased by 5.36 %-9.86 %, demonstrating the significant performance gains from multimodal data integration.
Conclusion: The results show that our approach can achieve good performance for the task of software vulnerability detection. Meanwhile, ablation experiments confirm the contributions of binary codes to the detection and indicate the effectiveness of our fusion strategy. We have released the codes and datasets (https://github.com/Wangqxn/Vul-detection) to facilitate follow-up research.
背景:许多研究者提出了漏洞检测方法,通过分析程序来提高软件的可靠性。然而,有些漏洞很难仅从源代码中识别出来,特别是与执行相关的漏洞。为了解决这一问题,本文引入了额外的二进制码,提出了一种基于多模态信息融合的软件漏洞检测新方案。方法:该方法将源码和二进制码视为不同的模态,并使用两个预训练模型作为特征提取器分别对其进行分析。然后,我们设计了一种以源代码信息为主体,以二进制码信息为补充的基于注意力的信息融合策略。该方法不仅可以捕获不同模态特征之间的相关性,而且可以在融合过程中过滤掉二进制码中的冗余。这样可以获得更全面的软件表征,并最终作为漏洞检测的依据。结果:我们的方法在不同语言的3个广泛使用的数据集(Reveal in C、design in c++和Code_vulnerability_java)上进行了综合评价:(1)在漏洞检测性能上,3个数据集的准确率分别达到86.09%、84.58%和80.43%,f1得分分别为82.87%、84.62%和79.58%;(2)与7种最先进的基线方法相比,我们的方法在数据集上的准确率提高了2.38% - 3.01%,f1分数提高了2.32% - 8.47%;(3)此外,烧蚀实验表明,当二进制码与源代码结合使用时(与单独使用源代码相比),准确率提高了6.83% ~ 13.76%,f1分数提高了5.36% ~ 9.86%,显示了多模态数据集成带来的显著性能提升。结论:该方法能够较好地完成软件漏洞检测任务。同时,烧蚀实验证实了二进制码对检测的贡献,表明了我们的融合策略的有效性。我们已经发布了代码和数据集(https://github.com/Wangqxn/Vul-detection),以方便后续研究。
{"title":"Multimodal information fusion for software vulnerability detection based on both source and binary codes","authors":"Yuzhou Liu , Qi Wang , Shuang Jiang , Runze Wu , Hongxu Tian , Peng Zhang","doi":"10.1016/j.scico.2025.103411","DOIUrl":"10.1016/j.scico.2025.103411","url":null,"abstract":"<div><div>Context: Many researchers have proposed vulnerability detection methods to enhance software reliability by analyzing the program. However, some vulnerabilities are difficult to be identified only from the source codes, especially the ones related to the execution.</div><div>Objectives: To solve this problem, this paper introduces extra binary codes and proposes a novel solution for software vulnerability detection based on the multimodal information fusion.</div><div>Methods: The approach treats the source and binary codes as different modalities, and uses two pre-trained models as feature extractors to analyze them separately. Then, we design an attention-based information fusion strategy that taking the information from source codes as the main body while the one from binary codes as the supplement. It could not only capture the correlations among features across different modalities, but also filter the redundancy from the binary codes in the fusion process. In this way, a more comprehensive representation of software is gained and finally taken as the basis for the vulnerability detection.</div><div>Results: Our method was comprehensively evaluated on three widely-used datasets in different languages, that is Reveal in C, Devign in C++, and Code_vulnerability_java in Java: (1) For vulnerability detection performance, the Accuracy reached 86.09 %, 84.58 %, and 80.43 % across the three datasets, with F1-scores of 82.87 %, 84.62 %, and 79.58 % respectively; (2) Compared with seven state-of-the-art baseline methods, our approach achieved Accuracy improvements of 2.38 %-3.01 % and F1-score enhancements of 2.32 %-8.47 % across the datasets; (3) Moreover, the ablation experiment shows when combining binary codes with source codes (versus using source codes alone), the Accuracy improved by 6.83 %-13.76 % and F1-score increased by 5.36 %-9.86 %, demonstrating the significant performance gains from multimodal data integration.</div><div>Conclusion: The results show that our approach can achieve good performance for the task of software vulnerability detection. Meanwhile, ablation experiments confirm the contributions of binary codes to the detection and indicate the effectiveness of our fusion strategy. We have released the codes and datasets <span><span>(https://github.com/Wangqxn/Vul-detection)</span><svg><path></path></svg></span> to facilitate follow-up research.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"250 ","pages":"Article 103411"},"PeriodicalIF":1.4,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}