Pub Date : 2025-06-24DOI: 10.1007/s10515-025-00533-5
Shristi Shrestha, Anas Mahmoud
Mobile app users commonly rely on app store ratings and reviews to find apps that suit their needs. However, the sheer volume of reviews available on app stores can lead to information overload, thus impeding users’ ability to make informed app selection decisions. To overcome this limitation, in this paper, we leverage Large Language Models (LLMs) to summarize mobile app reviews. In particular, we use the Chain of Density (CoD) prompt to guide OpenAI GPT-4 to generate abstractive, semantically dense, and readable summaries of mobile app reviews. The CoD prompt is engineered to iteratively extract salient entities from the source text and fuse them into a fixed-length summary. We evaluate the performance of our approach using a large dataset of mobile app reviews. We further conduct an empirical evaluation with 48 study participants to assess the readability of the generated CoD summaries. Our results show that an altered CoD prompt can correctly identify the main themes in user reviews and consolidate them into a natural language summary that is intended for end-user consumption. The prompt also manages to maintain the readability of the generated summaries while increasing their density. Our work in this paper aims to substantially improve mobile app users’ experience by providing an effective mechanism for summarizing important user feedback in the review stream.
{"title":"Mobile application review summarization using chain of density prompting","authors":"Shristi Shrestha, Anas Mahmoud","doi":"10.1007/s10515-025-00533-5","DOIUrl":"10.1007/s10515-025-00533-5","url":null,"abstract":"<div><p>Mobile app users commonly rely on app store ratings and reviews to find apps that suit their needs. However, the sheer volume of reviews available on app stores can lead to information overload, thus impeding users’ ability to make informed app selection decisions. To overcome this limitation, in this paper, we leverage Large Language Models (LLMs) to summarize mobile app reviews. In particular, we use the Chain of Density (CoD) prompt to guide OpenAI GPT-4 to generate abstractive, semantically dense, and readable summaries of mobile app reviews. The CoD prompt is engineered to iteratively extract salient entities from the source text and fuse them into a fixed-length summary. We evaluate the performance of our approach using a large dataset of mobile app reviews. We further conduct an empirical evaluation with 48 study participants to assess the readability of the generated CoD summaries. Our results show that an altered CoD prompt can correctly identify the main themes in user reviews and consolidate them into a natural language summary that is intended for end-user consumption. The prompt also manages to maintain the readability of the generated summaries while increasing their density. Our work in this paper aims to substantially improve mobile app users’ experience by providing an effective mechanism for summarizing important user feedback in the review stream.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-14DOI: 10.1007/s10515-025-00532-6
Meng Wang, Xiao Han, Hong Zhang, Yiran Guo, Jiangfan Guo
Deep learning has become prominent in source code vulnerability detection due to its ability to automatically extract complex feature representations from code, eliminating the need for manually defined rules or patterns. Some methods treat code as text sequences, however, they often overlook its inherent structural information. In contrast, graph-based approaches effectively capture structural relationships, but the sparseness and inconsistency of structures may lead to uneven feature vector extraction, which means that the model may not be able to adequately characterize important nodes or paths. To address this issue, we propose an approach called Dual-channel Graph Neural Network combining Graph properties and Random walks (DC-GAR). This approach integrates graph properties and random walks within a dual-channel graph neural network framework to enhance vulnerability detection. Specifically, graph properties capture global semantic features, while random walks provide context-dependent node structure information. The combination of these features is then leveraged by the dual-channel graph neural network for detection and classification. We have implemented DC-GAR and evaluated it on a dataset of 29,514 functions. Experimental results demonstrate that DC-GAR surpasses state-of-the-art vulnerability detectors, including FlawFinder, SySeVR, Devign, VulCNN, AMPLE, HardVD, CodeBERT, and GraphCodeBERT in terms of accuracy and F1-Score. Moreover, DC-GAR has proven effective and practical in real-world open-source projects.
{"title":"DC-GAR: detecting vulnerabilities by utilizing graph properties and random walks to uncover richer features","authors":"Meng Wang, Xiao Han, Hong Zhang, Yiran Guo, Jiangfan Guo","doi":"10.1007/s10515-025-00532-6","DOIUrl":"10.1007/s10515-025-00532-6","url":null,"abstract":"<div><p>Deep learning has become prominent in source code vulnerability detection due to its ability to automatically extract complex feature representations from code, eliminating the need for manually defined rules or patterns. Some methods treat code as text sequences, however, they often overlook its inherent structural information. In contrast, graph-based approaches effectively capture structural relationships, but the sparseness and inconsistency of structures may lead to uneven feature vector extraction, which means that the model may not be able to adequately characterize important nodes or paths. To address this issue, we propose an approach called <b>D</b>ual-<b>c</b>hannel Graph Neural Network combining <b>G</b>raph properties <b>a</b>nd <b>R</b>andom walks (<b>DC-GAR</b>). This approach integrates graph properties and random walks within a dual-channel graph neural network framework to enhance vulnerability detection. Specifically, graph properties capture global semantic features, while random walks provide context-dependent node structure information. The combination of these features is then leveraged by the dual-channel graph neural network for detection and classification. We have implemented DC-GAR and evaluated it on a dataset of 29,514 functions. Experimental results demonstrate that DC-GAR surpasses state-of-the-art vulnerability detectors, including <i>FlawFinder</i>, <i>SySeVR</i>, <i>Devign</i>, <i>VulCNN</i>, <i>AMPLE</i>, <i>HardVD</i>, <i>CodeBERT</i>, and <i>GraphCodeBERT</i> in terms of accuracy and F1-Score. Moreover, DC-GAR has proven effective and practical in real-world open-source projects.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-11DOI: 10.1007/s10515-025-00511-x
Shuo Li, Li’ao Zheng, Ru Yang, Zhijun Ding
Linear temporal logic (LTL) model checking faces a significant challenge known as the state-explosion problem. The on-the-fly method is a solution that constructs and checks the state space simultaneously, avoiding generating all states in advance. But it is not effective for concurrent interleaving. Unfolding based on Petri nets is a succinct structure covering all states that can mitigate this problem caused by concurrency. Many state-of-the-art methods optimally explore a complete unfolding structure using a tree-like structure. However, it is difficult to apply such a tree-like structure directly to the traditional on-the-fly method of LTL. At the same time, constructing a complete unfolding structure in advance and then checking LTL is also wasteful. Thus, the existing optimal exploration methods are not applicable to the on-the-fly unfolding. To solve these challenges, we propose an LTL model-checking method called on-the-fly unfolding with optimal exploration. This method is based on program dependence net (PDNet) proposed in the previous work. Firstly, we define conflict transitions of PDNet and an exploration tree with a novel notion of delayed transitions, which differs from the existing tree-like structure. The tree improves the on-the-fly unfolding by exploring each partial-order run only once and avoiding enumerating all possible combinations. Then, we propose an on-the-fly unfolding algorithm that simultaneously constructs the exploration tree and generates the unfolding structure while checking LTL. We implement a tool for verifying LTL properties of concurrent programs. It also improves traditional unfolding generations and performs better than SPIN and DiVine on the used benchmarks. The core contribution of this paper is that we propose an on-the-fly unfolding with an optimal exploration method for LTL. It avoids the complete enumeration of concurrent combinations from traditional unfolding generation.
{"title":"On-the-fly unfolding with optimal exploration for linear temporal logic model checking of concurrent software and systems","authors":"Shuo Li, Li’ao Zheng, Ru Yang, Zhijun Ding","doi":"10.1007/s10515-025-00511-x","DOIUrl":"10.1007/s10515-025-00511-x","url":null,"abstract":"<div><p>Linear temporal logic (LTL) model checking faces a significant challenge known as the state-explosion problem. The on-the-fly method is a solution that constructs and checks the state space simultaneously, avoiding generating all states in advance. But it is not effective for concurrent interleaving. Unfolding based on Petri nets is a succinct structure covering all states that can mitigate this problem caused by concurrency. Many state-of-the-art methods optimally explore a complete unfolding structure using a tree-like structure. However, it is difficult to apply such a tree-like structure directly to the traditional on-the-fly method of LTL. At the same time, constructing a complete unfolding structure in advance and then checking LTL is also wasteful. Thus, the existing optimal exploration methods are not applicable to the on-the-fly unfolding. To solve these challenges, we propose an LTL model-checking method called on-the-fly unfolding with optimal exploration. This method is based on program dependence net (PDNet) proposed in the previous work. Firstly, we define conflict transitions of PDNet and an exploration tree with a novel notion of delayed transitions, which differs from the existing tree-like structure. The tree improves the on-the-fly unfolding by exploring each partial-order run only once and avoiding enumerating all possible combinations. Then, we propose an on-the-fly unfolding algorithm that simultaneously constructs the exploration tree and generates the unfolding structure while checking LTL. We implement a tool for verifying LTL properties of concurrent programs. It also improves traditional unfolding generations and performs better than <i>SPIN</i> and <i>DiVine</i> on the used benchmarks. The core contribution of this paper is that we propose an on-the-fly unfolding with an optimal exploration method for LTL. It avoids the complete enumeration of concurrent combinations from traditional unfolding generation.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-07DOI: 10.1007/s10515-025-00529-1
Jiayi Wang, Ping Yu, Yi Qin, Yanyan Jiang, Yuan Yao, Xiaoxing Ma
Symbolic execution is a powerful technique for automated test case generation, ensuring comprehensive coverage of potential scenarios. However, it often struggles with complex, deep paths due to path explosion. Conversely, large language models (LLMs) utilize vast training data to generate test cases that can uncover intricate program behaviors that symbolic execution might miss. Despite their complementary strengths, integrating the systematic nature of symbolic execution with the creative capabilities of LLMs presents a significant challenge. We introduce NexuSym, an innovative tool that integrates symbolic execution with LLMs to facilitate the automatic generation of test cases. To effectively bridge the gap between these two approaches, we have developed a test case reducer, which normalizes the LLM-generated test cases to make them compatible with symbolic execution. Additionally, we propose a search space summarizer, which abstracts and condenses the search space explored by symbolic execution, enabling the LLM to focus on the most promising areas for further exploration. We instantiated NexuSym on KLEE and ChatGPT. Our evaluation of NexuSym involved 99 coreutils programs and 9 large GNU programs. The experimental results demonstrate that NexuSym significantly enhances program test coverage, with improvements of up to 20% in certain cases. Furthermore, we conducted an analysis of the monetary costs associated with using the LLM API, revealing that NexuSym is a highly cost-effective solution.
{"title":"NexuSym: Marrying symbolic path finders with large language models","authors":"Jiayi Wang, Ping Yu, Yi Qin, Yanyan Jiang, Yuan Yao, Xiaoxing Ma","doi":"10.1007/s10515-025-00529-1","DOIUrl":"10.1007/s10515-025-00529-1","url":null,"abstract":"<div><p>Symbolic execution is a powerful technique for automated test case generation, ensuring comprehensive coverage of potential scenarios. However, it often struggles with complex, deep paths due to path explosion. Conversely, large language models (LLMs) utilize vast training data to generate test cases that can uncover intricate program behaviors that symbolic execution might miss. Despite their complementary strengths, integrating the systematic nature of symbolic execution with the creative capabilities of LLMs presents a significant challenge. We introduce <span>NexuSym</span>, an innovative tool that integrates symbolic execution with LLMs to facilitate the automatic generation of test cases. To effectively bridge the gap between these two approaches, we have developed a test case reducer, which normalizes the LLM-generated test cases to make them compatible with symbolic execution. Additionally, we propose a search space summarizer, which abstracts and condenses the search space explored by symbolic execution, enabling the LLM to focus on the most promising areas for further exploration. We instantiated <span>NexuSym</span> on KLEE and ChatGPT. Our evaluation of <span>NexuSym</span> involved 99 coreutils programs and 9 large GNU programs. The experimental results demonstrate that <span>NexuSym</span> significantly enhances program test coverage, with improvements of up to 20% in certain cases. Furthermore, we conducted an analysis of the monetary costs associated with using the LLM API, revealing that <span>NexuSym</span> is a highly cost-effective solution.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03DOI: 10.1007/s10515-025-00527-3
Xingfang Wu, Heng Li, Foutse Khomh
Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, most existing approaches ignore the timestamps in log data, which can potentially provide fine-grained sequential and temporal information. In this work, we propose a configurable Transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model’s features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information (i.e., sequential, temporal, semantic information) in anomaly detection. The model can attain competitive and consistently stable performance compared to the baselines when presented with log sequences of varying lengths. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection on the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.
{"title":"What information contributes to log-based anomaly detection? Insights from a configurable transformer-based approach","authors":"Xingfang Wu, Heng Li, Foutse Khomh","doi":"10.1007/s10515-025-00527-3","DOIUrl":"10.1007/s10515-025-00527-3","url":null,"abstract":"<div><p>Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, most existing approaches ignore the timestamps in log data, which can potentially provide fine-grained sequential and temporal information. In this work, we propose a configurable Transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model’s features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information (i.e., sequential, temporal, semantic information) in anomaly detection. The model can attain competitive and consistently stable performance compared to the baselines when presented with log sequences of varying lengths. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection on the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
recent years, data-driven approaches have become popular for software vulnerability assessment (SVA). However, these approaches need a large amount of labeled SVA data to construct effective SVA models. This process demands security expertise for accurate labeling, incurring significant costs and introducing potential errors. Therefore, collecting the training datasets for SVA can be a challenging task. To effectively alleviate the SVA data labeling cost, we propose an approach SURF, which makes full use of a limited amount of labeled SVA data combined with a large amount of unlabeled SVA data to train the SVA model via semi-supervised learning. Furthermore, SURF incorporates lexical information (i.e., treat the code as plain text) and structural information (i.e., treat the code as the code property graph) as bimodal inputs for the SVA model training, which can further improve the performance of SURF. Through extensive experiments, we evaluated the effectiveness of SURF on a dataset that contains C/C++ vulnerable functions from real-world software projects. The results show that only by labeling 30% of the SVA data, SURF can reach or even exceed the performance of state-of-the-art SVA baselines (such as DeepCVA and Func), even if these supervised baselines use 100% of the labeled SVA data. Furthermore, SURF can also exceed the performance of the state-of-the-art Positive-unlabeled learning baseline PILOT when both are trained on 30% of the labeled SVA data.
{"title":"Semi-supervised software vulnerability assessment via code lexical and structural information fusion","authors":"Wenlong Pei, Yilin Huang, Xiang Chen, Guilong Lu, Yong Liu, Chao Ni","doi":"10.1007/s10515-025-00526-4","DOIUrl":"10.1007/s10515-025-00526-4","url":null,"abstract":"<div><p>In </p><p>recent years, data-driven approaches have become popular for software vulnerability assessment (SVA). However, these approaches need a large amount of labeled SVA data to construct effective SVA models. This process demands security expertise for accurate labeling, incurring significant costs and introducing potential errors. Therefore, collecting the training datasets for SVA can be a challenging task. To effectively alleviate the SVA data labeling cost, we propose an approach SURF, which makes full use of a limited amount of labeled SVA data combined with a large amount of unlabeled SVA data to train the SVA model via semi-supervised learning. Furthermore, SURF incorporates lexical information (i.e., treat the code as plain text) and structural information (i.e., treat the code as the code property graph) as bimodal inputs for the SVA model training, which can further improve the performance of SURF. Through extensive experiments, we evaluated the effectiveness of SURF on a dataset that contains C/C++ vulnerable functions from real-world software projects. The results show that only by labeling 30% of the SVA data, SURF can reach or even exceed the performance of state-of-the-art SVA baselines (such as DeepCVA and Func), even if these supervised baselines use 100% of the labeled SVA data. Furthermore, SURF can also exceed the performance of the state-of-the-art Positive-unlabeled learning baseline PILOT when both are trained on 30% of the labeled SVA data.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03DOI: 10.1007/s10515-025-00523-7
Ruizhen Gu, José Miguel Rojas, Donghwan Shin
Extended Reality (XR) is an emerging technology spanning diverse application domains and offering immersive user experiences. However, its unique characteristics, such as six degrees of freedom interactions, present significant testing challenges distinct from traditional 2D GUI applications, demanding novel testing techniques to build high-quality XR applications. This paper presents the first systematic mapping study on software testing for XR applications. We selected 34 studies focusing on techniques and empirical approaches in XR software testing for detailed examination. The studies are classified and reviewed to address the current research landscape, test facets, and evaluation methodologies in the XR testing domain. Additionally, we provide a repository summarising the mapping study, including datasets and tools referenced in the selected studies, to support future research and practical applications. Our study highlights open challenges in XR testing and proposes actionable future research directions to address the gaps and advance the field of XR software testing.
{"title":"Software testing for extended reality applications: a systematic mapping study","authors":"Ruizhen Gu, José Miguel Rojas, Donghwan Shin","doi":"10.1007/s10515-025-00523-7","DOIUrl":"10.1007/s10515-025-00523-7","url":null,"abstract":"<div><p>Extended Reality (XR) is an emerging technology spanning diverse application domains and offering immersive user experiences. However, its unique characteristics, such as six degrees of freedom interactions, present significant testing challenges distinct from traditional 2D GUI applications, demanding novel testing techniques to build high-quality XR applications. This paper presents the first systematic mapping study on software testing for XR applications. We selected 34 studies focusing on techniques and empirical approaches in XR software testing for detailed examination. The studies are classified and reviewed to address the current research landscape, test facets, and evaluation methodologies in the XR testing domain. Additionally, we provide a repository summarising the mapping study, including datasets and tools referenced in the selected studies, to support future research and practical applications. Our study highlights open challenges in XR testing and proposes actionable future research directions to address the gaps and advance the field of XR software testing.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00523-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.
{"title":"HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks","authors":"Bangchao Wang, Zhiyuan Zou, Xuanxuan Liang, Huan Jin, Peng Liang","doi":"10.1007/s10515-025-00528-2","DOIUrl":"10.1007/s10515-025-00528-2","url":null,"abstract":"<div><p>Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145171197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-29DOI: 10.1007/s10515-025-00521-9
Manar Mazkatli, David Monschein, Martin Armbruster, Robert Heinrich, Anne Koziolek
The explicit consideration of the software architecture supports system evolution and efficient quality assurance. In particular, Architecture-based Performance Prediction (AbPP) assesses the performance for future scenarios (e.g., alternative workload, design, deployment) without expensive measurements for all such alternatives. However, accurate AbPP requires an up-to-date architectural Performance Model (aPM) that is parameterized over factors impacting the performance (e.g., input data characteristics). Especially in agile development, keeping such a parametric aPM consistent with software artifacts is challenging due to frequent evolutionary, adaptive, and usage-related changes. Existing approaches do not address the impact of all aforementioned changes. Moreover, the extraction of a complete aPM after each impacting change causes unnecessary monitoring overhead and may overwrite previous manual adjustments. In this article, we present the Continuous Integration of architectural Performance Model (CIPM) approach, which automatically updates a parametric aPM after each evolutionary, adaptive, or usage change. To reduce the monitoring overhead, CIPM only calibrates the affected performance parameters (e.g., resource demand) using adaptive monitoring. Moreover, a self-validation process in CIPM validates the accuracy, manages the monitoring to reduce overhead, and recalibrates inaccurate parts. Consequently, CIPM will automatically keep the aPM up-to-date throughout the development and operation, which enables AbPP for a proactive identification of upcoming performance problems and for evaluating alternatives at low costs. We evaluate the applicability of CIPM in terms of accuracy, monitoring overhead, and scalability using six cases (four Java-based open source applications and two industrial Lua-based sensor applications). Regarding accuracy, we observed that CIPM correctly keeps an aPM up-to-date and estimates performance parameters well so that it supports accurate performance predictions. Regarding the monitoring overhead in our experiments, CIPM’s adaptive instrumentation demonstrated a significant reduction in the number of required instrumentation probes, ranging from 12.6 % to 83.3 %, depending on the specific cases evaluated. Finally, we found out that CIPM’s execution time is reasonable and scales well with an increasing number of model elements and monitoring data.
对软件体系结构的明确考虑支持系统演化和有效的质量保证。特别是,基于体系结构的性能预测(AbPP)评估未来场景(例如,可选工作负载、设计、部署)的性能,而无需对所有这些可选方案进行昂贵的度量。然而,准确的AbPP需要一个最新的体系结构性能模型(aPM),该模型参数化了影响性能的因素(例如,输入数据特征)。特别是在敏捷开发中,由于频繁的进化、自适应和与使用相关的更改,保持这样一个参数化aPM与软件工件的一致性是具有挑战性的。现有的方法不能解决上述所有变化的影响。此外,在每次影响更改之后提取完整的aPM会导致不必要的监视开销,并可能覆盖以前的手动调整。在本文中,我们介绍了体系结构性能模型的持续集成(Continuous Integration of architectural Performance Model, CIPM)方法,该方法在每次进化、自适应或使用变化之后自动更新参数aPM。为了减少监视开销,CIPM只使用自适应监视校准受影响的性能参数(例如,资源需求)。此外,CIPM中的自验证过程验证准确性,管理监视以减少开销,并重新校准不准确的部件。因此,在整个开发和操作过程中,CIPM将自动使aPM保持最新状态,这使AbPP能够主动识别即将出现的性能问题,并以低成本评估替代方案。我们使用六个案例(四个基于java的开源应用程序和两个基于lua的工业传感器应用程序)来评估CIPM在准确性、监视开销和可伸缩性方面的适用性。关于准确性,我们观察到CIPM正确地使aPM保持最新状态,并很好地估计性能参数,从而支持准确的性能预测。关于我们实验中的监控开销,CIPM的自适应仪器显示所需仪器探针的数量显著减少,范围从12.6%到83.3%,具体取决于评估的具体情况。最后,我们发现CIPM的执行时间是合理的,并且随着模型元素和监控数据数量的增加而具有良好的可扩展性。
{"title":"Continuous integration of architectural performance models with parametric dependencies – the CIPM approach","authors":"Manar Mazkatli, David Monschein, Martin Armbruster, Robert Heinrich, Anne Koziolek","doi":"10.1007/s10515-025-00521-9","DOIUrl":"10.1007/s10515-025-00521-9","url":null,"abstract":"<p>The explicit consideration of the software architecture supports system evolution and efficient quality assurance. In particular, Architecture-based Performance Prediction (AbPP) assesses the performance for future scenarios (e.g., alternative workload, design, deployment) without expensive measurements for all such alternatives. However, accurate AbPP requires an up-to-date architectural Performance Model (aPM) that is parameterized over factors impacting the performance (e.g., input data characteristics). Especially in agile development, keeping such a parametric aPM consistent with software artifacts is challenging due to frequent evolutionary, adaptive, and usage-related changes. Existing approaches do not address the impact of all aforementioned changes. Moreover, the extraction of a complete aPM after each impacting change causes unnecessary monitoring overhead and may overwrite previous manual adjustments. In this article, we present the Continuous Integration of architectural Performance Model (CIPM) approach, which automatically updates a parametric aPM after each evolutionary, adaptive, or usage change. To reduce the monitoring overhead, CIPM only calibrates the affected performance parameters (e.g., resource demand) using adaptive monitoring. Moreover, a self-validation process in CIPM validates the accuracy, manages the monitoring to reduce overhead, and recalibrates inaccurate parts. Consequently, CIPM will automatically keep the aPM up-to-date throughout the development and operation, which enables AbPP for a proactive identification of upcoming performance problems and for evaluating alternatives at low costs. We evaluate the applicability of CIPM in terms of accuracy, monitoring overhead, and scalability using six cases (four Java-based open source applications and two industrial Lua-based sensor applications). Regarding accuracy, we observed that CIPM correctly keeps an aPM up-to-date and estimates performance parameters well so that it supports accurate performance predictions. Regarding the monitoring overhead in our experiments, CIPM’s adaptive instrumentation demonstrated a significant reduction in the number of required instrumentation probes, ranging from 12.6 % to 83.3 %, depending on the specific cases evaluated. Finally, we found out that CIPM’s execution time is reasonable and scales well with an increasing number of model elements and monitoring data.</p>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00521-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145171381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-26DOI: 10.1007/s10515-025-00516-6
Xiaolu Zhang, Tahmid Rafi, Yuejun Guan, Shuqing Li, Michael R. Lyu
Metaverse is a form of next-generation human–computer interaction and social networks based on virtual and augmented reality. Both the research and industry community have invested much in this area to develop useful applications and enhance user experience. Meanwhile, the expanded human–computer interface which enables the immersive experience in the Metaverse will also inevitably expand the interface of potential privacy leaks. This dilemma between immersive user experience and higher privacy risks has not been well studied and it is not clear how different users would make decisions when facing such a dilemma. In this research work, we systematically studied this dilemma in different usage scenarios of the Metaverse and performed a study on 177 users to understand the factors that may affect users’ decision making. From the study, we found that user preference on immersive experience and privacy protection can be very different in different usage scenarios and we expect our study results can provide some insights and guidance for the design of privacy protection mechanisms in Metaverse platforms and applications.
{"title":"Understanding the privacy-realisticness dilemma of the metaverse","authors":"Xiaolu Zhang, Tahmid Rafi, Yuejun Guan, Shuqing Li, Michael R. Lyu","doi":"10.1007/s10515-025-00516-6","DOIUrl":"10.1007/s10515-025-00516-6","url":null,"abstract":"<div><p>Metaverse is a form of next-generation human–computer interaction and social networks based on virtual and augmented reality. Both the research and industry community have invested much in this area to develop useful applications and enhance user experience. Meanwhile, the expanded human–computer interface which enables the immersive experience in the Metaverse will also inevitably expand the interface of potential privacy leaks. This dilemma between immersive user experience and higher privacy risks has not been well studied and it is not clear how different users would make decisions when facing such a dilemma. In this research work, we systematically studied this dilemma in different usage scenarios of the Metaverse and performed a study on 177 users to understand the factors that may affect users’ decision making. From the study, we found that user preference on immersive experience and privacy protection can be very different in different usage scenarios and we expect our study results can provide some insights and guidance for the design of privacy protection mechanisms in Metaverse platforms and applications.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}