首页 > 最新文献

Journal of Software-Evolution and Process最新文献

英文 中文
UCLP: Unsupervised Classification of Key Aspects in Vulnerability Descriptions Through Label Profile UCLP:通过标签配置文件对漏洞描述中的关键方面进行无监督分类
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-13 DOI: 10.1002/smr.70052
Linyi Han, Hang Li, Xiaowang Zhang, Youmeng Li, Zhiyong Feng

Textual vulnerability descriptions (TVDs) in repositories like NVD and IBM X-Force Exchange are essential for security engineers managing vulnerabilities. Engineers typically search for key aspects in TVDs using specific phrases, but with multiple expressions for each aspect, retrieving all relevant records is challenging. We propose a label-based retrieval framework that classifies key aspects and retrieves TVDs by their broader categories. Given the large data volume, manual labeling is infeasible, making unsupervised classification critical. However, short labels and repeated words diminish semantic clarity, affecting classification accuracy. We introduce Unsupervised Classification through Label Profile (UCLP), which expands label semantics through label profiles inspired by recommendation systems. We construct profiles using neural network weights and apply TF-IDF to calculate similarities, smoothing distributions with an arctangent function. Results show that UCLP significantly outperforms four benchmarks, raising accuracy from 68.3% to 78.9% and improving three real-world applications.

像NVD和IBM X-Force Exchange这样的存储库中的文本漏洞描述(tvd)对于安全工程师管理漏洞至关重要。工程师通常使用特定的短语搜索tvd中的关键方面,但是由于每个方面都有多个表达式,因此检索所有相关记录是具有挑战性的。我们提出了一个基于标签的检索框架,该框架对关键方面进行分类,并根据其更广泛的类别检索tvd。由于数据量大,人工标注是不可行的,这使得无监督分类变得至关重要。然而,短标签和重复词降低了语义清晰度,影响了分类的准确性。我们通过标签概要介绍无监督分类(UCLP),它通过受推荐系统启发的标签概要扩展标签语义。我们使用神经网络权重构建轮廓,并应用TF-IDF计算相似度,使用arctan函数平滑分布。结果表明,UCLP显著优于四个基准,将准确率从68.3%提高到78.9%,并改善了三个实际应用。
{"title":"UCLP: Unsupervised Classification of Key Aspects in Vulnerability Descriptions Through Label Profile","authors":"Linyi Han,&nbsp;Hang Li,&nbsp;Xiaowang Zhang,&nbsp;Youmeng Li,&nbsp;Zhiyong Feng","doi":"10.1002/smr.70052","DOIUrl":"https://doi.org/10.1002/smr.70052","url":null,"abstract":"<div>\u0000 \u0000 <p>Textual vulnerability descriptions (TVDs) in repositories like NVD and IBM X-Force Exchange are essential for security engineers managing vulnerabilities. Engineers typically search for key aspects in TVDs using specific phrases, but with multiple expressions for each aspect, retrieving all relevant records is challenging. We propose a label-based retrieval framework that classifies key aspects and retrieves TVDs by their broader categories. Given the large data volume, manual labeling is infeasible, making unsupervised classification critical. However, short labels and repeated words diminish semantic clarity, affecting classification accuracy. We introduce Unsupervised Classification through Label Profile (UCLP), which expands label semantics through label profiles inspired by recommendation systems. We construct profiles using neural network weights and apply TF-IDF to calculate similarities, smoothing distributions with an arctangent function. Results show that UCLP significantly outperforms four benchmarks, raising accuracy from 68.3% to 78.9% and improving three real-world applications.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145038309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UFR-OSFA: Unified Feature Representation and Oppositional Structure Feature Alignment for Mixed-Project Heterogeneous Defect Prediction UFR-OSFA:混合项目异构缺陷预测的统一特征表示和对立结构特征对齐
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-10 DOI: 10.1002/smr.70049
Yifan Zou, Huiqiang Wang, Hongwu Lv, Shuai Zhao

Heterogeneous defect prediction (HDP) plays a crucial role in software engineering by enabling the early detection of software defects across projects with heterogeneous feature spaces. Recently, some mixed-project HDP (MP-HDP) methods have been proposed, which have demonstrated modest improvements in HDP performance. Nevertheless, existing MP-HDP approaches fail to address feature redundancy and distribution inconsistency simultaneously. To overcome these limitations, this paper proposes a novel MP-HDP approach, UFR-OSFA, based on unified feature representation and oppositional structural feature alignment. Concretely, UFR-OSFA first unifies these features by reducing the distribution differences between source and target projects through matching common features and the Hungarian algorithm based on the Kolmogorov–Smirnov (KS) test. Subsequently, utilizing a generator and two classifiers with oppositional structures, UFR-OSFA separates the features of the source project and clusters those of the target project, addressing the issue of conditional distribution mismatch and enhancing the model's generalization ability in the target project. Extensive experiments on 23 projects from five datasets demonstrate that the proposed approach performs better or comparably to baseline methods.

异质缺陷预测(HDP)在软件工程中起着至关重要的作用,它允许跨具有异质特征空间的项目早期检测软件缺陷。最近,提出了一些混合项目HDP (MP-HDP)方法,这些方法已经证明了HDP性能的适度改善。然而,现有的MP-HDP方法无法同时解决特征冗余和分布不一致的问题。为了克服这些限制,本文提出了一种新的基于统一特征表示和对置结构特征对齐的MP-HDP方法UFR-OSFA。具体来说,UFR-OSFA首先通过匹配共同特征和基于Kolmogorov-Smirnov (KS)检验的匈牙利算法,减少源项目和目标项目之间的分布差异,从而统一这些特征。随后,UFR-OSFA利用一个生成器和两个具有对立结构的分类器,对源项目的特征进行分离,对目标项目的特征进行聚类,解决了条件分布不匹配的问题,增强了模型在目标项目中的泛化能力。来自5个数据集的23个项目的广泛实验表明,所提出的方法比基线方法表现得更好或相当。
{"title":"UFR-OSFA: Unified Feature Representation and Oppositional Structure Feature Alignment for Mixed-Project Heterogeneous Defect Prediction","authors":"Yifan Zou,&nbsp;Huiqiang Wang,&nbsp;Hongwu Lv,&nbsp;Shuai Zhao","doi":"10.1002/smr.70049","DOIUrl":"https://doi.org/10.1002/smr.70049","url":null,"abstract":"<div>\u0000 \u0000 <p>Heterogeneous defect prediction (HDP) plays a crucial role in software engineering by enabling the early detection of software defects across projects with heterogeneous feature spaces. Recently, some mixed-project HDP (MP-HDP) methods have been proposed, which have demonstrated modest improvements in HDP performance. Nevertheless, existing MP-HDP approaches fail to address feature redundancy and distribution inconsistency simultaneously. To overcome these limitations, this paper proposes a novel MP-HDP approach, UFR-OSFA, based on unified feature representation and oppositional structural feature alignment. Concretely, UFR-OSFA first unifies these features by reducing the distribution differences between source and target projects through matching common features and the Hungarian algorithm based on the Kolmogorov–Smirnov (KS) test. Subsequently, utilizing a generator and two classifiers with oppositional structures, UFR-OSFA separates the features of the source project and clusters those of the target project, addressing the issue of conditional distribution mismatch and enhancing the model's generalization ability in the target project. Extensive experiments on 23 projects from five datasets demonstrate that the proposed approach performs better or comparably to baseline methods.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CyberESP: An Integrated Cybersecurity Framework for SMEs CyberESP:中小企业综合网络安全框架
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-09 DOI: 10.1002/smr.70050
Jose A. Calvo-Manzano, Tomás San Feliu, Ángel Herranz, Julio Mariño, Lars-Åke Fredlund, Ana M. Moreno

Cybersecurity is a critical global concern, particularly for small- and medium-sized enterprises (SMEs) with limited resources and expertise. The authors are developing CyberESP, a tailored cybersecurity framework supported by a semi-automated tool to ensure Spanish SMEs' cybersecurity management. Following the Design Science Research (DSR) methodology and grounded in international standards, the authors identified six requirements to be satisfied by a cybersecurity framework for SMEs, which should support the identification of assets, vulnerabilities, threats, and risks. This paper presents the first part of the CyberESP framework dealing with asset management, particularly their identification and analysis of dimensions and cost. A prototype supporting these activities was developed and validated through a case study in a retail SME, showing the solution's potential and identifying particular improvements. The paper also addresses threats to validity and limitations, noting the framework's focus on hardware, software, and networks. Future work includes vulnerability management and will explore the use of cloud and IoT deployment, positioning CyberESP as a practical solution to enhance SMEs' cybersecurity resilience.

网络安全是一个重要的全球问题,特别是对于资源和专业知识有限的中小型企业(SMEs)。作者正在开发CyberESP,这是一种定制的网络安全框架,由半自动工具支持,以确保西班牙中小企业的网络安全管理。遵循设计科学研究(DSR)方法并以国际标准为基础,作者确定了中小企业网络安全框架需要满足的六个要求,该框架应支持资产、漏洞、威胁和风险的识别。本文介绍了处理资产管理的CyberESP框架的第一部分,特别是对维度和成本的识别和分析。通过一个零售中小企业的案例研究,开发并验证了支持这些活动的原型,展示了解决方案的潜力并确定了特定的改进。本文还讨论了有效性和局限性的威胁,注意到框架的重点是硬件、软件和网络。未来的工作包括漏洞管理,并将探索使用云和物联网部署,将CyberESP定位为增强中小企业网络安全弹性的实用解决方案。
{"title":"CyberESP: An Integrated Cybersecurity Framework for SMEs","authors":"Jose A. Calvo-Manzano,&nbsp;Tomás San Feliu,&nbsp;Ángel Herranz,&nbsp;Julio Mariño,&nbsp;Lars-Åke Fredlund,&nbsp;Ana M. Moreno","doi":"10.1002/smr.70050","DOIUrl":"https://doi.org/10.1002/smr.70050","url":null,"abstract":"<p>Cybersecurity is a critical global concern, particularly for small- and medium-sized enterprises (SMEs) with limited resources and expertise. The authors are developing CyberESP, a tailored cybersecurity framework supported by a semi-automated tool to ensure Spanish SMEs' cybersecurity management. Following the Design Science Research (DSR) methodology and grounded in international standards, the authors identified six requirements to be satisfied by a cybersecurity framework for SMEs, which should support the identification of assets, vulnerabilities, threats, and risks. This paper presents the first part of the CyberESP framework dealing with asset management, particularly their identification and analysis of dimensions and cost. A prototype supporting these activities was developed and validated through a case study in a retail SME, showing the solution's potential and identifying particular improvements. The paper also addresses threats to validity and limitations, noting the framework's focus on hardware, software, and networks. Future work includes vulnerability management and will explore the use of cloud and IoT deployment, positioning CyberESP as a practical solution to enhance SMEs' cybersecurity resilience.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145022283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engineering MLOps Pipelines With Data Quality: A Case Study on Tabular Datasets in Kaggle 具有数据质量的工程MLOps管道:Kaggle中表格数据集的案例研究
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-08 DOI: 10.1002/smr.70044
Matteo Pancini, Matteo Camilli, Giovanni Quattrocchi, Damian Andrew Tamburri

Ensuring high-quality data is crucial for the successful deployment of machine learning models, thereby sustaining the operational pipelines around such models. However, a significant number of practitioners do not currently use data quality checks or measurements as gateways for their model construction and operationalization, indicating a need for greater awareness and adoption of these tools. In this study, we propose an automated approach for automating the process of architecting machine learning pipelines by means of (semi-)automated data quality checks. We focus on tabular data as a representative of the most widely used structured data formats in said pipelines. Our work is based on a subset of metrics that are particularly relevant in MLOps pipelines, stemming from our engagement with expert practitioners in machine learning operations (MLOps). We selected Deepchecks, a well-known tool for conducting data quality checks, from a cohort of similar tools to evaluate the quality of datasets collected from Kaggle, a widely used platform for machine learning competitions and data science projects. We also analyze the main features used by Kaggle to rank their datasets and used these features to validate the relevance of our approach. Our approach shows the potential for automated data quality checks to improve the efficiency and effectiveness of MLOps pipelines and their operation, by decreasing the risk of introducing errors and biases into machine learning models in production.

确保高质量的数据对于成功部署机器学习模型至关重要,从而维持围绕这些模型的操作管道。然而,相当数量的从业者目前没有使用数据质量检查或测量作为模型构建和操作化的网关,这表明需要更多地了解和采用这些工具。在这项研究中,我们提出了一种自动化的方法,通过(半)自动化的数据质量检查来自动化构建机器学习管道的过程。我们关注表格数据,将其作为上述管道中最广泛使用的结构化数据格式的代表。我们的工作是基于MLOps管道中特别相关的指标子集,这源于我们与机器学习操作(MLOps)的专家从业人员的合作。我们从一系列类似的工具中选择了Deepchecks,这是一个著名的数据质量检查工具,用于评估从Kaggle收集的数据集的质量,Kaggle是一个广泛用于机器学习竞赛和数据科学项目的平台。我们还分析了Kaggle用来对数据集进行排序的主要特征,并使用这些特征来验证我们方法的相关性。我们的方法显示了自动化数据质量检查的潜力,通过降低在生产中引入错误和偏差的机器学习模型的风险,可以提高MLOps管道及其操作的效率和有效性。
{"title":"Engineering MLOps Pipelines With Data Quality: A Case Study on Tabular Datasets in Kaggle","authors":"Matteo Pancini,&nbsp;Matteo Camilli,&nbsp;Giovanni Quattrocchi,&nbsp;Damian Andrew Tamburri","doi":"10.1002/smr.70044","DOIUrl":"https://doi.org/10.1002/smr.70044","url":null,"abstract":"<p>Ensuring high-quality data is crucial for the successful deployment of machine learning models, thereby sustaining the operational pipelines around such models. However, a significant number of practitioners do not currently use data quality checks or measurements as gateways for their model construction and operationalization, indicating a need for greater awareness and adoption of these tools. In this study, we propose an automated approach for automating the process of architecting machine learning pipelines by means of (semi-)automated data quality checks. We focus on tabular data as a representative of the most widely used structured data formats in said pipelines. Our work is based on a subset of metrics that are particularly relevant in MLOps pipelines, stemming from our engagement with expert practitioners in machine learning operations (MLOps). We selected Deepchecks, a well-known tool for conducting data quality checks, from a cohort of similar tools to evaluate the quality of datasets collected from Kaggle, a widely used platform for machine learning competitions and data science projects. We also analyze the main features used by Kaggle to rank their datasets and used these features to validate the relevance of our approach. Our approach shows the potential for automated data quality checks to improve the efficiency and effectiveness of MLOps pipelines and their operation, by decreasing the risk of introducing errors and biases into machine learning models in production.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Multi-Class Socio-Technical Congruence: Assessing Coordination in Collaborative Software Development Settings 迈向多阶层社会技术一致性:协同软件开发环境下的协调评估
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-08 DOI: 10.1002/smr.70040
Roshan Namal Rajapakse, Claudia Szabo

Effective coordination between contributors with different functional roles is fundamental for the success of collaboration-centric software development paradigms such as DevSecOps. However, quantitatively assessing coordination in such settings has received limited attention. We introduce multi-class socio-technical congruence (MC-STC$$ MChbox{-} STC $$), an extension of the widely studied socio-technical congruence (STC$$ STC $$) framework to address this gap. Our metric enables the assessment of coordination in a setting where contributors with different functional roles or alignments collaborate. Using a large-scale exploratory case study, we evaluated MC-STC$$ MChbox{-} STC $$ for two classes (i.e., 2C-STC$$ 2Chbox{-} STC $$). Specifically, we calculated 2C-STC$$ 2Chbox{-} STC $$ for 100 systematically selected projects from the TravisTorrent dataset, considering developers (dev) and security-focused developers (sf-devs) as the two types of contributors with different functional alignments (i.e., two classes). We hypothesized that the dev and sf-dev interaction would have a quantifiable impact on the vulnerability score (

具有不同功能角色的贡献者之间的有效协调是以协作为中心的软件开发范例(如DevSecOps)成功的基础。但是,对这种情况下的协调进行定量评估的注意有限。我们引入了多阶层的社会技术一致性(M C - S T C) $$ MChbox{-} STC $$ ),是广泛研究的社会技术一致性的延伸 $$ STC $$ )框架来解决这一差距。我们的度量允许在具有不同功能角色或联盟的贡献者协作的环境中评估协调。通过大规模的探索性案例研究,我们评估了C - S - T - C $$ MChbox{-} STC $$ 为两类(即2c - S - T - C) $$ 2Chbox{-} STC $$ ). 具体来说,我们计算了2c - stc $$ 2Chbox{-} STC $$ 从TravisTorrent数据集中系统地选择100个项目,考虑开发人员(dev)和以安全为重点的开发人员(sf-devs)作为两种类型的贡献者,具有不同的功能定位(即两个类)。我们假设开发人员和自开发人员之间的交互会对漏洞评分(vs)产生可量化的影响 $$ VS $$ )。我们的结果表明,2c - S - T - C之间存在适度的负相关 $$ 2Chbox{-} STC $$ 和V S $$ VS $$ , Spearman相关达到− $$ - $$ 0.427 (p = 0。00000624 $$ p=0.00000624 $$ ),这表明开发人员和软件开发人员之间更高层次的协调导致了高严重性漏洞发生率较低的项目。另外,2 C - S - T - C $$ 2Chbox{-} STC $$ 与vs呈较强的负相关 $$ VS $$ 比S T C $$ STC $$ 这表明它是这种关系的更敏感的指标。因此,我们提出的度量的具体实例,2c - S - T - C $$ 2Chbox{-} STC $$ 的表现相对较好 $$ STC $$ 用于衡量我们选定项目中的跨职能协调。然而,进一步的研究需要探索其更广泛的适用性。
{"title":"Towards Multi-Class Socio-Technical Congruence: Assessing Coordination in Collaborative Software Development Settings","authors":"Roshan Namal Rajapakse,&nbsp;Claudia Szabo","doi":"10.1002/smr.70040","DOIUrl":"https://doi.org/10.1002/smr.70040","url":null,"abstract":"<p>Effective coordination between contributors with different functional roles is fundamental for the success of collaboration-centric software development paradigms such as DevSecOps. However, quantitatively assessing coordination in such settings has received limited attention. We introduce multi-class socio-technical congruence (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 <mi>C</mi>\u0000 <mtext>-</mtext>\u0000 <mi>S</mi>\u0000 <mi>T</mi>\u0000 <mi>C</mi>\u0000 </mrow>\u0000 <annotation>$$ MChbox{-} STC $$</annotation>\u0000 </semantics></math>), an extension of the widely studied socio-technical congruence (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 <mi>T</mi>\u0000 <mi>C</mi>\u0000 </mrow>\u0000 <annotation>$$ STC $$</annotation>\u0000 </semantics></math>) framework to address this gap. Our metric enables the assessment of coordination in a setting where contributors with different functional roles or alignments collaborate. Using a large-scale exploratory case study, we evaluated <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 <mi>C</mi>\u0000 <mtext>-</mtext>\u0000 <mi>S</mi>\u0000 <mi>T</mi>\u0000 <mi>C</mi>\u0000 </mrow>\u0000 <annotation>$$ MChbox{-} STC $$</annotation>\u0000 </semantics></math> for two classes (i.e., <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mi>C</mi>\u0000 <mtext>-</mtext>\u0000 <mi>S</mi>\u0000 <mi>T</mi>\u0000 <mi>C</mi>\u0000 </mrow>\u0000 <annotation>$$ 2Chbox{-} STC $$</annotation>\u0000 </semantics></math>). Specifically, we calculated <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mi>C</mi>\u0000 <mtext>-</mtext>\u0000 <mi>S</mi>\u0000 <mi>T</mi>\u0000 <mi>C</mi>\u0000 </mrow>\u0000 <annotation>$$ 2Chbox{-} STC $$</annotation>\u0000 </semantics></math> for 100 systematically selected projects from the <i>TravisTorrent</i> dataset, considering developers (<i>dev</i>) and security-focused developers (<i>sf-devs</i>) as the two types of contributors with different functional alignments (i.e., two classes). We hypothesized that the <i>dev</i> and <i>sf-dev</i> interaction would have a quantifiable impact on the <i>vulnerability score</i> (<span></span><math>\u0000 <semantics>\u0000 <mrow","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70040","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prioritization Method for Crowdsourced Test Report by Integrating Text and Image Information 基于文本和图像信息集成的众包测试报告排序方法
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-04 DOI: 10.1002/smr.70043
Huijie Tu, Xiangjuan Yao, Dunwei Gong, Yan Yang

Crowdsourcing testing has the advantages of efficiency, speed, and reliability, but an excessive number of test reports makes it a challenge for report reviewers to select high-quality test reports in a limited time. Test reports submitted by crowd workers often tend to be short textual descriptions with a large number of screenshots attached. Most traditional processing methods of test reports target reports that only contain text information, which cannot meet the defect detection requirements of crowdsourced test reports. In view of this, this paper proposes a prioritization method of crowdsourced test reports that integrates text and image information. First, we extract the text and image information from the test reports, based on which the defect detection abilities of the test reports are measured and the similarities between test reports are calculated. Then, a multi-stage prioritization method of the test reports is presented based on the defect detection levels and similarities of the test reports. In the first stage, based on the defect detection levels and the similarities, the test report set is sorted and clustered to obtain the sorting results of partial reports and the similar set for each sorted report; in the second stage, the similar test report set is sorted with the criteria of minimizing the similarity and maximizing the defect detection level; the sorting results of the two stages are combined to form the final priorities of test reports. To validate our approach, we conducted experiments on five crowdsourced test datasets. The results and the analysis show that our approach can detect all faults faster in a limited time. By comprehensively utilizing text and image information to prioritize test reports, better sorting results can be obtained than state-of-the-art methods.

众包测试具有效率、速度和可靠性的优点,但是过多的测试报告使得报告审阅者很难在有限的时间内选择出高质量的测试报告。众工提交的测试报告往往是简短的文字描述,并附上大量的截图。传统的测试报告处理方法大多针对仅包含文本信息的报告,无法满足众包测试报告的缺陷检测需求。鉴于此,本文提出了一种融合文本和图像信息的众包测试报告排序方法。首先,我们从测试报告中提取文本和图像信息,在此基础上度量测试报告的缺陷检测能力并计算测试报告之间的相似度。然后,基于测试报告的缺陷检测等级和相似度,提出了测试报告的多阶段优先排序方法。第一阶段,根据缺陷检测等级和相似度,对测试报告集进行排序和聚类,得到部分报告的排序结果和每个排序报告的相似度集;第二阶段,以相似性最小化和缺陷检测等级最大化为准则对相似测试报告集进行排序;将两个阶段的排序结果结合起来,形成最终的测试报告优先级。为了验证我们的方法,我们在五个众包测试数据集上进行了实验。结果和分析表明,该方法可以在有限的时间内更快地检测出所有故障。通过综合利用文本和图像信息对测试报告进行排序,可以获得比现有方法更好的排序结果。
{"title":"Prioritization Method for Crowdsourced Test Report by Integrating Text and Image Information","authors":"Huijie Tu,&nbsp;Xiangjuan Yao,&nbsp;Dunwei Gong,&nbsp;Yan Yang","doi":"10.1002/smr.70043","DOIUrl":"https://doi.org/10.1002/smr.70043","url":null,"abstract":"<div>\u0000 \u0000 <p>Crowdsourcing testing has the advantages of efficiency, speed, and reliability, but an excessive number of test reports makes it a challenge for report reviewers to select high-quality test reports in a limited time. Test reports submitted by crowd workers often tend to be short textual descriptions with a large number of screenshots attached. Most traditional processing methods of test reports target reports that only contain text information, which cannot meet the defect detection requirements of crowdsourced test reports. In view of this, this paper proposes a prioritization method of crowdsourced test reports that integrates text and image information. First, we extract the text and image information from the test reports, based on which the defect detection abilities of the test reports are measured and the similarities between test reports are calculated. Then, a multi-stage prioritization method of the test reports is presented based on the defect detection levels and similarities of the test reports. In the first stage, based on the defect detection levels and the similarities, the test report set is sorted and clustered to obtain the sorting results of partial reports and the similar set for each sorted report; in the second stage, the similar test report set is sorted with the criteria of minimizing the similarity and maximizing the defect detection level; the sorting results of the two stages are combined to form the final priorities of test reports. To validate our approach, we conducted experiments on five crowdsourced test datasets. The results and the analysis show that our approach can detect all faults faster in a limited time. By comprehensively utilizing text and image information to prioritize test reports, better sorting results can be obtained than state-of-the-art methods.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Software Engineering Research for Systems-of-Systems and Software Ecosystems 系统的系统和软件生态系统的软件工程研究进展
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-04 DOI: 10.1002/smr.70046
Rodrigo Santos, Antonia Bertolino, Pablo Antonino, Doo-Hwan Bae

For more than a decade, software engineering for systems-of-systems (SoS) and software ecosystems (SECO) has been largely investigated in order to cope with complexity in software-intensive systems. SoS research addresses several aspects related to software system architecture comprising a set of constituent systems that relate to each other to perform missions. As such, SoS have key characteristics such as operational and managerial independence, distribution, emergent behavior, and evolutionary development. Full interoperability and dynamic architecture become critical challenges in this context. On the hand, SECO research refers to modeling and analysis of a socio-technical network of actors and artifacts formed on top of common technological platforms, in which business factors directly influence software maintenance and evolution. Software sustainability and diversity as well as quality attributes that affect the SECO platform health represent challenges in the field. From the long-running, successful series of the International Workshop on Software Engineering for systems-of-systems and Software Ecosystems (SESoS), co-located with the IEEE/ACM International Conference on Software Engineering (ICSE), we present this special issue on the topics in the Journal of Software: Evolution and Process from SESoS 2023 in Melbourne, Australia. Four articles were accepted and published in this special issue, covering a longitudinal analysis of SoS research, as well as strategic patterns, services, and trust in SECO. These articles provide researchers and practitioners with advances in the state of the art and point out opportunities for further research.

十多年来,为了应对软件密集型系统的复杂性,对系统的系统(SoS)和软件生态系统(SECO)的软件工程进行了大量的研究。SoS研究涉及与软件系统架构相关的几个方面,该架构由一组相互关联以执行任务的组成系统组成。因此,SoS具有运营和管理独立性、分布、紧急行为和进化发展等关键特征。在这种情况下,完全互操作性和动态体系结构成为关键的挑战。另一方面,SECO研究是指在公共技术平台之上形成的参与者和工件的社会技术网络的建模和分析,其中业务因素直接影响软件的维护和发展。影响SECO平台健康的软件可持续性和多样性以及质量属性是该领域的挑战。在与IEEE/ACM软件工程国际会议(ICSE)共同举办的系统的系统和软件生态系统(SESoS)国际软件工程研讨会(SESoS)的长期成功的系列会议中,我们在澳大利亚墨尔本的《软件杂志:SESoS 2023的进化和过程》中提出了这一专题。四篇文章在本期特刊中被接受并发表,内容包括对SoS研究的纵向分析,以及对SECO的战略模式、服务和信任。这些文章为研究人员和实践者提供了最新的技术进展,并指出了进一步研究的机会。
{"title":"Advances in Software Engineering Research for Systems-of-Systems and Software Ecosystems","authors":"Rodrigo Santos,&nbsp;Antonia Bertolino,&nbsp;Pablo Antonino,&nbsp;Doo-Hwan Bae","doi":"10.1002/smr.70046","DOIUrl":"https://doi.org/10.1002/smr.70046","url":null,"abstract":"<p>For more than a decade, software engineering for systems-of-systems (SoS) and software ecosystems (SECO) has been largely investigated in order to cope with complexity in software-intensive systems. SoS research addresses several aspects related to software system architecture comprising a set of constituent systems that relate to each other to perform missions. As such, SoS have key characteristics such as operational and managerial independence, distribution, emergent behavior, and evolutionary development. Full interoperability and dynamic architecture become critical challenges in this context. On the hand, SECO research refers to modeling and analysis of a socio-technical network of actors and artifacts formed on top of common technological platforms, in which business factors directly influence software maintenance and evolution. Software sustainability and diversity as well as quality attributes that affect the SECO platform health represent challenges in the field. From the long-running, successful series of the International Workshop on Software Engineering for systems-of-systems and Software Ecosystems (SESoS), co-located with the IEEE/ACM International Conference on Software Engineering (ICSE), we present this special issue on the topics in the Journal of Software: Evolution and Process from SESoS 2023 in Melbourne, Australia. Four articles were accepted and published in this special issue, covering a longitudinal analysis of SoS research, as well as strategic patterns, services, and trust in SECO. These articles provide researchers and practitioners with advances in the state of the art and point out opportunities for further research.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144935249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Explainable Code Readability Classification With Graph Neural Networks 基于图神经网络的可解释代码可读性分类
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-03 DOI: 10.1002/smr.70048
Qing Mi, Zhiyou Xiao, Yi Zhan, Liyan Tao, Jiahe Zhang

Code readability is of central concern for developers, as a more readable code indicates higher maintainability, reusability, and portability. In recent years, many deep learning–based code readability classification methods have been proposed. Among them, a graph neural network (GNN)–based model has achieved the best performance in the field of code readability classification. However, it is still unclear what aspects of the model's input lead to its decisions, which hinders its practical use in the software industry. To improve the interpretability of existing code readability classification models and identify key code characteristics that drive their readability predictions, we propose an explanation framework with GNN explainers towards transparent and trustworthy code readability classification. First, we propose a simplified Abstract Syntax Tree (AST)–based code representation method, which transforms Java code snippets into ASTs and discards lower-level nodes with limited information. Then, we retrain the state-of-the-art GNN-based model together with our simplified program graphs. Finally, we employ SubgraphX to explain the model's code readability predictions at the subgraph level and visualize the explanation results to further analyze what causes such predictions. The experimental results show that sequential logic, code comments, selection logic, and nested structure are the most influential code characteristics when classifying code snippets as readable or unreadable. Further investigations indicate the model's proficiency in capturing features related to complex logic structures and extensive data flows but point to its limitations in identifying readability issues associated with naming conventions and code formatting. The explainability analysis conducted in this research is the first step towards more transparent and reliable code readability classification. We believe that our findings are useful in providing constructive suggestions for developers to write more readable code and delimitating directions for future model improvement.

代码的可读性是开发人员最关心的问题,因为更易读的代码意味着更高的可维护性、可重用性和可移植性。近年来,人们提出了许多基于深度学习的代码可读性分类方法。其中,基于图神经网络(GNN)的模型在代码可读性分类领域取得了最好的性能。然而,目前还不清楚模型输入的哪些方面导致了它的决策,这阻碍了它在软件行业的实际应用。为了提高现有代码可读性分类模型的可解释性,并识别驱动其可读性预测的关键代码特征,我们提出了一个带有GNN解释器的解释框架,以实现透明和可信赖的代码可读性分类。首先,我们提出了一种简化的基于抽象语法树(AST)的代码表示方法,该方法将Java代码片段转换为AST,并丢弃具有有限信息的低级节点。然后,我们将最先进的基于gnn的模型与简化的程序图一起重新训练。最后,我们使用SubgraphX在子图级别解释模型的代码可读性预测,并将解释结果可视化,以进一步分析导致这些预测的原因。实验结果表明,顺序逻辑、代码注释、选择逻辑和嵌套结构是对代码片段进行可读或不可读分类时影响最大的代码特征。进一步的调查表明,该模型在捕获与复杂逻辑结构和广泛数据流相关的特征方面非常熟练,但指出了它在识别与命名约定和代码格式相关的可读性问题方面的局限性。本研究中进行的可解释性分析是迈向更加透明和可靠的代码可读性分类的第一步。我们相信我们的发现对于为开发人员编写更可读的代码和确定未来模型改进的方向提供建设性的建议是有用的。
{"title":"Towards Explainable Code Readability Classification With Graph Neural Networks","authors":"Qing Mi,&nbsp;Zhiyou Xiao,&nbsp;Yi Zhan,&nbsp;Liyan Tao,&nbsp;Jiahe Zhang","doi":"10.1002/smr.70048","DOIUrl":"https://doi.org/10.1002/smr.70048","url":null,"abstract":"<div>\u0000 \u0000 <p>Code readability is of central concern for developers, as a more readable code indicates higher maintainability, reusability, and portability. In recent years, many deep learning–based code readability classification methods have been proposed. Among them, a graph neural network (GNN)–based model has achieved the best performance in the field of code readability classification. However, it is still unclear what aspects of the model's input lead to its decisions, which hinders its practical use in the software industry. To improve the interpretability of existing code readability classification models and identify key code characteristics that drive their readability predictions, we propose an explanation framework with GNN explainers towards transparent and trustworthy code readability classification. First, we propose a simplified Abstract Syntax Tree (AST)–based code representation method, which transforms Java code snippets into ASTs and discards lower-level nodes with limited information. Then, we retrain the state-of-the-art GNN-based model together with our simplified program graphs. Finally, we employ SubgraphX to explain the model's code readability predictions at the subgraph level and visualize the explanation results to further analyze what causes such predictions. The experimental results show that sequential logic, code comments, selection logic, and nested structure are the most influential code characteristics when classifying code snippets as readable or unreadable. Further investigations indicate the model's proficiency in capturing features related to complex logic structures and extensive data flows but point to its limitations in identifying readability issues associated with naming conventions and code formatting. The explainability analysis conducted in this research is the first step towards more transparent and reliable code readability classification. We believe that our findings are useful in providing constructive suggestions for developers to write more readable code and delimitating directions for future model improvement.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Securing Software Development Through People Maturity: A Fuzzy-AHP Decision-Making Framework 通过人员成熟度保护软件开发:一个模糊层次分析法决策框架
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-09-01 DOI: 10.1002/smr.70045
Rafiq Ahmad Khan, Hussein A. Al Hashimi, Hathal S. Alwageed, Ismail Keshta, Alaa Omran Almagrabi, Sarra Ayouni

The increasing complexity of software development processes has heightened the need for robust security measures. Although technical safeguards are essential, the role of human factors in securing software development remains underexplored. This paper presents a novel approach that integrates people's maturity with a fuzzy analytic hierarchy process (Fuzzy-AHP) decision-making framework to enhance the security in software development. The framework provides a systematic method for evaluating and prioritizing human factors that influence an organization's security posture, such as team-expertized communication and adherence to security protocols. Using the decision-making model allows the project managers and stakeholders to determine the appropriate areas for improvement and develop the right strategies and actions to nurture a secure and mature development culture. The paper identifies 24 human success factors (HSFs) and human security vulnerabilities (HSVs) and 38 practices for addressing these HSFs and HSVs through systematic literature review (SLR) and empirical survey. Furthermore, we discuss the local and global ranks of each HSF and HSV practice and categorize the identified practices into nine categories to determine the ranks and weight of each category. Based on collected data, Fuzzy-AHP prioritized these practices; the category “C4: Skill development and stakeholder engagement” is ranked highest at rank-1 and possesses the most significant weight of 0.12435. Similarly, the highest global weight is 0.051506, and the global ranked (rank-1) HSF and HSV practice is “P15: Hands-on practice and stakeholder communication.” The proposed approach complements existing technical methods by addressing the human element of security, making it adaptable to diverse organizational environments. Through this integration of people maturity and Fuzzy-AHP, the paper contributes a new dimension to securing software development, emphasizing the critical role of human factors in achieving comprehensive security.

软件开发过程的日益复杂已经提高了对健壮的安全措施的需求。尽管技术保障是必不可少的,但是在确保软件开发安全方面,人为因素的作用仍然没有得到充分的探讨。本文提出了一种将人的成熟度与模糊层次分析法(fuzzy - ahp)决策框架相结合的方法来提高软件开发的安全性。该框架提供了一种系统的方法,用于评估和确定影响组织安全状态的人为因素的优先级,例如团队专家的通信和对安全协议的遵守。使用决策模型允许项目经理和涉众确定适当的改进领域,并制定正确的策略和行动,以培育安全和成熟的开发文化。本文通过系统文献综述和实证调查,确定了24个人类成功因素(hsf)和人类安全漏洞(hsv),以及38个解决这些hsf和hsv的实践。此外,我们讨论了每个HSF和HSV实践的本地和全球排名,并将确定的实践分为9个类别,以确定每个类别的排名和权重。基于收集到的数据,模糊层次分析法对这些实践进行了优先级排序;类别“C4:技能发展和利益相关者参与”在第1位排名最高,拥有最显著的权重为0.12435。同样,全球最高权重为0.051506,全球排名(排名1)的HSF和HSV实践为“P15:动手实践和利益相关者沟通”。建议的方法通过解决安全的人为因素来补充现有的技术方法,使其适应不同的组织环境。通过人的成熟度和模糊层次分析法的结合,本文为软件开发的安全提供了一个新的维度,强调了人的因素在实现全面安全中的关键作用。
{"title":"Securing Software Development Through People Maturity: A Fuzzy-AHP Decision-Making Framework","authors":"Rafiq Ahmad Khan,&nbsp;Hussein A. Al Hashimi,&nbsp;Hathal S. Alwageed,&nbsp;Ismail Keshta,&nbsp;Alaa Omran Almagrabi,&nbsp;Sarra Ayouni","doi":"10.1002/smr.70045","DOIUrl":"https://doi.org/10.1002/smr.70045","url":null,"abstract":"<div>\u0000 \u0000 <p>The increasing complexity of software development processes has heightened the need for robust security measures. Although technical safeguards are essential, the role of human factors in securing software development remains underexplored. This paper presents a novel approach that integrates people's maturity with a fuzzy analytic hierarchy process (Fuzzy-AHP) decision-making framework to enhance the security in software development. The framework provides a systematic method for evaluating and prioritizing human factors that influence an organization's security posture, such as team-expertized communication and adherence to security protocols. Using the decision-making model allows the project managers and stakeholders to determine the appropriate areas for improvement and develop the right strategies and actions to nurture a secure and mature development culture. The paper identifies 24 human success factors (HSFs) and human security vulnerabilities (HSVs) and 38 practices for addressing these HSFs and HSVs through systematic literature review (SLR) and empirical survey. Furthermore, we discuss the local and global ranks of each HSF and HSV practice and categorize the identified practices into nine categories to determine the ranks and weight of each category. Based on collected data, Fuzzy-AHP prioritized these practices; the category “C4: Skill development and stakeholder engagement” is ranked highest at rank-1 and possesses the most significant weight of 0.12435. Similarly, the highest global weight is 0.051506, and the global ranked (rank-1) HSF and HSV practice is “P15: Hands-on practice and stakeholder communication.” The proposed approach complements existing technical methods by addressing the human element of security, making it adaptable to diverse organizational environments. Through this integration of people maturity and Fuzzy-AHP, the paper contributes a new dimension to securing software development, emphasizing the critical role of human factors in achieving comprehensive security.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144927289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Semiautomated Approach for Detecting Ambiguities in Software Requirements Using SpanBERT and Named Entity Recognition 使用SpanBERT和命名实体识别的半自动化软件需求歧义检测方法
IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-08-10 DOI: 10.1002/smr.70041
Fiza Talha, Touseef Tahir, Talha Nadeem

Ambiguous user requirements present a challenge in software requirement engineering. A manual approach to handling ambiguity is time-consuming. Software requirements are essential inputs to software development processes, including architecture and design, implementation, and testing. Requirement ambiguities lead to project cost overruns, delays in project delivery, and poor software product quality. Timely identification and correction of ambiguity can result in better software systems that meet product objectives and satisfy the needs of all stakeholders. This study explores various natural language processing techniques and SpanBERT (a variant of BERT). This research proposes a semiautomated approach for detecting anaphoric, coordination, and missing condition ambiguities in functional requirements. The proposed approach is validated on a new, original dataset containing 425 functional requirements from 16 domains. The ambiguities identified through our approach are compared with those detected manually and by ChatGPT. Our approach outperforms ChatGPT in detecting ambiguities. The proposed approach will aid project managers and requirement engineers in identifying ambiguities in requirement specifications, thereby helping to reduce cost overruns and delays in the software development process caused by requirement ambiguities.

模糊的用户需求是软件需求工程中的一个挑战。手动处理歧义的方法非常耗时。软件需求是软件开发过程的基本输入,包括架构和设计、实现和测试。需求不明确会导致项目成本超支、项目交付延迟以及软件产品质量差。及时识别和纠正歧义可以产生更好的软件系统,以满足产品目标并满足所有涉众的需求。本研究探讨了各种自然语言处理技术和SpanBERT (BERT的一种变体)。本研究提出了一种半自动化的方法来检测功能需求中的回指、协调和缺失条件歧义。该方法在包含来自16个领域的425个功能需求的新原始数据集上进行了验证。将通过我们的方法识别的歧义与手工和ChatGPT检测到的歧义进行比较。我们的方法在检测歧义方面优于ChatGPT。所建议的方法将帮助项目经理和需求工程师识别需求规范中的模糊性,从而帮助减少由需求模糊性引起的软件开发过程中的成本超支和延迟。
{"title":"A Semiautomated Approach for Detecting Ambiguities in Software Requirements Using SpanBERT and Named Entity Recognition","authors":"Fiza Talha,&nbsp;Touseef Tahir,&nbsp;Talha Nadeem","doi":"10.1002/smr.70041","DOIUrl":"https://doi.org/10.1002/smr.70041","url":null,"abstract":"<p>Ambiguous user requirements present a challenge in software requirement engineering. A manual approach to handling ambiguity is time-consuming. Software requirements are essential inputs to software development processes, including architecture and design, implementation, and testing. Requirement ambiguities lead to project cost overruns, delays in project delivery, and poor software product quality. Timely identification and correction of ambiguity can result in better software systems that meet product objectives and satisfy the needs of all stakeholders. This study explores various natural language processing techniques and SpanBERT (a variant of BERT). This research proposes a semiautomated approach for detecting anaphoric, coordination, and missing condition ambiguities in functional requirements. The proposed approach is validated on a new, original dataset containing 425 functional requirements from 16 domains. The ambiguities identified through our approach are compared with those detected manually and by ChatGPT. Our approach outperforms ChatGPT in detecting ambiguities. The proposed approach will aid project managers and requirement engineers in identifying ambiguities in requirement specifications, thereby helping to reduce cost overruns and delays in the software development process caused by requirement ambiguities.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 8","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144810976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Software-Evolution and Process
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1