首页 > 最新文献

IET Software最新文献

英文 中文
Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning 基于对称交叉熵学习的深度时间卷积网络的恶意软件鲁棒识别
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-07-07 DOI: 10.1049/sfw2.12137
Jiankun Sun, Xiong Luo, Weiping Wang, Yang Gao, Wenbing Zhao

Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.

物联网(IoT)领域的最新发展引起了人们对智能设备安全性的日益关注。具体而言,物联网系统上的恶意软件(恶意软件)数量不断增加。如今,研究人员已经在有监督的机器学习方法方面做出了许多努力来识别恶意攻击。高质量的标签在有监督的机器学习中非常重要,但由于生产环境的不确定性,噪声广泛存在。因此,从噪声标签中学习对于启用机器学习的恶意软件识别具有重要意义。在本研究中,受具有令人满意的噪声鲁棒性的对称交叉熵的激励,作者提出了一种使用时间卷积网络(TCN)的鲁棒恶意软件识别方法。此外,单词嵌入技术通常用于理解输入操作码(操作码)和应用程序编程接口函数名之间的上下文关系。在这里,考虑到现实世界智能环境中的大量未标记样本,作者使用单词嵌入方法,即Word2Vec,在未标记集上预训练TCN模型。在实验中,将所提出的方法与几种传统的统计方法和最近的神经网络在合成恶意软件数据集和真实世界数据集上进行了比较。性能比较表明,他们提出的方法具有更好的性能和噪声鲁棒性,特别是在真实场景中,该方法可以产生98.75%的最佳识别精度。
{"title":"Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning","authors":"Jiankun Sun,&nbsp;Xiong Luo,&nbsp;Weiping Wang,&nbsp;Yang Gao,&nbsp;Wenbing Zhao","doi":"10.1049/sfw2.12137","DOIUrl":"https://doi.org/10.1049/sfw2.12137","url":null,"abstract":"<p>Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"392-404"},"PeriodicalIF":1.6,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12137","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proposed ethical framework for software requirements engineering 提出的软件需求工程伦理框架
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-07-06 DOI: 10.1049/sfw2.12136
Seblewongel E. Biable, Nuno M. Garcia, Dida Midekso

Requirements engineering is a fundamental process in software development phases. At the same time, it is a difficult phase and exposed many ethical violations. The main purpose is proposing an ethical framework for software requirements engineering that addresses the identified concerns. These concerns include problems associated with a knowledge gap, requirements identification, quality-related concerns, unwillingness to give requirements, and practicing forbidden activities. These concerns are grouped into a category as the proposed framework components. Each of the categories encompasses more than one problem domain. The proposed framework suggests resolving mechanisms as collections of clauses for each of those concerns. An expert evaluation technique is used to validate the proposed framework. The experts are purposefully selected from software industries and institutions. Questionnaires and focus group discussions were used as data-gathering tools for the validation of the proposed framework. The validity (face validity, content validity, and construct validity) and the reliability of the proposed framework were checked. The evaluation results show that the proposed framework has an acceptable range of validity and reliability. The proposed framework can be used as a guideline for software engineers to minimise the occurrence of those identified concerns during the requirements engineering process.

需求工程是软件开发阶段的一个基本过程。与此同时,这是一个困难的阶段,暴露了许多违反道德的行为。主要目的是为软件需求工程提出一个伦理框架,以解决已确定的问题。这些问题包括与知识差距、需求识别、与质量相关的问题、不愿给出需求以及从事禁止活动有关的问题。这些关切被归为一类,作为拟议的框架组成部分。每个类别都包含多个问题域。拟议的框架建议将解决机制作为每一个关切问题的条款集合。使用专家评估技术来验证所提出的框架。专家是有目的地从软件行业和机构中挑选出来的。调查表和重点小组讨论被用作验证拟议框架的数据收集工具。对所提出的框架的有效性(面孔有效性、内容有效性和结构有效性)和可靠性进行了检验。评估结果表明,所提出的框架具有可接受的有效性和可靠性范围。所提出的框架可作为软件工程师的指导方针,以最大限度地减少在需求工程过程中出现的问题。
{"title":"Proposed ethical framework for software requirements engineering","authors":"Seblewongel E. Biable,&nbsp;Nuno M. Garcia,&nbsp;Dida Midekso","doi":"10.1049/sfw2.12136","DOIUrl":"https://doi.org/10.1049/sfw2.12136","url":null,"abstract":"<p>Requirements engineering is a fundamental process in software development phases. At the same time, it is a difficult phase and exposed many ethical violations. The main purpose is proposing an ethical framework for software requirements engineering that addresses the identified concerns. These concerns include problems associated with a knowledge gap, requirements identification, quality-related concerns, unwillingness to give requirements, and practicing forbidden activities. These concerns are grouped into a category as the proposed framework components. Each of the categories encompasses more than one problem domain. The proposed framework suggests resolving mechanisms as collections of clauses for each of those concerns. An expert evaluation technique is used to validate the proposed framework. The experts are purposefully selected from software industries and institutions. Questionnaires and focus group discussions were used as data-gathering tools for the validation of the proposed framework. The validity (face validity, content validity, and construct validity) and the reliability of the proposed framework were checked. The evaluation results show that the proposed framework has an acceptable range of validity and reliability. The proposed framework can be used as a guideline for software engineers to minimise the occurrence of those identified concerns during the requirements engineering process.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"526-537"},"PeriodicalIF":1.6,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12136","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50122748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CTHP: Selection for adoption of open-source bioinformatics software based on a customised ISO 25010 quality model, three-way decision and Delphi hierarchy process CTHP:基于定制的ISO25010质量模型、三方决策和Delphi层次过程选择采用开源生物信息学软件
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-06-27 DOI: 10.1049/sfw2.12134
Yuqi Li, Yixin Bian, Ziheng Zhang, Song Zhao, Yiqi Liu

The ever-growing open-source software tools in different domains increase the difficulty of software selection from the end-users perspective. The process of evaluating, comparing, and selecting open-source solutions is far from trivial. Especially, when additional requirements need to be considered, the existing methodologies will fail to adapt to the new tasks. The objective of this study is to present a solution for dealing with this issue. A novel approach, CTHP, is presented for the evaluation and selection of open-source software in the Bioinformatics domain. First, the ISO 25010 quality model is chosen as the basis. This model is customised according to the special characteristics of the Bioinformatics applications. The customisation is done by extracting the quality factors from the Bioinformatics applications, weighting these factors from the viewpoints of both developers and end-users, and adding them to the model. After that, Three-way Decision and Delphi Hierarchy Process are integrated to assist in the selection for adoption. Finally, as a case study, the proposed approach is applied to assist the decision-making process of two popular natural language processing frameworks in the Bioinformatics area. Our study is a valuable contribution since it provides a systematic way to document the decision-making process and help the researchers and practitioners of Bioinformatics to make better decisions among the alternatives.

不同领域中不断增长的开源软件工具增加了从最终用户角度选择软件的难度。评估、比较和选择开源解决方案的过程绝非易事。特别是,当需要考虑额外的需求时,现有的方法将无法适应新的任务。本研究的目的是提出解决这一问题的办法。提出了一种新的方法,CTHP,用于评估和选择生物信息学领域的开源软件。首先,选择ISO 25010质量模型作为基础。该模型是根据生物信息学应用的特殊特性定制的。定制是通过从生物信息学应用程序中提取质量因素来完成的,从开发人员和最终用户的角度对这些因素进行加权,并将它们添加到模型中。之后,将三向决策和德尔菲层次过程相结合,以帮助选择采用。最后,作为一个案例研究,将所提出的方法应用于生物信息学领域两个流行的自然语言处理框架的决策过程。我们的研究是一项有价值的贡献,因为它提供了一种系统的方式来记录决策过程,并帮助生物信息学的研究人员和从业者在备选方案中做出更好的决策。
{"title":"CTHP: Selection for adoption of open-source bioinformatics software based on a customised ISO 25010 quality model, three-way decision and Delphi hierarchy process","authors":"Yuqi Li,&nbsp;Yixin Bian,&nbsp;Ziheng Zhang,&nbsp;Song Zhao,&nbsp;Yiqi Liu","doi":"10.1049/sfw2.12134","DOIUrl":"https://doi.org/10.1049/sfw2.12134","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <p>The ever-growing open-source software tools in different domains increase the difficulty of software selection from the end-users perspective. The process of evaluating, comparing, and selecting open-source solutions is far from trivial. Especially, when additional requirements need to be considered, the existing methodologies will fail to adapt to the new tasks. The objective of this study is to present a solution for dealing with this issue. A novel approach, CTHP, is presented for the evaluation and selection of open-source software in the Bioinformatics domain. First, the ISO 25010 quality model is chosen as the basis. This model is customised according to the special characteristics of the Bioinformatics applications. The customisation is done by extracting the quality factors from the Bioinformatics applications, weighting these factors from the viewpoints of both developers and end-users, and adding them to the model. After that, Three-way Decision and Delphi Hierarchy Process are integrated to assist in the selection for adoption. Finally, as a case study, the proposed approach is applied to assist the decision-making process of two popular natural language processing frameworks in the Bioinformatics area. Our study is a valuable contribution since it provides a systematic way to document the decision-making process and help the researchers and practitioners of Bioinformatics to make better decisions among the alternatives.</p>\u0000 </section>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"496-508"},"PeriodicalIF":1.6,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12134","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting ‘revisiting supervised methods for effort-aware cross-project defect prediction’ 重新审视“重新审视工作感知跨项目缺陷预测的监督方法”
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-06-27 DOI: 10.1049/sfw2.12133
Fuyang Li, Peixin Yang, Jacky Wai Keung, Wenhua Hu, Haoyu Luo, Xiao Yu

Effort-aware cross-project defect prediction (EACPDP), which uses cross-project software modules to build a model to rank within-project software modules based on the defect density, has been suggested to allocate limited testing resource efficiently. Recently, Ni et al. proposed an EACPDP method called EASC, which used all cross-project modules to train a model without considering the data distribution difference between cross-project and within-project data. In addition, Ni et al. employed the different defect density calculation strategies when comparing EASC and baseline methods. To explore the effective defect density calculation strategies and methods on EACPDP, the authors compare four data filtering methods and five transfer learning methods with EASC using four commonly used defect density calculation strategies. The authors use three classification evaluation metrics and seven effort-aware metrics to assess the performance of methods on 11 PROMISE datasets comprehensively. The results show that (1) The classification before sorting (CBS+) defect density calculation strategy achieves the best overall performance. (2) Using balanced distribution adaption (BDA) and joint distribution adaptation (JDA) with the K-nearest neighbour classifier to build the EACPDP model can find 15% and 14.3% more defective modules and 11.6% and 8.9% more defects while achieving the acceptable initial false alarms (IFA). (3) Better comprehensive classification performance of the methods can bring better EACPDP performance to some extent. (4) A flexible adjustment of the defect threshold λ of the CBS+ strategy contribute to different goals. In summary, the authors recommend researchers and practitioners use to BDA and JDA with the CBS+ strategy to build the EACPDP model.

努力感知跨项目缺陷预测(EACPDP)利用跨项目软件模块建立模型,根据缺陷密度在项目软件模块内进行排序,以有效分配有限的测试资源。最近,Ni等人。提出了一种称为EASC的EACPDP方法,该方法使用所有跨项目模块来训练模型,而不考虑跨项目数据和项目内数据之间的数据分布差异。此外,Ni等人。在比较EASC和基线方法时采用了不同的缺陷密度计算策略。为了探索EACPDP上有效的缺陷密度计算策略和方法,作者将四种数据过滤方法和五种迁移学习方法与EASC进行了比较,并使用了四种常用的缺陷密度测量策略。作者使用三个分类评估指标和七个努力感知指标来全面评估方法在11个PROMISE数据集上的性能。结果表明:(1)先分类后排序(CBS+)缺陷密度计算策略取得了最佳的整体性能。(2) 使用平衡分布自适应(BDA)和联合分布自适应(JDA)与K近邻分类器建立EACPDP模型,可以发现15%和14.3%的缺陷模块,11.6%和8.9%的缺陷,同时实现可接受的初始虚警(IFA)。(3) 更好的综合分类性能可以在一定程度上带来更好的EACPDP性能。(4) 灵活调整CBS+策略的缺陷阈值λ有助于实现不同的目标。总之,作者建议研究人员和从业者使用BDA和JDA以及CBS+策略来构建EACPDP模型。
{"title":"Revisiting ‘revisiting supervised methods for effort-aware cross-project defect prediction’","authors":"Fuyang Li,&nbsp;Peixin Yang,&nbsp;Jacky Wai Keung,&nbsp;Wenhua Hu,&nbsp;Haoyu Luo,&nbsp;Xiao Yu","doi":"10.1049/sfw2.12133","DOIUrl":"https://doi.org/10.1049/sfw2.12133","url":null,"abstract":"<p>Effort-aware cross-project defect prediction (EACPDP), which uses cross-project software modules to build a model to rank within-project software modules based on the defect density, has been suggested to allocate limited testing resource efficiently. Recently, Ni et al. proposed an EACPDP method called EASC, which used all cross-project modules to train a model without considering the data distribution difference between cross-project and within-project data. In addition, Ni et al. employed the different defect density calculation strategies when comparing EASC and baseline methods. To explore the effective defect density calculation strategies and methods on EACPDP, the authors compare four data filtering methods and five transfer learning methods with EASC using four commonly used defect density calculation strategies. The authors use three classification evaluation metrics and seven effort-aware metrics to assess the performance of methods on 11 PROMISE datasets comprehensively. The results show that (1) The classification before sorting (CBS+) defect density calculation strategy achieves the best overall performance. (2) Using balanced distribution adaption (BDA) and joint distribution adaptation (JDA) with the K-nearest neighbour classifier to build the EACPDP model can find 15% and 14.3% more defective modules and 11.6% and 8.9% more defects while achieving the acceptable initial false alarms (IFA). (3) Better comprehensive classification performance of the methods can bring better EACPDP performance to some extent. (4) A flexible adjustment of the defect threshold <i>λ</i> of the CBS+ strategy contribute to different goals. In summary, the authors recommend researchers and practitioners use to BDA and JDA with the CBS+ strategy to build the EACPDP model.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"472-495"},"PeriodicalIF":1.6,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12133","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Formal verification of a telerehabilitation system through an abstraction and refinement approach using Uppaal 通过使用Uppaal的抽象和细化方法对远程康复系统进行形式化验证
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-06-22 DOI: 10.1049/sfw2.12128
Farid Arfi, Anne-Lise Courbis, Thomas Lambolais, François Bughin, Maurice Hayot

Formal methods are proven techniques that provide a rigorous mathematical basis to software development. In particular, they allow the quality of development to be effectively improved by making accurate and explicit modelling, so that anomalies like ambiguities and incompleteness are identified in the early phases of the software development process. Semi-formal UML models and formal Timed Automata models are used to design a telerehabilitation system through a practical approach based on abstraction and refinement. The formal verification of expected properties of the system is performed by the Uppaal tool. The motivation of this work is threefold: (i) showing the usefulness of formal methods to satisfy the validation needs of a medical telerehabilitation system; (ii) demonstrating our approach of system analysis through refinements to guide the development of a complex system; and (iii) highlighting, from a real-life experience, the usefulness of models to involve the stakeholders all along the design of a system, from requirements to detailed specifications.

形式化方法是经过验证的技术,为软件开发提供了严格的数学基础。特别是,它们通过进行准确和明确的建模,可以有效地提高开发质量,从而在软件开发过程的早期阶段识别出模糊和不完整等异常情况。通过一种基于抽象和精化的实用方法,使用半形式化UML模型和形式化时间自动机模型来设计远程康复系统。系统预期特性的正式验证由Uppaal工具执行。这项工作的动机有三个:(i)展示正式方法的有用性,以满足医疗远程康复系统的验证需求;(ii)通过改进来展示我们的系统分析方法,以指导复杂系统的开发;以及(iii)从现实生活中的经验中强调模型的有用性,使利益相关者参与系统的设计,从需求到详细规范。
{"title":"Formal verification of a telerehabilitation system through an abstraction and refinement approach using Uppaal","authors":"Farid Arfi,&nbsp;Anne-Lise Courbis,&nbsp;Thomas Lambolais,&nbsp;François Bughin,&nbsp;Maurice Hayot","doi":"10.1049/sfw2.12128","DOIUrl":"https://doi.org/10.1049/sfw2.12128","url":null,"abstract":"<p>Formal methods are proven techniques that provide a rigorous mathematical basis to software development. In particular, they allow the quality of development to be effectively improved by making accurate and explicit modelling, so that anomalies like ambiguities and incompleteness are identified in the early phases of the software development process. Semi-formal UML models and formal Timed Automata models are used to design a telerehabilitation system through a practical approach based on abstraction and refinement. The formal verification of expected properties of the system is performed by the <span>Uppaal</span> tool. The motivation of this work is threefold: (i) showing the usefulness of formal methods to satisfy the validation needs of a medical telerehabilitation system; (ii) demonstrating our approach of system analysis through refinements to guide the development of a complex system; and (iii) highlighting, from a real-life experience, the usefulness of models to involve the stakeholders all along the design of a system, from requirements to detailed specifications.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"582-599"},"PeriodicalIF":1.6,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12128","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50141103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Just-in-time defect prediction enhanced by the joint method of line label fusion and file filtering 利用线标签融合和文件过滤的联合方法增强实时缺陷预测
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-06-16 DOI: 10.1049/sfw2.12131
Huan Zhang, Li Kuang, Aolang Wu, Qiuming Zhao, Xiaoxian Yang

Just-In-Time (JIT) defect prediction aims to predict the defect proneness of software changes when they are initially submitted. It has become a hot topic in software defect prediction due to its timely manner and traceability. Researchers have proposed many JIT defect prediction approaches. However, these approaches cannot effectively utilise line labels representing added or removed lines and ignore the noise caused by defect-irrelevant files. Therefore, a JIT defect prediction model enhanced by the joint method of line label Fusion and file Filtering (JIT-FF) is proposed. Firstly, to distinguish added and removed lines while preserving the original software changes information, the authors represent the code changes as original, added, and removed codes according to line labels. Secondly, to obtain semantics-enhanced code representation, a cross-attention-based line label fusion method to perform complementary feature enhancement is proposed. Thirdly, to generate code changes containing fewer defect-irrelevant files, the authors formalise the file filtering as a sequential decision problem and propose a reinforcement learning-based file filtering method. Finally, based on generated code changes, CodeBERT-based commit representation and multi-layer perceptron-based defect prediction are performed to identify the defective software changes. The experiments demonstrate that JIT-FF can predict defective software changes more effectively.

JIT缺陷预测旨在预测软件更改最初提交时的缺陷倾向性。由于其及时性和可追溯性,它已成为软件缺陷预测的热门话题。研究人员提出了许多JIT缺陷预测方法。然而,这些方法不能有效地利用表示添加或删除的行的行标签,并且忽略由与缺陷无关的文件引起的噪声。因此,提出了一种通过线标签融合和文件过滤(JIT-FF)的联合方法来增强JIT缺陷预测模型。首先,为了区分添加和删除的行,同时保留原始软件更改信息,作者根据行标签将代码更改表示为原始代码、添加代码和删除代码。其次,为了获得语义增强的代码表示,提出了一种基于交叉注意力的线标签融合方法来进行互补特征增强。第三,为了生成包含较少缺陷无关文件的代码更改,作者将文件过滤正式化为一个顺序决策问题,并提出了一种基于强化学习的文件过滤方法。最后,基于生成的代码更改,执行基于CodeBERT的提交表示和基于多层感知器的缺陷预测来识别有缺陷的软件更改。实验表明,JIT-FF可以更有效地预测缺陷软件的变化。
{"title":"Just-in-time defect prediction enhanced by the joint method of line label fusion and file filtering","authors":"Huan Zhang,&nbsp;Li Kuang,&nbsp;Aolang Wu,&nbsp;Qiuming Zhao,&nbsp;Xiaoxian Yang","doi":"10.1049/sfw2.12131","DOIUrl":"https://doi.org/10.1049/sfw2.12131","url":null,"abstract":"<p>Just-In-Time (JIT) defect prediction aims to predict the defect proneness of software changes when they are initially submitted. It has become a hot topic in software defect prediction due to its timely manner and traceability. Researchers have proposed many JIT defect prediction approaches. However, these approaches cannot effectively utilise line labels representing added or removed lines and ignore the noise caused by defect-irrelevant files. Therefore, a JIT defect prediction model enhanced by the joint method of line label Fusion and file Filtering (JIT-FF) is proposed. Firstly, to distinguish added and removed lines while preserving the original software changes information, the authors represent the code changes as original, added, and removed codes according to line labels. Secondly, to obtain semantics-enhanced code representation, a cross-attention-based line label fusion method to perform complementary feature enhancement is proposed. Thirdly, to generate code changes containing fewer defect-irrelevant files, the authors formalise the file filtering as a sequential decision problem and propose a reinforcement learning-based file filtering method. Finally, based on generated code changes, CodeBERT-based commit representation and multi-layer perceptron-based defect prediction are performed to identify the defective software changes. The experiments demonstrate that JIT-FF can predict defective software changes more effectively.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"378-391"},"PeriodicalIF":1.6,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12131","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50151350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining relevant solutions for programming tasks from search engine results 从搜索引擎结果中挖掘编程任务的相关解决方案
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-06-14 DOI: 10.1049/sfw2.12127
Adriano M. Rocha, Marcelo A. Maia

Official documentation of software development technologies, for example, APIs, may not be sufficient for all developer needs, so searching on the Internet is a usual practice. Nonetheless, finding useful information may be challenging because the best solutions are not always among the first ranked pages. Developers need to read and discard irrelevant pages, that is, those without code examples or those that have content with little focus on the desired solution. This work aims at proposing an approach to mine relevant solutions for programming tasks from search engine results by removing irrelevant pages. The authors evaluated the top-20 pages returned by the Google search engine, for 10 different queries, and observed that only 31% of the evaluated pages are relevant to developers. Then, the authors proposed and evaluated three different approaches to mine the relevant pages returned by the search engine. Google's search engine has been used as a baseline, and authors’ results have shown that it returns a reasonable number of irrelevant pages for developers, and the authors could establish an effective approach to remove irrelevant pages, suggesting that developers could benefit from a customised web search filter for development content.

软件开发技术的官方文档,例如API,可能不足以满足所有开发人员的需求,因此在互联网上搜索是一种常见的做法。尽管如此,找到有用的信息可能很有挑战性,因为最佳解决方案并不总是在排名第一的页面中。开发人员需要阅读并丢弃不相关的页面,即那些没有代码示例的页面,或者那些内容很少关注所需解决方案的页面。这项工作旨在提出一种方法,通过删除不相关的页面,从搜索引擎结果中挖掘编程任务的相关解决方案。作者评估了谷歌搜索引擎针对10个不同查询返回的前20个页面,并观察到只有31%的评估页面与开发人员相关。然后,作者提出并评估了三种不同的方法来挖掘搜索引擎返回的相关页面。谷歌的搜索引擎已被用作基线,作者的结果表明,它为开发人员返回了合理数量的不相关页面,作者可以建立一种有效的方法来删除不相关的页面,这表明开发人员可以从针对开发内容的定制网络搜索过滤器中受益。
{"title":"Mining relevant solutions for programming tasks from search engine results","authors":"Adriano M. Rocha,&nbsp;Marcelo A. Maia","doi":"10.1049/sfw2.12127","DOIUrl":"https://doi.org/10.1049/sfw2.12127","url":null,"abstract":"<p>Official documentation of software development technologies, for example, APIs, may not be sufficient for all developer needs, so searching on the Internet is a usual practice. Nonetheless, finding useful information may be challenging because the best solutions are not always among the first ranked pages. Developers need to read and discard irrelevant pages, that is, those without code examples or those that have content with little focus on the desired solution. This work aims at proposing an approach to mine relevant solutions for programming tasks from search engine results by removing irrelevant pages. The authors evaluated the top-20 pages returned by the Google search engine, for 10 different queries, and observed that only 31% of the evaluated pages are relevant to developers. Then, the authors proposed and evaluated three different approaches to mine the relevant pages returned by the search engine. Google's search engine has been used as a baseline, and authors’ results have shown that it returns a reasonable number of irrelevant pages for developers, and the authors could establish an effective approach to remove irrelevant pages, suggesting that developers could benefit from a customised web search filter for development content.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"455-471"},"PeriodicalIF":1.6,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50132965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated issue assignment using topic modelling on Jira issue tracking data 使用Jira问题跟踪数据的主题建模自动分配问题
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-05-30 DOI: 10.1049/sfw2.12129
Themistoklis Diamantopoulos, Nikolaos Saoulidis, Andreas Symeonidis

As more and more software teams use online issue tracking systems to collaborate on software projects, the accurate assignment of new issues to the most suitable contributors may have significant impact on the success of the project. As a result, several research efforts have been directed towards automating this process to save considerable time and effort. However, most approaches focus mainly on software bugs and employ models that do not sufficiently take into account the semantics and the non-textual metadata of issues and/or produce models that may require manual tuning. A methodology that extracts both textual and non-textual features from different types of issues is designed, providing a Jira dataset that involves not only bugs but also new features, issues related to documentation, patches, etc. Moreover, the semantics of issue text are effectively captured by employing a topic modelling technique that is optimised using the assignment result. Finally, this methodology aggregates probabilities from a set of individual models to provide the final assignment. Upon evaluating this approach in an automated issue assignment setting using a dataset of Jira issues, the authors conclude that it can be effective for automated issue assignment.

随着越来越多的软件团队使用在线问题跟踪系统在软件项目上进行协作,将新问题准确分配给最合适的贡献者可能会对项目的成功产生重大影响。因此,一些研究工作都致力于自动化这一过程,以节省大量的时间和精力。然而,大多数方法主要关注软件缺陷,并且使用的模型没有充分考虑问题的语义和非文本元数据,和/或生成可能需要手动调整的模型。设计了一种从不同类型的问题中提取文本和非文本特征的方法,提供了一个Jira数据集,该数据集不仅涉及错误,还涉及新功能、与文档相关的问题、补丁等。此外,通过使用使用分配结果优化的主题建模技术,可以有效地捕获问题文本的语义。最后,这种方法从一组单独的模型中聚合概率,以提供最终分配。在使用Jira问题数据集在自动问题分配设置中评估这种方法后,作者得出结论,它可以有效地进行自动问题分配。
{"title":"Automated issue assignment using topic modelling on Jira issue tracking data","authors":"Themistoklis Diamantopoulos,&nbsp;Nikolaos Saoulidis,&nbsp;Andreas Symeonidis","doi":"10.1049/sfw2.12129","DOIUrl":"https://doi.org/10.1049/sfw2.12129","url":null,"abstract":"<p>As more and more software teams use online issue tracking systems to collaborate on software projects, the accurate assignment of new issues to the most suitable contributors may have significant impact on the success of the project. As a result, several research efforts have been directed towards automating this process to save considerable time and effort. However, most approaches focus mainly on software bugs and employ models that do not sufficiently take into account the semantics and the non-textual metadata of issues and/or produce models that may require manual tuning. A methodology that extracts both textual and non-textual features from different types of issues is designed, providing a Jira dataset that involves not only bugs but also new features, issues related to documentation, patches, etc. Moreover, the semantics of issue text are effectively captured by employing a topic modelling technique that is optimised using the assignment result. Finally, this methodology aggregates probabilities from a set of individual models to provide the final assignment. Upon evaluating this approach in an automated issue assignment setting using a dataset of Jira issues, the authors conclude that it can be effective for automated issue assignment.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 3","pages":"333-344"},"PeriodicalIF":1.6,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12129","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50148877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of requirements specification using native language on external software quality 使用母语的需求规范对外部软件质量的影响
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-05-29 DOI: 10.1049/sfw2.12124
Fernando Uyaguari, Cathy Guevara-Vega, Antonio Quiña-Mera, Alvaro Uyaguari, Cristina Acosta

In the context of requirements specification in Global Software Development, aspects such as differences in culture, language and schedule affect software development teams; however, we do not know the effect of these issues. Compare the native language requirements with the foreign language requirements concerning external quality. We conducted a controlled experiment of one-factor two treatments within-subjects with 17 experimental subjects. Wilcoxon test indicates that there is evidence to reject the null hypothesis (p-value = 0.008); there is a statistically significant difference. The external quality value obtained with native language requirements is superior to the external quality produced with the foreign language. The effect size equals an absolute value of 0.45, which corresponds to a medium effect. The language used in the requirements specification influences the external quality; using the native language in the requirements specification significantly increases the external quality. The result obtained in this research should be considered to evaluate the roles and English language skills of GSD team members and their effect on external software quality. We also suggest considering the English language skills of the experimental subjects in the experimental laboratories since language could influence the results of the experiments.

在全球软件开发的需求规范中,文化、语言和时间表的差异等方面会影响软件开发团队;然而,我们不知道这些问题的影响。比较母语要求和外语要求的外部质量。我们对17名受试者进行了一项一因素二治疗的对照实验。Wilcoxon检验表明有证据拒绝零假设(p值=0.008);存在统计学上显著的差异。用母语要求获得的外部质量值优于用外语产生的外部质量。效果大小等于0.45的绝对值,这对应于中等效果。需求规范中使用的语言会影响外部质量;在需求规范中使用母语可以显著提高外部质量。本研究的结果应用于评估GSD团队成员的角色和英语技能,以及他们对外部软件质量的影响。我们还建议在实验实验室考虑实验对象的英语语言技能,因为语言会影响实验结果。
{"title":"Effect of requirements specification using native language on external software quality","authors":"Fernando Uyaguari,&nbsp;Cathy Guevara-Vega,&nbsp;Antonio Quiña-Mera,&nbsp;Alvaro Uyaguari,&nbsp;Cristina Acosta","doi":"10.1049/sfw2.12124","DOIUrl":"https://doi.org/10.1049/sfw2.12124","url":null,"abstract":"<p>In the context of requirements specification in Global Software Development, aspects such as differences in culture, language and schedule affect software development teams; however, we do not know the effect of these issues. Compare the native language requirements with the foreign language requirements concerning external quality. We conducted a controlled experiment of one-factor two treatments within-subjects with 17 experimental subjects. Wilcoxon test indicates that there is evidence to reject the null hypothesis (<i>p</i>-value = 0.008); there is a statistically significant difference. The external quality value obtained with native language requirements is superior to the external quality produced with the foreign language. The effect size equals an absolute value of 0.45, which corresponds to a medium effect. The language used in the requirements specification influences the external quality; using the native language in the requirements specification significantly increases the external quality. The result obtained in this research should be considered to evaluate the roles and English language skills of GSD team members and their effect on external software quality. We also suggest considering the English language skills of the experimental subjects in the experimental laboratories since language could influence the results of the experiments.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 3","pages":"287-300"},"PeriodicalIF":1.6,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12124","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50147628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case Study on Applications of the Hook Model in Software Products 钩子模型在软件产品中的应用案例研究
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-05-16 DOI: 10.3390/software2020014
E. Lukyanchikova, N. Askarbekuly, Hamna Aslam, M. Mazzara
The Hook model is used in digital products to engage and retain users through the mechanism of habit formation. This paper explores the use of Hook model techniques in two mobile applications, one being a popular taxi service (Uber taxi) and the other a social network (Instagram). The goal of this paper is to explore the Hook cycle patterns in the two products, and to identify commonalities and differences in how they are applied. Our results suggest that Hook cycle patterns appear with similar frequency; however, Instagram includes more internal Trigger calls. Uber uses fewer triggers to encourage usage, most probably because users already have a specific need for the application. For the same reason, Uber has less opportunity to fail in the reward delivery, while Instagram can use the failure (in providing a reward) as another trigger if the usage habit is already established. In addition, we introduce two types of Hook cycle patterns: internal (within a single use case) and external (transition between use cases). The insights obtained through the case studies serve as a practical reference for developing engaging and retention-focused applications.
Hook模型用于数字产品中,通过习惯形成机制来吸引和留住用户。本文探讨了在两个移动应用程序中使用Hook模型技术,一个是流行的出租车服务(Uber出租车),另一个是社交网络(Instagram)。本文的目的是探索这两个产品中的Hook循环模式,并确定它们在应用方式上的共同点和不同点。我们的研究结果表明,Hook周期模式出现的频率相似;然而,Instagram包含了更多的内部触发器调用。优步使用较少的触发器来鼓励使用,很可能是因为用户已经对该应用程序有了特定的需求。出于同样的原因,优步在奖励发放方面失败的机会较少,而Instagram可以利用(提供奖励的)失败作为另一个触发因素,如果使用习惯已经形成。此外,我们还介绍了两种类型的Hook循环模式:内部(在单个用例中)和外部(用例之间的转换)。通过案例研究获得的见解可作为开发引人入胜和专注于留存率的应用程序的实用参考。
{"title":"A Case Study on Applications of the Hook Model in Software Products","authors":"E. Lukyanchikova, N. Askarbekuly, Hamna Aslam, M. Mazzara","doi":"10.3390/software2020014","DOIUrl":"https://doi.org/10.3390/software2020014","url":null,"abstract":"The Hook model is used in digital products to engage and retain users through the mechanism of habit formation. This paper explores the use of Hook model techniques in two mobile applications, one being a popular taxi service (Uber taxi) and the other a social network (Instagram). The goal of this paper is to explore the Hook cycle patterns in the two products, and to identify commonalities and differences in how they are applied. Our results suggest that Hook cycle patterns appear with similar frequency; however, Instagram includes more internal Trigger calls. Uber uses fewer triggers to encourage usage, most probably because users already have a specific need for the application. For the same reason, Uber has less opportunity to fail in the reward delivery, while Instagram can use the failure (in providing a reward) as another trigger if the usage habit is already established. In addition, we introduce two types of Hook cycle patterns: internal (within a single use case) and external (transition between use cases). The insights obtained through the case studies serve as a practical reference for developing engaging and retention-focused applications.","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"8 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87599108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1