首页 > 最新文献

Software: Practice and Experience最新文献

英文 中文
Blockchain and explainable AI for enhanced decision making in cyber threat detection 区块链和可解释人工智能促进网络威胁检测中的决策制定
Pub Date : 2024-02-19 DOI: 10.1002/spe.3319
Prabhat Kumar, Danish Javeed, Randhir Kumar, A.K.M Najmul Islam
Artificial Intelligence (AI) based cyber threat detection tools are widely used to process and analyze a large amount of data for improved intrusion detection performance. However, these models are often considered as black box by the cybersecurity experts due to their inability to comprehend or interpret the reasoning behind the decisions. Moreover, AI-based threat hunting is data-driven and is usually modeled using the data provided by multiple cloud vendors. This is another critical challenge, as a malicious cloud can provide false information (i.e., insider attacks) and can degrade the threat-hunting capability. In this paper, we present a blockchain-enabled eXplainable AI (XAI) for enhancing the decision-making capability of cyber threat detection in the context of Smart Healthcare Systems. Specifically, first, we use blockchain to validate and store data between multiple cloud vendors by implementing a Clique Proof-of-Authority (C-PoA) consensus. Second, a novel deep learning-based threat-hunting model is built by combining Parallel Stacked Long Short Term Memory (PSLSTM) networks with a multi-head attention mechanism for improved attack detection. The extensive experiment confirms its potential to be used as an enhanced decision support system by cybersecurity analysts.
基于人工智能(AI)的网络威胁检测工具被广泛用于处理和分析大量数据,以提高入侵检测性能。然而,由于网络安全专家无法理解或解释决策背后的推理,这些模型通常被视为黑盒。此外,基于人工智能的威胁捕猎是数据驱动的,通常使用多个云供应商提供的数据建模。这是另一个关键挑战,因为恶意云可能会提供虚假信息(即内部攻击),从而降低威胁猎捕能力。在本文中,我们提出了一种支持区块链的可扩展人工智能(XAI),用于增强智能医疗系统中网络威胁检测的决策能力。具体来说,首先,我们通过实施克利克授权证明(C-PoA)共识,使用区块链在多个云供应商之间验证和存储数据。其次,通过将并行堆叠长短期记忆(PSLSTM)网络与多头关注机制相结合,建立了一种基于深度学习的新型威胁猎杀模型,以改进攻击检测。广泛的实验证实了它作为网络安全分析人员的增强型决策支持系统的潜力。
{"title":"Blockchain and explainable AI for enhanced decision making in cyber threat detection","authors":"Prabhat Kumar, Danish Javeed, Randhir Kumar, A.K.M Najmul Islam","doi":"10.1002/spe.3319","DOIUrl":"https://doi.org/10.1002/spe.3319","url":null,"abstract":"Artificial Intelligence (AI) based cyber threat detection tools are widely used to process and analyze a large amount of data for improved intrusion detection performance. However, these models are often considered as black box by the cybersecurity experts due to their inability to comprehend or interpret the reasoning behind the decisions. Moreover, AI-based threat hunting is data-driven and is usually modeled using the data provided by multiple cloud vendors. This is another critical challenge, as a malicious cloud can provide false information (i.e., insider attacks) and can degrade the threat-hunting capability. In this paper, we present a blockchain-enabled eXplainable AI (XAI) for enhancing the decision-making capability of cyber threat detection in the context of Smart Healthcare Systems. Specifically, first, we use blockchain to validate and store data between multiple cloud vendors by implementing a Clique Proof-of-Authority (C-PoA) consensus. Second, a novel deep learning-based threat-hunting model is built by combining Parallel Stacked Long Short Term Memory (PSLSTM) networks with a multi-head attention mechanism for improved attack detection. The extensive experiment confirms its potential to be used as an enhanced decision support system by cybersecurity analysts.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139918416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the interaction between the search parameters and the nature of the search problems in search-based model-driven engineering 基于搜索的模型驱动工程中搜索参数与搜索问题性质之间的相互作用
Pub Date : 2024-02-15 DOI: 10.1002/spe.3320
Isis Roca, Jaime Font, Lorena Arcega, Carlos Cetina
The use of search-based software engineering to address model-driven engineering activities (SBMDE) is becoming more popular. Many maintenance tasks can be reformulated as a search problem, and, when those tasks are applied to software models, the search strategy has to retrieve a model fragment. There are no studies on the influence of the search parameters when applied to software models. This article evaluates the impact of different search parameter values on the performance of an evolutionary algorithm whose population is in the form of software models. Our study takes into account the nature of the model fragment location problems (MFLPs) in which the evolutionary algorithm is applied. The evaluation searches 1895 MFLPs (characterized through five measures that define MFLPs) from two industrial case studies and uses 625 different combinations of search parameter values. The results show that the impact on the performance when varying the population size, the replacement percentage, or the crossover rate produces changes of around 30% in performance. With regard to the nature of the problems, the size of the search space has the largest impact. Search parameter values and the nature of the MFLPs influence the performance when applying an evolutionary algorithm to perform fragment location on models. Search parameter values have a greater effect on precision values, and the nature of the MFLPs has a greater effect on recall values. Our results should raise awareness of the relevance of the search parameters and the nature of the problems for the SBMDE community.
使用基于搜索的软件工程来解决模型驱动的工程活动(SBMDE)正变得越来越流行。许多维护任务都可以重新表述为搜索问题,当这些任务应用于软件模型时,搜索策略必须检索模型片段。目前还没有关于搜索参数对软件模型的影响的研究。本文评估了不同搜索参数值对以软件模型为种群的进化算法性能的影响。我们的研究考虑了应用进化算法的模型片段定位问题(MFLPs)的性质。评估搜索了两个工业案例研究中的 1895 个 MFLPs(通过定义 MFLPs 的五种措施来描述),并使用了 625 种不同的搜索参数值组合。结果表明,改变种群规模、替换率或交叉率对性能的影响约为 30%。就问题的性质而言,搜索空间的大小影响最大。应用进化算法对模型进行片段定位时,搜索参数值和 MFLPs 的性质会影响性能。搜索参数值对精确度值的影响更大,而MFLPs的性质对召回值的影响更大。我们的研究结果应能提高 SBMDE 界对搜索参数和问题性质相关性的认识。
{"title":"On the interaction between the search parameters and the nature of the search problems in search-based model-driven engineering","authors":"Isis Roca, Jaime Font, Lorena Arcega, Carlos Cetina","doi":"10.1002/spe.3320","DOIUrl":"https://doi.org/10.1002/spe.3320","url":null,"abstract":"The use of search-based software engineering to address model-driven engineering activities (SBMDE) is becoming more popular. Many maintenance tasks can be reformulated as a search problem, and, when those tasks are applied to software models, the search strategy has to retrieve a model fragment. There are no studies on the influence of the search parameters when applied to software models. This article evaluates the impact of different search parameter values on the performance of an evolutionary algorithm whose population is in the form of software models. Our study takes into account the nature of the model fragment location problems (MFLPs) in which the evolutionary algorithm is applied. The evaluation searches 1895 MFLPs (characterized through five measures that define MFLPs) from two industrial case studies and uses 625 different combinations of search parameter values. The results show that the impact on the performance when varying the population size, the replacement percentage, or the crossover rate produces changes of around 30% in performance. With regard to the nature of the problems, the size of the search space has the largest impact. Search parameter values and the nature of the MFLPs influence the performance when applying an evolutionary algorithm to perform fragment location on models. Search parameter values have a greater effect on precision values, and the nature of the MFLPs has a greater effect on recall values. Our results should raise awareness of the relevance of the search parameters and the nature of the problems for the SBMDE community.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139918413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functions as a service for distributed deep neural network inference over the cloud-to-things continuum 在 "云到物 "连续体上为分布式深度神经网络推理提供功能服务
Pub Date : 2024-02-11 DOI: 10.1002/spe.3318
Altair Bueno, Bartolomé Rubio, Cristian Martín, Manuel Díaz
The use of serverless computing has been gaining popularity in recent years as an alternative to traditional Cloud computing. We explore the usability and potential development benefits of three popular open-source serverless platforms in the context of IoT: OpenFaaS, Fission, and OpenWhisk. To address this we discuss our experience developing a serverless and low-latency Distributed Deep Neural Network (DDNN) application. Our findings indicate that these serverless platforms require significant resources to operate and are not ideal for constrained devices. In addition, we archived a 55% improvement compared to Kafka-ML's performance under load, a framework without dynamic scaling support, demonstrating the potential of serverless computing for low-latency applications.
作为传统云计算的替代方案,无服务器计算的使用近年来越来越受欢迎。我们探讨了三种流行的开源无服务器平台在物联网背景下的可用性和潜在开发优势:OpenFaaS、Fission 和 OpenWhisk。为此,我们讨论了开发无服务器和低延迟分布式深度神经网络(DDNN)应用的经验。我们的研究结果表明,这些无服务器平台需要大量资源才能运行,对于受限设备来说并不理想。此外,与没有动态扩展支持的框架 Kafka-ML 相比,我们在负载下的性能提高了 55%,这证明了无服务器计算在低延迟应用方面的潜力。
{"title":"Functions as a service for distributed deep neural network inference over the cloud-to-things continuum","authors":"Altair Bueno, Bartolomé Rubio, Cristian Martín, Manuel Díaz","doi":"10.1002/spe.3318","DOIUrl":"https://doi.org/10.1002/spe.3318","url":null,"abstract":"The use of serverless computing has been gaining popularity in recent years as an alternative to traditional Cloud computing. We explore the usability and potential development benefits of three popular open-source serverless platforms in the context of IoT: OpenFaaS, Fission, and OpenWhisk. To address this we discuss our experience developing a serverless and low-latency Distributed Deep Neural Network (DDNN) application. Our findings indicate that these serverless platforms require significant resources to operate and are not ideal for constrained devices. In addition, we archived a 55% improvement compared to Kafka-ML's performance under load, a framework without dynamic scaling support, demonstrating the potential of serverless computing for low-latency applications.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A distributed tracing pipeline for improving locality awareness of microservices applications 用于提高微服务应用程序定位意识的分布式跟踪管道
Pub Date : 2024-02-01 DOI: 10.1002/spe.3317
Carmine Colarusso, Assunta De Caro, Ida Falco, Lorenzo Goglia, Eugenio Zimeo
The microservices architectural style aims at improving software maintenance and scalability by decomposing applications into independently deployable components. A common criticism about this style is the risk of increasing response times due to communication, especially with very granular entities. Locality‐aware placement of microservices onto the underlying hardware can contribute to keeping response times low. However, the complex graphs of invocations originating from users' calls largely depend on the specific workload (e.g., the length of an invocation chain could depend on the input parameters). Therefore, many existing approaches are not suitable for modern infrastructures where application components can be dynamically redeployed to take into account user expectations. This paper contributes to overcoming the limitations of static or off‐line techniques by presenting a big data pipeline to dynamically collect tracing data from running applications that are used to identify a given number of microservices groups whose deployment allows keeping low the response times of the most critical operations under a defined workload. The results, obtained in different working conditions and with different infrastructure configurations, are presented and discussed to draw the main considerations about the general problem of defining boundary, granularity, and optimal placement of microservices on the underlying execution environment. In particular, they show that knowing how a specific workload impacts the constituent microservices of an application, helps achieve better performance, by effectively lowering response time (e.g., up to a reduction), through the exploitation of locality‐driven clustering strategies for deploying groups of services.
微服务架构风格旨在通过将应用程序分解为可独立部署的组件,提高软件的维护性和可扩展性。对这种风格的一个常见批评是,由于通信,尤其是与非常细粒度的实体通信,有可能增加响应时间。在底层硬件上对微服务进行本地感知放置有助于保持较低的响应时间。然而,源自用户调用的复杂调用图在很大程度上取决于特定的工作负载(例如,调用链的长度可能取决于输入参数)。因此,许多现有方法并不适合现代基础设施,因为在现代基础设施中,应用组件可以根据用户的期望进行动态重新部署。本文提出了一种大数据管道,可动态收集运行应用程序的跟踪数据,用于识别给定数量的微服务组,这些微服务组的部署可在定义的工作负载下保持最关键操作的低响应时间,从而克服静态或离线技术的局限性。本文介绍并讨论了在不同工作条件和不同基础架构配置下取得的结果,从而得出了在底层执行环境中定义微服务的边界、粒度和最佳位置等一般问题的主要考虑因素。特别是,它们表明,了解特定工作负载如何影响应用程序的组成微服务,有助于实现更好的性能,通过利用局部性驱动的聚类策略来部署服务组,有效降低响应时间(如减少响应时间)。
{"title":"A distributed tracing pipeline for improving locality awareness of microservices applications","authors":"Carmine Colarusso, Assunta De Caro, Ida Falco, Lorenzo Goglia, Eugenio Zimeo","doi":"10.1002/spe.3317","DOIUrl":"https://doi.org/10.1002/spe.3317","url":null,"abstract":"The microservices architectural style aims at improving software maintenance and scalability by decomposing applications into independently deployable components. A common criticism about this style is the risk of increasing response times due to communication, especially with very granular entities. Locality‐aware placement of microservices onto the underlying hardware can contribute to keeping response times low. However, the complex graphs of invocations originating from users' calls largely depend on the specific workload (e.g., the length of an invocation chain could depend on the input parameters). Therefore, many existing approaches are not suitable for modern infrastructures where application components can be dynamically redeployed to take into account user expectations. This paper contributes to overcoming the limitations of static or off‐line techniques by presenting a big data pipeline to dynamically collect tracing data from running applications that are used to identify a given number of microservices groups whose deployment allows keeping low the response times of the most critical operations under a defined workload. The results, obtained in different working conditions and with different infrastructure configurations, are presented and discussed to draw the main considerations about the general problem of defining boundary, granularity, and optimal placement of microservices on the underlying execution environment. In particular, they show that knowing how a specific workload impacts the constituent microservices of an application, helps achieve better performance, by effectively lowering response time (e.g., up to a reduction), through the exploitation of locality‐driven clustering strategies for deploying groups of services.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139831607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A distributed tracing pipeline for improving locality awareness of microservices applications 用于提高微服务应用程序定位意识的分布式跟踪管道
Pub Date : 2024-02-01 DOI: 10.1002/spe.3317
Carmine Colarusso, Assunta De Caro, Ida Falco, Lorenzo Goglia, Eugenio Zimeo
The microservices architectural style aims at improving software maintenance and scalability by decomposing applications into independently deployable components. A common criticism about this style is the risk of increasing response times due to communication, especially with very granular entities. Locality‐aware placement of microservices onto the underlying hardware can contribute to keeping response times low. However, the complex graphs of invocations originating from users' calls largely depend on the specific workload (e.g., the length of an invocation chain could depend on the input parameters). Therefore, many existing approaches are not suitable for modern infrastructures where application components can be dynamically redeployed to take into account user expectations. This paper contributes to overcoming the limitations of static or off‐line techniques by presenting a big data pipeline to dynamically collect tracing data from running applications that are used to identify a given number of microservices groups whose deployment allows keeping low the response times of the most critical operations under a defined workload. The results, obtained in different working conditions and with different infrastructure configurations, are presented and discussed to draw the main considerations about the general problem of defining boundary, granularity, and optimal placement of microservices on the underlying execution environment. In particular, they show that knowing how a specific workload impacts the constituent microservices of an application, helps achieve better performance, by effectively lowering response time (e.g., up to a reduction), through the exploitation of locality‐driven clustering strategies for deploying groups of services.
微服务架构风格旨在通过将应用程序分解为可独立部署的组件,提高软件的维护性和可扩展性。对这种风格的一个常见批评是,由于通信,尤其是与非常细粒度的实体通信,有可能增加响应时间。在底层硬件上对微服务进行本地感知放置有助于保持较低的响应时间。然而,源自用户调用的复杂调用图在很大程度上取决于特定的工作负载(例如,调用链的长度可能取决于输入参数)。因此,许多现有方法并不适合现代基础设施,因为在现代基础设施中,应用组件可以根据用户的期望进行动态重新部署。本文提出了一种大数据管道,可动态收集运行应用程序的跟踪数据,用于识别给定数量的微服务组,这些微服务组的部署可在定义的工作负载下保持最关键操作的低响应时间,从而克服静态或离线技术的局限性。本文介绍并讨论了在不同工作条件和不同基础架构配置下取得的结果,从而得出了在底层执行环境中定义微服务的边界、粒度和最佳位置等一般问题的主要考虑因素。特别是,它们表明,了解特定工作负载如何影响应用程序的组成微服务,有助于实现更好的性能,通过利用局部性驱动的聚类策略来部署服务组,有效降低响应时间(如减少响应时间)。
{"title":"A distributed tracing pipeline for improving locality awareness of microservices applications","authors":"Carmine Colarusso, Assunta De Caro, Ida Falco, Lorenzo Goglia, Eugenio Zimeo","doi":"10.1002/spe.3317","DOIUrl":"https://doi.org/10.1002/spe.3317","url":null,"abstract":"The microservices architectural style aims at improving software maintenance and scalability by decomposing applications into independently deployable components. A common criticism about this style is the risk of increasing response times due to communication, especially with very granular entities. Locality‐aware placement of microservices onto the underlying hardware can contribute to keeping response times low. However, the complex graphs of invocations originating from users' calls largely depend on the specific workload (e.g., the length of an invocation chain could depend on the input parameters). Therefore, many existing approaches are not suitable for modern infrastructures where application components can be dynamically redeployed to take into account user expectations. This paper contributes to overcoming the limitations of static or off‐line techniques by presenting a big data pipeline to dynamically collect tracing data from running applications that are used to identify a given number of microservices groups whose deployment allows keeping low the response times of the most critical operations under a defined workload. The results, obtained in different working conditions and with different infrastructure configurations, are presented and discussed to draw the main considerations about the general problem of defining boundary, granularity, and optimal placement of microservices on the underlying execution environment. In particular, they show that knowing how a specific workload impacts the constituent microservices of an application, helps achieve better performance, by effectively lowering response time (e.g., up to a reduction), through the exploitation of locality‐driven clustering strategies for deploying groups of services.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139891490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel instance-based method for cross-project just-in-time defect prediction 基于实例的跨项目及时缺陷预测新方法
Pub Date : 2024-01-24 DOI: 10.1002/spe.3316
Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai
Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.
跨项目(CP)及时软件缺陷预测(JIT-SDP)利用 CP 数据克服初始数据稀缺的问题,在软件项目的早期阶段训练高性能的 JIT-SDP 分类器。JIT-SDP 在跨项目背景下面临的主要挑战在于训练数据和测试数据之间的不同分布。为了解决这个问题,我们选择了与目标数据非常相似的源数据实例来构建分类器。软件数据集通常会表现出类不平衡问题,即缺陷类与干净类的比例明显偏低。这种不平衡通常会降低分类器的性能。在本研究中,我们提出了一种利用核均值匹配(ISKMM)的实例选择方法,该方法能同时解决跨项目缺陷预测(CPDP)中的知识转移和类不平衡问题。该方法采用核均值匹配(KMM)技术来评估训练数据和目标数据之间的相似性。它选择具有高相似性的实例,保留它们,并根据相似性加权对数据进行重新采样,以缓解类不平衡问题。我们在 10 个开源项目上进行的实验表明,ISKMM 算法优于现有的 CP 单源软件缺陷预测 (SDP) 算法。此外,在使用所提出的算法时,从跨项目数据构建的缺陷预测器的整体性能可与从项目内数据学习的预测器相媲美。
{"title":"A novel instance-based method for cross-project just-in-time defect prediction","authors":"Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai","doi":"10.1002/spe.3316","DOIUrl":"https://doi.org/10.1002/spe.3316","url":null,"abstract":"Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139562496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving task offloading in mobile edge computing: A deep reinforcement learning approach 移动边缘计算中的隐私保护任务卸载:深度强化学习方法
Pub Date : 2024-01-23 DOI: 10.1002/spe.3314
Fanglue Xia, Ying Chen, Jiwei Huang
As machine learning (ML) technologies continue to evolve, there is an increasing demand for data. Mobile crowd sensing (MCS) can motivate more users in the data collection process through reasonable compensation, which can enrich the data scale and coverage. However, nowadays, users are increasingly concerned about their privacy and are unwilling to easily share their personal data. Therefore, protecting privacy has become a crucial issue. In ML, federated learning (FL) is a widely known privacy-preserving technique where the model training process is performed locally by the data owner, which can protect privacy to a large extent. However, as the model size grows, the weak computing power and battery life of user devices are not sufficient to support training a large number of models locally. With mobile edge computing (MEC), user can offload some of the model training tasks to the edge server for collaborative computation, allowing the edge server to participate in the model training process to improve training efficiency. However, edge servers are not fully trusted, and there is still a risk of privacy leakage if data is directly uploaded to the edge server. To address this issue, we design a local differential privacy (LDP) based data privacy-preserving algorithm and a deep reinforcement learning (DRL) based task offloading algorithm. We also propose a privacy-preserving distributed ML framework for MEC and model the cloud-edge-mobile collaborative training process. These algorithms not only enable effective utilization of edge computing to accelerate machine learning model training but also significantly enhance user privacy and save device battery power. We have conducted experiments to verify the effectiveness of the framework and algorithms.
随着机器学习(ML)技术的不断发展,对数据的需求也越来越大。移动人群感知(MCS)可以通过合理的补偿激励更多用户参与数据收集过程,从而丰富数据规模和覆盖范围。然而,如今用户越来越关注自己的隐私,不愿轻易分享个人数据。因此,保护隐私已成为一个至关重要的问题。在多语言学习(ML)中,联合学习(FL)是一种广为人知的隐私保护技术,其模型训练过程由数据所有者在本地完成,可以在很大程度上保护隐私。然而,随着模型规模的增长,用户设备微弱的计算能力和电池寿命不足以支持本地训练大量模型。利用移动边缘计算(MEC),用户可以将部分模型训练任务卸载到边缘服务器上进行协同计算,让边缘服务器参与模型训练过程,从而提高训练效率。然而,边缘服务器并不完全可信,如果直接将数据上传到边缘服务器,仍然存在隐私泄露的风险。为了解决这个问题,我们设计了一种基于局部差分隐私(LDP)的数据隐私保护算法和一种基于深度强化学习(DRL)的任务卸载算法。我们还为 MEC 提出了一种保护隐私的分布式 ML 框架,并对云-边缘-移动协同训练过程进行了建模。这些算法不仅能有效利用边缘计算加速机器学习模型训练,还能显著提高用户隐私保护并节省设备电池电量。我们通过实验验证了框架和算法的有效性。
{"title":"Privacy-preserving task offloading in mobile edge computing: A deep reinforcement learning approach","authors":"Fanglue Xia, Ying Chen, Jiwei Huang","doi":"10.1002/spe.3314","DOIUrl":"https://doi.org/10.1002/spe.3314","url":null,"abstract":"As machine learning (ML) technologies continue to evolve, there is an increasing demand for data. Mobile crowd sensing (MCS) can motivate more users in the data collection process through reasonable compensation, which can enrich the data scale and coverage. However, nowadays, users are increasingly concerned about their privacy and are unwilling to easily share their personal data. Therefore, protecting privacy has become a crucial issue. In ML, federated learning (FL) is a widely known privacy-preserving technique where the model training process is performed locally by the data owner, which can protect privacy to a large extent. However, as the model size grows, the weak computing power and battery life of user devices are not sufficient to support training a large number of models locally. With mobile edge computing (MEC), user can offload some of the model training tasks to the edge server for collaborative computation, allowing the edge server to participate in the model training process to improve training efficiency. However, edge servers are not fully trusted, and there is still a risk of privacy leakage if data is directly uploaded to the edge server. To address this issue, we design a local differential privacy (LDP) based data privacy-preserving algorithm and a deep reinforcement learning (DRL) based task offloading algorithm. We also propose a privacy-preserving distributed ML framework for MEC and model the cloud-edge-mobile collaborative training process. These algorithms not only enable effective utilization of edge computing to accelerate machine learning model training but also significantly enhance user privacy and save device battery power. We have conducted experiments to verify the effectiveness of the framework and algorithms.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139557260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolution of internal dimensions in object-oriented software–A time series based approach 面向对象软件内部维度的演变--基于时间序列的方法
Pub Date : 2024-01-21 DOI: 10.1002/spe.3310
Bruno L. Sousa, Mariza A. S. Bigonha, Kecia A. M. Ferreira, Glaura C. Franco
Software evolution is the process of adapting, maintaining, and updating a software system. This process concentrates the most significant part of the software costs. Many works have studied software evolution and found relevant insights, such as Lehman's laws. However, there is a gap in how software systems evolve from an internal dimensions point of view. For instance, the literature has indicated how systems grow, for example, linearly, sub-linearly, super-linearly, or following the Pareto distribution. However, a well-defined pattern of how this phenomenon occurs has not been established. This work aims to define a novel method to analyze and predict software evolution. We based our strategy on time series analysis, linear regression techniques, and trend tests. In this study, we applied the proposed model to investigate how the internal structure of object-oriented software systems evolves in terms of four dimensions: coupling, inheritance hierarchy, cohesion, and class size. Applying the proposed method, we identify the functions that better explain how the analyzed dimensions evolve. Besides, we investigate how the relationship between dimension metrics behave over the systems' evolution and the set of classes existing in the systems that affect the evolution of these dimensions. We mined and analyzed data from 46 Java-based open-source projects. We used eight software metrics regarding the dimensions analyzed in this study. The main results of this study reveal ten software evolution properties, among them: coupling, cohesion, and inheritance evolve linearly; a relevant percentage of classes contributes to coupling and size evolution; a small percentage of classes contributes to cohesion evolution; there is no relation between the software internal dimensions' evolution. The results also indicate that our method can accurately predict how the software system will evolve in short-term and long-term predictions.
软件进化是调整、维护和更新软件系统的过程。这一过程集中了软件成本的最主要部分。许多著作对软件进化进行了研究,并发现了相关的见解,如雷曼定律。然而,从内部维度来看,在软件系统如何进化方面还存在差距。例如,文献指出了系统的增长方式,如线性增长、亚线性增长、超线性增长或遵循帕累托分布。然而,这种现象如何发生的明确模式尚未确立。这项工作旨在定义一种分析和预测软件进化的新方法。我们的策略基于时间序列分析、线性回归技术和趋势测试。在本研究中,我们应用所提出的模型,从耦合、继承层次、内聚和类大小四个维度研究了面向对象软件系统的内部结构是如何演变的。应用所提出的方法,我们确定了能更好地解释所分析维度如何演变的函数。此外,我们还研究了维度指标在系统演化过程中的表现与系统中影响这些维度演化的类集之间的关系。我们从 46 个基于 Java 的开源项目中挖掘并分析了数据。我们使用了与本研究分析的维度相关的八个软件度量指标。本研究的主要结果揭示了十种软件进化特性,其中包括:耦合、内聚和继承呈线性进化;相关比例的类会促进耦合和大小的进化;小比例的类会促进内聚的进化;软件内部维度的进化之间没有关系。结果还表明,我们的方法可以准确预测软件系统在短期和长期预测中的演变情况。
{"title":"Evolution of internal dimensions in object-oriented software–A time series based approach","authors":"Bruno L. Sousa, Mariza A. S. Bigonha, Kecia A. M. Ferreira, Glaura C. Franco","doi":"10.1002/spe.3310","DOIUrl":"https://doi.org/10.1002/spe.3310","url":null,"abstract":"Software evolution is the process of adapting, maintaining, and updating a software system. This process concentrates the most significant part of the software costs. Many works have studied software evolution and found relevant insights, such as Lehman's laws. However, there is a gap in how software systems evolve from an internal dimensions point of view. For instance, the literature has indicated how systems grow, for example, linearly, sub-linearly, super-linearly, or following the Pareto distribution. However, a well-defined pattern of how this phenomenon occurs has not been established. This work aims to define a novel method to analyze and predict software evolution. We based our strategy on time series analysis, linear regression techniques, and trend tests. In this study, we applied the proposed model to investigate how the internal structure of object-oriented software systems evolves in terms of four dimensions: coupling, inheritance hierarchy, cohesion, and class size. Applying the proposed method, we identify the functions that better explain how the analyzed dimensions evolve. Besides, we investigate how the relationship between dimension metrics behave over the systems' evolution and the set of classes existing in the systems that affect the evolution of these dimensions. We mined and analyzed data from 46 Java-based open-source projects. We used eight software metrics regarding the dimensions analyzed in this study. The main results of this study reveal ten software evolution properties, among them: coupling, cohesion, and inheritance evolve linearly; a relevant percentage of classes contributes to coupling and size evolution; a small percentage of classes contributes to cohesion evolution; there is no relation between the software internal dimensions' evolution. The results also indicate that our method can accurately predict how the software system will evolve in short-term and long-term predictions.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-demand JSON: A better way to parse documents? 按需 JSON:解析文档的更好方法?
Pub Date : 2024-01-18 DOI: 10.1002/spe.3313
John Keiser, Daniel Lemire
JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.
JSON 是互联网上一种流行的数据交换标准。接收 JSON 文档可能会成为性能瓶颈。一种流行的解析策略是将输入文本转换为基于树的数据结构--有时称为文档对象模型或 DOM。我们设计并实现了一种新颖的 JSON 解析界面(称为 On-Demand),它在程序员看来就像传统的基于 DOM 的方法。然而,其底层实现是指针在内容中迭代,只是将结果(对象、数组、字符串、数字)懒散地具体化。在最新的商品处理器上,我们的方法在多个基准测试中都取得了优异的性能。为确保可重现性,我们的工作作为开源软件免费提供。多个系统使用了 On Demand,例如 Apache Doris、Node.js JavaScript 运行时、Milvus 和 Velox。
{"title":"On-demand JSON: A better way to parse documents?","authors":"John Keiser, Daniel Lemire","doi":"10.1002/spe.3313","DOIUrl":"https://doi.org/10.1002/spe.3313","url":null,"abstract":"JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139495743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experiences and challenges from developing cyber-physical systems in industry-academia collaboration 产学合作开发网络物理系统的经验和挑战
Pub Date : 2024-01-17 DOI: 10.1002/spe.3312
Johan Cederbladh, Romina Eramo, Vittoriano Muttillo, Per Erik Strandberg
Cyber-physical systems (CPSs) are increasing in developmental complexity. Several emerging technologies, such as Model-based engineering, DevOps, and Artificial intelligence, are expected to alleviate the associated complexity by introducing more advanced capabilities. The AIDOaRt research project investigates how the aforementioned technologies can assist in developing complex CPSs in various industrial use cases. In this paper, we discuss the experiences of industry and academia collaborating to improve the development of complex CPSs through the experiences in the research project. In particular, the paper presents the results of two working groups that examined the challenges of developing complex CPSs from an industrial and academic perspective when considering the previously mentioned technologies. We present five identified challenge areas from developing complex CPSs and discuss them from the perspective of industry and academia: data, modeling, requirements engineering, continuous software and system engineering, as well as intelligence and automation. Furthermore, we highlight practical experience in collaboration from the project via two explicit use cases and connect them to the challenge areas. Finally, we discuss some lessons learned through the collaborations, which might foster future collaborative efforts.
网络物理系统(CPS)的开发复杂性与日俱增。一些新兴技术,如基于模型的工程、DevOps和人工智能,有望通过引入更先进的功能来缓解相关的复杂性。AIDOaRt 研究项目调查了上述技术如何在各种工业用例中协助开发复杂的 CPS。在本文中,我们将通过该研究项目的经验,讨论工业界和学术界合作改进复杂 CPS 开发的经验。特别是,本文介绍了两个工作组的成果,这两个工作组从工业和学术的角度研究了开发复杂 CPS 所面临的挑战,并考虑了前面提到的技术。我们介绍了开发复杂 CPS 所面临的五个挑战领域,并从工业界和学术界的角度对其进行了讨论:数据、建模、需求工程、连续软件和系统工程,以及智能化和自动化。此外,我们还通过两个明确的使用案例强调了项目合作中的实践经验,并将它们与挑战领域联系起来。最后,我们讨论了从合作中学到的一些经验,这些经验可能会促进未来的合作努力。
{"title":"Experiences and challenges from developing cyber-physical systems in industry-academia collaboration","authors":"Johan Cederbladh, Romina Eramo, Vittoriano Muttillo, Per Erik Strandberg","doi":"10.1002/spe.3312","DOIUrl":"https://doi.org/10.1002/spe.3312","url":null,"abstract":"Cyber-physical systems (CPSs) are increasing in developmental complexity. Several emerging technologies, such as Model-based engineering, DevOps, and Artificial intelligence, are expected to alleviate the associated complexity by introducing more advanced capabilities. The AIDOaRt research project investigates how the aforementioned technologies can assist in developing complex CPSs in various industrial use cases. In this paper, we discuss the experiences of industry and academia collaborating to improve the development of complex CPSs through the experiences in the research project. In particular, the paper presents the results of two working groups that examined the challenges of developing complex CPSs from an industrial and academic perspective when considering the previously mentioned technologies. We present five identified challenge areas from developing complex CPSs and discuss them from the perspective of industry and academia: data, modeling, requirements engineering, continuous software and system engineering, as well as intelligence and automation. Furthermore, we highlight practical experience in collaboration from the project via two explicit use cases and connect them to the challenge areas. Finally, we discuss some lessons learned through the collaborations, which might foster future collaborative efforts.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139495747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Software: Practice and Experience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1