首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
Enabling semi-supervised learning in intrusion detection systems 在入侵检测系统中实现半监督学习
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-12 DOI: 10.1016/j.jpdc.2024.105010
Panagis Sarantos , John Violos , Aris Leivadeas
Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.
入侵检测系统(IDS)是一种警报网络安全工具,它通过分析网络流量来识别可疑活动和已知威胁。最先进的入侵检测系统依赖于有监督的机器学习模型,这些模型经过训练,能利用历史标注数据集对网络流量进行分类。然而,下一代网络具有异构和动态的特点。异构性使每个网络环境都大不相同,而动态性则意味着新的威胁不断出现。这两个因素提出了一个研究问题,即基于监督机器学习的 IDS 能否在不同于产生其标注训练数据的网络环境中有效工作。在本文中,我们首先给出了这一研究问题的答案,然后尝试提出一种半监督学习方法,这种方法可以在不同的网络环境中使用无标记数据进行充分推广,同时考虑到与有标记数据相比,无标记数据更容易收集且成本更低。为了验证这一概念,我们使用两个公开的标注数据集 CIC-IDS2017 和 CIC-IDS2018 以及一个非标注数据集 PS-Azure2023 进行了实验。结果证实了我们的假设以及半监督学习范式在 IDS 设计中的适用性。
{"title":"Enabling semi-supervised learning in intrusion detection systems","authors":"Panagis Sarantos ,&nbsp;John Violos ,&nbsp;Aris Leivadeas","doi":"10.1016/j.jpdc.2024.105010","DOIUrl":"10.1016/j.jpdc.2024.105010","url":null,"abstract":"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105010"},"PeriodicalIF":3.4,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fault-tolerance in biswapped multiprocessor interconnection networks 双交换多处理器互连网络中的容错性
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-05 DOI: 10.1016/j.jpdc.2024.105009
Basem Assiri , Muhammad Faisal Nadeem , Waqar Ali , Ali Ahmad
Interconnection networks play a vital role in connecting many sets of processor memories, known as processing vertices. Recently, multiprocessor interconnection networks have obtained much attention due to their cost-effectiveness and wide applications in parallel multi-processor systems connecting processors and memory modules. A locating set in a computer network is to select certain nodes, called the locating set, whose positions determine the positions of all other nodes in the network. The locating number is defined as the minimum size of the locating set needed to identify all network vertices. Since, if any single node fails within the locating set, that set is no longer able to identify all the nodes within the network, if the remaining nodes in the locating set can still locate all other network nodes, then it is termed a fault-tolerant locating set. The fault tolerance becomes highly essential in multiprocessor networks, wherein every processor is subject to an absolute failure, to guarantee the system is at full capacity in case one or more components fail. In this study, we determine the fault-tolerant locating number of biswapped networks by considering different classes of networks as base clusters.
互连网络在连接多组处理器存储器(称为处理顶点)方面发挥着重要作用。近来,多处理器互连网络因其成本效益高以及在连接处理器和内存模块的并行多处理器系统中的广泛应用而备受关注。计算机网络中的定位集就是选择某些节点(称为定位集),这些节点的位置决定了网络中所有其他节点的位置。定位数被定义为识别所有网络顶点所需的最小定位集大小。如果定位集中的任何单个节点发生故障,定位集就无法再识别网络中的所有节点,如果定位集中的其余节点仍能定位所有其他网络节点,则称为容错定位集。在多处理器网络中,每个处理器都有可能出现绝对故障,因此容错就变得非常重要,以保证在一个或多个组件出现故障时系统仍能满负荷运行。在本研究中,我们以不同类别的网络为基础群组,确定了双交换网络的容错定位数。
{"title":"Fault-tolerance in biswapped multiprocessor interconnection networks","authors":"Basem Assiri ,&nbsp;Muhammad Faisal Nadeem ,&nbsp;Waqar Ali ,&nbsp;Ali Ahmad","doi":"10.1016/j.jpdc.2024.105009","DOIUrl":"10.1016/j.jpdc.2024.105009","url":null,"abstract":"<div><div>Interconnection networks play a vital role in connecting many sets of processor memories, known as processing vertices. Recently, multiprocessor interconnection networks have obtained much attention due to their cost-effectiveness and wide applications in parallel multi-processor systems connecting processors and memory modules. A locating set in a computer network is to select certain nodes, called the locating set, whose positions determine the positions of all other nodes in the network. The locating number is defined as the minimum size of the locating set needed to identify all network vertices. Since, if any single node fails within the locating set, that set is no longer able to identify all the nodes within the network, if the remaining nodes in the locating set can still locate all other network nodes, then it is termed a fault-tolerant locating set. The fault tolerance becomes highly essential in multiprocessor networks, wherein every processor is subject to an absolute failure, to guarantee the system is at full capacity in case one or more components fail. In this study, we determine the fault-tolerant locating number of biswapped networks by considering different classes of networks as base clusters.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105009"},"PeriodicalIF":3.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-02 DOI: 10.1016/S0743-7315(24)00167-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00167-9","DOIUrl":"10.1016/S0743-7315(24)00167-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 105003"},"PeriodicalIF":3.4,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and experimental evaluation of algorithms for optimizing the throughput of dispersed computing 设计和实验评估优化分散计算吞吐量的算法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-29 DOI: 10.1016/j.jpdc.2024.104999
Xiangchen Zhao , Diyi Hu, Bhaskar Krishnamachari
We introduce three optimized scheduling algorithms for dispersed computing and present JupiterTP, a real-world system built on k8s and the prior Jupiter system, enabling end-to-end computation on distributed clusters. Distinguishing itself from traditional throughput optimization approaches that focus on theory and simulations, our work is the first implementation of such an end-to-end system capable of handling arbitrary DAGs across diverse computing networks, including public clouds, IoT systems, and edge networks. Beyond mere scheduling, JupiterTP integrates profilers, execution, and orchestration engines, offering unified interfaces for additional scheduling algorithm integrations. The system's performance is tested on real clusters and real applications, compared to prior work that relied on simulations alone. We make JupiterTP available to the community as open-source software at https://github.com/ANRGUSC/JupiterTP.
我们介绍了三种针对分散计算的优化调度算法,并展示了基于 k8s 和先前 Jupiter 系统的实际系统 JupiterTP,该系统可在分布式集群上实现端到端计算。与注重理论和模拟的传统吞吐量优化方法不同,我们的工作是首次实现这种端到端系统,它能够在包括公共云、物联网系统和边缘网络在内的各种计算网络中处理任意 DAG。除了单纯的调度,JupiterTP 还集成了剖析器、执行和协调引擎,为其他调度算法集成提供了统一接口。与之前仅依赖模拟的工作相比,该系统的性能在真实集群和真实应用上进行了测试。我们将 JupiterTP 作为开源软件提供给社区,网址是 https://github.com/ANRGUSC/JupiterTP。
{"title":"Design and experimental evaluation of algorithms for optimizing the throughput of dispersed computing","authors":"Xiangchen Zhao ,&nbsp;Diyi Hu,&nbsp;Bhaskar Krishnamachari","doi":"10.1016/j.jpdc.2024.104999","DOIUrl":"10.1016/j.jpdc.2024.104999","url":null,"abstract":"<div><div>We introduce three optimized scheduling algorithms for dispersed computing and present JupiterTP, a real-world system built on k8s and the prior Jupiter system, enabling end-to-end computation on distributed clusters. Distinguishing itself from traditional throughput optimization approaches that focus on theory and simulations, our work is the first implementation of such an end-to-end system capable of handling arbitrary DAGs across diverse computing networks, including public clouds, IoT systems, and edge networks. Beyond mere scheduling, JupiterTP integrates profilers, execution, and orchestration engines, offering unified interfaces for additional scheduling algorithm integrations. The system's performance is tested on real clusters and real applications, compared to prior work that relied on simulations alone. We make JupiterTP available to the community as open-source software at <span><span>https://github.com/ANRGUSC/JupiterTP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104999"},"PeriodicalIF":3.4,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hands-on parallel & distributed computing with Raspberry Pi devices and clusters 使用 Raspberry Pi 设备和集群进行并行和分布式计算实践
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-28 DOI: 10.1016/j.jpdc.2024.104996
Elizabeth Shoop , Suzanne J. Matthews , Richard Brown , Joel C. Adams
Parallel and distributed computing (PDC) concepts are now required topics for accredited undergraduate computer science programs. However, introducing PDC into the CS curriculum is challenging for several reasons, including an instructor's lack of PDC knowledge and difficulties in accessing PDC hardware. This paper addresses both of these challenges by presenting free, interactive, web-based PDC teaching modules using inexpensive Raspberry Pi single board computers (SBCs). Our materials include a free disk image that makes it possible for instructors to build Raspberry Pi clusters in minutes and use our software in a variety of curricular contexts. Our multi-year assessment of these materials on students and faculty members indicates that: (i) our materials increased students' confidence regarding important PDC concepts and motivated them to study PDC further; and (ii) our materials increased faculty members' confidence and preparedness in teaching key PDC concepts at their own institutions.
并行和分布式计算(PDC)概念现已成为经认证的计算机科学本科课程的必修课程。然而,将 PDC 引入计算机科学课程具有挑战性,原因有几个,包括教师缺乏 PDC 知识和难以获得 PDC 硬件。本文利用价格低廉的 Raspberry Pi 单板计算机 (SBC),提供免费、交互式、基于网络的 PDC 教学模块,以解决这两个难题。我们的教材包括一个免费的磁盘映像,使教师能够在几分钟内构建 Raspberry Pi 集群,并在各种课程环境中使用我们的软件。我们对学生和教师使用这些教材的多年评估表明(i) 我们的教材增强了学生对 PDC 重要概念的信心,激发了他们进一步学习 PDC 的动力;(ii) 我们的教材增强了教员的信心,使他们更有准备在自己的机构中教授 PDC 的关键概念。
{"title":"Hands-on parallel & distributed computing with Raspberry Pi devices and clusters","authors":"Elizabeth Shoop ,&nbsp;Suzanne J. Matthews ,&nbsp;Richard Brown ,&nbsp;Joel C. Adams","doi":"10.1016/j.jpdc.2024.104996","DOIUrl":"10.1016/j.jpdc.2024.104996","url":null,"abstract":"<div><div>Parallel and distributed computing (PDC) concepts are now required topics for accredited undergraduate computer science programs. However, introducing PDC into the CS curriculum is challenging for several reasons, including an instructor's lack of PDC knowledge and difficulties in accessing PDC hardware. This paper addresses both of these challenges by presenting free, interactive, web-based PDC teaching modules using inexpensive Raspberry Pi single board computers (SBCs). Our materials include a free disk image that makes it possible for instructors to build Raspberry Pi clusters in minutes and use our software in a variety of curricular contexts. Our multi-year assessment of these materials on students and faculty members indicates that: (i) our materials increased students' confidence regarding important PDC concepts and motivated them to study PDC further; and (ii) our materials increased faculty members' confidence and preparedness in teaching key PDC concepts at their own institutions.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104996"},"PeriodicalIF":3.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-static conditions in low-latency C++ for high frequency trading: Better than branch prediction hints 高频交易低延迟 C++ 中的半静态条件优于分支预测提示
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1016/j.jpdc.2024.105000
Paul Alexander Bilokon, Maximilian Lucuta, Erez Shermer
Conditional branches pose a challenge for code optimisation, particularly in low latency settings. We present a novel language construct, referred to as a semi-static condition, which enables programmers to dynamically modify the direction of a branch at run-time by modifying the assembly code within the underlying executable. Subsequently, we explore scenarios where the use of semi-static conditions outperforms traditional conditional branching, highlighting their potential applications in real-time machine learning and high-frequency trading (HFT). Throughout the development process, key considerations of performance, portability, syntax, and security were taken into account.
条件分支给代码优化带来了挑战,尤其是在低延迟环境下。我们提出了一种被称为半静态条件的新型语言结构,它能让程序员通过修改底层可执行文件中的汇编代码,在运行时动态修改分支的方向。随后,我们探讨了使用半静态条件优于传统条件分支的应用场景,并重点介绍了半静态条件在实时机器学习和高频交易(HFT)中的潜在应用。在整个开发过程中,我们考虑了性能、可移植性、语法和安全性等关键因素。
{"title":"Semi-static conditions in low-latency C++ for high frequency trading: Better than branch prediction hints","authors":"Paul Alexander Bilokon,&nbsp;Maximilian Lucuta,&nbsp;Erez Shermer","doi":"10.1016/j.jpdc.2024.105000","DOIUrl":"10.1016/j.jpdc.2024.105000","url":null,"abstract":"<div><div>Conditional branches pose a challenge for code optimisation, particularly in low latency settings. We present a novel language construct, referred to as a semi-static condition, which enables programmers to dynamically modify the direction of a branch at run-time by modifying the assembly code within the underlying executable. Subsequently, we explore scenarios where the use of semi-static conditions outperforms traditional conditional branching, highlighting their potential applications in real-time machine learning and high-frequency trading (HFT). Throughout the development process, key considerations of performance, portability, syntax, and security were taken into account.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105000"},"PeriodicalIF":3.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locating a black hole in a dynamic ring 定位动态环中的黑洞
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-18 DOI: 10.1016/j.jpdc.2024.104998
Giuseppe Antonio Di Luna , Paola Flocchini , Giuseppe Prencipe , Nicola Santoro
In networked environments supporting mobile agents, a pressing problem is the presence of network sites harmful for the agents. In this paper we consider the danger posed by a node that destroys any incoming agent without leaving any trace. Such a dangerous node is known in the literature as a black hole (Bh). The problem of a team of system agents determining its location, known as black hole search (Bhs ), has been extensively studied in the literature under a variety of assumptions, both in synchronous and asynchronous settings. The main complexity parameter of Bhs is the number of system agents (called size) needed to solve the problem; other parameters are the number of moves (called cost) performed by the agents, and the time until termination.
In the existing literature, with only a couple of exceptions, all results are based on a common assumption that the network is static, i.e. its topology does not change in time. We consider instead the Bhs when the network is dynamic: the link structure of the graph changes over time. While time-varying graphs have been the focus of intense research in the last two decades, very little is known on the problem of locating the Bh in such networks.
In this paper, we contribute to fill this research gap by studying Bhs in dynamic ring networks, focusing on the 1-interval connectivity adversarial dynamics. Feasibility and complexity of the problem depend on many factors, specifically on the size n of the ring, whether or not n is known, and the type of inter-agent communication (whiteboards, tokens, face-to-face, visual). In this paper, we provide a complete feasibility characterization presenting size optimal algorithms. Furthermore, we establish lower bounds on the cost and time of size-optimal solutions and show that our algorithms achieve those bounds.
在支持移动代理的网络环境中,一个亟待解决的问题是存在对代理有害的网络站点。在本文中,我们考虑的是一个节点所带来的危险,它能摧毁任何进入的代理,而不留下任何痕迹。这种危险节点在文献中被称为黑洞(Bh)。在各种假设条件下,文献对同步和异步环境下系统代理团队确定其位置的问题(称为黑洞搜索(Bhs))进行了广泛研究。Bhs 的主要复杂度参数是解决问题所需的系统代理数量(称为规模);其他参数包括代理执行的移动次数(称为成本)和直到终止的时间。在现有文献中,除少数例外情况外,所有结果都基于一个共同假设,即网络是静态的,即其拓扑结构不会随时间发生变化。而我们考虑的是网络动态时的 Bhs:图的链接结构随时间变化。虽然时变图在过去二十年中一直是研究的热点,但人们对在这类网络中定位 Bh 的问题却知之甚少。在本文中,我们通过研究动态环网中的 Bhs,填补了这一研究空白,重点研究了 1 间隔连接对抗动态。问题的可行性和复杂性取决于很多因素,特别是环的大小 n、n 是否已知以及代理间通信的类型(白板、令牌、面对面、可视)。在本文中,我们提供了一个完整的可行性表征,介绍了大小最优的算法。此外,我们还建立了大小最优解的成本和时间下限,并证明我们的算法达到了这些下限。
{"title":"Locating a black hole in a dynamic ring","authors":"Giuseppe Antonio Di Luna ,&nbsp;Paola Flocchini ,&nbsp;Giuseppe Prencipe ,&nbsp;Nicola Santoro","doi":"10.1016/j.jpdc.2024.104998","DOIUrl":"10.1016/j.jpdc.2024.104998","url":null,"abstract":"<div><div>In networked environments supporting mobile agents, a pressing problem is the presence of network sites harmful for the agents. In this paper we consider the danger posed by a node that destroys any incoming agent without leaving any trace. Such a dangerous node is known in the literature as a <em>black hole</em> (<span>Bh</span>). The problem of a team of system agents determining its location, known as <em>black hole search</em> (<span>Bhs</span> ), has been extensively studied in the literature under a variety of assumptions, both in synchronous and asynchronous settings. The main complexity parameter of <span>Bhs</span> <!-->is the number of system agents (called <em>size</em>) needed to solve the problem; other parameters are the number of moves (called <em>cost</em>) performed by the agents, and the <em>time</em> until termination.</div><div>In the existing literature, with only a couple of exceptions, all results are based on a common assumption that the network is <em>static</em>, i.e. its topology does not change in time. We consider instead the <span>Bhs</span> <!-->when the network is <em>dynamic</em>: the link structure of the graph changes over time. While time-varying graphs have been the focus of intense research in the last two decades, very little is known on the problem of locating the <span>Bh</span> in such networks.</div><div>In this paper, we contribute to fill this research gap by studying <span>Bhs</span> <!-->in <em>dynamic ring</em> networks, focusing on the <em>1-interval connectivity</em> adversarial dynamics. Feasibility and complexity of the problem depend on many factors, specifically on the size <em>n</em> of the ring, whether or not <em>n</em> is known, and the type of inter-agent communication (whiteboards, tokens, face-to-face, visual). In this paper, we provide a <em>complete</em> feasibility characterization presenting size optimal algorithms. Furthermore, we establish lower bounds on the cost and time of size-optimal solutions and show that our algorithms achieve those bounds.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104998"},"PeriodicalIF":3.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OASR-WFBP: An overlapping aware start-up sharing gradient merging strategy for efficient communication in distributed deep learning OASR-WFBP:分布式深度学习中高效通信的重叠感知启动共享梯度合并策略
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-17 DOI: 10.1016/j.jpdc.2024.104997
Yingjie Song , Zhuo Tang , Yaohua Wang , Xiong Xiao , Zhizhong Liu , Jing Xia , Kenli Li
Wait-Free-Back-Propagation (WFBP) is a practical method for distributed deep-learning, but it suffers from a high communication overhead. To address this issue, the communication overhead can be reduced by overlapping gradient communication and computation, and sharing the startup time among multiple gradient communication phases. However, existing optimizations choose to share the startup time greedily and fail to coordinately exploit the overlapping opportunity between computation and communication. We propose an overlapping aware startup sharing Wait-Free-Back-Propagation (OASR-WFBP). An analytic model is designed to guide the sharing procedure. Evaluations show that OASR-WFBP achieves a 5%-16% optimization in iteration time over the state-of-the-art WFBP algorithm.
无等待反向传播(WFBP)是分布式深度学习的一种实用方法,但它存在通信开销大的问题。为了解决这个问题,可以通过重叠梯度通信和计算,以及在多个梯度通信阶段之间共享启动时间来减少通信开销。然而,现有的优化方案选择贪婪地共享启动时间,未能协调利用计算和通信之间的重叠机会。我们提出了一种重叠感知启动共享等待-自由-回传(OASR-WFBP)。我们设计了一个分析模型来指导共享程序。评估结果表明,与最先进的 WFBP 算法相比,OASR-WFBP 在迭代时间上实现了 5%-16% 的优化。
{"title":"OASR-WFBP: An overlapping aware start-up sharing gradient merging strategy for efficient communication in distributed deep learning","authors":"Yingjie Song ,&nbsp;Zhuo Tang ,&nbsp;Yaohua Wang ,&nbsp;Xiong Xiao ,&nbsp;Zhizhong Liu ,&nbsp;Jing Xia ,&nbsp;Kenli Li","doi":"10.1016/j.jpdc.2024.104997","DOIUrl":"10.1016/j.jpdc.2024.104997","url":null,"abstract":"<div><div>Wait-Free-Back-Propagation (WFBP) is a practical method for distributed deep-learning, but it suffers from a high communication overhead. To address this issue, the communication overhead can be reduced by overlapping gradient communication and computation, and sharing the startup time among multiple gradient communication phases. However, existing optimizations choose to share the startup time greedily and fail to coordinately exploit the overlapping opportunity between computation and communication. We propose an overlapping aware startup sharing Wait-Free-Back-Propagation (OASR-WFBP). An analytic model is designed to guide the sharing procedure. Evaluations show that OASR-WFBP achieves a 5%-16% optimization in iteration time over the state-of-the-art WFBP algorithm.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104997"},"PeriodicalIF":3.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance 迈向超大规模的高速湍流:STREAmS-2 移植与性能
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-15 DOI: 10.1016/j.jpdc.2024.104993
Srikanth Sathyanarayana , Matteo Bernardini , Davide Modesti , Sergio Pirozzoli , Francesco Salvadore
Exascale High Performance Computing (HPC) represents a tremendous opportunity to push the boundaries of Computational Fluid Dynamics (CFD), but despite the consolidated trend towards the use of Graphics Processing Units (GPUs), programmability is still an issue. STREAmS-2 (Bernardini et al. Comput. Phys. Commun. 285 (2023) 108644) is a compressible solver for canonical wall-bounded turbulent flows capable of harvesting the potential of NVIDIA GPUs. Here we extend the already available CUDA Fortran backend with a novel HIP backend targeting AMD GPU architectures. The main implementation strategies are discussed along with a novel Python tool that can generate the HIP and CPU code versions allowing developers to focus their attention only on the CUDA Fortran backend. Single GPU performance is analyzed focusing on NVIDIA A100 and AMD MI250x cards which are currently at the core of several HPC clusters. The gap between peak GPU performance and STREAmS-2 performance is found to be generally smaller for NVIDIA cards. Roofline analysis allows tracing this behavior to unexpectedly different computational intensities of the same kernel using the two cards. Additional single-GPU comparisons are performed to assess the impact of grid size, number of parallelized loops, thread masking and thread divergence. Parallel performance is measured on the two largest EuroHPC pre-exascale systems, LUMI (AMD GPUs) and Leonardo (NVIDIA GPUs). Strong scalability reveals more than 80% efficiency up to 16 nodes for Leonardo and up to 32 for LUMI. Weak scalability shows an impressive efficiency of over 95% up to the maximum number of nodes tested (256 for LUMI and 512 for Leonardo). This analysis shows that STREAmS-2 is the perfect candidate to fully exploit the power of current pre-exascale HPC systems in Europe, allowing users to simulate flows with over a trillion mesh points, thus reducing the gap between the Reynolds numbers achievable in high-fidelity simulations and those of real engineering applications.
超大规模高性能计算(HPC)为推动计算流体力学(CFD)的发展提供了巨大机遇,但尽管使用图形处理器(GPU)已成为大势所趋,可编程性仍是一个问题。STREAmS-2(Bernardini et al.Phys.285 (2023) 108644)是一个用于典型壁界湍流的可压缩求解器,能够充分利用英伟达™(NVIDIA®)图形处理器的潜力。在此,我们使用针对 AMD GPU 架构的新型 HIP 后端扩展了已有的 CUDA Fortran 后端。我们讨论了主要的实施策略,以及一个新颖的 Python 工具,该工具可以生成 HIP 和 CPU 代码版本,使开发人员只需关注 CUDA Fortran 后端。分析的重点是英伟达™(NVIDIA®)A100 和 AMD MI250x 显卡的单 GPU 性能,这些显卡目前是多个高性能计算集群的核心。研究发现,英伟达™(NVIDIA®)显卡的 GPU 峰值性能与 STREAmS-2 性能之间的差距通常较小。通过屋顶线分析,可以追溯到使用这两种显卡的同一内核的计算强度出乎意料地不同。还进行了其他单 GPU 比较,以评估网格大小、并行循环数量、线程屏蔽和线程分歧的影响。并行性能在两个最大的 EuroHPC 预级联系统 LUMI(AMD GPU)和 Leonardo(NVIDIA GPU)上进行了测量。强可扩展性表明,Leonardo 16 节点和 LUMI 32 节点的效率分别超过 80%。弱可扩展性显示,在测试的最大节点数(LUMI 为 256 节点,Leonardo 为 512 节点)范围内,效率超过 95%,令人印象深刻。这项分析表明,STREAmS-2 是充分利用欧洲当前超大规模前 HPC 系统能力的最佳选择,它允许用户模拟超过万亿个网格点的流动,从而缩小了高保真模拟中可实现的雷诺数与实际工程应用中的雷诺数之间的差距。
{"title":"High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance","authors":"Srikanth Sathyanarayana ,&nbsp;Matteo Bernardini ,&nbsp;Davide Modesti ,&nbsp;Sergio Pirozzoli ,&nbsp;Francesco Salvadore","doi":"10.1016/j.jpdc.2024.104993","DOIUrl":"10.1016/j.jpdc.2024.104993","url":null,"abstract":"<div><div>Exascale High Performance Computing (HPC) represents a tremendous opportunity to push the boundaries of Computational Fluid Dynamics (CFD), but despite the consolidated trend towards the use of Graphics Processing Units (GPUs), programmability is still an issue. STREAmS-2 (Bernardini et al. Comput. Phys. Commun. 285 (2023) 108644) is a compressible solver for canonical wall-bounded turbulent flows capable of harvesting the potential of NVIDIA GPUs. Here we extend the already available CUDA Fortran backend with a novel HIP backend targeting AMD GPU architectures. The main implementation strategies are discussed along with a novel Python tool that can generate the HIP and CPU code versions allowing developers to focus their attention only on the CUDA Fortran backend. Single GPU performance is analyzed focusing on NVIDIA A100 and AMD MI250x cards which are currently at the core of several HPC clusters. The gap between peak GPU performance and STREAmS-2 performance is found to be generally smaller for NVIDIA cards. Roofline analysis allows tracing this behavior to unexpectedly different computational intensities of the same kernel using the two cards. Additional single-GPU comparisons are performed to assess the impact of grid size, number of parallelized loops, thread masking and thread divergence. Parallel performance is measured on the two largest EuroHPC pre-exascale systems, LUMI (AMD GPUs) and Leonardo (NVIDIA GPUs). Strong scalability reveals more than 80% efficiency up to 16 nodes for Leonardo and up to 32 for LUMI. Weak scalability shows an impressive efficiency of over 95% up to the maximum number of nodes tested (256 for LUMI and 512 for Leonardo). This analysis shows that STREAmS-2 is the perfect candidate to fully exploit the power of current pre-exascale HPC systems in Europe, allowing users to simulate flows with over a trillion mesh points, thus reducing the gap between the Reynolds numbers achievable in high-fidelity simulations and those of real engineering applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104993"},"PeriodicalIF":3.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A zero-knowledge proof federated learning on DLT for healthcare data 针对医疗保健数据的零知识证明联合学习 DLT
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-11 DOI: 10.1016/j.jpdc.2024.104992
Lorenzo Petrosino , Luigi Masi , Federico D'Antoni , Mario Merone , Luca Vollero
With the increasingly widespread adoption of Healthcare 4.0 practices, new challenges have arisen for the utilization of collected sensitive data. On the one hand, these data have immense potential to unlock valuable insights for personalized medicine, early disease detection, and predictive analysis thanks to the use of Artificial Intelligence. On the other hand, ensuring the protection of patient privacy is of paramount importance to maintain trust and uphold ethical practices within the healthcare system. Classical centralized learning approaches do not fit well with the privacy and security requirements imposed by the law and the sensitivity of treated data, which is why decentralized learning approaches are gaining ground. Among these, Federated Learning (FL) stands out as a viable alternative, providing greater security and performance comparable to classic centralized learning approaches. However, there are still various attacks targeting the local parameters or gradients updated by the participants. Therefore, we present our architecture based on the conjunction of Zero-Knowledge Proof, FL, and blockchain that implements also the decentralized identifier standard. The adoption of this architecture can grant the execution, management, supervision, and updating of the FL process, guaranteeing the resilience of the system and the reliability and traceability of exchanged data. In order to test the performance, robustness, and implementation costs of the proposed architecture, we develop a case study on the prediction of blood glucose levels in people with Type-1-diabetes. The results of our analysis show an improved system in terms of balance between performance privacy and security, guaranteeing high levels of verifiability, therefore proving the proposed architecture suitable for most of the FL processes needed in the healthcare field.
随着医疗保健 4.0 的应用越来越广泛,如何利用收集到的敏感数据也面临着新的挑战。一方面,由于人工智能的使用,这些数据具有巨大的潜力,可以为个性化医疗、早期疾病检测和预测分析提供有价值的见解。另一方面,确保保护患者隐私对于维护医疗系统内的信任和道德规范至关重要。传统的集中式学习方法与法律规定的隐私和安全要求以及治疗数据的敏感性不相适应,因此分散式学习方法逐渐受到重视。其中,联邦学习(FL)作为一种可行的替代方法脱颖而出,它提供了更高的安全性,其性能可与传统的集中式学习方法相媲美。然而,针对参与者更新的本地参数或梯度的攻击仍层出不穷。因此,我们提出了基于零知识证明(Zero-Knowledge Proof)、FL 和区块链(blockchain)的架构,该架构还实现了去中心化标识符标准。采用这种架构可以执行、管理、监督和更新 FL 流程,保证系统的弹性以及交换数据的可靠性和可追溯性。为了测试拟议架构的性能、稳健性和实施成本,我们开发了一个预测 1 型糖尿病患者血糖水平的案例研究。我们的分析结果表明,该系统在性能、隐私和安全性之间的平衡方面有所改进,保证了高水平的可验证性,因此证明了所提出的架构适用于医疗保健领域所需的大多数 FL 流程。
{"title":"A zero-knowledge proof federated learning on DLT for healthcare data","authors":"Lorenzo Petrosino ,&nbsp;Luigi Masi ,&nbsp;Federico D'Antoni ,&nbsp;Mario Merone ,&nbsp;Luca Vollero","doi":"10.1016/j.jpdc.2024.104992","DOIUrl":"10.1016/j.jpdc.2024.104992","url":null,"abstract":"<div><div>With the increasingly widespread adoption of Healthcare 4.0 practices, new challenges have arisen for the utilization of collected sensitive data. On the one hand, these data have immense potential to unlock valuable insights for personalized medicine, early disease detection, and predictive analysis thanks to the use of Artificial Intelligence. On the other hand, ensuring the protection of patient privacy is of paramount importance to maintain trust and uphold ethical practices within the healthcare system. Classical centralized learning approaches do not fit well with the privacy and security requirements imposed by the law and the sensitivity of treated data, which is why decentralized learning approaches are gaining ground. Among these, Federated Learning (FL) stands out as a viable alternative, providing greater security and performance comparable to classic centralized learning approaches. However, there are still various attacks targeting the local parameters or gradients updated by the participants. Therefore, we present our architecture based on the conjunction of Zero-Knowledge Proof, FL, and blockchain that implements also the decentralized identifier standard. The adoption of this architecture can grant the execution, management, supervision, and updating of the FL process, guaranteeing the resilience of the system and the reliability and traceability of exchanged data. In order to test the performance, robustness, and implementation costs of the proposed architecture, we develop a case study on the prediction of blood glucose levels in people with Type-1-diabetes. The results of our analysis show an improved system in terms of balance between performance privacy and security, guaranteeing high levels of verifiability, therefore proving the proposed architecture suitable for most of the FL processes needed in the healthcare field.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104992"},"PeriodicalIF":3.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1