首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
AFS-GNN: Adaptive and fast scheduling system for distributed GNN training AFS-GNN:分布式GNN训练自适应快速调度系统
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-05-01 Epub Date: 2026-01-08 DOI: 10.1016/j.jpdc.2026.105225
Yuting Gao, Yongqiang Gao, Yongmei Liu
Graph Neural Networks (GNNs) have become core models for learning from relational data in domains such as transportation, social networks, and recommender systems. However, distributed GNN training on large graphs suffers from severe GPU workload imbalance and high communication cost caused by dynamic mini-batch sampling and large structural differences among nodes. To address these challenges, we propose AFS-GNN, a scheduling-aware adaptive framework that achieves fine-grained workload balancing in distributed GNN training. AFS-GNN continuously monitors per-GPU mini-batch execution time through lightweight runtime agents and employs Kalman filtering to suppress transient fluctuations and detect persistent imbalance trends. Upon imbalance detection, it constructs a Hierarchical Dependency Graph (HDG) that explicitly captures multi-hop aggregation dependencies and node-level computational costs. Guided by a heuristic load estimator, AFS-GNN applies cost-aware spectral bipartitioning via the Fiedler vector to select structurally coherent migration blocks that minimize inter-GPU communication while maintaining computational consistency. Selected blocks are migrated asynchronously across devices using intra-node or inter-process communication, ensuring non-blocking execution. Extensive experiments on large-scale benchmarks-ogbn-products and ogbn-papers100M-demonstrate that AFS-GNN achieves up to 21.7% acceleration over Euler, 15% over DistDGL, and 13.7% over FlexGraph, while maintaining stable convergence and scalability across diverse batch sizes and partition configurations.
图神经网络(gnn)已经成为交通、社交网络和推荐系统等领域中从关系数据中学习的核心模型。然而,在大型图上进行分布式GNN训练时,由于动态小批量采样和节点间结构差异大,导致GPU工作负载严重不平衡,通信成本高。为了解决这些挑战,我们提出了AFS-GNN,这是一个调度感知的自适应框架,可以在分布式GNN训练中实现细粒度的工作负载平衡。AFS-GNN通过轻量级运行代理持续监控每个gpu的小批量执行时间,并采用卡尔曼滤波来抑制瞬态波动并检测持续的不平衡趋势。在不平衡检测后,构建层次依赖图(HDG),显式捕获多跳聚合依赖关系和节点级计算成本。在启发式负载估计器的指导下,AFS-GNN通过费德勒矢量应用成本感知谱双分区来选择结构上一致的迁移块,在保持计算一致性的同时最小化gpu间的通信。选择的块通过节点内或进程间通信在设备之间异步迁移,确保非阻塞执行。在大规模基准测试(ogbn-products和ogbn-paper)上进行的大量实验表明,AFS-GNN比Euler实现了高达21.7%的加速,比DistDGL实现了15%的加速,比FlexGraph实现了13.7%的加速,同时在不同的批大小和分区配置中保持了稳定的收敛和可扩展性。
{"title":"AFS-GNN: Adaptive and fast scheduling system for distributed GNN training","authors":"Yuting Gao,&nbsp;Yongqiang Gao,&nbsp;Yongmei Liu","doi":"10.1016/j.jpdc.2026.105225","DOIUrl":"10.1016/j.jpdc.2026.105225","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) have become core models for learning from relational data in domains such as transportation, social networks, and recommender systems. However, distributed GNN training on large graphs suffers from severe GPU workload imbalance and high communication cost caused by dynamic mini-batch sampling and large structural differences among nodes. To address these challenges, we propose AFS-GNN, a scheduling-aware adaptive framework that achieves fine-grained workload balancing in distributed GNN training. AFS-GNN continuously monitors per-GPU mini-batch execution time through lightweight runtime agents and employs Kalman filtering to suppress transient fluctuations and detect persistent imbalance trends. Upon imbalance detection, it constructs a Hierarchical Dependency Graph (HDG) that explicitly captures multi-hop aggregation dependencies and node-level computational costs. Guided by a heuristic load estimator, AFS-GNN applies cost-aware spectral bipartitioning via the Fiedler vector to select structurally coherent migration blocks that minimize inter-GPU communication while maintaining computational consistency. Selected blocks are migrated asynchronously across devices using intra-node or inter-process communication, ensuring non-blocking execution. Extensive experiments on large-scale benchmarks-<em>ogbn-products</em> and <em>ogbn-papers100M</em>-demonstrate that AFS-GNN achieves up to 21.7% acceleration over Euler, 15% over DistDGL, and 13.7% over FlexGraph, while maintaining stable convergence and scalability across diverse batch sizes and partition configurations.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"211 ","pages":"Article 105225"},"PeriodicalIF":4.0,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OptimES: Optimizing federated learning using remote embeddings for graph neural networks OptimES:使用图神经网络的远程嵌入优化联邦学习
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-05-01 Epub Date: 2026-01-22 DOI: 10.1016/j.jpdc.2026.105227
Pranjal Naman, Yogesh Simmhan
Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures. However, in most real-world settings, such as financial transaction networks and healthcare networks, this data is localized to different data owners and cannot be aggregated due to privacy concerns. Federated Learning (FL) has emerged as a viable machine learning approach for training a shared model that iteratively aggregates local models trained on decentralized data. This addresses privacy concerns while leveraging parallelism. State-of-the-art methods enhance the privacy-respecting convergence accuracy of federated GNN training by sharing remote embeddings of boundary vertices through a server (EmbC). However, they are limited by diminished performance due to large communication costs. In this article, we propose OptimES, an optimized federated GNN training framework that employs remote neighbourhood pruning, overlapping the push of embeddings to the server with local training, and dynamic pulling of embeddings to reduce network costs and training time. We perform a rigorous evaluation of these strategies for four common graph datasets with up to 111M vertices and 1.6B edges. We see that a modest drop in per-round accuracy due to the preemptive push of embeddings is out-stripped by the reduction in per-round training time for large and dense graphs like Reddit and Products, converging up to  ≈ 3.5 ×  faster than EmbC and giving up to  ≈ 16% better accuracy than the default federated GNN learning. While accuracy improvements over default federated GNNs are modest for sparser graphs like Arxiv and Papers, they achieve the target accuracy about  ≈ 11 ×  faster than EmbC.
近年来,由于能够从图数据结构中学习有意义的表示,图神经网络(gnn)取得了快速发展。然而,在大多数现实环境中,例如金融交易网络和医疗保健网络,这些数据被定位到不同的数据所有者,并且由于隐私问题而无法聚合。联邦学习(FL)已经成为一种可行的机器学习方法,用于训练共享模型,该模型迭代地聚合在分散数据上训练的本地模型。这在利用并行性的同时解决了隐私问题。最先进的方法通过服务器(EmbC)共享边界顶点的远程嵌入,提高了联邦GNN训练的尊重隐私的收敛精度。然而,由于通信成本高,性能下降,它们受到限制。在本文中,我们提出了一种优化的联邦GNN训练框架OptimES,该框架采用远程邻域修剪,将嵌入与本地训练重叠推送到服务器,以及动态提取嵌入以减少网络成本和训练时间。我们对四个常见的图形数据集进行了严格的评估,这些数据集有多达111M个顶点和1.6B条边。我们看到,对于大型和密集的图(如Reddit和Products),由于抢先推送嵌入而导致的每轮精度的适度下降被每轮训练时间的减少所抵消,比EmbC更快收敛到 ≈ 3.5 × ,并且比默认的联邦GNN学习提高 ≈ 16%的精度。虽然对于像Arxiv和Papers这样的稀疏图,默认联合gnn的精度改进是适度的,但它们比EmbC更快地实现了 ≈ 11 × 的目标精度。
{"title":"OptimES: Optimizing federated learning using remote embeddings for graph neural networks","authors":"Pranjal Naman,&nbsp;Yogesh Simmhan","doi":"10.1016/j.jpdc.2026.105227","DOIUrl":"10.1016/j.jpdc.2026.105227","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures. However, in most real-world settings, such as financial transaction networks and healthcare networks, this data is localized to different data owners and cannot be aggregated due to privacy concerns. Federated Learning (FL) has emerged as a viable machine learning approach for training a shared model that iteratively aggregates local models trained on decentralized data. This addresses privacy concerns while leveraging parallelism. State-of-the-art methods enhance the privacy-respecting convergence accuracy of federated GNN training by sharing remote embeddings of boundary vertices through a server (EmbC). However, they are limited by diminished performance due to large communication costs. In this article, we propose OptimES, an optimized federated GNN training framework that employs remote neighbourhood pruning, overlapping the push of embeddings to the server with local training, and dynamic pulling of embeddings to reduce network costs and training time. We perform a rigorous evaluation of these strategies for four common graph datasets with up to 111<em>M</em> vertices and 1.6<em>B</em> edges. We see that a modest drop in per-round accuracy due to the preemptive push of embeddings is out-stripped by the reduction in per-round training time for large and dense graphs like Reddit and Products, converging up to  ≈ 3.5 ×  faster than EmbC and giving up to  ≈ 16% better accuracy than the default federated GNN learning. While accuracy improvements over default federated GNNs are modest for sparser graphs like Arxiv and Papers, they achieve the target accuracy about  ≈ 11 ×  faster than EmbC.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"211 ","pages":"Article 105227"},"PeriodicalIF":4.0,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On complexity of substructure connectivity and restricted connectivity of graphs 图的子结构连通性和限制连通性的复杂性
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-05-01 Epub Date: 2026-02-05 DOI: 10.1016/j.jpdc.2026.105237
Huazhong Lü , Tingzeng Wu
The connectivity of a graph is an important parameter to evaluate its reliability. k-restricted connectivity (resp. Rh-restricted connectivity) of a graph G is the minimum cardinality of a set S of vertices in G, if exists, whose deletion disconnects G and leaves each component of GS with more than k vertices (resp. δ(GS)h). In contrast, structure (substructure) connectivity of G is defined as the minimum number of vertex-disjoint subgraphs whose deletion disconnects G. As generalizations of the concept of connectivity, structure (substructure) connectivity, restricted connectivity and Rh-restricted connectivity have been extensively studied from the combinatorial point of view. Very little is known about the computational complexity of these variants, except for the recently established NP-completeness of k-restricted edge-connectivity. In this paper, we prove that the problems of determining structure, substructure, restricted, and Rh-restricted connectivity are all NP-complete.
图的连通性是评价图的可靠性的一个重要参数。k限制连接(如:图G的rh限制连通性)是G中一个顶点集S的最小基数,如果存在,它的删除断开G并使G−S的每个分量具有超过k个顶点(p. 1)。δ(G−)≥h)。相比之下,G的结构(子结构)连通性被定义为顶点不相交子图的最小数目,这些子图的缺失使G断开。作为连通性概念的推广,从组合的角度广泛研究了结构(子结构)连通性、受限连通性和rh受限连通性。除了最近建立的k限制边连通性的np完备性之外,对这些变体的计算复杂性知之甚少。在本文中,我们证明了确定结构、子结构、受限连通性和rh受限连通性的问题都是np完全的。
{"title":"On complexity of substructure connectivity and restricted connectivity of graphs","authors":"Huazhong Lü ,&nbsp;Tingzeng Wu","doi":"10.1016/j.jpdc.2026.105237","DOIUrl":"10.1016/j.jpdc.2026.105237","url":null,"abstract":"<div><div>The connectivity of a graph is an important parameter to evaluate its reliability. <em>k</em>-restricted connectivity (resp. <em>R<sup>h</sup></em>-restricted connectivity) of a graph <em>G</em> is the minimum cardinality of a set <em>S</em> of vertices in <em>G</em>, if exists, whose deletion disconnects <em>G</em> and leaves each component of <span><math><mrow><mi>G</mi><mo>−</mo><mi>S</mi></mrow></math></span> with more than <em>k</em> vertices (resp. <span><math><mrow><mi>δ</mi><mo>(</mo><mi>G</mi><mo>−</mo><mi>S</mi><mo>)</mo><mo>≥</mo><mi>h</mi></mrow></math></span>). In contrast, structure (substructure) connectivity of <em>G</em> is defined as the minimum number of vertex-disjoint subgraphs whose deletion disconnects <em>G</em>. As generalizations of the concept of connectivity, structure (substructure) connectivity, restricted connectivity and <em>R<sup>h</sup></em>-restricted connectivity have been extensively studied from the combinatorial point of view. Very little is known about the computational complexity of these variants, except for the recently established NP-completeness of <em>k</em>-restricted edge-connectivity. In this paper, we prove that the problems of determining structure, substructure, restricted, and <em>R<sup>h</sup></em>-restricted connectivity are all NP-complete.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"211 ","pages":"Article 105237"},"PeriodicalIF":4.0,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146190036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyBMSearch: A fast multi-Level search algorithm delivering order-of-Magnitude speedups on multi-Billion datasets HyBMSearch:一个快速的多级搜索算法,在数十亿个数据集上提供数量级的速度
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-05-01 Epub Date: 2026-01-11 DOI: 10.1016/j.jpdc.2026.105226
Shashank Raj , Kalyanmoy Deb
We present HyBMSearch (Hybrid Bayesian Multi-Level Search), a Python-based algorithm that redefines how we handle extremely large, sorted datasets. By combining classic methods-binary and interpolation search-with a multi-level chunking approach, this technique achieves significant speedups on arrays ranging from 100 million to 10 billion (tested) elements. At the core of our approach is the integration of a hybrid and custom genetic algorithm with Bayesian optimization, enabling automatic parameter tuning. This eliminates the guesswork of manual tuning while maintaining solid performance across a variety of scenarios. Despite the fact that NumPy’s searchsorted is highly optimized C code, HyBMSearch (written in Python) still delivers dramatic speed gains in multi-threaded tests. It processes 10 million lookups on a 100-million-element dataset in just 0.0244 seconds (compared to 23.67 seconds needed for searchsorted), handles 100 million lookups on a 1-billion-element array in 0.393 seconds (versus 184.89 seconds by NumPy’s searchsorted), performs 500 million lookups on 5 billion elements in 59.00 seconds (rather than 979.73 seconds by NumPy’s searchsorted), and resolves 1 billion lookups on 10 billion elements in 119.68 seconds (instead of 2071.84 seconds by NumPy’s searchsorted). These results set a new milestone for high-performance search methods in parallel and distributed settings, demonstrating the capability of our proposed approach to optimize the search process.
我们提出HyBMSearch(混合贝叶斯多级搜索),这是一种基于python的算法,它重新定义了我们如何处理超大的、排序的数据集。通过将经典方法(二进制和插值搜索)与多级分块方法相结合,该技术在1亿到100亿个(已测试)元素的数组上实现了显著的加速。我们方法的核心是将混合自定义遗传算法与贝叶斯优化相结合,从而实现自动参数调优。这消除了手动调优的猜测,同时在各种场景中保持稳定的性能。尽管NumPy的searchsorted是高度优化的C代码,HyBMSearch(用Python编写)仍然在多线程测试中提供了显著的速度提升。它在1亿个元素的数据集上处理1000万次查找仅需0.0244秒(相比之下,searchsorted需要23.67秒),在10亿个元素的数组上处理1亿次查找仅需0.393秒(相比之下,NumPy的searchsorted需要184.89秒),在50亿个元素上执行5亿次查找仅需59.00秒(而NumPy的searchsorted需要979.73秒)。并在119.68秒内解析100亿个元素的10亿次查找(而不是NumPy的搜索排序的2071.84秒)。这些结果为并行和分布式设置下的高性能搜索方法树立了新的里程碑,证明了我们提出的方法优化搜索过程的能力。
{"title":"HyBMSearch: A fast multi-Level search algorithm delivering order-of-Magnitude speedups on multi-Billion datasets","authors":"Shashank Raj ,&nbsp;Kalyanmoy Deb","doi":"10.1016/j.jpdc.2026.105226","DOIUrl":"10.1016/j.jpdc.2026.105226","url":null,"abstract":"<div><div>We present HyBMSearch (Hybrid Bayesian Multi-Level Search), a Python-based algorithm that redefines how we handle extremely large, sorted datasets. By combining classic methods-binary and interpolation search-with a multi-level chunking approach, this technique achieves significant speedups on arrays ranging from 100 million to 10 billion (tested) elements. At the core of our approach is the integration of a hybrid and custom genetic algorithm with Bayesian optimization, enabling automatic parameter tuning. This eliminates the guesswork of manual tuning while maintaining solid performance across a variety of scenarios. Despite the fact that NumPy’s <span>searchsorted</span> is highly optimized C code, HyBMSearch (written in Python) still delivers dramatic speed gains in multi-threaded tests. It processes 10 million lookups on a 100-million-element dataset in just 0.0244 seconds (compared to 23.67 seconds needed for <span>searchsorted</span>), handles 100 million lookups on a 1-billion-element array in 0.393 seconds (versus 184.89 seconds by <span>NumPy’s searchsorted</span>), performs 500 million lookups on 5 billion elements in 59.00 seconds (rather than 979.73 seconds by <span>NumPy’s searchsorted</span>), and resolves 1 billion lookups on 10 billion elements in 119.68 seconds (instead of 2071.84 seconds by <span>NumPy’s searchsorted</span>). These results set a new milestone for high-performance search methods in parallel and distributed settings, demonstrating the capability of our proposed approach to optimize the search process.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"211 ","pages":"Article 105226"},"PeriodicalIF":4.0,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLSI design and its hardware implementation for optimal image dehazing with adaptive bilateral filtering 自适应双边滤波图像去雾的VLSI设计及其硬件实现
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-04-01 Epub Date: 2025-10-24 DOI: 10.1016/j.jpdc.2025.105186
A. Arul Edwin Raj , Nabihah Binti Ahmad , Jeffin Gracewell , Renugadevi R , C.T. Kalaivani
Fog and smog significantly hinder image processing by reducing visual output quality and disrupting the functionality of systems reliant on visual data. Existing dehazing methods face several challenges, including computational complexity, sensitivity to parameter settings and limited optimization for diverse conditions. To overcome these limitations, this paper introduces the Selective Bilateral Filtering and Color Attenuation Analysis (SBBFC), a new methodology for real-time image dehazing. While offering this benefit, SBBFC eliminates problems that prior methods have by dynamically controlling window sizes and using color attenuation analysis to sustain reliable performance in response to changes in the level of haze and to guarantee accurate color rendition in the dehazed image. The hardware-optimized way uses FPGA or ASIC type of technologies with high throughput and real-time response, better image quality and considerably better detail reproduction. When it comes to ASIC implementation, the concepts of the proposed architecture provide 350 MPixels/s at the cost of 15k gates and 5 mW of power consumption with an area efficiency of 0. 8 mm²/k. In hardware mode targeting FPGA design, it offers 100 MPixels/s performance at a clock frequency of 100 MHz. In light of the above specifications, it is evident that the proposed architecture would be fitting in delivering dehazing in real-time, with high throughput and at low power.
雾和烟雾通过降低视觉输出质量和破坏依赖视觉数据的系统的功能,严重阻碍了图像处理。现有的除雾方法面临着一些挑战,包括计算复杂性、对参数设置的敏感性以及对不同条件的有限优化。为了克服这些限制,本文介绍了选择性双边滤波和颜色衰减分析(SBBFC),这是一种实时图像去雾的新方法。在提供这种好处的同时,SBBFC消除了以前的方法所存在的问题,通过动态控制窗口大小和使用颜色衰减分析来维持可靠的性能,以响应雾霾水平的变化,并保证去雾图像中的准确色彩再现。硬件优化方式采用FPGA或ASIC类型的技术,具有高吞吐量和实时响应,更好的图像质量和更好的细节再现。当涉及到ASIC实现时,所提出的架构概念以15k门的成本和5mw的功耗提供350 MPixels/s,面积效率为0。8毫米²/ k。在针对FPGA设计的硬件模式下,它在100 MHz的时钟频率下提供100 MPixels/s的性能。根据上述规格,很明显,所提出的架构将适合实时提供高吞吐量和低功耗的除雾。
{"title":"VLSI design and its hardware implementation for optimal image dehazing with adaptive bilateral filtering","authors":"A. Arul Edwin Raj ,&nbsp;Nabihah Binti Ahmad ,&nbsp;Jeffin Gracewell ,&nbsp;Renugadevi R ,&nbsp;C.T. Kalaivani","doi":"10.1016/j.jpdc.2025.105186","DOIUrl":"10.1016/j.jpdc.2025.105186","url":null,"abstract":"<div><div>Fog and smog significantly hinder image processing by reducing visual output quality and disrupting the functionality of systems reliant on visual data. Existing dehazing methods face several challenges, including computational complexity, sensitivity to parameter settings and limited optimization for diverse conditions. To overcome these limitations, this paper introduces the Selective Bilateral Filtering and Color Attenuation Analysis (SBBFC), a new methodology for real-time image dehazing. While offering this benefit, SBBFC eliminates problems that prior methods have by dynamically controlling window sizes and using color attenuation analysis to sustain reliable performance in response to changes in the level of haze and to guarantee accurate color rendition in the dehazed image. The hardware-optimized way uses FPGA or ASIC type of technologies with high throughput and real-time response, better image quality and considerably better detail reproduction. When it comes to ASIC implementation, the concepts of the proposed architecture provide 350 MPixels/s at the cost of 15k gates and 5 mW of power consumption with an area efficiency of 0. 8 mm²/k. In hardware mode targeting FPGA design, it offers 100 MPixels/s performance at a clock frequency of 100 MHz. In light of the above specifications, it is evident that the proposed architecture would be fitting in delivering dehazing in real-time, with high throughput and at low power.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105186"},"PeriodicalIF":4.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mobility -aware server placement and power allocation for randomly walking mobile users 移动感知服务器位置和随机移动用户的功率分配
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-04-01 Epub Date: 2026-01-06 DOI: 10.1016/j.jpdc.2025.105216
Keqin Li
We systematically, quantitatively, and mathematically address the problems of optimal mobility-aware server placement and optimal mobility-aware power allocation in mobile edge computing environments with randomly walking mobile users. The new contributions of the paper are highlighted below. We establish a single-server M/G/1 queueing system for mobile user equipment and a multiserver M/G/k queueing system for mobile edge clouds. We consider both the synchronous mobility model and the asynchronous mobility model, which are described by discrete-time Markov chains and continuous-time Markov chains respectively. We discuss two task offloading strategies for user equipment in the same service area, i.e., the equal-response-time method and the equal-load-fraction method. We formally and rigorously define the optimal mobility-aware server placement problem and the optimal mobility-aware power allocation problem. We develop optimization algorithms to solve the optimal mobility-aware server placement problem and the optimal mobility-aware power allocation problem. We demonstrate numerical data for optimal mobility-aware server placement and optimal mobility-aware power allocation with two mobility models, two task offloading strategies, and two power consumption models. The significance of the paper can be seen from the fact that the above analytical and algorithmic discussion of optimal mobility-aware server placement and optimal mobility-aware power allocation for mobile edge computing environments with randomly walking mobile users has rarely been seen in the existing literature.
我们系统地、定量地和数学地解决了移动边缘计算环境中具有随机行走移动用户的最佳移动性感知服务器放置和最佳移动性感知功率分配问题。本文的新贡献在下面突出显示。建立了针对移动用户设备的单服务器M/G/1排队系统和针对移动边缘云的多服务器M/G/k排队系统。我们考虑了同步迁移模型和异步迁移模型,它们分别用离散马尔可夫链和连续马尔可夫链来描述。讨论了同一服务区域内用户设备的两种任务卸载策略,即等响应时间法和等负载分数法。我们正式而严格地定义了最优移动感知服务器布局问题和最优移动感知功率分配问题。我们开发了优化算法来解决最优移动性感知服务器放置问题和最优移动性感知功率分配问题。我们用两种移动性模型、两种任务卸载策略和两种功耗模型展示了最优移动性感知服务器放置和最优移动性感知功率分配的数值数据。上述针对随机行走移动用户的移动边缘计算环境的最优移动性感知服务器布局和最优移动性感知功率分配的分析和算法讨论,在现有文献中很少见到,可见本文的意义。
{"title":"Mobility -aware server placement and power allocation for randomly walking mobile users","authors":"Keqin Li","doi":"10.1016/j.jpdc.2025.105216","DOIUrl":"10.1016/j.jpdc.2025.105216","url":null,"abstract":"<div><div>We systematically, quantitatively, and mathematically address the problems of optimal mobility-aware server placement and optimal mobility-aware power allocation in mobile edge computing environments with randomly walking mobile users. The new contributions of the paper are highlighted below. We establish a single-server M/G/1 queueing system for mobile user equipment and a multiserver M/G/k queueing system for mobile edge clouds. We consider both the synchronous mobility model and the asynchronous mobility model, which are described by discrete-time Markov chains and continuous-time Markov chains respectively. We discuss two task offloading strategies for user equipment in the same service area, i.e., the equal-response-time method and the equal-load-fraction method. We formally and rigorously define the optimal mobility-aware server placement problem and the optimal mobility-aware power allocation problem. We develop optimization algorithms to solve the optimal mobility-aware server placement problem and the optimal mobility-aware power allocation problem. We demonstrate numerical data for optimal mobility-aware server placement and optimal mobility-aware power allocation with two mobility models, two task offloading strategies, and two power consumption models. The significance of the paper can be seen from the fact that the above analytical and algorithmic discussion of optimal mobility-aware server placement and optimal mobility-aware power allocation for mobile edge computing environments with randomly walking mobile users has rarely been seen in the existing literature.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105216"},"PeriodicalIF":4.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed quadratic interpolation estimation for large-scale quantile regression 大规模分位数回归的分布二次插值估计
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-04-01 Epub Date: 2025-12-20 DOI: 10.1016/j.jpdc.2025.105214
Ziqian Qin , Yue Chao , Xuejun Ma
A number of statistical learning approaches for large-scale quantile regression (QR) have been rapidly developed to address the optimization issues arising from massive data computations. However, the principal idea behind most distributed QR estimation procedures for solving the nondifferentiable quantile loss problem is to approximate the check function using kernel-based smoothing approaches with bandwidth. In this article, we develop a new communication-efficient distributed QR estimation procedure called Distributed Quadratic Interpolation estimation strategy for QR (DQIQR) to tackle the issue posed by the limited memory constraint on a single computer machine. Specifically, we implement a quadratic function in a small neighborhood around the origin, which transforms the nondifferentiable check function into a convex and smooth quadratic loss function without using kernel-based methods. The minimizer, named the DQIQR estimator, is obtained through an approximate multi-round reweighted least squares aggregations procedure under the divide-and-conquer (DC) framework. Theoretically, we establish the asymptotic normality for the DQIQR estimator and show that our estimator achieves the same efficiency as the QR estimator computed on the entire data. Furthermore, a regularized version of DQIQR (DRQIQR) for processing distributed variable selection procedure is also investigated. Finally, the synthetic and real datasets are used to evaluate the effectiveness of the proposed approaches.
大规模分位数回归(QR)的一些统计学习方法已经迅速发展,以解决大量数据计算带来的优化问题。然而,大多数用于解决不可微分位数损失问题的分布式QR估计程序背后的主要思想是使用带带宽的基于核的平滑方法来近似检查函数。在本文中,我们开发了一种新的通信高效的分布式QR估计方法,称为QR的分布式二次插值估计策略(DQIQR),以解决单台计算机有限内存约束所带来的问题。具体来说,我们在原点附近的小邻域内实现了一个二次函数,它将不可微的检查函数转化为凸光滑的二次损失函数,而不使用基于核的方法。在分而治之的框架下,通过近似多轮重加权最小二乘方法得到最小估计器DQIQR估计器。理论上,我们建立了DQIQR估计量的渐近正态性,并证明了我们的估计量与在整个数据上计算的QR估计量达到了相同的效率。此外,本文还研究了用于处理分布式变量选择过程的正则化DQIQR (DRQIQR)。最后,利用合成数据集和实际数据集对所提方法的有效性进行了评价。
{"title":"Distributed quadratic interpolation estimation for large-scale quantile regression","authors":"Ziqian Qin ,&nbsp;Yue Chao ,&nbsp;Xuejun Ma","doi":"10.1016/j.jpdc.2025.105214","DOIUrl":"10.1016/j.jpdc.2025.105214","url":null,"abstract":"<div><div>A number of statistical learning approaches for large-scale quantile regression (QR) have been rapidly developed to address the optimization issues arising from massive data computations. However, the principal idea behind most distributed QR estimation procedures for solving the nondifferentiable quantile loss problem is to approximate the check function using kernel-based smoothing approaches with bandwidth. In this article, we develop a new communication-efficient distributed QR estimation procedure called Distributed Quadratic Interpolation estimation strategy for QR (DQIQR) to tackle the issue posed by the limited memory constraint on a single computer machine. Specifically, we implement a quadratic function in a small neighborhood around the origin, which transforms the nondifferentiable check function into a convex and smooth quadratic loss function without using kernel-based methods. The minimizer, named the DQIQR estimator, is obtained through an approximate multi-round reweighted least squares aggregations procedure under the divide-and-conquer (DC) framework. Theoretically, we establish the asymptotic normality for the DQIQR estimator and show that our estimator achieves the same efficiency as the QR estimator computed on the entire data. Furthermore, a regularized version of DQIQR (DRQIQR) for processing distributed variable selection procedure is also investigated. Finally, the synthetic and real datasets are used to evaluate the effectiveness of the proposed approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105214"},"PeriodicalIF":4.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-04-01 Epub Date: 2026-02-04 DOI: 10.1016/S0743-7315(26)00008-0
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(26)00008-0","DOIUrl":"10.1016/S0743-7315(26)00008-0","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105230"},"PeriodicalIF":4.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146189152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal schedule for periodic jobs with discretely controllable processing times on two machines 两台机器上加工时间离散可控的周期性作业的最优调度
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-04-01 Epub Date: 2025-12-10 DOI: 10.1016/j.jpdc.2025.105204
Zizhao Wang , Wei Bao , Ruoyu Wu , Dong Yuan , Albert Y. Zomaya
In many real-world situations, the processing time of computational jobs can be shortened by lowering the processing quality. This is referred to as discretely controllable processing time, where the original processing time can be shortened to a number of levels with lower processing qualities. In this paper, we study the scheduling problem of periodic jobs with discretely controllable processing times on two machines. The problem is NP-hard, as directly solving it through dynamic programming leads to exponential computational complexity. This is because we need to memorise a set of processed jobs to avoid reprocessing. In order to address this issue, we prove the Ordered Scheduling Structure (OSS) Property and the Consecutive Decision Making (CDM) Property. The OSS Property allows us to search for an optimal solution in which jobs on the same machine are orderly started. The CDM Property allows us to memorise only two jobs to completely avoid the job reprocessing. These two properties greatly decrease the searching space, and the resultant dynamic programming solution to find an optimal solution is with pseudo-polynomial computational complexity.
在许多实际情况下,可以通过降低处理质量来缩短计算作业的处理时间。这被称为离散可控加工时间,其中原始加工时间可以缩短到较低加工质量的若干级别。本文研究了加工时间离散可控的周期作业在两台机器上的调度问题。这个问题是np困难的,因为通过动态规划直接解决它会导致指数级的计算复杂性。这是因为我们需要记住一组已处理的作业,以避免再处理。为了解决这个问题,我们证明了有序调度结构(Ordered Scheduling Structure, OSS)和连续决策(continuous Decision Making, CDM)的性质。OSS属性允许我们搜索最优解决方案,其中同一台机器上的作业有序启动。CDM属性允许我们只记住两个作业,以完全避免作业的再处理。这两种性质大大减小了搜索空间,得到的动态规划解求最优解的计算复杂度为伪多项式。
{"title":"Optimal schedule for periodic jobs with discretely controllable processing times on two machines","authors":"Zizhao Wang ,&nbsp;Wei Bao ,&nbsp;Ruoyu Wu ,&nbsp;Dong Yuan ,&nbsp;Albert Y. Zomaya","doi":"10.1016/j.jpdc.2025.105204","DOIUrl":"10.1016/j.jpdc.2025.105204","url":null,"abstract":"<div><div>In many real-world situations, the processing time of computational jobs can be shortened by lowering the processing quality. This is referred to as discretely controllable processing time, where the original processing time can be shortened to a number of levels with lower processing qualities. In this paper, we study the scheduling problem of periodic jobs with discretely controllable processing times on two machines. The problem is NP-hard, as directly solving it through dynamic programming leads to exponential computational complexity. This is because we need to memorise a set of processed jobs to avoid reprocessing. In order to address this issue, we prove the Ordered Scheduling Structure (OSS) Property and the Consecutive Decision Making (CDM) Property. The OSS Property allows us to search for an optimal solution in which jobs on the same machine are orderly started. The CDM Property allows us to memorise only two jobs to completely avoid the job reprocessing. These two properties greatly decrease the searching space, and the resultant dynamic programming solution to find an optimal solution is with pseudo-polynomial computational complexity.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105204"},"PeriodicalIF":4.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimistic execution in byzantine broadcast protocols that tolerate malicious majority 容忍恶意多数的拜占庭广播协议的乐观执行
IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-03-01 Epub Date: 2025-11-29 DOI: 10.1016/j.jpdc.2025.105203
Ruomu Hou, Haifeng Yu
We consider the classic byzantine broadcast problem in distributed computing, in the context of a system with n node and at most fmax byzantine failures, under the standard synchronous timing model. Let f be the actual number of byzantine failures in a given execution, where ffmax. Our goal in this work is to optimize the performance of byzantine broadcast protocols in the common case where f is relative small. To this end, we propose a novel framework, called FlintBB, for adding an optimistic track into existing byzantine broadcast protocols. Using this framework, we show that we can achieve an exponential improvement in several existing byzantine broadcast protocols when f is relatively small. At the same time, our approach does not sacrifice the performance when f is not small.
在标准同步定时模型下,考虑了具有n个节点且最多fmax拜占庭故障的分布式计算系统中的经典拜占庭广播问题。设f为给定执行中拜占庭失败的实际次数,其中f≤fmax。我们在这项工作中的目标是在f相对较小的常见情况下优化拜占庭广播协议的性能。为此,我们提出了一个名为FlintBB的新框架,用于在现有的拜占庭广播协议中添加乐观轨道。使用这个框架,我们表明,当f相对较小时,我们可以在几个现有的拜占庭广播协议中实现指数级的改进。同时,我们的方法在f不小的情况下不牺牲性能。
{"title":"Optimistic execution in byzantine broadcast protocols that tolerate malicious majority","authors":"Ruomu Hou,&nbsp;Haifeng Yu","doi":"10.1016/j.jpdc.2025.105203","DOIUrl":"10.1016/j.jpdc.2025.105203","url":null,"abstract":"<div><div>We consider the classic byzantine broadcast problem in distributed computing, in the context of a system with <em>n</em> node and at most <span><math><msub><mi>f</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math></span> byzantine failures, under the standard synchronous timing model. Let <em>f</em> be the actual number of byzantine failures in a given execution, where <span><math><mrow><mi>f</mi><mo>≤</mo><msub><mi>f</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></mrow></math></span>. Our goal in this work is to optimize the performance of byzantine broadcast protocols in the common case where <em>f</em> is relative small. To this end, we propose a novel framework, called <span>FlintBB</span>, for adding an <em>optimistic track</em> into existing byzantine broadcast protocols. Using this framework, we show that we can achieve an <em>exponential improvement</em> in several existing byzantine broadcast protocols when <em>f</em> is relatively small. At the same time, our approach does not sacrifice the performance when <em>f</em> is not small.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"209 ","pages":"Article 105203"},"PeriodicalIF":4.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. Am. Mineral. APL Photonics ACTA GEOL SIN-ENGL Acta Oceanolog. Sin. ACTA ORTHOP Energy Environ. Appl. Phys. Rev. ECOLOGY ECOSYSTEMS J. Clim. Int. J. Geog. Inf. Sci. ACTA DIABETOL Carbon Balance Manage. Geosci. Model Dev. 2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring Basin Res. J. Earth Sci. 2011 IEEE 2nd International Conference on Computing, Control and Industrial Engineering INT J MOD PHYS B Hydrol. Earth Syst. Sci. Environ. Technol. Innovation EUR PHYS J-SPEC TOP Big Earth Data Ann. Glaciol. Can. J. Phys. 2013 IEEE MTT-S International Microwave Workshop Series on RF and Wireless Technologies for Biomedical and Healthcare Applications (IMWS-BIO) 2011 International Conference on Electrical and Control Engineering Phys. Chem. Miner. Seismol. Res. Lett. Appl. Geochem. Communications Earth & Environment Environ. Chem. Geochim. Cosmochim. Acta IEEE Magn. Lett. ACTA DERM-VENEREOL 环境与发展 Environ. Prog. Sustainable Energy [Sanfujinka chiryo] Obstetrical and gynecological therapy Global Biogeochem. Cycles J. Mod. Opt. ENVIRON HEALTH-GLOB Q. J. Eng. Geol. Hydrogeol. Mon. Weather Rev. Clean-Soil Air Water TECTONICS J. Afr. Earth. Sci. Environ. Pollut. Bioavailability Acta Neuropsychiatr. Commun. Theor. Phys. BIOGEOSCIENCES 电力系统及其自动化学报 ACTA PETROL SIN Memai Heiko Igaku IZV-PHYS SOLID EART+ ARCHAEOMETRY Ecol. Indic. Org. Geochem. Geochem. Trans. Adv. Meteorol. J. Mol. Spectrosc. Geobiology Hydrogeol. J. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems ACTA CARDIOL SIN ACTA POL PHARM J APPL METEOROL CLIM Adv. Atmos. Sci. 2012 IEEE International Conference on Oxide Materials for Electronic Engineering (OMEE) ARCH ACOUST Geosci. J. 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) Int. J. Biometeorol. 非金属矿 Conserv. Genet. Resour. Acta Geochimica 国际生物医学工程杂志 Ecol. Monogr. Terra Nova J. Appl. Phys. Clim. Change Geostand. Geoanal. Res. ACTAS ESP PSIQUIATRI Annu. Rev. Earth Planet. Sci. Solid Earth High Pressure Res. Chin. Phys. B Aust. J. Earth Sci. ECOL RESTOR Int. J. Climatol. Contrib. Mineral. Petrol. J. Hydrol. Environmental Toxicology & Water Quality Int. Geol. Rev. AAPG Bull. Quat. Sci. Rev. NUCL INSTRUM METH B Environ. Eng. Manage. J.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1