Pub Date : 2025-11-17DOI: 10.1016/S0743-7315(25)00161-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00161-3","DOIUrl":"10.1016/S0743-7315(25)00161-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"207 ","pages":"Article 105194"},"PeriodicalIF":4.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1016/j.jpdc.2025.105200
Ranjeeth Sekhar CB , Diksha Shekhawat , Jugal Gandhi , M. Santosh , Jai Gopal Pandey
This paper presents a hardware architecture, design, and implementation of a flash translation layer (FTL) for NAND flash memory devices. The proposed FTL system incorporates a hybrid logical-to-physical mapping scheme, wear leveling, garbage collection, bad block management, and error correcting codes (ECC). This architecture is designed to optimize the NAND flash memory management process, enhancing the overall performance and reliability. In this, hybrid logical-to-physical mapping and wear leveling schemes efficiently manage data placement, mitigating the inherent challenges of NAND flash memory, such as limited write endurance and page-based programming constraints. To validate the proposed hardware FTL, experimental evaluations have been performed, demonstrating its efficiency in terms of latency, dynamic power, and throughput. Hardware implementation is carried out on the Xilinx Zynq UltraScale+ ZCU102 platform containing the xczu9eg-2ffvb1156-2-e FPGA device. The proposed work has comparable resource utilization and improved datapath delay, throughput, and dynamic power, with practical implications for edge computing applications.
{"title":"Energy and performance efficient NAND flash translation layer architecture for low-latency edge applications","authors":"Ranjeeth Sekhar CB , Diksha Shekhawat , Jugal Gandhi , M. Santosh , Jai Gopal Pandey","doi":"10.1016/j.jpdc.2025.105200","DOIUrl":"10.1016/j.jpdc.2025.105200","url":null,"abstract":"<div><div>This paper presents a hardware architecture, design, and implementation of a flash translation layer (FTL) for NAND flash memory devices. The proposed FTL system incorporates a hybrid logical-to-physical mapping scheme, wear leveling, garbage collection, bad block management, and error correcting codes (ECC). This architecture is designed to optimize the NAND flash memory management process, enhancing the overall performance and reliability. In this, hybrid logical-to-physical mapping and wear leveling schemes efficiently manage data placement, mitigating the inherent challenges of NAND flash memory, such as limited write endurance and page-based programming constraints. To validate the proposed hardware FTL, experimental evaluations have been performed, demonstrating its efficiency in terms of latency, dynamic power, and throughput. Hardware implementation is carried out on the Xilinx Zynq UltraScale+ ZCU102 platform containing the xczu9eg-2ffvb1156-2-e FPGA device. The proposed work has comparable resource utilization and improved datapath delay, throughput, and dynamic power, with practical implications for edge computing applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"209 ","pages":"Article 105200"},"PeriodicalIF":4.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dragonfly networks offer a viable solution for large-scale supercomputers and datacenters. However, developing efficient routing mechanisms for these networks presents significant challenges. Current solutions often lead to unstable network behavior due to congestion and fairness issues, exacerbating performance variability and the tail-latency problem. An analysis of the topology and its standard deadlock avoidance mechanisms reveals that server access to global network links varies based on their location in the network, resulting in throughput unfairness. To address this issue, this paper introduces a novel switch buffer architecture which reduces head-of-line blocking and enhances fairness, to significantly improve overall network performance. Despite offering comparable cost to existing solutions, the proposed buffer architecture proves superior performance. Real-world synthetic simulations scenarios further confirm these findings, showing performance improvements between 10 % and 47 % against conventional solutions in medium sized Dragonflies.
{"title":"A new switch buffer architecture for dragonfly networks","authors":"Alejandro Cano , Cristóbal Camarero , Carmen Martínez , Ramón Beivide","doi":"10.1016/j.jpdc.2025.105199","DOIUrl":"10.1016/j.jpdc.2025.105199","url":null,"abstract":"<div><div>Dragonfly networks offer a viable solution for large-scale supercomputers and datacenters. However, developing efficient routing mechanisms for these networks presents significant challenges. Current solutions often lead to unstable network behavior due to congestion and fairness issues, exacerbating performance variability and the tail-latency problem. An analysis of the topology and its standard deadlock avoidance mechanisms reveals that server access to global network links varies based on their location in the network, resulting in throughput unfairness. To address this issue, this paper introduces a novel switch buffer architecture which reduces head-of-line blocking and enhances fairness, to significantly improve overall network performance. Despite offering comparable cost to existing solutions, the proposed buffer architecture proves superior performance. Real-world synthetic simulations scenarios further confirm these findings, showing performance improvements between 10 % and 47 % against conventional solutions in medium sized Dragonflies.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"209 ","pages":"Article 105199"},"PeriodicalIF":4.0,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.jpdc.2025.105198
Md Nurul Hasan, Suyel Namasudra
Blockchain technology is rapidly being adopted across various sectors, including healthcare, finance, and agriculture, due to its key features like decentralization, immutability, and consensus mechanisms. These features ensure security, privacy, transparency, and accountability. On the other hand, Mobile Edge Computing (MEC) extends cloud computing capabilities to mobile devices through a distributed system. However, existing studies, which combine blockchain with MEC, often overlook the impact of data offloading on system performance. This paper proposes a Blockchain and MEC-based Secure and Energy-efficient System (BMSES) for sharing Internet of Medical Things (IoMT) data securely among patients and doctors, while optimizing energy consumption through task offloading to MEC servers. Here, the Non-Orthogonal Multiple Access (NOMA) protocol is utilized for efficient channel sharing among multiple users, which offers low cost, reduced latency, and low power consumption. The proposed scheme optimizes energy consumption by efficiently managing task delegation and resource allocation in MEC. Additionally, smart contracts automate blockchain operations, enhancing efficiency and security. The proposed scheme is evaluated in terms of energy consumption, transmission rate, latency, and offloading delay. The results of the experiments show the effectiveness of the proposed scheme compared to state-of-the-art offloading approaches in terms of improving energy efficiency, transmission rate, offloading delay, and latency of the system.
{"title":"BMSES: Blockchain and mobile edge computing-based secure and energy-efficient system for healthcare data management","authors":"Md Nurul Hasan, Suyel Namasudra","doi":"10.1016/j.jpdc.2025.105198","DOIUrl":"10.1016/j.jpdc.2025.105198","url":null,"abstract":"<div><div>Blockchain technology is rapidly being adopted across various sectors, including healthcare, finance, and agriculture, due to its key features like decentralization, immutability, and consensus mechanisms. These features ensure security, privacy, transparency, and accountability. On the other hand, Mobile Edge Computing (MEC) extends cloud computing capabilities to mobile devices through a distributed system. However, existing studies, which combine blockchain with MEC, often overlook the impact of data offloading on system performance. This paper proposes a Blockchain and MEC-based Secure and Energy-efficient System (BMSES) for sharing Internet of Medical Things (IoMT) data securely among patients and doctors, while optimizing energy consumption through task offloading to MEC servers. Here, the Non-Orthogonal Multiple Access (NOMA) protocol is utilized for efficient channel sharing among multiple users, which offers low cost, reduced latency, and low power consumption. The proposed scheme optimizes energy consumption by efficiently managing task delegation and resource allocation in MEC. Additionally, smart contracts automate blockchain operations, enhancing efficiency and security. The proposed scheme is evaluated in terms of energy consumption, transmission rate, latency, and offloading delay. The results of the experiments show the effectiveness of the proposed scheme compared to state-of-the-art offloading approaches in terms of improving energy efficiency, transmission rate, offloading delay, and latency of the system.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"208 ","pages":"Article 105198"},"PeriodicalIF":4.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-05DOI: 10.1016/j.jpdc.2025.105191
Yu Liang , Jidong Ge , Jie Wu , Sheng Zhang , Shiwu Wen , Bin Luo
Edge computing is one of the promising technologies that aim to enable timely computation at the network edge. Major service providers started to deploy geographically-distributed edge servers several years ago. A major challenge in geographically distributed edges is task scheduling, i.e., how to assign various tasks submitted by mobile users to distributed edges so as to optimize some metric. However, it is not easy to perform task scheduling in geographically-distributed edges. We observed that there are three major challenges in designing an efficient task scheduling solution in dynamic edge environments: heterogeneous tasks, dynamic edge networks, and heterogeneous edge servers. In this paper, we pursue a black-box solution for task scheduling in the collaborative edge environment while not relying on detailed analytical performance modeling. We propose Reload, an intelligent deep reinforcement learning-based task scheduler. Reload learns a policy purely based on the known information, without foreseeing the future. Reload depicts its policy as a neural network that maps “raw” observations to scheduling actions. During training, Reload starts out knowing nothing and gradually learns to make better scheduling decisions through reinforcement, in the form of reward signals for past decisions. Reload leverages Advantage Actor Critic to train the policy network. We evaluate Reload using extensive simulations.
{"title":"Reload: Deep reinforcement learning-based workload distribution for collaborative edges","authors":"Yu Liang , Jidong Ge , Jie Wu , Sheng Zhang , Shiwu Wen , Bin Luo","doi":"10.1016/j.jpdc.2025.105191","DOIUrl":"10.1016/j.jpdc.2025.105191","url":null,"abstract":"<div><div>Edge computing is one of the promising technologies that aim to enable timely computation at the network edge. Major service providers started to deploy geographically-distributed edge servers several years ago. A major challenge in geographically distributed edges is task scheduling, i.e., how to assign various tasks submitted by mobile users to distributed edges so as to optimize some metric. However, it is not easy to perform task scheduling in geographically-distributed edges. We observed that there are three major challenges in designing an efficient task scheduling solution in dynamic edge environments: heterogeneous tasks, dynamic edge networks, and heterogeneous edge servers. In this paper, we pursue a black-box solution for task scheduling in the collaborative edge environment while not relying on detailed analytical performance modeling. We propose Reload, an intelligent deep reinforcement learning-based task scheduler. Reload learns a policy purely based on the known information, without foreseeing the future. Reload depicts its policy as a neural network that maps “raw” observations to scheduling actions. During training, Reload starts out knowing nothing and gradually learns to make better scheduling decisions through reinforcement, in the form of reward signals for past decisions. Reload leverages Advantage Actor Critic to train the policy network. We evaluate Reload using extensive simulations.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"209 ","pages":"Article 105191"},"PeriodicalIF":4.0,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-02DOI: 10.1016/j.jpdc.2025.105184
Sunday Oyinlola Ogundoyin , Ismaila Adeniyi Kamil , Isaac Adewale Ojedokun , John Oluwaseun Babalola , Vincent Omollo Nyangaresi
The rapid growth of mobile crowdsensing (MCS) presents significant opportunities for large-scale data collection through the collaborative use of smart devices. However, the increasing volume of sensitive data generated by MCS services poses critical challenges in terms of privacy and security, particularly latency-sensitive applications. While some solutions exist, many still lack the scalability, robustness, efficiency, and privacy preservation required by MCS services. This paper proposes a privacy-preserving and secure fog-based MCS (FB-MCS) scheme, comprising a four-tier fog computing architecture that incorporates dynamic lower-tier fog (LTF) and static upper-tier fog (UTF) systems to enable efficient data aggregation, privacy protection, and secure communication. The proposed scheme utilizes secret sharing and homomorphic MAC techniques for efficient and verifiable data aggregation over multi-dimensional data, allowing a data requester to verify the accuracy and integrity of the aggregated results. The scheme employs a lightweight elliptic curve cryptography (ECC) to ensure secure authentication without overburdening resource-constrained devices. An adaptive fog node selection strategy, based on trust and mobility, is proposed for reliable real-time task allocation. Extensive security analysis demonstrates that the scheme not only guarantees privacy preservation, integrity of aggregation results, strong anonymity, un-linkability, traceability, and resistance to well-known attacks but also achieves data confidentiality and unforgeability in the random oracle model under Type I and Type II adversaries, assuming the Computational Diffie-Hellman Problem (CDHP) and Discrete Logarithm Problem (DLP) are intractable. Moreover, performance assessments indicate that the proposed scheme surpasses previous advanced solutions, achieving a 48 % – 280 % improvement in efficiency.
{"title":"On the privacy preservation and secure communication in fog-based mobile crowdsensing","authors":"Sunday Oyinlola Ogundoyin , Ismaila Adeniyi Kamil , Isaac Adewale Ojedokun , John Oluwaseun Babalola , Vincent Omollo Nyangaresi","doi":"10.1016/j.jpdc.2025.105184","DOIUrl":"10.1016/j.jpdc.2025.105184","url":null,"abstract":"<div><div>The rapid growth of mobile crowdsensing (MCS) presents significant opportunities for large-scale data collection through the collaborative use of smart devices. However, the increasing volume of sensitive data generated by MCS services poses critical challenges in terms of privacy and security, particularly latency-sensitive applications. While some solutions exist, many still lack the scalability, robustness, efficiency, and privacy preservation required by MCS services. This paper proposes a privacy-preserving and secure fog-based MCS (FB-MCS) scheme, comprising a four-tier fog computing architecture that incorporates dynamic lower-tier fog (LTF) and static upper-tier fog (UTF) systems to enable efficient data aggregation, privacy protection, and secure communication. The proposed scheme utilizes secret sharing and homomorphic MAC techniques for efficient and verifiable data aggregation over multi-dimensional data, allowing a data requester to verify the accuracy and integrity of the aggregated results. The scheme employs a lightweight elliptic curve cryptography (ECC) to ensure secure authentication without overburdening resource-constrained devices. An adaptive fog node selection strategy, based on trust and mobility, is proposed for reliable real-time task allocation. Extensive security analysis demonstrates that the scheme not only guarantees privacy preservation, integrity of aggregation results, strong anonymity, un-linkability, traceability, and resistance to well-known attacks but also achieves data confidentiality and unforgeability in the random oracle model under Type I and Type II adversaries, assuming the Computational Diffie-Hellman Problem (CDHP) and Discrete Logarithm Problem (DLP) are intractable. Moreover, performance assessments indicate that the proposed scheme surpasses previous advanced solutions, achieving a 48 % – 280 % improvement in efficiency.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"208 ","pages":"Article 105184"},"PeriodicalIF":4.0,"publicationDate":"2025-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.jpdc.2025.105190
M. Dharmalingam , Kamalraj Subramaniam , Ashwin M , N. Nandhagopal
One rapidly expanding technology that is effectively employed in many different applications is the Internet of Things (IoT) network. There are daily communication issues in an Internet of Things network because of the enormous number of connecting nodes. A cloud service is used as a backend by the IoT platform to process data and manage remote control. A successful intrusion detection system (IDS) that can track computer sources and generate data on suspicious or unusual activity is essential for managing the growing complexity of cyberattacks. The security of the IoT network may increasingly become a major issue as IoT technology becomes widely used. Due to the large number and diversity of IoT devices, protecting IoT systems with conventional IDS is difficult. This proposed approach can mitigate the problems of existing studies and achieve better results in the process of detection IoT attacks. Initially, min-max normalization is used to pre-process the inputs to speed up training and improve the efficiency of the proposed model. From the normalized data, a new Adaptive Eagle Cat Optimization (AECO) with unique hunting and search functions is used to select features. Finally, based on the selected features, a Hybrid Deep Convolutional with Capsule Auto Encoder (Hybrid_DCAE) is proposed to classify various intruders, and an Enhanced Gannet Optimization Algorithm (EGOA) is used to fine-tune the parameters for improved system performance. The results analysis shows that the proposed model achieved 96.8 % accuracy in the BoT-IoT dataset, 97.21 % in CICIDS-2017, 97.55 % in UNSW-NB15, and 97.4 % in the DS2OS dataset.
{"title":"Diverse attack detection in IoT using hybrid deep convolutional with capsule auto encoder for intrusion detection model","authors":"M. Dharmalingam , Kamalraj Subramaniam , Ashwin M , N. Nandhagopal","doi":"10.1016/j.jpdc.2025.105190","DOIUrl":"10.1016/j.jpdc.2025.105190","url":null,"abstract":"<div><div>One rapidly expanding technology that is effectively employed in many different applications is the Internet of Things (IoT) network. There are daily communication issues in an Internet of Things network because of the enormous number of connecting nodes. A cloud service is used as a backend by the IoT platform to process data and manage remote control. A successful intrusion detection system (IDS) that can track computer sources and generate data on suspicious or unusual activity is essential for managing the growing complexity of cyberattacks. The security of the IoT network may increasingly become a major issue as IoT technology becomes widely used. Due to the large number and diversity of IoT devices, protecting IoT systems with conventional IDS is difficult. This proposed approach can mitigate the problems of existing studies and achieve better results in the process of detection IoT attacks. Initially, min-max normalization is used to pre-process the inputs to speed up training and improve the efficiency of the proposed model. From the normalized data, a new Adaptive Eagle Cat Optimization (AECO) with unique hunting and search functions is used to select features. Finally, based on the selected features, a Hybrid Deep Convolutional with Capsule Auto Encoder (Hybrid_DCAE) is proposed to classify various intruders, and an Enhanced Gannet Optimization Algorithm (EGOA) is used to fine-tune the parameters for improved system performance. The results analysis shows that the proposed model achieved 96.8 % accuracy in the BoT-IoT dataset, 97.21 % in CICIDS-2017, 97.55 % in UNSW-NB15, and 97.4 % in the DS2OS dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"208 ","pages":"Article 105190"},"PeriodicalIF":4.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.jpdc.2025.105189
Dongkun Huo , Yingting Zhou , Yixue Hao , Long Hu , Yijun Mo , Min Chen , Iztok Humar
Advances in artificial intelligence(AI) have significantly boosted the application of intelligent models, and deploying deep neural network models for device inference is increasingly common. However, it is difficult for resource-constrained devices to handle the huge computational load of neural networks. So partitioning models to co-compute at the edge cloud and terminals accelerates real-time inference at the edge. Existing research overlooks resource allocation and collaborative decision-making for dynamic edge networks, and high-dimensional features cause transmission delay. To address this, we propose a feature-sensitive compression algorithm that implements differentiated compression based on feature importance to reduce communication load while maintaining inference accuracy. Then, we design a reinforcement learning approach for resource allocation and an online model partition algorithm using contextual bandits, leveraging compressed features for adaptive decisions in dynamic environments. Finally, we conduct a large number of experiments on different types of networks, and the results show that our approach can reduce the inference delay by up to 65.4 % and save up to 77.6 % of energy consumption.
{"title":"Multi-modal model partition strategy for end-edge collaborative inference","authors":"Dongkun Huo , Yingting Zhou , Yixue Hao , Long Hu , Yijun Mo , Min Chen , Iztok Humar","doi":"10.1016/j.jpdc.2025.105189","DOIUrl":"10.1016/j.jpdc.2025.105189","url":null,"abstract":"<div><div>Advances in artificial intelligence(AI) have significantly boosted the application of intelligent models, and deploying deep neural network models for device inference is increasingly common. However, it is difficult for resource-constrained devices to handle the huge computational load of neural networks. So partitioning models to co-compute at the edge cloud and terminals accelerates real-time inference at the edge. Existing research overlooks resource allocation and collaborative decision-making for dynamic edge networks, and high-dimensional features cause transmission delay. To address this, we propose a feature-sensitive compression algorithm that implements differentiated compression based on feature importance to reduce communication load while maintaining inference accuracy. Then, we design a reinforcement learning approach for resource allocation and an online model partition algorithm using contextual bandits, leveraging compressed features for adaptive decisions in dynamic environments. Finally, we conduct a large number of experiments on different types of networks, and the results show that our approach can reduce the inference delay by up to 65.4 % and save up to 77.6 % of energy consumption.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"208 ","pages":"Article 105189"},"PeriodicalIF":4.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145435477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.jpdc.2025.105188
Francisco J. Andújar , Rocío Carratalá-Sáez , Yuri Torres , Arturo Gonzalez-Escribano , Diego R. Llanos
Computational platforms for high-performance scientific applications are increasingly heterogeneous, incorporating multiple GPU accelerators. However, differences in GPU vendors, architectures, and programming models challenge performance portability and ease of development. SYCL provides a unified programming approach, enabling applications to target NVIDIA and AMD GPUs simultaneously while offering higher-level abstractions for data and task management. This paper evaluates SYCL’s performance and development effort using the Finite Time Lyapunov Exponent (FTLE) calculation as a case study. We compare SYCL’s AdaptiveCpp (Ahead-Of-Time and Just-In-Time) and Intel oneAPI compilers, along with different data management strategies (Unified Shared Memory and buffers), against equivalent CUDA and HIP implementations. Our analysis considers single and multi-GPU execution, including heterogeneous setups with GPUs from different vendors. Results show that, while SYCL introduces additional development effort compared to native CUDA and HIP implementations, it enables multi-vendor portability with minimal performance overhead when using specific design options. Based on our findings, we provide development guidelines to help programmers decide when to use SYCL versus vendor-specific alternatives.
{"title":"On the development of high-performance, multi-GPU applications on heterogeneous systems leveraging SYCL","authors":"Francisco J. Andújar , Rocío Carratalá-Sáez , Yuri Torres , Arturo Gonzalez-Escribano , Diego R. Llanos","doi":"10.1016/j.jpdc.2025.105188","DOIUrl":"10.1016/j.jpdc.2025.105188","url":null,"abstract":"<div><div>Computational platforms for high-performance scientific applications are increasingly heterogeneous, incorporating multiple GPU accelerators. However, differences in GPU vendors, architectures, and programming models challenge performance portability and ease of development. SYCL provides a unified programming approach, enabling applications to target NVIDIA and AMD GPUs simultaneously while offering higher-level abstractions for data and task management. This paper evaluates SYCL’s performance and development effort using the Finite Time Lyapunov Exponent (FTLE) calculation as a case study. We compare SYCL’s AdaptiveCpp (Ahead-Of-Time and Just-In-Time) and Intel oneAPI compilers, along with different data management strategies (Unified Shared Memory and buffers), against equivalent CUDA and HIP implementations. Our analysis considers single and multi-GPU execution, including heterogeneous setups with GPUs from different vendors. Results show that, while SYCL introduces additional development effort compared to native CUDA and HIP implementations, it enables multi-vendor portability with minimal performance overhead when using specific design options. Based on our findings, we provide development guidelines to help programmers decide when to use SYCL versus vendor-specific alternatives.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"207 ","pages":"Article 105188"},"PeriodicalIF":4.0,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145467079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.jpdc.2025.105186
A. Arul Edwin Raj , Nabihah Binti Ahmad , Jeffin Gracewell , Renugadevi R , C.T. Kalaivani
Fog and smog significantly hinder image processing by reducing visual output quality and disrupting the functionality of systems reliant on visual data. Existing dehazing methods face several challenges, including computational complexity, sensitivity to parameter settings and limited optimization for diverse conditions. To overcome these limitations, this paper introduces the Selective Bilateral Filtering and Color Attenuation Analysis (SBBFC), a new methodology for real-time image dehazing. While offering this benefit, SBBFC eliminates problems that prior methods have by dynamically controlling window sizes and using color attenuation analysis to sustain reliable performance in response to changes in the level of haze and to guarantee accurate color rendition in the dehazed image. The hardware-optimized way uses FPGA or ASIC type of technologies with high throughput and real-time response, better image quality and considerably better detail reproduction. When it comes to ASIC implementation, the concepts of the proposed architecture provide 350 MPixels/s at the cost of 15k gates and 5 mW of power consumption with an area efficiency of 0. 8 mm²/k. In hardware mode targeting FPGA design, it offers 100 MPixels/s performance at a clock frequency of 100 MHz. In light of the above specifications, it is evident that the proposed architecture would be fitting in delivering dehazing in real-time, with high throughput and at low power.
{"title":"VLSI design and its hardware implementation for optimal image dehazing with adaptive bilateral filtering","authors":"A. Arul Edwin Raj , Nabihah Binti Ahmad , Jeffin Gracewell , Renugadevi R , C.T. Kalaivani","doi":"10.1016/j.jpdc.2025.105186","DOIUrl":"10.1016/j.jpdc.2025.105186","url":null,"abstract":"<div><div>Fog and smog significantly hinder image processing by reducing visual output quality and disrupting the functionality of systems reliant on visual data. Existing dehazing methods face several challenges, including computational complexity, sensitivity to parameter settings and limited optimization for diverse conditions. To overcome these limitations, this paper introduces the Selective Bilateral Filtering and Color Attenuation Analysis (SBBFC), a new methodology for real-time image dehazing. While offering this benefit, SBBFC eliminates problems that prior methods have by dynamically controlling window sizes and using color attenuation analysis to sustain reliable performance in response to changes in the level of haze and to guarantee accurate color rendition in the dehazed image. The hardware-optimized way uses FPGA or ASIC type of technologies with high throughput and real-time response, better image quality and considerably better detail reproduction. When it comes to ASIC implementation, the concepts of the proposed architecture provide 350 MPixels/s at the cost of 15k gates and 5 mW of power consumption with an area efficiency of 0. 8 mm²/k. In hardware mode targeting FPGA design, it offers 100 MPixels/s performance at a clock frequency of 100 MHz. In light of the above specifications, it is evident that the proposed architecture would be fitting in delivering dehazing in real-time, with high throughput and at low power.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"210 ","pages":"Article 105186"},"PeriodicalIF":4.0,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}