首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Keyed watermarks: A fine-grained watermark generation for Apache Flink
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-11 DOI: 10.1016/j.future.2025.107796
Tawfik Yasser , Tamer Arafa , Mohamed ElHelw , Ahmed Awad
Big Data Stream processing engines, exemplified by tools like Apache Flink, employ windowing techniques to manage unbounded streams of events. Aggregating relevant data within Windows is important for event-time windowing due to its impact on result accuracy. A pivotal role in this process is attributed to watermarks, unique timestamps signifying event progression in time. Nonetheless, the existing watermark generation method within Apache Flink, operating at the input stream level, exhibits a bias towards faster sub-streams, causing the omission of events from slower counterparts. Our analysis determined that Apache Flink’s standard watermark generation approach results in an approximate 33% data loss when 50% of median-proximate keys experience delays. Furthermore, this loss exceeds 37% in cases where 50% of randomly selected keys encounter delays. In this paper, we introduce a pioneering approach termed keyed watermarks to address data loss concerns and enhance data processing precision to a minimum of 99% in most scenarios. Our strategy facilitates distinct progress monitoring by creating individualized watermarks for each sub-stream (key). Within our investigation, we delineate the essential architectural and API modifications requisite for integrating keyed watermarks while also highlighting our experience in navigating the expansion of Apache Flink’s extensive codebase. Moreover, we conduct a comparative evaluation between the efficacy of our approach and the conventional watermark generation technique concerning the accuracy of event-time tracking, the latency of watermark processing, and the growth of Flink’s maintained state.
{"title":"Keyed watermarks: A fine-grained watermark generation for Apache Flink","authors":"Tawfik Yasser ,&nbsp;Tamer Arafa ,&nbsp;Mohamed ElHelw ,&nbsp;Ahmed Awad","doi":"10.1016/j.future.2025.107796","DOIUrl":"10.1016/j.future.2025.107796","url":null,"abstract":"<div><div>Big Data Stream processing engines, exemplified by tools like Apache Flink, employ windowing techniques to manage unbounded streams of events. Aggregating relevant data within Windows is important for event-time windowing due to its impact on result accuracy. A pivotal role in this process is attributed to watermarks, unique timestamps signifying event progression in time. Nonetheless, the existing watermark generation method within Apache Flink, operating at the input stream level, exhibits a bias towards faster sub-streams, causing the omission of events from slower counterparts. Our analysis determined that Apache Flink’s standard watermark generation approach results in an approximate 33% data loss when 50% of median-proximate keys experience delays. Furthermore, this loss exceeds 37% in cases where 50% of randomly selected keys encounter delays. In this paper, we introduce a pioneering approach termed <em>keyed watermarks</em> to address data loss concerns and enhance data processing precision to a minimum of 99% in most scenarios. Our strategy facilitates distinct progress monitoring by creating individualized watermarks for each sub-stream (key). Within our investigation, we delineate the essential architectural and API modifications requisite for integrating keyed watermarks while also highlighting our experience in navigating the expansion of Apache Flink’s extensive codebase. Moreover, we conduct a comparative evaluation between the efficacy of our approach and the conventional watermark generation technique concerning the accuracy of event-time tracking, the latency of watermark processing, and the growth of Flink’s maintained state.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107796"},"PeriodicalIF":6.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-organized MoE framework for distributed federated learning
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-11 DOI: 10.1016/j.future.2025.107798
Jungjae Lee, Wooseong Kim
Federated Learning (FL) has solved the problem of data silos by enabling multiple participants to cooperatively train a global model while ensuring data privacy; however, it is still a challenge to establish a Distributed Federated Learning (DFL) framework that naturally suffers from the heterogeneity of devices and datasets. Rather than conventional FL algorithms that combine client models for a single global model, a Mixture of Experts (MoE) based FL is an effective alternative that can admit individual features on each client dataset by partitioning the entire latent space. In this study, we introduce the Self-Organized MoE Framework (SOMFed), which enhances the DFL lifecycle under asynchronous updates and statistical challenges of datasets. Considering that nodes are assumed to lack label information in contrast to most of previous studies, aside from their class data, we propose the Model Assessment and Selection (MASS) algorithm for the SOMFed framework, leveraging self-supervised learning. It evaluates and chooses suitable experts for own unlabeled dataset by differentiating the performance of the representation layers among experts using Bayesian optimization and Conditional Loss Adjustment (CLA). The SOMFed exhibits superior performance in extensive experiments with different non-IID distributions and stragglers compared to FedAVG, FedAsync, SCAFFOLD, FedAT, and Adaptive Expert Models (AEM). In particular, it demonstrates robustness against pathological non-IID distribution on CIFAR10, achieving accuracy of 79.42%.
{"title":"A self-organized MoE framework for distributed federated learning","authors":"Jungjae Lee,&nbsp;Wooseong Kim","doi":"10.1016/j.future.2025.107798","DOIUrl":"10.1016/j.future.2025.107798","url":null,"abstract":"<div><div>Federated Learning (FL) has solved the problem of data silos by enabling multiple participants to cooperatively train a global model while ensuring data privacy; however, it is still a challenge to establish a Distributed Federated Learning (DFL) framework that naturally suffers from the heterogeneity of devices and datasets. Rather than conventional FL algorithms that combine client models for a single global model, a Mixture of Experts (MoE) based FL is an effective alternative that can admit individual features on each client dataset by partitioning the entire latent space. In this study, we introduce the Self-Organized MoE Framework (SOMFed), which enhances the DFL lifecycle under asynchronous updates and statistical challenges of datasets. Considering that nodes are assumed to lack label information in contrast to most of previous studies, aside from their class data, we propose the Model Assessment and Selection (MASS) algorithm for the SOMFed framework, leveraging self-supervised learning. It evaluates and chooses suitable experts for own unlabeled dataset by differentiating the performance of the representation layers among experts using Bayesian optimization and Conditional Loss Adjustment (CLA). The SOMFed exhibits superior performance in extensive experiments with different non-IID distributions and stragglers compared to FedAVG, FedAsync, SCAFFOLD, FedAT, and Adaptive Expert Models (AEM). In particular, it demonstrates robustness against pathological non-IID distribution on CIFAR10, achieving accuracy of 79.42%.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107798"},"PeriodicalIF":6.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and Privacy-Preserving Spatial Keyword Authorization Query with access control
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-10 DOI: 10.1016/j.future.2025.107774
Bohai Wen, Shengzhou Hu, Xinquan Ma, Huofeng Jia, Longjian Huang
In light of the accelerated advancement of GPS and the explosive growth of data, outsourcing spatial data to cloud servers has become a common practice for location-based service providers to alleviate computational and storage burdens. However, existing spatial keyword query schemes with fine-grained access control often rely on additional encryption techniques, such as homomorphic encryption and RSA, for spatial range queries, resulting in significant computational overhead. Furthermore, most schemes enforce access policies on all index tree nodes, which compromises efficiency and practicality. To address these challenges, we propose the Fast and Privacy-Preserving Spatial Keyword Authorization Query (FPAQ) scheme. FPAQ leverages Geohash and Quadtree to construct an index tree, achieving sub-linear search complexity and efficient spatial keyword queries. And introduces a novel authorization mechanism based on secret keys, embedding authorization information in non-leaf nodes to minimize computational overhead, while access policies are enforced only on leaf nodes. Additionally, attribute-based encryption is employed to support fine-grained access control in multi-user scenarios. Formal security analysis confirms that FPAQ safeguards data confidentiality and query privacy. Experimental results on the Yelp dataset validate the scheme’s superior efficiency and scalability compared to existing methods.
{"title":"Fast and Privacy-Preserving Spatial Keyword Authorization Query with access control","authors":"Bohai Wen,&nbsp;Shengzhou Hu,&nbsp;Xinquan Ma,&nbsp;Huofeng Jia,&nbsp;Longjian Huang","doi":"10.1016/j.future.2025.107774","DOIUrl":"10.1016/j.future.2025.107774","url":null,"abstract":"<div><div>In light of the accelerated advancement of GPS and the explosive growth of data, outsourcing spatial data to cloud servers has become a common practice for location-based service providers to alleviate computational and storage burdens. However, existing spatial keyword query schemes with fine-grained access control often rely on additional encryption techniques, such as homomorphic encryption and RSA, for spatial range queries, resulting in significant computational overhead. Furthermore, most schemes enforce access policies on all index tree nodes, which compromises efficiency and practicality. To address these challenges, we propose the <u>F</u>ast and <u>P</u>rivacy-Preserving Spatial Keyword <u>A</u>uthorization <u>Q</u>uery (FPAQ) scheme. FPAQ leverages Geohash and Quadtree to construct an index tree, achieving sub-linear search complexity and efficient spatial keyword queries. And introduces a novel authorization mechanism based on secret keys, embedding authorization information in non-leaf nodes to minimize computational overhead, while access policies are enforced only on leaf nodes. Additionally, attribute-based encryption is employed to support fine-grained access control in multi-user scenarios. Formal security analysis confirms that FPAQ safeguards data confidentiality and query privacy. Experimental results on the Yelp dataset validate the scheme’s superior efficiency and scalability compared to existing methods.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107774"},"PeriodicalIF":6.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance and efficiency: A multi-generational benchmark of modern processors on bandwidth-bound HPC applications
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-06 DOI: 10.1016/j.future.2025.107793
Balázs Drávai, István Z. Reguly
The last two years has seen the launch of a multitude of new x86 processors, in reaction to market demand. Intel has launched four families of Xeon Processors, with some novel architectural features; first the Sapphire Rapids generation which featured a version with on-package HBM, the Emerald Rapids generation, and then differentiated by releasing the performance-oriented Granite Rapids and the efficiency-oriented Sierra Forest families. In this work, we evaluate the performance and energy efficiency of CPUs from each of different generations and variants of Intel and AMD CPUs, with a particular focus on bandwidth-bound high performance computing (HPC) applications. We contrast runtime and energy consumption figures and track trends across generations. We furthermore study how enabling locality-improving optimizations increases cache reuse and overall performance, while reducing energy use.
{"title":"Performance and efficiency: A multi-generational benchmark of modern processors on bandwidth-bound HPC applications","authors":"Balázs Drávai,&nbsp;István Z. Reguly","doi":"10.1016/j.future.2025.107793","DOIUrl":"10.1016/j.future.2025.107793","url":null,"abstract":"<div><div>The last two years has seen the launch of a multitude of new x86 processors, in reaction to market demand. Intel has launched four families of Xeon Processors, with some novel architectural features; first the Sapphire Rapids generation which featured a version with on-package HBM, the Emerald Rapids generation, and then differentiated by releasing the performance-oriented Granite Rapids and the efficiency-oriented Sierra Forest families. In this work, we evaluate the performance and energy efficiency of CPUs from each of different generations and variants of Intel and AMD CPUs, with a particular focus on bandwidth-bound high performance computing (HPC) applications. We contrast runtime and energy consumption figures and track trends across generations. We furthermore study how enabling locality-improving optimizations increases cache reuse and overall performance, while reducing energy use.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107793"},"PeriodicalIF":6.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143578089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
zCeph: Design and implementation of a ZNS-friendly distributed file system
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-05 DOI: 10.1016/j.future.2025.107763
Jin Yong Ha , Yongseok Son
This article presents zCeph, a ZNS-friendly distributed file system designed to efficiently utilize zoned namespace (ZNS) SSDs. Specifically, we first propose MZAllocator which enables multiple zones to be utilized simultaneously to maximize the performance of ZNS SSDs. Second, we adopt an append command to eliminate the need for synchronization in write ordering within distributed storage systems to improve scalability. Third, we present zBlueFS, a ZNS-aware user-level file system based on BlueFS to update the metadata on the ZNS SSD without a conventional SSD. Finally, we propose a delta write technique, DeltaWriter, which writes only a modified part of the metadata (i.e., onode) to reduce read–modify–write overhead whenever the metadata are updated. We implement zCeph with four techniques based on Ceph, an open-source distributed file system. Further, we evaluate zCeph on a pair of 48-core machines with ZNS SSDs using micro and macro benchmarks, and the results reveal that zCeph improves performance by up to 4.2× and 8.8× compared with Ceph, respectively.
{"title":"zCeph: Design and implementation of a ZNS-friendly distributed file system","authors":"Jin Yong Ha ,&nbsp;Yongseok Son","doi":"10.1016/j.future.2025.107763","DOIUrl":"10.1016/j.future.2025.107763","url":null,"abstract":"<div><div>This article presents <span>zCeph</span>, a ZNS-friendly distributed file system designed to efficiently utilize zoned namespace (ZNS) SSDs. Specifically, we first propose <span>MZAllocator</span> which enables multiple zones to be utilized simultaneously to maximize the performance of ZNS SSDs. Second, we adopt an <span>append</span> command to eliminate the need for synchronization in write ordering within distributed storage systems to improve scalability. Third, we present <span>zBlueFS</span>, a ZNS-aware user-level file system based on BlueFS to update the metadata on the ZNS SSD without a conventional SSD. Finally, we propose a delta write technique, <span>DeltaWriter</span>, which writes only a modified part of the metadata (i.e., onode) to reduce read–modify–write overhead whenever the metadata are updated. We implement <span>zCeph</span> with four techniques based on Ceph, an open-source distributed file system. Further, we evaluate <span>zCeph</span> on a pair of 48-core machines with ZNS SSDs using micro and macro benchmarks, and the results reveal that <span>zCeph</span> improves performance by up to 4.2<span><math><mo>×</mo></math></span> and 8.8<span><math><mo>×</mo></math></span> compared with Ceph, respectively.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107763"},"PeriodicalIF":6.2,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated adaptive pruning with differential privacy
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-05 DOI: 10.1016/j.future.2025.107783
Zhousheng Wang , Jiahe Shen , Hua Dai , Jian Xu , Geng Yang , Hao Zhou
Federated Learning (FL), as an emerging distributed machine learning technique, reduces the computational burden on the central server through decentralization, while ensuring data privacy. It typically requires client sampling and local training for each iteration, followed by aggregation of the model on a central server. Although this distributed learning approach has positive implications for the preservation of privacy, it also increases the computational load of local clients. Therefore, lightweight efficient schemes become an indispensable tool to help reduce communication and computational costs in FL. In addition, due to the risk of model stealing attacks when uploaded, it is urgent to improve the level of privacy protection further. In this paper, we propose Federated Adaptive Pruning (FAP), a lightweight method that integrates FL with adaptive pruning by adjusting explicit regularization. We keep the model unchanged, but instead try to dynamically prune the data from large datasets during the training process to reduce the computational costs and enhance privacy protection. In each round of training, selected clients train with their local data and prune a portion of the data before uploading the model for server-side aggregation. The remaining data are reserved for subsequent computations. With this approach, selected clients can quickly refine their data at the beginning of training. In addition, we combine FAP with differential privacy to further strengthen data privacy. Through comprehensive experiments, we demonstrate the performance of FAP on different datasets with basic models, e.g., CNN, and MLP, just to mention a few. Numerous experimental results show that our method is able to significantly prune the datasets to reduce computational overhead with minimal loss of accuracy. Compared to previous methods, we can obtain the lowest training error, and further improve the data privacy of client-side.
{"title":"Federated adaptive pruning with differential privacy","authors":"Zhousheng Wang ,&nbsp;Jiahe Shen ,&nbsp;Hua Dai ,&nbsp;Jian Xu ,&nbsp;Geng Yang ,&nbsp;Hao Zhou","doi":"10.1016/j.future.2025.107783","DOIUrl":"10.1016/j.future.2025.107783","url":null,"abstract":"<div><div>Federated Learning (FL), as an emerging distributed machine learning technique, reduces the computational burden on the central server through decentralization, while ensuring data privacy. It typically requires client sampling and local training for each iteration, followed by aggregation of the model on a central server. Although this distributed learning approach has positive implications for the preservation of privacy, it also increases the computational load of local clients. Therefore, lightweight efficient schemes become an indispensable tool to help reduce communication and computational costs in FL. In addition, due to the risk of model stealing attacks when uploaded, it is urgent to improve the level of privacy protection further. In this paper, we propose Federated Adaptive Pruning (FAP), a lightweight method that integrates FL with adaptive pruning by adjusting explicit regularization. We keep the model unchanged, but instead try to dynamically prune the data from large datasets during the training process to reduce the computational costs and enhance privacy protection. In each round of training, selected clients train with their local data and prune a portion of the data before uploading the model for server-side aggregation. The remaining data are reserved for subsequent computations. With this approach, selected clients can quickly refine their data at the beginning of training. In addition, we combine FAP with differential privacy to further strengthen data privacy. Through comprehensive experiments, we demonstrate the performance of FAP on different datasets with basic models, <em>e.g.</em>, CNN, and MLP, just to mention a few. Numerous experimental results show that our method is able to significantly prune the datasets to reduce computational overhead with minimal loss of accuracy. Compared to previous methods, we can obtain the lowest training error, and further improve the data privacy of client-side.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107783"},"PeriodicalIF":6.2,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-platform and polyhedral programming for Nussinov RNA folding
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-03 DOI: 10.1016/j.future.2025.107786
Mateusz Gruzewski, Marek Palkowski
This article addresses the use of codes from polyhedral compilers with tiled and parallel code designed for CPU processors, automatically generated as source-to-source OpenMP for NVIDIA GPU graphics cards using CUDA. In previous publications, we demonstrated that it is possible to use large language models (LLM) to translate code, generate kernels, and correctly manage memory transfers between the host and the device without manual effort. Unfortunately, when the target architecture is not taken into account in detail, the performance of code designed for CPUs leaves much to be desired when running on GPUs. The architectural differences between these two platforms like cores, cache, and the dimensionality of computations require careful attention to performance portability. In this article, we address the Nussinov algorithm, a popular benchmark in bioinformatics, to achieve higher performance on the NVIDIA platform than automatically generated codes by LLM. Nussinov’s loop nests are a non-trivial kernel from the non-serial polyadic dynamic programming (NPDP) benchmark with non-uniform loops. We will utilize a polyhedral code framework that tiles and then manually modifies the most nested loop nest containing the majority of the computations, using the two-dimensional thread blocks. To accelerate the computations, shared memory within blocks is utilized. The resulting codes were tested on two modern NVIDIA devices for various RNA sequence lengths, compared to parallel and tiled CPU codes, and previously generated Nussinov’s GPU codes using LLMs. The correctness of these codes and their scalability were analyzed. Comparison to related approaches and future work are outlined.
{"title":"Cross-platform and polyhedral programming for Nussinov RNA folding","authors":"Mateusz Gruzewski,&nbsp;Marek Palkowski","doi":"10.1016/j.future.2025.107786","DOIUrl":"10.1016/j.future.2025.107786","url":null,"abstract":"<div><div>This article addresses the use of codes from polyhedral compilers with tiled and parallel code designed for CPU processors, automatically generated as source-to-source OpenMP for NVIDIA GPU graphics cards using CUDA. In previous publications, we demonstrated that it is possible to use large language models (LLM) to translate code, generate kernels, and correctly manage memory transfers between the host and the device without manual effort. Unfortunately, when the target architecture is not taken into account in detail, the performance of code designed for CPUs leaves much to be desired when running on GPUs. The architectural differences between these two platforms like cores, cache, and the dimensionality of computations require careful attention to performance portability. In this article, we address the Nussinov algorithm, a popular benchmark in bioinformatics, to achieve higher performance on the NVIDIA platform than automatically generated codes by LLM. Nussinov’s loop nests are a non-trivial kernel from the non-serial polyadic dynamic programming (NPDP) benchmark with non-uniform loops. We will utilize a polyhedral code framework that tiles and then manually modifies the most nested loop nest containing the majority of the computations, using the two-dimensional thread blocks. To accelerate the computations, shared memory within blocks is utilized. The resulting codes were tested on two modern NVIDIA devices for various RNA sequence lengths, compared to parallel and tiled CPU codes, and previously generated Nussinov’s GPU codes using LLMs. The correctness of these codes and their scalability were analyzed. Comparison to related approaches and future work are outlined.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107786"},"PeriodicalIF":6.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex network knowledge-based field programmable gate arrays routing congestion prediction
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-01 DOI: 10.1016/j.future.2025.107776
Tingyuan Nie, Pengfei Liu, Kun Zhao, Zhenhao Wang
Routing congestion occurs due to the unprecedented complexity of FPGA (field programmable gate array) designs. Accurately predicting the congestion of today’s FPGAs at an early stage helps to reduce the burden of later design and optimization. This paper proposes an innovative complex network knowledge-based approach to predict FPGA routing congestion during the placement stage. The complex network features and circuit features highly correlated with routing congestion are mapped into RGB (red-green-blue) images according to the pre-perceived importance of the features to feed to the proposed model. A patched EDM (elucidating the design space of a diffusion-based generative model) with a patch transformation is introduced to focus on the most significant features.
Experimental results show the remarkable achievements of the approach with an average SSIM (structural similarity) of 85.01 %, PSNR (peak signal-to-noise Ratio) of 27.85 dB (decibels), NRMS (normalized root mean square) of 12.91 %, and PIX (pixel accuracy) of 18.73 %, outperforming the recent state-of-the-art models like pix2pix, pix2pixHD, FCN (fully convolutional networks), and Lay-Net, improved by 4.87 %, 2.83 %, 5.77 %, and 18.56 % on key metric SSIM, respectively. The ablation validation highlights the efficiency of complex network features in routing congestion prediction. The outcome enables the identification of potential routing congestion in early design stages, facilitating the optimization solution of subsequent tractable routing problems.
{"title":"Complex network knowledge-based field programmable gate arrays routing congestion prediction","authors":"Tingyuan Nie,&nbsp;Pengfei Liu,&nbsp;Kun Zhao,&nbsp;Zhenhao Wang","doi":"10.1016/j.future.2025.107776","DOIUrl":"10.1016/j.future.2025.107776","url":null,"abstract":"<div><div>Routing congestion occurs due to the unprecedented complexity of FPGA (field programmable gate array) designs. Accurately predicting the congestion of today’s FPGAs at an early stage helps to reduce the burden of later design and optimization. This paper proposes an innovative complex network knowledge-based approach to predict FPGA routing congestion during the placement stage. The complex network features and circuit features highly correlated with routing congestion are mapped into RGB (red-green-blue) images according to the pre-perceived importance of the features to feed to the proposed model. A patched EDM (elucidating the design space of a diffusion-based generative model) with a patch transformation is introduced to focus on the most significant features.</div><div>Experimental results show the remarkable achievements of the approach with an average SSIM (structural similarity) of 85.01 %, PSNR (peak signal-to-noise Ratio) of 27.85 dB (decibels), NRMS (normalized root mean square) of 12.91 %, and PIX (pixel accuracy) of 18.73 %, outperforming the recent state-of-the-art models like pix2pix, pix2pixHD, FCN (fully convolutional networks), and Lay-Net, improved by 4.87 %, 2.83 %, 5.77 %, and 18.56 % on key metric SSIM, respectively. The ablation validation highlights the efficiency of complex network features in routing congestion prediction. The outcome enables the identification of potential routing congestion in early design stages, facilitating the optimization solution of subsequent tractable routing problems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107776"},"PeriodicalIF":6.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMGCSyn: Explainable synergistic drug combination prediction based on multimodal fusion
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-03-01 DOI: 10.1016/j.future.2025.107784
Yongqing Zhang , Hao Yuan , Yuhang Liu , Shuwen Xiong , Zhigan Zhou , Yugui Xu , Xinyu Mao , Meiqin Gong
Synergistic drug combinations are an effective solution for treating complex diseases. The main challenge is to improve the model performance of the unknown drug combination prediction task. Due to some drugs in the dataset being wholly excluded, it is difficult for the model to effectively extract the data features of these drugs, affecting the model’s accuracy and generalization ability. Unlike previous methods, we propose an interpretable synergistic drug combination prediction model, MMGCSyn, based on multimodal feature fusion. The process is as follows: First, given any (drug, drug, cell line) triple. For drug features, a graph attention network is used to extract drug molecular graph features, a deformable convolutional network is used to extract drug morgan fingerprint features and the spatial feature reconstruction module is used to suppress morgan fingerprint feature redundancy. Multi-layer MLP is used to extract the features of cell line features. Subsequently, feature fusion and prediction are performed through Transformer. We compared five existing methods on three drug combination datasets. The results show that MMGCSyn has achieved the best results and can effectively capture the chemical substructures of drug molecules.
{"title":"MMGCSyn: Explainable synergistic drug combination prediction based on multimodal fusion","authors":"Yongqing Zhang ,&nbsp;Hao Yuan ,&nbsp;Yuhang Liu ,&nbsp;Shuwen Xiong ,&nbsp;Zhigan Zhou ,&nbsp;Yugui Xu ,&nbsp;Xinyu Mao ,&nbsp;Meiqin Gong","doi":"10.1016/j.future.2025.107784","DOIUrl":"10.1016/j.future.2025.107784","url":null,"abstract":"<div><div>Synergistic drug combinations are an effective solution for treating complex diseases. The main challenge is to improve the model performance of the unknown drug combination prediction task. Due to some drugs in the dataset being wholly excluded, it is difficult for the model to effectively extract the data features of these drugs, affecting the model’s accuracy and generalization ability. Unlike previous methods, we propose an interpretable synergistic drug combination prediction model, MMGCSyn, based on multimodal feature fusion. The process is as follows: First, given any (drug, drug, cell line) triple. For drug features, a graph attention network is used to extract drug molecular graph features, a deformable convolutional network is used to extract drug morgan fingerprint features and the spatial feature reconstruction module is used to suppress morgan fingerprint feature redundancy. Multi-layer MLP is used to extract the features of cell line features. Subsequently, feature fusion and prediction are performed through Transformer. We compared five existing methods on three drug combination datasets. The results show that MMGCSyn has achieved the best results and can effectively capture the chemical substructures of drug molecules.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107784"},"PeriodicalIF":6.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ARGO: Overcoming hardware dependence in distributed learning
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-28 DOI: 10.1016/j.future.2025.107778
Karim Boubouh , Amine Boussetta , Rachid Guerraoui , Alexandre Maurer
Mobile devices offer a valuable resource for distributed learning alongside traditional computers, encouraging energy efficiency and privacy through local computations. However, the hardware limitations of these devices makes it impossible to use classical SGD for industry-grade machine learning models (with a very large number of parameters). Moreover, they are intermittently available and susceptible to failures. To address these challenges, we introduce ARGO, an algorithm that combines adaptive workload schemes with Byzantine resilience mechanisms, as well as dynamic device participation. Our theoretical analysis demonstrates linear convergence for strongly convex losses and sub-linear convergence for non-convex losses, without assuming specific dataset partitioning (for potential data heterogeneity). Our formal analysis highlights the interplay between convergence properties, hardware capabilities, Byzantine impact, and standard factors such as mini-batch size and learning rate. Through extensive evaluations, we show that ARGO outperforms standard SGD in terms of convergence speed and accuracy, and most importantly, thrives when classical SGD is not possible due to hardware limitations.
{"title":"ARGO: Overcoming hardware dependence in distributed learning","authors":"Karim Boubouh ,&nbsp;Amine Boussetta ,&nbsp;Rachid Guerraoui ,&nbsp;Alexandre Maurer","doi":"10.1016/j.future.2025.107778","DOIUrl":"10.1016/j.future.2025.107778","url":null,"abstract":"<div><div>Mobile devices offer a valuable resource for distributed learning alongside traditional computers, encouraging energy efficiency and privacy through local computations. However, the hardware limitations of these devices makes it impossible to use classical SGD for industry-grade machine learning models (with a very large number of parameters). Moreover, they are intermittently available and susceptible to failures. To address these challenges, we introduce <span>ARGO</span>, an algorithm that combines adaptive workload schemes with Byzantine resilience mechanisms, as well as dynamic device participation. Our theoretical analysis demonstrates linear convergence for strongly convex losses and sub-linear convergence for non-convex losses, without assuming specific dataset partitioning (for potential data heterogeneity). Our formal analysis highlights the interplay between convergence properties, hardware capabilities, Byzantine impact, and standard factors such as mini-batch size and learning rate. Through extensive evaluations, we show that <span>ARGO</span> outperforms standard SGD in terms of convergence speed and accuracy, and most importantly, thrives when classical SGD is not possible due to hardware limitations.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107778"},"PeriodicalIF":6.2,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1