首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications 让我们跟踪它:同步和异步应用程序的细粒度无服务器基准测试
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-28 DOI: 10.1016/j.future.2025.108336
Joel Scheuner , Simon Eismann , Sacheendra Talluri , Erwin van Eyk , Cristina Abad , Philipp Leitner , Alexandru Iosup
Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed latency breakdown, (ii) leverages a suite of serverless applications representative of production usage, including synchronous and asynchronous serverless applications with external service integrations, and (iii) automates comprehensive, end-to-end experiments to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our latency breakdown analysis of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at https://github.com/ServiTrace/ReplicationPackage.
要使无服务器计算得到广泛应用,需要对性能有详细的了解。虽然存在基准测试方法,但它们的见解是粗粒度的,通常不足以(根本原因)分析实际的无服务器应用程序,这些应用程序通常由异步协调的功能和服务组成。为了解决这个问题,我们设计并实现了ServiTrace,这是一种用于细粒度分布式跟踪分析的方法,也是一种用于各种无服务器应用程序架构的应用程序级基准测试套件。ServiTrace (i)使用一种新颖的算法和启发法分析分布式无服务器跟踪,以提取详细的延迟分解,(ii)利用一套代表生产使用的无服务器应用程序,包括具有外部服务集成的同步和异步无服务器应用程序,以及(iii)自动化全面的端到端实验以捕获应用程序级性能。使用我们的ServiTrace参考实现,我们在市场领先的AWS环境中进行了大规模的实证性能研究,收集了超过750万条执行痕迹。通过对不同应用程序类型和调用模式的中位延迟、冷启动和尾延迟的延迟分解分析,我们进行了四项主要观察。例如,无服务器应用程序的中位数端到端延迟通常不是由功能计算决定的,而是由外部服务调用、编排和基于触发器的协调决定的;所有这些都可以在没有类似servitrace的基准测试的情况下隐藏起来。我们在FAIR原则下发布经验数据,并将ServiTrace作为经过测试的、可扩展的开源工具发布在https://github.com/ServiTrace/ReplicationPackage上。
{"title":"Let’s trace it: Fine-grained serverless benchmarking for synchronous and asynchronous applications","authors":"Joel Scheuner ,&nbsp;Simon Eismann ,&nbsp;Sacheendra Talluri ,&nbsp;Erwin van Eyk ,&nbsp;Cristina Abad ,&nbsp;Philipp Leitner ,&nbsp;Alexandru Iosup","doi":"10.1016/j.future.2025.108336","DOIUrl":"10.1016/j.future.2025.108336","url":null,"abstract":"<div><div>Making serverless computing widely applicable requires detailed understanding of performance. Although benchmarking approaches exist, their insights are coarse-grained and typically insufficient for (root cause) analysis of realistic serverless applications, which often consist of asynchronously coordinated functions and services. Addressing this gap, we design and implement ServiTrace, an approach for fine-grained distributed trace analysis and an application-level benchmarking suite for diverse serverless-application architectures. ServiTrace (i) analyzes distributed serverless traces using a novel algorithm and heuristics for extracting a detailed <em>latency breakdown</em>, (ii) leverages a suite of serverless applications representative of production usage, including <em>synchronous and asynchronous serverless applications</em> with external service integrations, and (iii) automates comprehensive, <em>end-to-end experiments</em> to capture application-level performance. Using our ServiTrace reference implementation, we conduct a large-scale empirical performance study in the market-leading AWS environment, collecting over 7.5 million execution traces. We make four main observations enabled by our <em>latency breakdown analysis</em> of median latency, cold starts, and tail latency for different application types and invocation patterns. For example, the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, and trigger-based coordination; all of which could be hidden without ServiTrace-like benchmarking. We release empirical data under FAIR principles and ServiTrace as a tested, extensible, open-source tool at <span><span>https://github.com/ServiTrace/ReplicationPackage</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108336"},"PeriodicalIF":6.2,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interference modeling and scheduling for compute-intensive batch applications 计算密集型批处理应用的干扰建模和调度
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-27 DOI: 10.1016/j.future.2025.108355
Chennian Xiong , Weiwei Lin , Huikang Huang , Jianpeng Lin , Keqin Li
Cloud computing and virtualization technologies have significantly improved resource utilization in data centers. However, performance interference caused by resource contention remains a major challenge, particularly for compute-intensive batch applications, which are vital for large-scale data processing and task scheduling. Addressing performance interference in the modeling and scheduling of such applications still requires improvement. Existing interference models often rely on stereotypical metrics and average values, ignoring the impact of temporal fluctuations, while conventional scheduling algorithms overlook interference dynamics, leading to suboptimal scheduling results. To overcome these limitations, this article investigates the key factors influencing the performance of compute-intensive workloads and introduces a novel performance interference model that incorporates temporal fluctuations. Furthermore, we propose a historical-data-driven scheduling method that accounts for both temporal dynamics and batch application interference characteristics. Experimental results demonstrate that the proposed performance interference model achieves higher accuracy and robustness against overfitting compared to existing models that neglect temporal variations. Additionally, our interference-aware scheduling algorithm significantly outperforms traditional methods in throughput, scheduling efficiency, and server load balancing, providing an effective solution to mitigate performance interference in cloud environments.
云计算和虚拟化技术显著提高了数据中心的资源利用率。然而,由资源争用引起的性能干扰仍然是一个主要挑战,特别是对于计算密集型批处理应用程序,这对于大规模数据处理和任务调度至关重要。在这些应用程序的建模和调度中处理性能干扰仍然需要改进。现有的干扰模型往往依赖于刻板的度量和平均值,忽略了时间波动的影响,而传统的调度算法忽略了干扰动力学,导致调度结果不是最优的。为了克服这些限制,本文研究了影响计算密集型工作负载性能的关键因素,并引入了一种包含时间波动的新型性能干扰模型。此外,我们提出了一种历史数据驱动的调度方法,该方法考虑了时间动态和批处理应用程序的干扰特性。实验结果表明,与忽略时间变化的现有模型相比,所提出的性能干扰模型对过拟合具有更高的精度和鲁棒性。此外,我们的干扰感知调度算法在吞吐量、调度效率和服务器负载平衡方面显著优于传统方法,为减轻云环境中的性能干扰提供了有效的解决方案。
{"title":"Interference modeling and scheduling for compute-intensive batch applications","authors":"Chennian Xiong ,&nbsp;Weiwei Lin ,&nbsp;Huikang Huang ,&nbsp;Jianpeng Lin ,&nbsp;Keqin Li","doi":"10.1016/j.future.2025.108355","DOIUrl":"10.1016/j.future.2025.108355","url":null,"abstract":"<div><div>Cloud computing and virtualization technologies have significantly improved resource utilization in data centers. However, performance interference caused by resource contention remains a major challenge, particularly for compute-intensive batch applications, which are vital for large-scale data processing and task scheduling. Addressing performance interference in the modeling and scheduling of such applications still requires improvement. Existing interference models often rely on stereotypical metrics and average values, ignoring the impact of temporal fluctuations, while conventional scheduling algorithms overlook interference dynamics, leading to suboptimal scheduling results. To overcome these limitations, this article investigates the key factors influencing the performance of compute-intensive workloads and introduces a novel performance interference model that incorporates temporal fluctuations. Furthermore, we propose a historical-data-driven scheduling method that accounts for both temporal dynamics and batch application interference characteristics. Experimental results demonstrate that the proposed performance interference model achieves higher accuracy and robustness against overfitting compared to existing models that neglect temporal variations. Additionally, our interference-aware scheduling algorithm significantly outperforms traditional methods in throughput, scheduling efficiency, and server load balancing, providing an effective solution to mitigate performance interference in cloud environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108355"},"PeriodicalIF":6.2,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment ProtoFedGAN:一种通过动态双原型对齐训练生成对抗网络的新型联邦学习框架
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-27 DOI: 10.1016/j.future.2025.108353
Zhigang Wang, Yuzi Li, Qinghua Zhang, Junfeng Zhao
Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.
生成对抗网络(GANs)在数据生成任务中显示出巨大的潜力。然而,传统的集中式培训需要共享原始数据,存在敏感信息泄露的风险。联邦学习提供了一个解决方案,从而导致了联邦gan的开发。这种方法在不交换原始数据的情况下支持分布式训练,从而在一定程度上降低了风险。然而,现有的Federated GAN框架在以异构客户端数据和异构客户端模型为特征的现实场景中面临挑战,包括生成性能下降、模式崩溃和潜在的隐私泄露。为了解决这些挑战,本文提出了ProtoFedGAN,一种基于动态双原型对齐的联邦生成对抗网络。具体来说,ProtoFedGAN引入了一个基于原型学习的联邦知识共享范例,它将本地客户端特性抽象为轻量级类原型,并在服务器上动态地聚合它们。这种方法促进了异构客户机模型之间的知识共享,通过特性抽象增强了隐私保护,并减少了通信开销。此外,提出了一种潜在空间对齐机制,以增强客户端生成器的潜在空间与全局分布之间的一致性,并结合一个动态原型聚合器,通过相似度加权参数调整来减轻非独立和同分布(Non-IID)数据引起的特征偏移。最后,提出了一种双原型驱动的生成增强策略,其中主原型通过锚定客户端共识特征来保证全局分布的稳定性,子原型促进多模态特征表达,从而共同优化生成数据的真实感和多样性。四个基准数据集(MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100)的实验结果表明,ProtoFedGAN在IID和非IID设置下始终具有最低的FID、KL和MMD,以及最高的IS,优于最近的联合gan,如CAP-GAN、IFL-GAN、PRIVATE FL-GAN和PerFED-GAN,特别是在异构环境中。
{"title":"ProtoFedGAN: A novel federated learning framework for training generative adversarial networks via dynamic dual-prototype alignment","authors":"Zhigang Wang,&nbsp;Yuzi Li,&nbsp;Qinghua Zhang,&nbsp;Junfeng Zhao","doi":"10.1016/j.future.2025.108353","DOIUrl":"10.1016/j.future.2025.108353","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) have demonstrated significant potential in data-generation tasks. However, traditional centralized training requires the sharing of raw data, which poses risks of sensitive information leakage. Federated learning offers a solution, leading to the development of Federated GANs. This approach mitigates the risk to some extent by enabling distributed training without exchanging raw data. Nevertheless, existing Federated GAN frameworks face challenges in real-world scenarios characterized by heterogeneous client data and heterogeneous client models, including degraded generation performance, mode collapse, and potential privacy leaks. To address these challenges, this paper proposes ProtoFedGAN, a Federated Generative Adversarial Network based on Dynamic Dual-Prototype Alignment. Specifically, ProtoFedGAN introduces a prototype learning-based federated knowledge-sharing paradigm, which abstracts local client features into lightweight class prototypes and dynamically aggregates them on the server. This approach facilitates knowledge sharing among heterogeneous client models, enhances privacy protection through feature abstraction, and reduces communication overhead. Furthermore, a latent space alignment mechanism is proposed to enforce consistency between client generators’ latent spaces and the global distribution, coupled with a dynamic prototype aggregator that mitigates feature shifts induced by non-independent and identically distributed (Non-IID) data through similarity-weighted parameter adjustment. Finally, a dual-prototype-driven generation enhancement strategy is proposed, where the Main Prototype ensures global distribution stability by anchoring consensus features across clients, while the subprototypes promote multi-modal feature expression, thereby jointly optimizing both realism and diversity in generated data. Experimental results across four benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) demonstrate that ProtoFedGAN consistently achieves the lowest FID, KL, and MMD, and the highest IS under both IID and Non-IID settings, outperforming recent federated GANs such as CAP-GAN, IFL-GAN, PRIVATE FL-GAN, and PerFED-GAN, particularly in heterogeneous environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108353"},"PeriodicalIF":6.2,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin HP2C-DT:高精度高性能计算机数字孪生
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-26 DOI: 10.1016/j.future.2025.108333
E. Iraola , M. García-Lorenzo , F. Lordan-Gomis , F. Rossi , E. Prieto-Araujo , R.M. Badia
Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.
To address this problem, we propose the High-Precision High-Performance Computer-enabled Digital Twin (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.
Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.
数字孪生正在改变我们监控、分析和控制物理系统的方式,但设计平衡实时响应和大量计算需求的架构仍然是一个挑战。基于云计算的解决方案经常与延迟和资源限制作斗争,而基于边缘的方法缺乏复杂模拟和数据驱动优化的处理能力。为了解决这个问题,我们提出了高精度高性能计算机支持的数字孪生(HP2C-DT)参考架构,它将高性能计算(HPC)集成到计算连续体中。与仅将HPC用于离线模拟的传统设置不同,HP2C-DT使其成为数字孪生工作流程的活跃部分,可以根据紧急情况和计算需求动态地将任务分配给边缘、云或HPC资源。此外,为了弥合理论与实践之间的差距,我们引入了HP2C-DT框架,这是一个使用comps在不同基础设施之间无缝分配工作负载的工作实现。我们在电网用例中对其进行了测试,展示了它如何通过边缘侧数据聚合将通信带宽减少一个数量级,通过动态卸载将响应时间提高2倍,并在实际资源范围内为计算密集型工作流保持近乎理想的强大扩展。这些结果表明,高性能计算驱动的方法可以推动数字孪生超越当前的限制,使它们更智能、更快、更有能力处理现实世界的复杂性。
{"title":"HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin","authors":"E. Iraola ,&nbsp;M. García-Lorenzo ,&nbsp;F. Lordan-Gomis ,&nbsp;F. Rossi ,&nbsp;E. Prieto-Araujo ,&nbsp;R.M. Badia","doi":"10.1016/j.future.2025.108333","DOIUrl":"10.1016/j.future.2025.108333","url":null,"abstract":"<div><div>Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.</div><div>To address this problem, we propose the <em>High-Precision High-Performance Computer-enabled Digital Twin</em> (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.</div><div>Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108333"},"PeriodicalIF":6.2,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU acceleration of hybrid FETI solver for problems of transient nonlinear dynamics 瞬态非线性动力学问题的混合fei求解器的GPU加速
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-25 DOI: 10.1016/j.future.2025.108341
Jakub Homola, Ondřej Meca, Lubomír Říha, Tomáš Brzobohatý
FETI methods, which build on the Finite Element Method, are utilized for large-scale engineering simulations. They use domain decomposition techniques to divide a large domain into many smaller subdomains, which can be processed in parallel.
Current trends in HPC focus on GPU-accelerated clusters. To utilize them efficiently, FETI solvers should be able to use these accelerators. Recent developments have demonstrated that the fundamental component of the FETI methods, the dual operator, can be successfully offloaded to the GPU.
In this paper, we focus on GPU acceleration of the Hybrid FETI variant. It reduces the size of the projector by using a two-level decomposition, thus allowing for a significantly higher number of compute nodes to be efficiently utilized. In turn, it allows us to split the problem into a larger number of smaller subdomains, which improves single-process performance.
We demonstrate the performance on a real-world problem of transient nonlinear dynamics that requires reassembling of the dual operator, preconditioner, and projector during each call of the solver. On the MareNostrum 5 supercomputer, using Nvidia H100 GPUs, we achieved a speedup of 2.9 for the whole Hybrid FETI solver compared to a CPU-only run.
FETI方法建立在有限元方法的基础上,用于大规模工程模拟。它们使用域分解技术将一个大域划分为许多较小的子域,这些子域可以并行处理。当前HPC的趋势集中在gpu加速集群上。为了有效地利用它们,fei求解器应该能够使用这些加速器。最近的发展表明,FETI方法的基本组成部分,对偶算子,可以成功地卸载到GPU。本文主要研究Hybrid FETI变体的GPU加速问题。它通过使用两级分解来减小投影仪的尺寸,从而允许有效地利用更多数量的计算节点。反过来,它允许我们将问题分解为大量较小的子域,从而提高单进程性能。我们展示了在一个实际的瞬态非线性动力学问题上的性能,该问题需要在每次调用求解器时重新组装对偶算子、预调节器和投影器。在使用Nvidia H100 gpu的MareNostrum 5超级计算机上,与仅使用cpu运行相比,我们实现了整个混合FETI求解器的2.9倍加速。
{"title":"GPU acceleration of hybrid FETI solver for problems of transient nonlinear dynamics","authors":"Jakub Homola,&nbsp;Ondřej Meca,&nbsp;Lubomír Říha,&nbsp;Tomáš Brzobohatý","doi":"10.1016/j.future.2025.108341","DOIUrl":"10.1016/j.future.2025.108341","url":null,"abstract":"<div><div>FETI methods, which build on the Finite Element Method, are utilized for large-scale engineering simulations. They use domain decomposition techniques to divide a large domain into many smaller subdomains, which can be processed in parallel.</div><div>Current trends in HPC focus on GPU-accelerated clusters. To utilize them efficiently, FETI solvers should be able to use these accelerators. Recent developments have demonstrated that the fundamental component of the FETI methods, the dual operator, can be successfully offloaded to the GPU.</div><div>In this paper, we focus on GPU acceleration of the Hybrid FETI variant. It reduces the size of the projector by using a two-level decomposition, thus allowing for a significantly higher number of compute nodes to be efficiently utilized. In turn, it allows us to split the problem into a larger number of smaller subdomains, which improves single-process performance.</div><div>We demonstrate the performance on a real-world problem of transient nonlinear dynamics that requires reassembling of the dual operator, preconditioner, and projector during each call of the solver. On the MareNostrum 5 supercomputer, using Nvidia H100 GPUs, we achieved a speedup of 2.9 for the whole Hybrid FETI solver compared to a CPU-only run.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108341"},"PeriodicalIF":6.2,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Log-Tree: Building log-enhanced B+-tree for hybrid DRAM/PM main memories Log-Tree:为混合DRAM/PM主存构建Log-Enhanced B +树
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-25 DOI: 10.1016/j.future.2025.108332
Zhengzhu Yao, Chaoshu Yang, Runyu Zhang, Hai Yang, Yu Peng
B+-trees are widely used in storage systems, which have been optimized to match the characteristics of the Persistent Memories (PMs) in recent studies. However, existing DRAM/PM hybrid B+-trees still induce write performance penalties, low space utilization, and slow recovery, which are caused by two critical design limitations: (1) massive random writes can lead to severe write performance degradation due to the asymmetric sequential/random write performance of PM; (2) trade-offs among write performance, PM space utilization, and recovery. In this paper, we propose a log-structured B+-tree for hybrid DRAM/PM main memory, called Log-Tree, to solve these problems. First, Log-Tree incorporates a block-grained shadow layer of leaf nodes in PM and designs a lightweight metadata for each block. Then, Log-Tree persists the newly-inserted entry into the corresponding blocks sequentially to reduce cacheline flushes. Finally, Log-Tree employs a dynamic data migration strategy among all blocks to further improve the space utilization of PM. We conducted comprehensive evaluations on the Intel Optane DCPMM platform. Compared with μTree/FPTree/CCL-BTree/FAST&FAIR/SSB-Tree, Log-Tree achieves the highest PM space utilization while providing 4.81/1.23 × , 2.99/1.36 × , 1.46/0.99 × , 4.03/1.59 × , and 4.23/1.99 ×  write/read throughput on average, respectively.
B+树在存储系统中得到了广泛的应用,近年来研究人员对B+树进行了优化,以适应持久记忆的特点。然而,现有的DRAM/PM混合B+树仍然会导致写性能下降、空间利用率低和恢复缓慢,这是由两个关键的设计限制造成的:(1)由于PM的非对称顺序/随机写性能,大量随机写可能导致严重的写性能下降;(2)写性能、PM空间利用率和恢复之间的权衡。在本文中,我们提出了一种用于混合DRAM/PM主存的日志结构B+树,称为Log-Tree,以解决这些问题。首先,Log-Tree在PM中集成了叶节点的块粒度阴影层,并为每个块设计了轻量级元数据。然后,Log-Tree将新插入的条目按顺序持久化到相应的块中,以减少缓存刷新。最后,Log-Tree在所有块之间采用动态数据迁移策略,进一步提高PM的空间利用率。我们对英特尔Optane DCPMM平台进行了全面的评估。与μ树/ FPTree CCL-BTree / FAST&公平/ SSB-Tree Log-Tree达到最高的空间利用率,同时提供4.81/1.23下午 × ,2.99/1.36 × ,1.46/0.99 × ,4.03/1.59 × ,和4.23/1.99 × 写/读平均吞吐量,分别。
{"title":"Log-Tree: Building log-enhanced B+-tree for hybrid DRAM/PM main memories","authors":"Zhengzhu Yao,&nbsp;Chaoshu Yang,&nbsp;Runyu Zhang,&nbsp;Hai Yang,&nbsp;Yu Peng","doi":"10.1016/j.future.2025.108332","DOIUrl":"10.1016/j.future.2025.108332","url":null,"abstract":"<div><div>B<span><math><msup><mrow></mrow><mo>+</mo></msup></math></span>-trees are widely used in storage systems, which have been optimized to match the characteristics of the Persistent Memories (PMs) in recent studies. However, existing DRAM/PM hybrid B<span><math><msup><mrow></mrow><mo>+</mo></msup></math></span>-trees still induce write performance penalties, low space utilization, and slow recovery, which are caused by two critical design limitations: (1) massive random writes can lead to severe write performance degradation due to the asymmetric sequential/random write performance of PM; (2) trade-offs among write performance, PM space utilization, and recovery. In this paper, we propose a log-structured B<span><math><msup><mrow></mrow><mo>+</mo></msup></math></span>-tree for hybrid DRAM/PM main memory, called Log-Tree, to solve these problems. First, Log-Tree incorporates a <em>block</em>-grained shadow layer of leaf nodes in PM and designs a lightweight metadata for each block. Then, Log-Tree persists the newly-inserted entry into the corresponding blocks sequentially to reduce cacheline flushes. Finally, Log-Tree employs a dynamic data migration strategy among all blocks to further improve the space utilization of PM. We conducted comprehensive evaluations on the Intel Optane DCPMM platform. Compared with <em>μ</em>Tree/FPTree/CCL-BTree/FAST&amp;FAIR/SSB-Tree, Log-Tree achieves the highest PM space utilization while providing 4.81/1.23 × , 2.99/1.36 × , 1.46/0.99 × , 4.03/1.59 × , and 4.23/1.99 ×  write/read throughput on average, respectively.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108332"},"PeriodicalIF":6.2,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Portable Compiler-Runtime Approach for Scalability Prediction 可伸缩性预测的可移植编译-运行时方法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-25 DOI: 10.1016/j.future.2025.108337
Nicolai Stawinoga , Sohan Lal , Biagio Cosenza , Philip Salzmann , Peter Thoman , Thomas Fahringer
Highly scalable parallel applications can efficiently solve expensive computational problems when run on a large number of compute nodes. However, selecting the optimal number of nodes for a compute job of a given size is non-trivial, and allocating too few or too many nodes may not yield the expected performance. Knowing the scaling behavior of an application in advance enables us, for example, to make optimal use of the available hardware resources. We introduce a novel, portable approach to predict the scalability of parallel applications written in modern high-level programming models. We propose a predictive compiler-runtime framework based on Celerity, a task-based distributed runtime system that enables executing SYCL codes on clusters. The framework targets a broad range of computing systems, from CPU to GPU clusters, and proposes a model that combines machine learning, communication modeling and DAG heuristics. Experimental results on two large-scale clusters, JUWELS and Marconi-100, show accurate scalability prediction of unseen single and multi-task applications.
高度可扩展的并行应用程序可以在大量计算节点上运行时有效地解决昂贵的计算问题。然而,为给定大小的计算作业选择最优节点数量是非常重要的,分配过少或过多的节点可能无法产生预期的性能。例如,提前了解应用程序的伸缩行为使我们能够最优地利用可用的硬件资源。我们介绍了一种新颖的、可移植的方法来预测用现代高级编程模型编写的并行应用程序的可伸缩性。我们提出了一个基于Celerity的预测编译运行时框架,Celerity是一个基于任务的分布式运行时系统,可以在集群上执行SYCL代码。该框架针对广泛的计算系统,从CPU到GPU集群,并提出了一个结合机器学习,通信建模和DAG启发式的模型。在JUWELS和Marconi-100两个大规模集群上的实验结果显示了对未见过的单任务和多任务应用的准确可扩展性预测。
{"title":"A Portable Compiler-Runtime Approach for Scalability Prediction","authors":"Nicolai Stawinoga ,&nbsp;Sohan Lal ,&nbsp;Biagio Cosenza ,&nbsp;Philip Salzmann ,&nbsp;Peter Thoman ,&nbsp;Thomas Fahringer","doi":"10.1016/j.future.2025.108337","DOIUrl":"10.1016/j.future.2025.108337","url":null,"abstract":"<div><div>Highly scalable parallel applications can efficiently solve expensive computational problems when run on a large number of compute nodes. However, selecting the optimal number of nodes for a compute job of a given size is non-trivial, and allocating too few or too many nodes may not yield the expected performance. Knowing the scaling behavior of an application in advance enables us, for example, to make optimal use of the available hardware resources. We introduce a novel, portable approach to predict the scalability of parallel applications written in modern high-level programming models. We propose a predictive compiler-runtime framework based on Celerity, a task-based distributed runtime system that enables executing SYCL codes on clusters. The framework targets a broad range of computing systems, from CPU to GPU clusters, and proposes a model that combines machine learning, communication modeling and DAG heuristics. Experimental results on two large-scale clusters, JUWELS and Marconi-100, show accurate scalability prediction of unseen single and multi-task applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108337"},"PeriodicalIF":6.2,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and scalable branch-and-bound algorithm for exact qubit allocation 精确量子位分配的高效可扩展分支定界算法
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-24 DOI: 10.1016/j.future.2025.108342
Jean-Philippe Valois, Guillaume Helbecque, Nouredine Melab
Qubit allocation is a central step in adapting abstract quantum circuits to noisy intermediate-scale quantum devices, yet exact approaches for solving it face severe scalability limitations. In this work, we revisit the formulation of qubit allocation as a permutation-based quadratic assignment problem and develop a branch-and-bound algorithm for its exact resolution. We first establish a refined sequential implementation that achieves significantly faster runtimes than previous exact approaches on most problem instances, thereby setting a new state-of-the-art for this formulation. Building on this foundation, we extend the approach to a performance-aware parallel implementation that exploits both intra-node and inter-node parallelism on High-Performance Computing (HPC) infrastructures. Our experimental evaluation demonstrates near-linear strong scaling at the intra-node level and substantial scalability in distributed settings across nodes. Leveraging these capabilities, we provide reference optimal solutions for challenging benchmark circuits of up to 26 qubits—significantly larger than previously reported instances. These results show that large-scale parallelization can effectively extend the reach of exact methods for qubit allocation, thereby advancing the integration of combinatorial optimization and HPC techniques in quantum computing.
量子位分配是使抽象量子电路适应有噪声的中等规模量子器件的核心步骤,但解决它的确切方法面临严重的可扩展性限制。在这项工作中,我们重新审视量子位分配作为一个基于置换的二次分配问题的公式,并开发了一个分支定界算法来精确解决它。我们首先建立了一个精炼的顺序实现,它在大多数问题实例上实现的运行时间比以前的精确方法快得多,从而为这个公式设置了一个新的状态。在此基础上,我们将该方法扩展为性能感知的并行实现,该实现利用高性能计算(HPC)基础设施上的节点内和节点间并行性。我们的实验评估证明了在节点内级别的近线性强缩放和跨节点的分布式设置中的大量可扩展性。利用这些功能,我们为高达26量子位的具有挑战性的基准电路提供了参考最佳解决方案-比以前报道的实例大得多。这些结果表明,大规模并行化可以有效地扩展精确量子比特分配方法的范围,从而推进组合优化和高性能计算技术在量子计算中的集成。
{"title":"Efficient and scalable branch-and-bound algorithm for exact qubit allocation","authors":"Jean-Philippe Valois,&nbsp;Guillaume Helbecque,&nbsp;Nouredine Melab","doi":"10.1016/j.future.2025.108342","DOIUrl":"10.1016/j.future.2025.108342","url":null,"abstract":"<div><div>Qubit allocation is a central step in adapting abstract quantum circuits to noisy intermediate-scale quantum devices, yet exact approaches for solving it face severe scalability limitations. In this work, we revisit the formulation of qubit allocation as a permutation-based quadratic assignment problem and develop a branch-and-bound algorithm for its exact resolution. We first establish a refined sequential implementation that achieves significantly faster runtimes than previous exact approaches on most problem instances, thereby setting a new state-of-the-art for this formulation. Building on this foundation, we extend the approach to a performance-aware parallel implementation that exploits both intra-node and inter-node parallelism on High-Performance Computing (HPC) infrastructures. Our experimental evaluation demonstrates near-linear strong scaling at the intra-node level and substantial scalability in distributed settings across nodes. Leveraging these capabilities, we provide reference optimal solutions for challenging benchmark circuits of up to 26 qubits—significantly larger than previously reported instances. These results show that large-scale parallelization can effectively extend the reach of exact methods for qubit allocation, thereby advancing the integration of combinatorial optimization and HPC techniques in quantum computing.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108342"},"PeriodicalIF":6.2,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A privacy protection mechanism in distributed reinforcement learning using zero-knowledge proof 基于零知识证明的分布式强化学习中的隐私保护机制
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-23 DOI: 10.1016/j.future.2025.108320
Changjin Zhao, Xiang Feng, Huiqun Yu
In the field of distributed agent communication, privacy protection has always been a core concern. With ongoing advances in privacy-preserving technologies, integrating these techniques into distributed reinforcement learning has become a prevailing trend. However, the key challenge lies in safeguarding privacy while ensuring that model learning efficiency remains unaffected. To tackle this concern, a privacy-preserving framework named Zero-Knowledge proof for Distributed Reinforcement Learning (ZKDRL) is proposed. This framework equips each agent with strict differential privacy and integrates a privacy-aware receiver at the Learner end to mitigate the impact of noise on model aggregation. Additionally, zero-knowledge proof techniques are incorporated to ensure communication security and integrity within the distributed system, thereby verifying information authenticity without revealing any additional details. Implementation of ZKDRL on the open-source Surreal framework shows that, compared to baseline methods, the approach enhances data privacy by at least 21.9 % while increasing the model’s average cumulative reward by 9.5 %. Consequently, the model’s performance loss remains confined to an acceptable range, which confirms the framework’s practical applicability in distributed reinforcement learning.
在分布式代理通信领域,隐私保护一直是一个核心问题。随着隐私保护技术的不断进步,将这些技术集成到分布式强化学习中已经成为一种主流趋势。然而,关键的挑战在于在保证模型学习效率不受影响的情况下保护隐私。为了解决这个问题,提出了一个名为分布式强化学习零知识证明(ZKDRL)的隐私保护框架。该框架为每个智能体配备严格的差分隐私,并在学习端集成隐私感知接收器,以减轻噪声对模型聚合的影响。此外,采用零知识证明技术来确保分布式系统内的通信安全性和完整性,从而在不泄露任何额外细节的情况下验证信息的真实性。ZKDRL在开源超现实框架上的实现表明,与基线方法相比,该方法将数据隐私性提高了至少21.9%,同时将模型的平均累积奖励提高了9.5%。因此,模型的性能损失仍然被限制在可接受的范围内,这证实了该框架在分布式强化学习中的实际适用性。
{"title":"A privacy protection mechanism in distributed reinforcement learning using zero-knowledge proof","authors":"Changjin Zhao,&nbsp;Xiang Feng,&nbsp;Huiqun Yu","doi":"10.1016/j.future.2025.108320","DOIUrl":"10.1016/j.future.2025.108320","url":null,"abstract":"<div><div>In the field of distributed agent communication, privacy protection has always been a core concern. With ongoing advances in privacy-preserving technologies, integrating these techniques into distributed reinforcement learning has become a prevailing trend. However, the key challenge lies in safeguarding privacy while ensuring that model learning efficiency remains unaffected. To tackle this concern, a privacy-preserving framework named Zero-Knowledge proof for Distributed Reinforcement Learning (ZKDRL) is proposed. This framework equips each agent with strict differential privacy and integrates a privacy-aware receiver at the Learner end to mitigate the impact of noise on model aggregation. Additionally, zero-knowledge proof techniques are incorporated to ensure communication security and integrity within the distributed system, thereby verifying information authenticity without revealing any additional details. Implementation of ZKDRL on the open-source Surreal framework shows that, compared to baseline methods, the approach enhances data privacy by at least 21.9 % while increasing the model’s average cumulative reward by 9.5 %. Consequently, the model’s performance loss remains confined to an acceptable range, which confirms the framework’s practical applicability in distributed reinforcement learning.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108320"},"PeriodicalIF":6.2,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model context protocol-based agentic react large language model for adaptive traffic signals: Luxembourg case study 基于模型上下文协议的自适应交通信号代理反应大语言模型:卢森堡案例研究
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-23 DOI: 10.1016/j.future.2025.108339
Tarek Othmani , Sadok Ben Yahia , Antonio Lalaguna
Due to the growing issues of urban population, including mobility requirements, this paper addresses the phenomenon of traffic congestion in urban environments by employing a Model Context Protocol-Based Agentic ReAct Large Language Model for Adaptive Traffic Signals (MARLATS) framework based on adaptive Traffic Management, Reinforcement Learning (RL), and Large Language Models (LLMs). The framework assessed energy consumption, emissions measures, traffic performance, and economic performance. The incorporation of various types of vehicles and practical trip scenarios within the MARLATS framework of Luxembourg City was carried out to support traffic control in urban areas. The study findings revealed a 89% cut in average travel time, a 96% drop in average waiting time, 74% gain in average speed, a remarkable 50% reduction in fuel consumption and emission abatement (CO, CO2, NOx, PM, NMVOC), while increasing noise pollution by 6.9% but MARLATS was capable of halving the operating costs by 50% from 14.14€ /h to 7.05€ /h. Compared with leading RL/DRL/LLM studies, MARLATS outperforms by 34% to 73%. These results position MARLATS as a turnkey, rapid-payback pathway to net-zero, congestion-free cities. Despite the good results, MARLATS suffer from some limitations that need to be considered in future projects, such as reducing noise emissions, mixing vehicle fleets like battery electric and plug-in hybrid vehicles, quantifying V2X infrastructure costs, and providing cybersecurity analysis for efficient and safer data transfer.
由于城市人口日益增长的问题,包括流动性需求,本文通过采用基于自适应交通管理、强化学习(RL)和大型语言模型(llm)的基于模型上下文协议的自适应交通信号代理ReAct大型语言模型(MARLATS)框架来解决城市环境中的交通拥堵现象。该框架评估了能源消耗、排放措施、交通绩效和经济绩效。在卢森堡市的MARLATS框架内纳入了各种类型的车辆和实际旅行场景,以支持城市地区的交通管制。研究结果显示,平均行驶时间减少了89%,平均等待时间减少了96%,平均速度提高了74%,燃油消耗和排放(CO, CO2, NOx, PM, NMVOC)显著降低了50%,同时噪音污染增加了6.9%,但MARLATS能够将运营成本减半50%,从14.14欧元/小时降至7.05欧元/小时。与领先的RL/DRL/LLM研究相比,MARLATS的性能高出34%至73%。这些结果将MARLATS定位为通往零净、无拥堵城市的交钥匙、快速回报途径。尽管取得了良好的效果,但MARLATS仍存在一些局限性,需要在未来的项目中加以考虑,例如减少噪音排放、混合车辆(如纯电动汽车和插电式混合动力汽车)、量化V2X基础设施成本,以及为高效、安全的数据传输提供网络安全分析。
{"title":"Model context protocol-based agentic react large language model for adaptive traffic signals: Luxembourg case study","authors":"Tarek Othmani ,&nbsp;Sadok Ben Yahia ,&nbsp;Antonio Lalaguna","doi":"10.1016/j.future.2025.108339","DOIUrl":"10.1016/j.future.2025.108339","url":null,"abstract":"<div><div>Due to the growing issues of urban population, including mobility requirements, this paper addresses the phenomenon of traffic congestion in urban environments by employing a <strong>M</strong>odel Context Protocol-Based <strong>A</strong>gentic <strong>R</strong>eAct <strong>L</strong>arge Language Model for <strong>A</strong>daptive <strong>T</strong>raffic <strong>S</strong>ignals (<strong>MARLATS</strong>) framework based on adaptive Traffic Management, Reinforcement Learning (RL), and Large Language Models (LLMs). The framework assessed energy consumption, emissions measures, traffic performance, and economic performance. The incorporation of various types of vehicles and practical trip scenarios within the <strong>MARLATS</strong> framework of Luxembourg City was carried out to support traffic control in urban areas. The study findings revealed a 89% cut in average travel time, a 96% drop in average waiting time, 74% gain in average speed, a remarkable 50% reduction in fuel consumption and emission abatement (CO, CO<sub>2</sub>, NO<sub><em>x</em></sub>, PM, NMVOC), while increasing noise pollution by 6.9% but <strong>MARLATS</strong> was capable of halving the operating costs by 50% from 14.14€ /h to 7.05€ /h. Compared with leading RL/DRL/LLM studies, <strong>MARLATS</strong> outperforms by 34% to 73%. These results position <strong>MARLATS</strong> as a turnkey, rapid-payback pathway to net-zero, congestion-free cities. Despite the good results, <strong>MARLATS</strong> suffer from some limitations that need to be considered in future projects, such as reducing noise emissions, mixing vehicle fleets like battery electric and plug-in hybrid vehicles, quantifying V2X infrastructure costs, and providing cybersecurity analysis for efficient and safer data transfer.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"179 ","pages":"Article 108339"},"PeriodicalIF":6.2,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1