首页 > 最新文献

Performance Evaluation最新文献

英文 中文
Job assignment in machine learning inference systems with accuracy constraints 具有准确性约束的机器学习推理系统中的任务分配
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-12-12 DOI: 10.1016/j.peva.2024.102463
Tuhinangshu Choudhury , Gauri Joshi , Weina Wang
Modern machine learning inference systems often host multiple models that can perform the same task with different levels of accuracy and latency. For example, a large model can be more accurate but slow, whereas a smaller and less accurate can be faster in serving inference queries. Amidst the rapid advancements in Large Language Models (LLMs), it is paramount for such systems to strike the best trade-off between latency and accuracy. In this paper, we consider the problem of designing job assignment policies for a multi-server queueing system where servers have heterogeneous rates and accuracies, and our goal is to minimize the expected inference latency while meeting an average accuracy target. Such queueing systems with constraints have been sparsely studied in prior literature to the best of our knowledge. We first identify a lower bound on the minimum achievable latency under any policy that achieves the target accuracy a using a linear programming (LP) formulation. Building on the LP solution, we introduce a Randomized-Join-the Idle Queue (R-JIQ) policy, which consistently meets the accuracy target and asymptotically (as system size increases) achieves the optimal latency TLP-LB(λ). However, the R-JIQ policy relies on the knowledge of the arrival rate λ to solve the LP. To address this limitation, we propose the Prioritize Ordered Pairs (POP) policy that incorporates the concept of ordered pairs of servers into waterfilling to iteratively solve the LP. This allows the POP policy to function without relying on the arrival rate. Experiments suggest that POP performs robustly across different system sizes and load scenarios, achieving near-optimal performance.
现代机器学习推理系统通常包含多个模型,这些模型可以以不同的准确度和延迟执行相同的任务。例如,大型模型可能更准确但速度较慢,而较小且准确度较低的模型在提供推理查询时可能更快。随着大型语言模型(LLM)的快速发展,此类系统必须在延迟和准确性之间取得最佳平衡。在本文中,我们考虑了为多服务器队列系统设计任务分配策略的问题,在该系统中,服务器具有不同的速率和准确度,我们的目标是在满足平均准确度目标的同时最大限度地减少预期推理延迟。据我们所知,以前的文献中对这种具有约束条件的队列系统的研究很少。我们首先使用线性规划(LP)公式确定了在任何可达到目标精度 a∗ 的策略下可实现的最小延迟下限。在 LP 解法的基础上,我们引入了随机加入空闲队列(R-JIQ)策略,该策略可持续满足精度目标,并渐进地(随着系统规模的增加)实现最佳延迟 TLP-LB(λ)。然而,R-JIQ 策略依赖于到达率 λ 的知识来求解 LP。为解决这一局限性,我们提出了优先有序对(POP)策略,该策略将服务器有序对的概念融入到注水中,以迭代方式求解 LP。这使得 POP 策略无需依赖到达率即可发挥作用。实验表明,POP 在不同的系统规模和负载情况下都表现稳健,达到了接近最优的性能。
{"title":"Job assignment in machine learning inference systems with accuracy constraints","authors":"Tuhinangshu Choudhury ,&nbsp;Gauri Joshi ,&nbsp;Weina Wang","doi":"10.1016/j.peva.2024.102463","DOIUrl":"10.1016/j.peva.2024.102463","url":null,"abstract":"<div><div>Modern machine learning inference systems often host multiple models that can perform the same task with different levels of accuracy and latency. For example, a large model can be more accurate but slow, whereas a smaller and less accurate can be faster in serving inference queries. Amidst the rapid advancements in Large Language Models (LLMs), it is paramount for such systems to strike the best trade-off between latency and accuracy. In this paper, we consider the problem of designing job assignment policies for a multi-server queueing system where servers have heterogeneous rates and accuracies, and our goal is to minimize the expected inference latency while meeting an average accuracy target. Such queueing systems with constraints have been sparsely studied in prior literature to the best of our knowledge. We first identify a lower bound on the minimum achievable latency under any policy that achieves the target accuracy <span><math><msup><mrow><mi>a</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> using a linear programming (LP) formulation. Building on the LP solution, we introduce a Randomized-Join-the Idle Queue (R-JIQ) policy, which consistently meets the accuracy target and asymptotically (as system size increases) achieves the optimal latency <span><math><mrow><msub><mrow><mi>T</mi></mrow><mrow><mtext>LP-LB</mtext></mrow></msub><mrow><mo>(</mo><mi>λ</mi><mo>)</mo></mrow></mrow></math></span>. However, the R-JIQ policy relies on the knowledge of the arrival rate <span><math><mi>λ</mi></math></span> to solve the LP. To address this limitation, we propose the Prioritize Ordered Pairs (POP) policy that incorporates the concept of <em>ordered pairs</em> of servers into waterfilling to iteratively solve the LP. This allows the POP policy to function without relying on the arrival rate. Experiments suggest that POP performs robustly across different system sizes and load scenarios, achieving near-optimal performance.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102463"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143181843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of a queue-length-dependent vacation queue with bulk service, N-policy, set-up time and cost optimization 带批量服务、N 政策、设置时间和成本优化的队列长度依赖型休假队列分析
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-11-20 DOI: 10.1016/j.peva.2024.102459
P. Karan, S. Pradhan
Due to the extensive applications of bulk service vacation queues in manufacturing industries, inventory systems, wireless sensor networks for deducing energy consumption etc., in this article, we analyze the steady-state behavior of an infinite-buffer group arrival bulk service queue with vacation scenario, set-up time and N-threshold policy. Here the customers arrive according to the compound Poisson process and the server originates the service process with minimum ‘a’ customers and can give service to maximum ‘b’ customers at a time. We adopt batch-size-dependent service time as well as queue-length-dependent vacation duration which improve the system’s performance significantly. The N-threshold policy is proposed to awaken the server from a vacation/dormant state where the service station starts the set-up procedure after the accumulation of pre-decided ‘N’ customers. Using the supplementary variable technique, firstly, we derive the set of system equations in the steady-state. After that, we obtain the bivariate probability generating functions (pgfs) of queue content and size of the departing batch, the queue content and type of vacation taken by the server at vacation completion epoch and also the single pgf of queue content at the end of set-up time. We extract the joint distribution from those generating functions using the roots method and derive a simple algebraic relation between the probabilities at departure and arbitrary epoch. We also provide assorted numerical results to validate our proposed methodology and obtained theoretical results. The impact of the system parameters on the performance measures is presented through tables and graphs. Finally, a cost optimization function is provided for the benefit of system designers.
由于批量服务休假队列在制造业、库存系统、用于推断能源消耗的无线传感器网络等领域的广泛应用,本文分析了具有休假场景、设置时间和 N 个阈值策略的无限缓冲区群到达批量服务队列的稳态行为。在此,客户根据复合泊松过程到达,服务器以最小的 "a "客户启动服务过程,每次可为最大的 "b "客户提供服务。我们采用了与批量大小相关的服务时间和与队列长度相关的休假时间,这大大提高了系统的性能。我们提出了 N 门限策略,用于将服务器从休假/休眠状态唤醒,即服务站在预先确定的 "N "个客户累积后开始设置程序。利用补充变量技术,我们首先推导出稳态下的系统方程组。然后,我们得到了离开批次的队列内容和规模的双变量概率生成函数(pgfs)、服务器在休假结束时的队列内容和休假类型,以及设置时间结束时队列内容的单变量概率生成函数(pgf)。我们使用根法从这些生成函数中提取联合分布,并推导出出发和任意时间点概率之间的简单代数关系。我们还提供了各种数值结果,以验证我们提出的方法和获得的理论结果。我们还通过表格和图表展示了系统参数对性能指标的影响。最后,我们还提供了一个成本优化函数,供系统设计人员参考。
{"title":"Analysis of a queue-length-dependent vacation queue with bulk service, N-policy, set-up time and cost optimization","authors":"P. Karan,&nbsp;S. Pradhan","doi":"10.1016/j.peva.2024.102459","DOIUrl":"10.1016/j.peva.2024.102459","url":null,"abstract":"<div><div>Due to the extensive applications of bulk service vacation queues in manufacturing industries, inventory systems, wireless sensor networks for deducing energy consumption etc., in this article, we analyze the steady-state behavior of an infinite-buffer group arrival bulk service queue with vacation scenario, set-up time and <span><math><mi>N</mi></math></span>-threshold policy. Here the customers arrive according to the compound Poisson process and the server originates the service process with minimum ‘<span><math><mi>a</mi></math></span>’ customers and can give service to maximum ‘<span><math><mi>b</mi></math></span>’ customers at a time. We adopt batch-size-dependent service time as well as queue-length-dependent vacation duration which improve the system’s performance significantly. The <span><math><mi>N</mi></math></span>-threshold policy is proposed to awaken the server from a vacation/dormant state where the service station starts the set-up procedure after the accumulation of pre-decided ‘<span><math><mi>N</mi></math></span>’ customers. Using the supplementary variable technique, firstly, we derive the set of system equations in the steady-state. After that, we obtain the bivariate probability generating functions (pgfs) of queue content and size of the departing batch, the queue content and type of vacation taken by the server at vacation completion epoch and also the single pgf of queue content at the end of set-up time. We extract the joint distribution from those generating functions using the roots method and derive a simple algebraic relation between the probabilities at departure and arbitrary epoch. We also provide assorted numerical results to validate our proposed methodology and obtained theoretical results. The impact of the system parameters on the performance measures is presented through tables and graphs. Finally, a cost optimization function is provided for the benefit of system designers.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102459"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142723450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special issue on Performance Analysis and Evaluation of Systems for Artificial Intelligence 社论:人工智能系统性能分析与评价特刊
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-12-13 DOI: 10.1016/j.peva.2024.102465
Anshul Gandhi , Bo Jiang , Shaolei Ren
{"title":"Editorial: Special issue on Performance Analysis and Evaluation of Systems for Artificial Intelligence","authors":"Anshul Gandhi ,&nbsp;Bo Jiang ,&nbsp;Shaolei Ren","doi":"10.1016/j.peva.2024.102465","DOIUrl":"10.1016/j.peva.2024.102465","url":null,"abstract":"","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102465"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143182335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preface: Special issue on ITC 2023 前言:ITC 2023特刊
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-12-04 DOI: 10.1016/j.peva.2024.102462
Sara Alouf , Oliver Hohlfeld , Zhiyuan Jiang
{"title":"Preface: Special issue on ITC 2023","authors":"Sara Alouf ,&nbsp;Oliver Hohlfeld ,&nbsp;Zhiyuan Jiang","doi":"10.1016/j.peva.2024.102462","DOIUrl":"10.1016/j.peva.2024.102462","url":null,"abstract":"","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102462"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143182243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foreword - Special Issue - MASCOTS 2023 前言-特刊-吉祥物2023
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-01-03 DOI: 10.1016/j.peva.2025.102467
Maria Carla Calzarossa , Anshul Gandhi
{"title":"Foreword - Special Issue - MASCOTS 2023","authors":"Maria Carla Calzarossa ,&nbsp;Anshul Gandhi","doi":"10.1016/j.peva.2025.102467","DOIUrl":"10.1016/j.peva.2025.102467","url":null,"abstract":"","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102467"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143182336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Formal error bounds for the state space reduction of Markov chains 马尔可夫链状态空间缩减的形式误差边界
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-12-18 DOI: 10.1016/j.peva.2024.102464
Fabian Michel, Markus Siegle
We study the approximation of a Markov chain on a reduced state space, for both discrete- and continuous-time Markov chains. In this context, we extend the existing theory of formal error bounds for the approximated transient distributions. In the discrete-time setting, we bound the stepwise increment of the error, and in the continuous-time setting, we bound the rate at which the error grows. In addition, the same error bounds can also be applied to bound how far an approximated stationary distribution is from stationarity. As a special case, we consider aggregated (or lumped) Markov chains, where the state space reduction is achieved by partitioning the state space into macro states. Subsequently, we compare the error bounds with relevant concepts from the literature, such as exact and ordinary lumpability, as well as deflatability and aggregatability. These concepts provide stricter than necessary conditions for settings in which the aggregation error is zero. We also present possible algorithms for finding suitable aggregations for which the formal error bounds are low, and we analyze first experiments with these algorithms on a range of different models.
我们研究了离散和连续时间马尔可夫链在缩小状态空间上的近似。在此背景下,我们扩展了近似瞬态分布的现有正式误差约束理论。在离散时间环境中,我们对误差的逐步增量进行了约束;在连续时间环境中,我们对误差的增长率进行了约束。此外,同样的误差约束也可用于约束近似静态分布离静态的距离。作为特例,我们考虑了聚集(或拼凑)马尔可夫链,通过将状态空间划分为宏状态来实现状态空间的缩小。随后,我们将误差边界与文献中的相关概念进行比较,如精确可凑合性和普通可凑合性,以及可放缩性和可聚合性。这些概念为聚集误差为零的设置提供了比必要条件更严格的条件。我们还提出了一些可能的算法,用于寻找形式误差边界较低的合适集合,并分析了这些算法在一系列不同模型上的首次实验。
{"title":"Formal error bounds for the state space reduction of Markov chains","authors":"Fabian Michel,&nbsp;Markus Siegle","doi":"10.1016/j.peva.2024.102464","DOIUrl":"10.1016/j.peva.2024.102464","url":null,"abstract":"<div><div>We study the approximation of a Markov chain on a reduced state space, for both discrete- and continuous-time Markov chains. In this context, we extend the existing theory of formal error bounds for the approximated transient distributions. In the discrete-time setting, we bound the stepwise increment of the error, and in the continuous-time setting, we bound the rate at which the error grows. In addition, the same error bounds can also be applied to bound how far an approximated stationary distribution is from stationarity. As a special case, we consider aggregated (or lumped) Markov chains, where the state space reduction is achieved by partitioning the state space into macro states. Subsequently, we compare the error bounds with relevant concepts from the literature, such as exact and ordinary lumpability, as well as deflatability and aggregatability. These concepts provide stricter than necessary conditions for settings in which the aggregation error is zero. We also present possible algorithms for finding suitable aggregations for which the formal error bounds are low, and we analyze first experiments with these algorithms on a range of different models.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102464"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143181844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling scalable and adaptive machine learning training via serverless computing on public cloud 通过公共云上的无服务器计算实现可扩展的自适应机器学习训练
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-11-06 DOI: 10.1016/j.peva.2024.102451
Ahsan Ali , Xiaolong Ma , Syed Zawad , Paarijaat Aditya , Istemi Ekin Akkus , Ruichuan Chen , Lei Yang , Feng Yan
In today’s production machine learning (ML) systems, models are continuously trained, improved, and deployed. ML design and training are becoming a continuous workflow of various tasks that have dynamic resource demands. Serverless computing is an emerging cloud paradigm that provides transparent resource management and scaling for users and has the potential to revolutionize the routine of ML design and training. However, hosting modern ML workflows on existing serverless platforms has non-trivial challenges due to their intrinsic design limitations such as stateless nature, limited communication support across function instances, and limited function execution duration. These limitations result in a lack of an overarching view and adaptation mechanism for training dynamics, and an amplification of existing problems in ML workflows.
To address the above challenges, we propose SMLT, an automated, scalable and adaptive serverless framework on public cloud to enable efficient and user-centric ML design and training. SMLT employs an automated and adaptive scheduling mechanism to dynamically optimize the deployment and resource scaling for ML tasks during training. SMLT further enables user-centric ML workflow execution by supporting user-specified training deadline and budget limit. In addition, by providing an end-to-end design, SMLT solves the intrinsic problems in public cloud serverless platforms such as the communication overhead, limited function execution duration, need for repeated initialization, and also provides explicit fault tolerance for ML training. SMLT is open-sourced and compatible with all major ML frameworks. Our experimental evaluation with large, sophisticated modern ML models demonstrates that SMLT outperforms the state-of-the-art VM-based systems and existing public cloud serverless ML training frameworks in both training speed (up to 8×) and monetary cost (up to 3×).
在当今的生产型机器学习(ML)系统中,模型需要不断训练、改进和部署。ML 的设计和训练正在成为各种任务的连续工作流程,而这些任务都有动态的资源需求。无服务器计算是一种新兴的云计算模式,可为用户提供透明的资源管理和扩展,并有可能彻底改变 ML 设计和训练的常规工作。然而,在现有的无服务器平台上托管现代 ML 工作流面临着非同小可的挑战,原因在于其固有的设计限制,例如无状态特性、跨功能实例的通信支持有限以及功能执行持续时间有限。为了应对上述挑战,我们在公共云上提出了一个自动化、可扩展和自适应的无服务器框架--SMLT,以实现高效和以用户为中心的 ML 设计和训练。SMLT 采用自动化自适应调度机制,在训练过程中动态优化 ML 任务的部署和资源扩展。通过支持用户指定的训练截止日期和预算限制,SMLT 进一步实现了以用户为中心的 ML 工作流执行。此外,通过提供端到端设计,SMLT 解决了公有云无服务器平台的固有问题,如通信开销、有限的函数执行时间、需要重复初始化等,还为 ML 训练提供了显式容错。SMLT 是开源的,兼容所有主要的 ML 框架。我们使用大型、复杂的现代 ML 模型进行的实验评估表明,SMLT 在训练速度(高达 8 倍)和货币成本(高达 3 倍)方面都优于最先进的基于虚拟机的系统和现有的公共云无服务器 ML 训练框架。
{"title":"Enabling scalable and adaptive machine learning training via serverless computing on public cloud","authors":"Ahsan Ali ,&nbsp;Xiaolong Ma ,&nbsp;Syed Zawad ,&nbsp;Paarijaat Aditya ,&nbsp;Istemi Ekin Akkus ,&nbsp;Ruichuan Chen ,&nbsp;Lei Yang ,&nbsp;Feng Yan","doi":"10.1016/j.peva.2024.102451","DOIUrl":"10.1016/j.peva.2024.102451","url":null,"abstract":"<div><div>In today’s production machine learning (ML) systems, models are continuously trained, improved, and deployed. ML design and training are becoming a continuous workflow of various tasks that have dynamic resource demands. Serverless computing is an emerging cloud paradigm that provides transparent resource management and scaling for users and has the potential to revolutionize the routine of ML design and training. However, hosting modern ML workflows on existing serverless platforms has non-trivial challenges due to their intrinsic design limitations such as stateless nature, limited communication support across function instances, and limited function execution duration. These limitations result in a lack of an overarching view and adaptation mechanism for training dynamics, and an amplification of existing problems in ML workflows.</div><div>To address the above challenges, we propose <span>SMLT</span>, an automated, scalable and adaptive serverless framework on public cloud to enable efficient and user-centric ML design and training. <span>SMLT</span> employs an automated and adaptive scheduling mechanism to dynamically optimize the deployment and resource scaling for ML tasks during training. <span>SMLT</span> further enables user-centric ML workflow execution by supporting user-specified training deadline and budget limit. In addition, by providing an end-to-end design, <span>SMLT</span> solves the intrinsic problems in public cloud serverless platforms such as the communication overhead, limited function execution duration, need for repeated initialization, and also provides explicit fault tolerance for ML training. <span>SMLT</span> is open-sourced and compatible with all major ML frameworks. Our experimental evaluation with large, sophisticated modern ML models demonstrates that <span>SMLT</span> outperforms the state-of-the-art VM-based systems and existing public cloud serverless ML training frameworks in both training speed (up to 8<span><math><mo>×</mo></math></span>) and monetary cost (up to 3<span><math><mo>×</mo></math></span>).</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102451"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedCust: Offloading hyperparameter customization for federated learning FedCust:为联合学习卸载超参数定制功能
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-11-16 DOI: 10.1016/j.peva.2024.102450
Syed Zawad , Xiaolong Ma , Jun Yi , Cheng Li , Minjia Zhang , Lei Yang , Feng Yan , Yuxiong He
Federated Learning (FL) is a new machine learning paradigm that enables training models collaboratively across clients without sharing private data. In FL, data is non-uniformly distributed among clients (i.e., data heterogeneity) and cannot be redistributed nor monitored like in conventional machine learning due to privacy constraints. Such data heterogeneity and privacy requirements bring new challenges for learning hyperparameter optimization as the training dynamics change across clients even within the same training round and they are difficult to be measured due to privacy. The state-of-the-art in hyperparameter customization can greatly improve FL model accuracy but also incur significant computing overheads and power consumption on client devices, and slowdown the training process. To address the prohibitively expensive cost challenge, we explore the possibility of offloading hyperparameter customization to servers. We propose FedCust, a framework that offloads expensive hyperparameter customization cost from the client devices to the central server without violating privacy constraints. Our key discovery is that it is not necessary to do hyperparameter customization for every client, and clients with similar data heterogeneity can use the same hyperparameters to achieve good training performance. We propose heterogeneity measurement metrics for clustering clients into groups such that clients within the same group share hyperparameters. FedCust uses the proxy data from initial model design to emulate different heterogeneity groups and perform hyperparameter customization on the server side without accessing client data nor information. To make the hyperparameter customization scalable, FedCust further employs a Bayesian-strengthened tuner to significantly accelerates the hyperparameter customization speed. Extensive evaluation demonstrates that FedCust achieves up to 7/2/4/4/6% better accuracy than the widely adopted one-size-fits-all approach on popular FL benchmarks FEMNIST, Shakespeare, Cifar100, Cifar10, and Fashion-MNIST respectively, while being scalable and reducing computation, memory, and energy consumption on the client devices, without compromising privacy constraints.
联合学习(FL)是一种新的机器学习范式,它能在不共享隐私数据的情况下跨客户端协作训练模型。在联机学习中,数据在客户端之间是非均匀分布的(即数据异构性),由于隐私限制,不能像传统机器学习那样进行再分配或监控。这种数据异质性和隐私要求给学习超参数优化带来了新的挑战,因为即使在同一轮训练中,不同客户的训练动态也会发生变化,而且由于隐私原因,很难对其进行测量。最先进的超参数定制技术可以大大提高 FL 模型的准确性,但同时也会在客户端设备上产生巨大的计算开销和功耗,并减慢训练过程。为了解决成本过高的难题,我们探索了将超参数定制卸载到服务器上的可能性。我们提出了 FedCust,这是一个在不违反隐私约束的情况下将昂贵的超参数定制成本从客户端设备卸载到中央服务器的框架。我们的主要发现是,没有必要为每个客户端进行超参数定制,具有相似数据异质性的客户端可以使用相同的超参数来实现良好的训练性能。我们提出了异质性测量指标,用于将客户机聚类成组,使同组内的客户机共享超参数。FedCust 使用初始模型设计中的代理数据来模拟不同的异质性组,并在服务器端执行超参数定制,而无需访问客户端数据或信息。为了使超参数定制具有可扩展性,FedCust 进一步采用了贝叶斯强化调谐器,显著加快了超参数定制速度。广泛的评估表明,在流行的 FL 基准 FEMNIST、Shakespeare、Cifar100、Cifar10 和 Fashion-MNIST 上,FedCust 比广泛采用的 "一刀切 "方法分别提高了高达 7/2/4/4/6%的准确率,同时还具有可扩展性,降低了客户端设备的计算量、内存和能耗,而且不影响隐私约束。
{"title":"FedCust: Offloading hyperparameter customization for federated learning","authors":"Syed Zawad ,&nbsp;Xiaolong Ma ,&nbsp;Jun Yi ,&nbsp;Cheng Li ,&nbsp;Minjia Zhang ,&nbsp;Lei Yang ,&nbsp;Feng Yan ,&nbsp;Yuxiong He","doi":"10.1016/j.peva.2024.102450","DOIUrl":"10.1016/j.peva.2024.102450","url":null,"abstract":"<div><div>Federated Learning (FL) is a new machine learning paradigm that enables training models collaboratively across clients without sharing private data. In FL, data is non-uniformly distributed among clients (i.e., data heterogeneity) and cannot be redistributed nor monitored like in conventional machine learning due to privacy constraints. Such data heterogeneity and privacy requirements bring new challenges for learning hyperparameter optimization as the training dynamics change across clients even within the same training round and they are difficult to be measured due to privacy. The state-of-the-art in hyperparameter customization can greatly improve FL model accuracy but also incur significant computing overheads and power consumption on client devices, and slowdown the training process. To address the prohibitively expensive cost challenge, we explore the possibility of offloading hyperparameter customization to servers. We propose <em>FedCust</em>, a framework that offloads expensive hyperparameter customization cost from the client devices to the central server without violating privacy constraints. Our key discovery is that it is not necessary to do hyperparameter customization for every client, and clients with similar data heterogeneity can use the same hyperparameters to achieve good training performance. We propose heterogeneity measurement metrics for clustering clients into groups such that clients within the same group share hyperparameters. <em>FedCust</em> uses the proxy data from initial model design to emulate different heterogeneity groups and perform hyperparameter customization on the server side without accessing client data nor information. To make the hyperparameter customization scalable, <em>FedCust</em> further employs a Bayesian-strengthened tuner to significantly accelerates the hyperparameter customization speed. Extensive evaluation demonstrates that <em>FedCust</em> achieves up to 7/2/4/4/6% better accuracy than the widely adopted one-size-fits-all approach on popular FL benchmarks FEMNIST, Shakespeare, Cifar100, Cifar10, and Fashion-MNIST respectively, while being scalable and reducing computation, memory, and energy consumption on the client devices, without compromising privacy constraints.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102450"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling grant-free multiple access through Successive Interference Cancellation 通过连续干扰消除实现免授权多路访问
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2024-12-01 DOI: 10.1016/j.peva.2024.102460
Asmad Bin Abdul Razzaque, Andrea Baiocchi
Internet of Things (IoT) is stirring a surge of interest in effective methods for sharing communication channels, with nodes transmitting sporadic, short messages. These messages are often related to control systems that collect sensor data to drive process actuation, such as in industries, autonomous vehicles, and environmental control. Traditional approaches that dominate wireless and cellular communications prove most effective when dealing with a limited number of concurrently active nodes, sending relatively large volumes of data. We address a different scenario where numerous nodes generate and transmit short messages according to non-periodic schedules. In such cases, random multiple access becomes the typical approach for sharing the communication channel. We propose a general modeling framework that enables the investigation of the impact of Successive Interference Cancellation (SIC) on two of the main random access paradigms, namely Slotted ALOHA (SA) and Carrier-Sense Multiple Access (CSMA). The key varying parameter is the target Signal to Interference plus Noise Ratio (SINR) at the receiver, directly tied to the spectral efficiency of the adopted coding and modulation scheme. Two different regimes are highlighted that bring the system to work at relative maxima of the sum-rate. We further investigate the impact of different transmission power settings and imperfect interference cancellation. Leveraging on the insight gained in the saturated node scenario, an adaptive algorithm is defined for the dynamic case, where the number of backlogged nodes varies over time. The numerical results provide evidence of a significant potential for grant-free multiple access, calling for practical algorithms to translate this promise into feasible realizations.
物联网(IoT)正在激起人们对共享通信渠道的有效方法的兴趣,节点发送零星的短消息。这些信息通常与收集传感器数据以驱动过程驱动的控制系统相关,例如在工业、自动驾驶汽车和环境控制中。事实证明,在处理并发活动节点数量有限、发送相对大量数据的情况下,主导无线和蜂窝通信的传统方法是最有效的。我们解决了一个不同的场景,其中许多节点根据非周期性计划生成和传输短消息。在这种情况下,随机多址成为共享通信信道的典型方法。我们提出了一个通用的建模框架,可以研究连续干扰抵消(SIC)对两种主要随机接入范式的影响,即开槽ALOHA (SA)和载波感知多址(CSMA)。关键的变化参数是接收机的目标信噪比(SINR),它直接关系到所采用的编码和调制方案的频谱效率。强调了两种不同的制度,使该制度以相对最大的总和费率工作。我们进一步研究了不同发射功率设置和不完全干扰消除的影响。利用在饱和节点场景中获得的洞察力,为动态情况定义了自适应算法,其中积压节点的数量随时间而变化。数值结果证明了免授权多址的巨大潜力,需要实用的算法将这一承诺转化为可行的实现。
{"title":"Enabling grant-free multiple access through Successive Interference Cancellation","authors":"Asmad Bin Abdul Razzaque,&nbsp;Andrea Baiocchi","doi":"10.1016/j.peva.2024.102460","DOIUrl":"10.1016/j.peva.2024.102460","url":null,"abstract":"<div><div>Internet of Things (IoT) is stirring a surge of interest in effective methods for sharing communication channels, with nodes transmitting sporadic, short messages. These messages are often related to control systems that collect sensor data to drive process actuation, such as in industries, autonomous vehicles, and environmental control. Traditional approaches that dominate wireless and cellular communications prove most effective when dealing with a limited number of concurrently active nodes, sending relatively large volumes of data. We address a different scenario where numerous nodes generate and transmit short messages according to non-periodic schedules. In such cases, random multiple access becomes the typical approach for sharing the communication channel. We propose a general modeling framework that enables the investigation of the impact of Successive Interference Cancellation (SIC) on two of the main random access paradigms, namely Slotted ALOHA (SA) and Carrier-Sense Multiple Access (CSMA). The key varying parameter is the target Signal to Interference plus Noise Ratio (SINR) at the receiver, directly tied to the spectral efficiency of the adopted coding and modulation scheme. Two different regimes are highlighted that bring the system to work at relative maxima of the sum-rate. We further investigate the impact of different transmission power settings and imperfect interference cancellation. Leveraging on the insight gained in the saturated node scenario, an adaptive algorithm is defined for the dynamic case, where the number of backlogged nodes varies over time. The numerical results provide evidence of a significant potential for grant-free multiple access, calling for practical algorithms to translate this promise into feasible realizations.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"167 ","pages":"Article 102460"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143182242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lure: A simulator for networks of batteryless intermittent nodes 诱惑无电池间歇节点网络模拟器
IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-01 Epub Date: 2024-08-23 DOI: 10.1016/j.peva.2024.102440
Mathew L. Wymore, Rohit Sahu, Thomas Ruminski, Vishal Deep, Morgan Ambourn, Gregory Ling, Vishak Narayanan, William Asiedu, Daji Qiao, Henry Duwe

The emerging paradigm of batteryless intermittent sensor networks (BISNs) presents new challenges for researchers of low-power wireless systems and protocols. The nature of these challenges exacerbates the difficulty of evaluating networks of physical sensor nodes, making simulation an even more important component in evaluating performance metrics, such as communication throughput and delay, for BISN designs. To our knowledge, existing simulators and analytical models do not meet the unique needs of BISN research; therefore, we have created a new open-source BISN simulator named Lure. Lure is designed from the ground-up for simulation of batteryless intermittent systems and networks. Written in Python, Lure is powerful, flexible, highly configurable, and supports rapid prototyping of new protocols, systems, and applications, with a low learning curve. In this paper, we present Lure and validate it with experimental data to show that Lure can accurately reflect the reality of BISNs. We then demonstrate the process of applying Lure to research questions in select case studies.

新兴的无电池间歇式传感器网络(BISN)模式为低功耗无线系统和协议研究人员带来了新的挑战。这些挑战的性质加剧了评估物理传感器节点网络的难度,使仿真成为评估 BISN 设计性能指标(如通信吞吐量和延迟)的一个更加重要的组成部分。据我们所知,现有的模拟器和分析模型无法满足 BISN 研究的独特需求;因此,我们创建了一个新的开源 BISN 模拟器,名为 Lure。Lure 是专为模拟无电池间歇系统和网络而设计的。Lure 使用 Python 编写,功能强大、灵活、可配置性高,支持新协议、系统和应用的快速原型开发,学习曲线较低。在本文中,我们将介绍 Lure,并通过实验数据对其进行验证,以表明 Lure 能够准确反映 BISN 的实际情况。然后,我们演示了将 Lure 应用于精选案例研究中的研究问题的过程。
{"title":"Lure: A simulator for networks of batteryless intermittent nodes","authors":"Mathew L. Wymore,&nbsp;Rohit Sahu,&nbsp;Thomas Ruminski,&nbsp;Vishal Deep,&nbsp;Morgan Ambourn,&nbsp;Gregory Ling,&nbsp;Vishak Narayanan,&nbsp;William Asiedu,&nbsp;Daji Qiao,&nbsp;Henry Duwe","doi":"10.1016/j.peva.2024.102440","DOIUrl":"10.1016/j.peva.2024.102440","url":null,"abstract":"<div><p>The emerging paradigm of batteryless intermittent sensor networks (BISNs) presents new challenges for researchers of low-power wireless systems and protocols. The nature of these challenges exacerbates the difficulty of evaluating networks of physical sensor nodes, making simulation an even more important component in evaluating performance metrics, such as communication throughput and delay, for BISN designs. To our knowledge, existing simulators and analytical models do not meet the unique needs of BISN research; therefore, we have created a new open-source BISN simulator named <em>Lure</em>. Lure is designed from the ground-up for simulation of batteryless intermittent systems and networks. Written in Python, Lure is powerful, flexible, highly configurable, and supports rapid prototyping of new protocols, systems, and applications, with a low learning curve. In this paper, we present Lure and validate it with experimental data to show that Lure can accurately reflect the reality of BISNs. We then demonstrate the process of applying Lure to research questions in select case studies.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"166 ","pages":"Article 102440"},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0166531624000452/pdfft?md5=1c6343234e3ac7dad5efd12075fa6bfd&pid=1-s2.0-S0166531624000452-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Performance Evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1