首页 > 最新文献

Performance Evaluation最新文献

英文 中文
Optimizing resource allocation for geographically-distributed inference by large language models 基于大型语言模型的地理分布推理资源优化分配
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102527
Tingyang Sun , Ting He , Bo Ji , Parimal Parag
Large language models (LLMs) have demonstrated extraordinary performance in many artificial intelligence (AI) tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. Recently, a distributed system called PETALS was developed to lower the barrier for deploying LLMs by splitting the model blocks across multiple servers with low-end GPUs distributed over the Internet, which was much faster than swapping the model parameters between the GPU memory and other cheaper but slower local storage media. However, the performance of such a distributed system critically depends on the resource allocation, and how to do so optimally remains unknown. In this work, we present the first systematic study of the resource allocation problem in distributed LLM inference, with focus on two important decisions: block placement and request routing. Our main results include: (i) experimentally validated performance models that can predict the inference performance under given block placement and request routing decisions, (ii) a formulation of the offline optimization of block placement and request routing as a mixed integer linear programming (MILP) problem together with the NP-hardness proof and a polynomial-complexity algorithm with guaranteed performance, and (iii) an adaptation of the offline algorithm for the online setting with the same performance guarantee under bounded load. Through both experiments and experimentally-validated simulations, we have verified that the proposed solution can substantially reduce the inference time compared to the state-of-the-art solution in diverse settings with geographically-distributed servers. As a byproduct, we have also developed a light-weighted CPU-only simulator capable of predicting the performance of distributed LLM inference on GPU servers, which can evaluate large deployments and facilitate future research for researchers with limited GPU access.
大型语言模型(llm)在许多人工智能(AI)任务中表现出非凡的性能,但由于对高端gpu的要求,即使经过训练,使用起来也很昂贵。最近,一种名为PETALS的分布式系统被开发出来,通过在互联网上分布的低端GPU的多个服务器上分割模型块来降低部署llm的障碍,这比在GPU内存和其他更便宜但速度较慢的本地存储介质之间交换模型参数要快得多。然而,这种分布式系统的性能严重依赖于资源分配,如何实现最佳分配仍然是未知的。在这项工作中,我们首次系统地研究了分布式LLM推理中的资源分配问题,重点关注两个重要的决策:块放置和请求路由。我们的主要结果包括:(i)实验验证的性能模型,可以预测给定块放置和请求路由决策下的推理性能;(ii)将块放置和请求路由的离线优化表述为混合整数线性规划(MILP)问题,以及具有保证性能的np硬度证明和多项式复杂度算法;(iii)有界负载下具有相同性能保证的离线算法对在线设置的自适应。通过实验和实验验证的模拟,我们已经验证了与地理分布服务器的不同设置下的最先进解决方案相比,所提出的解决方案可以大大减少推理时间。作为副产品,我们还开发了一个轻量级的仅cpu模拟器,能够预测GPU服务器上分布式LLM推理的性能,它可以评估大型部署,并为GPU访问受限的研究人员提供便利。
{"title":"Optimizing resource allocation for geographically-distributed inference by large language models","authors":"Tingyang Sun ,&nbsp;Ting He ,&nbsp;Bo Ji ,&nbsp;Parimal Parag","doi":"10.1016/j.peva.2025.102527","DOIUrl":"10.1016/j.peva.2025.102527","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated extraordinary performance in many artificial intelligence (AI) tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. Recently, a distributed system called PETALS was developed to lower the barrier for deploying LLMs by splitting the model blocks across multiple servers with low-end GPUs distributed over the Internet, which was much faster than swapping the model parameters between the GPU memory and other cheaper but slower local storage media. However, the performance of such a distributed system critically depends on the resource allocation, and how to do so optimally remains unknown. In this work, we present the first systematic study of the resource allocation problem in distributed LLM inference, with focus on two important decisions: block placement and request routing. Our main results include: (i) experimentally validated performance models that can predict the inference performance under given block placement and request routing decisions, (ii) a formulation of the offline optimization of block placement and request routing as a mixed integer linear programming (MILP) problem together with the NP-hardness proof and a polynomial-complexity algorithm with guaranteed performance, and (iii) an adaptation of the offline algorithm for the online setting with the same performance guarantee under bounded load. Through both experiments and experimentally-validated simulations, we have verified that the proposed solution can substantially reduce the inference time compared to the state-of-the-art solution in diverse settings with geographically-distributed servers. As a byproduct, we have also developed a light-weighted CPU-only simulator capable of predicting the performance of distributed LLM inference on GPU servers, which can evaluate large deployments and facilitate future research for researchers with limited GPU access.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102527"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing asymptotically optimal policies for continuous-time weakly coupled MDPs 连续时间弱耦合mdp的渐近最优策略设计
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102528
Matthieu Perbal, Balakrishna Prabhu, Ina Maria Verloop
We study the continuous-time Weakly Coupled Markov Decision Process (WCMDP), a class of decision problems involving multiple interacting Markov processes (or “arms”) subject to shared resource constraints. We present a general framework for policy design using a combination of an underlying Markov process and a sequence of mappings. Our main theoretical result establishes sufficient conditions on the Markov process and mapping defining the policy, such that it is asymptotically optimal as the number of arms grows.
We construct both deterministic and randomized policies based on a solution to a linear program (LP). These policies initially assign actions to arms — either proportionally (deterministic) or randomly — based on conditional measures derived from the LP. As this initial allocation may violate feasibility constraints, we introduce a mapping to enforce the resource constraints are satisfied. Finally, we numerically evaluate and compare the performance of our proposed policies, both deterministic and randomized, under different choices of mappings.
我们研究了连续时间弱耦合马尔可夫决策过程(WCMDP),这是一类涉及多个受共享资源约束的相互作用的马尔可夫过程(或“手臂”)的决策问题。我们提出了一个使用底层马尔可夫过程和映射序列相结合的策略设计的一般框架。我们的主要理论结果建立了马尔可夫过程和映射定义策略的充分条件,使得该策略随着武器数量的增长是渐近最优的。我们基于线性规划(LP)的一个解构造了确定性和随机策略。这些政策最初根据LP衍生的条件措施,按比例(确定性)或随机地将行动分配给武器。由于这种初始分配可能违反可行性约束,我们引入映射来强制满足资源约束。最后,在不同的映射选择下,我们数值评估和比较了我们所提出的策略的性能,包括确定性策略和随机策略。
{"title":"Designing asymptotically optimal policies for continuous-time weakly coupled MDPs","authors":"Matthieu Perbal,&nbsp;Balakrishna Prabhu,&nbsp;Ina Maria Verloop","doi":"10.1016/j.peva.2025.102528","DOIUrl":"10.1016/j.peva.2025.102528","url":null,"abstract":"<div><div>We study the continuous-time Weakly Coupled Markov Decision Process (WCMDP), a class of decision problems involving multiple interacting Markov processes (or “arms”) subject to shared resource constraints. We present a general framework for policy design using a combination of an underlying Markov process and a sequence of mappings. Our main theoretical result establishes sufficient conditions on the Markov process and mapping defining the policy, such that it is asymptotically optimal as the number of arms grows.</div><div>We construct both deterministic and randomized policies based on a solution to a linear program (LP). These policies initially assign actions to arms — either proportionally (deterministic) or randomly — based on conditional measures derived from the LP. As this initial allocation may violate feasibility constraints, we introduce a mapping to enforce the resource constraints are satisfied. Finally, we numerically evaluate and compare the performance of our proposed policies, both deterministic and randomized, under different choices of mappings.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102528"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The last, the least, and the urgent: Fluid modeling and performance equivalence for scheduling policies in partial service queues with abandonment 最后,最少,也是最紧迫的:放弃部分服务队列中调度策略的流体建模和性能等效
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102517
Andres Ferragut, Diego Goldsztajn, Fernando Paganini
In several queueing systems, arriving tasks or customers have both service and timing requirements, the latter expressed as a deadline for the task to be served. These systems with customer abandonment have a long and rich history in queueing theory, and have several applications in task scheduling in computer systems, operations research problems, etc. A common feature in all of these works is that they deal with customers reneging from the system only while in the queue, and not during service. However, in several applications, customers may also leave during service, and the partial work performed by the system during their stay is still useful.
In this paper we analyze these partial service queues with abandonment in a many-server setting, characterizing the equilibrium performance of several policies in terms of the amount of service attained by tasks. For this purpose, we develop fluid models with two-dimensional independent variables, corresponding to service and sojourn times, which take the form of partial differential equations expressed in weak form. These fluid models allow us to consider general and possibly correlated service and timing requirements, as well as a wide range of service disciplines. In particular, we focus on Earliest-Deadline-First, Least-Attained-Service and Last-Come-First-Served, and establish that all three policies have the same equilibrium performance, even though the latter two do not need any information about deadlines. This striking property means that designers may avoid the difficult job of estimating deadlines without incurring a performance penalty. The fluid model conclusions are validated by extensive numerical experiments.
在一些排队系统中,到达的任务或客户同时具有服务和时间要求,后者表示为要服务的任务的截止日期。这些客户放弃系统在排队理论中有着悠久而丰富的历史,在计算机系统的任务调度、运筹学问题等方面有着广泛的应用。所有这些工作的一个共同特点是,它们只在排队时处理违背系统的客户,而不是在服务期间。然而,在一些应用程序中,客户也可能在服务期间离开,系统在他们逗留期间执行的部分工作仍然是有用的。在本文中,我们在多服务器设置中分析了这些带有放弃的部分服务队列,并根据任务获得的服务量描述了几种策略的均衡性能。为此,我们开发了具有二维自变量的流体模型,对应于服务和逗留时间,采用弱形式表示的偏微分方程的形式。这些流体模型使我们能够考虑一般的和可能相关的服务和时间要求,以及广泛的服务学科。特别是,我们关注最早截止日期优先、最少获得服务和最后先到先得,并确定所有三种策略具有相同的均衡性能,即使后两种策略不需要任何关于截止日期的信息。这个惊人的特性意味着设计师可以避免估算截止日期的困难工作,而不会导致性能损失。大量的数值实验验证了流体模型的结论。
{"title":"The last, the least, and the urgent: Fluid modeling and performance equivalence for scheduling policies in partial service queues with abandonment","authors":"Andres Ferragut,&nbsp;Diego Goldsztajn,&nbsp;Fernando Paganini","doi":"10.1016/j.peva.2025.102517","DOIUrl":"10.1016/j.peva.2025.102517","url":null,"abstract":"<div><div>In several queueing systems, arriving tasks or customers have both service and timing requirements, the latter expressed as a deadline for the task to be served. These systems with customer abandonment have a long and rich history in queueing theory, and have several applications in task scheduling in computer systems, operations research problems, etc. A common feature in all of these works is that they deal with customers reneging from the system only while in the queue, and not during service. However, in several applications, customers may also leave during service, and the partial work performed by the system during their stay is still useful.</div><div>In this paper we analyze these partial service queues with abandonment in a many-server setting, characterizing the equilibrium performance of several policies in terms of the amount of service attained by tasks. For this purpose, we develop fluid models with two-dimensional independent variables, corresponding to service and sojourn times, which take the form of partial differential equations expressed in weak form. These fluid models allow us to consider general and possibly correlated service and timing requirements, as well as a wide range of service disciplines. In particular, we focus on Earliest-Deadline-First, Least-Attained-Service and Last-Come-First-Served, and establish that all three policies have the same equilibrium performance, even though the latter two do not need any information about deadlines. This striking property means that designers may avoid the difficult job of estimating deadlines without incurring a performance penalty. The fluid model conclusions are validated by extensive numerical experiments.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102517"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASIP tandem queues with Lévy input and consumption 带有lims输入和消耗的ASIP串联队列
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102513
Onno Boxma , Offer Kella , Jacques Resing
We consider an ASIP (asymmetric inclusion process) tandem queue, in which the first queue receives a fluid input according to a nondecreasing Lévy process. Each queue has a gate that opens after independent, exponentially distributed periods for an infinitesimal amount of time, allowing the queue content to move to the next queue. In addition, again at independent exponentially distributed instants, a fixed fraction of a queue content is removed from the system.
For this model, restricting ourselves to steady state, we obtain the following results. (i) We derive the buffer content distribution of the first queue. (ii) For the 2-queue model, we obtain relatively simple explicit expressions for the Laplace transform of the joint buffer content in several special cases. (iii) Asymptotic results are obtained for the 2-queue model when the above-mentioned buffer content removal process approaches a shot-noise process. (iv) For the general n-queue case, we show how all moments of the buffer contents at all queues can be obtained. (v) For the general n-queue case, we sketch an approximation method that allows one in principle to derive tractable expressions for the Laplace transform of the buffer content at each queue, with exact mean buffer contents at all queues.
我们考虑一个ASIP(非对称包含过程)串联队列,其中第一个队列根据非递减的lsamvy过程接收流体输入。每个队列都有一个门,在独立的、指数分布的时间段内打开一个无限小的时间,允许队列内容移动到下一个队列。此外,同样在独立的指数分布时刻,从系统中删除队列内容的固定部分。对于这个模型,我们将自己限制在稳态,得到如下结果。(i)导出了第一个队列的缓冲区内容分布。(ii)对于2队列模型,我们得到了几种特殊情况下联合缓冲区内容的拉普拉斯变换的比较简单的显式表达式。(iii)对于2队列模型,当上述缓冲区内容移除过程接近于一个短噪声过程时,得到了渐近结果。(iv)对于一般的n队列情况,我们展示了如何获得所有队列上缓冲区内容的所有矩。(v)对于一般的n队列情况,我们概述了一种近似方法,该方法原则上允许人们推导出每个队列上缓冲区内容的拉普拉斯变换的易于处理的表达式,并具有所有队列上的精确平均缓冲区内容。
{"title":"ASIP tandem queues with Lévy input and consumption","authors":"Onno Boxma ,&nbsp;Offer Kella ,&nbsp;Jacques Resing","doi":"10.1016/j.peva.2025.102513","DOIUrl":"10.1016/j.peva.2025.102513","url":null,"abstract":"<div><div>We consider an ASIP (asymmetric inclusion process) tandem queue, in which the first queue receives a fluid input according to a nondecreasing Lévy process. Each queue has a gate that opens after independent, exponentially distributed periods for an infinitesimal amount of time, allowing the queue content to move to the next queue. In addition, again at independent exponentially distributed instants, a fixed fraction of a queue content is removed from the system.</div><div>For this model, restricting ourselves to steady state, we obtain the following results. (i) We derive the buffer content distribution of the first queue. (ii) For the 2-queue model, we obtain relatively simple explicit expressions for the Laplace transform of the joint buffer content in several special cases. (iii) Asymptotic results are obtained for the 2-queue model when the above-mentioned buffer content removal process approaches a shot-noise process. (iv) For the general <span><math><mi>n</mi></math></span>-queue case, we show how all moments of the buffer contents at all queues can be obtained. (v) For the general <span><math><mi>n</mi></math></span>-queue case, we sketch an approximation method that allows one in principle to derive tractable expressions for the Laplace transform of the buffer content at each queue, with exact mean buffer contents at all queues.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102513"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Switching constrained OCO with predictions and feedback delays 具有预测和反馈延迟的切换约束OCO
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102524
Weici Pan, Zhenhua Liu
We examine Online Convex Optimization (OCO) problems with feedback delay and a strict limit on decision switching, which exists in applications such as smart grid and learning. Existing algorithms developed for traditional OCO struggle in this setting, often violating switching constraints or incurring high regrets, as evidenced by simulations. In this paper, we establish a new algorithm, Follow-the-Maximally-Coupled-Latest-Leader (FMCLL), achieving a near-optimal regret of O(T/S) for such problems with delayed feedbacks and a bound of O(T/Sτ) for problems with predictions of τ rounds, even though the player is only allowed to move at most S times in expectation across T rounds. FMCLL meets performance bounds in scenarios with delays and predictions by using maximal coupling sampling to inform algorithm design for switching-constrained problems. To better apply our framework to practical applications, we also extend the algorithm and results to the bandit feedback setting. Simulations demonstrate FMCLL’s superiority over traditional Gradient Descent or Follow-the-Leader algorithms, excelling under adversarial or stochastic losses and reducing constraint violations.
我们研究了在智能电网和学习等应用中存在的带有反馈延迟和严格限制决策切换的在线凸优化(OCO)问题。仿真结果表明,针对传统OCO开发的现有算法往往会违反切换约束或导致高遗憾。在本文中,我们建立了一个新的算法,跟随最大耦合最新领导者(FMCLL),对于具有延迟反馈的问题实现了O(T/S)的近最优后悔,对于具有τ轮预测的问题实现了O(T/S−τ)的界,即使玩家在T轮中最多只允许在期望中移动S次。FMCLL通过使用最大耦合采样来为切换约束问题的算法设计提供信息,从而满足有延迟和预测场景下的性能界限。为了更好地将我们的框架应用于实际应用,我们还将算法和结果扩展到强盗反馈设置中。仿真结果表明,FMCLL优于传统的梯度下降算法或跟随者算法,在对抗或随机损失和减少约束违反方面表现出色。
{"title":"Switching constrained OCO with predictions and feedback delays","authors":"Weici Pan,&nbsp;Zhenhua Liu","doi":"10.1016/j.peva.2025.102524","DOIUrl":"10.1016/j.peva.2025.102524","url":null,"abstract":"<div><div>We examine Online Convex Optimization (OCO) problems with feedback delay and a strict limit on decision switching, which exists in applications such as smart grid and learning. Existing algorithms developed for traditional OCO struggle in this setting, often violating switching constraints or incurring high regrets, as evidenced by simulations. In this paper, we establish a new algorithm, Follow-the-Maximally-Coupled-Latest-Leader (FMCLL), achieving a near-optimal regret of <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>T</mi><mo>/</mo><mi>S</mi><mo>)</mo></mrow></mrow></math></span> for such problems with delayed feedbacks and a bound of <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>T</mi><mo>/</mo><mi>S</mi><mo>−</mo><mi>τ</mi><mo>)</mo></mrow></mrow></math></span> for problems with predictions of <span><math><mi>τ</mi></math></span> rounds, even though the player is only allowed to move at most <span><math><mi>S</mi></math></span> times in expectation across <span><math><mi>T</mi></math></span> rounds. FMCLL meets performance bounds in scenarios with delays and predictions by using maximal coupling sampling to inform algorithm design for switching-constrained problems. To better apply our framework to practical applications, we also extend the algorithm and results to the bandit feedback setting. Simulations demonstrate FMCLL’s superiority over traditional Gradient Descent or Follow-the-Leader algorithms, excelling under adversarial or stochastic losses and reducing constraint violations.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102524"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
γ-CounterBoost: Optimizing response time tails using job type information only γ-CounterBoost:仅使用作业类型信息优化响应时间尾部
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102514
Nils Charlet, Benny Van Houdt
In a recent paper the γ-Boost scheduling policy was shown to minimize the tail of the response time distribution in a light-tailed M/G/1-queue. This policy schedules jobs using a boosted arrival time, defined as the arrival time of a job minus its boost, where the boost of a job depends on its exact job size. The γ-Boost policy can also be used when only partial job size information is available, such as the type of an incoming job. In such case the boost bi of a job depends solely on its type i and γ-Boost was shown to optimize the tail among all boost policies, where a boost policy is fully determined by the bi values. In the partial information setting γ-Boost relies on two types of information: job types and arrival times.
This paper focuses on the problem of minimizing the tail in a light-tailed M/G/1-queue in the partial job size information setting when the scheduler only makes use of the job types and does not exploit arrival times. Prior work showed that in case of 2 job types the so-called Nudge-M policy minimizes the tail in a large class of scheduling policies. In this paper we introduce the γ-CounterBoost policy in the partial information setting with d2 job types and prove that it minimizes the tail in an even broader class of scheduling policies called Contextual CounterBoost policies. The γ-CounterBoost policy reduces to the Nudge-M policy in case of d=2 job types.
在最近的一篇论文中,证明了γ-Boost调度策略可以最小化轻尾M/G/1队列响应时间分布的尾部。此策略使用增强的到达时间调度作业,该时间定义为作业的到达时间减去其boost,其中作业的boost取决于其确切的作业大小。当只有部分作业大小信息可用时,例如传入作业的类型,也可以使用γ-Boost策略。在这种情况下,作业的boost bi仅取决于其类型i,并且γ-Boost被证明可以优化所有boost策略中的尾部,其中boost策略完全由bi值决定。在部分信息设置中,γ-Boost依赖于两种类型的信息:作业类型和到达时间。本文研究了在部分作业大小信息设置下,调度程序仅利用作业类型而不利用到达时间,最小化轻尾M/G/1队列尾部的问题。先前的研究表明,在两种作业类型的情况下,所谓的Nudge-M策略在一大类调度策略中最小化了尾部。本文在d≥2个作业类型的部分信息设置中引入了γ-CounterBoost策略,并证明了它在更广泛的调度策略(称为上下文CounterBoost策略)中最小化了尾部。在d=2个作业类型的情况下,γ-CounterBoost策略减少为Nudge-M策略。
{"title":"γ-CounterBoost: Optimizing response time tails using job type information only","authors":"Nils Charlet,&nbsp;Benny Van Houdt","doi":"10.1016/j.peva.2025.102514","DOIUrl":"10.1016/j.peva.2025.102514","url":null,"abstract":"<div><div>In a recent paper the <span><math><mi>γ</mi></math></span>-Boost scheduling policy was shown to minimize the tail of the response time distribution in a light-tailed M/G/1-queue. This policy schedules jobs using a boosted arrival time, defined as the arrival time of a job minus its boost, where the boost of a job depends on its exact job size. The <span><math><mi>γ</mi></math></span>-Boost policy can also be used when only partial job size information is available, such as the type of an incoming job. In such case the boost <span><math><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span> of a job depends solely on its type <span><math><mi>i</mi></math></span> and <span><math><mi>γ</mi></math></span>-Boost was shown to optimize the tail among all boost policies, where a boost policy is fully determined by the <span><math><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span> values. In the partial information setting <span><math><mi>γ</mi></math></span>-Boost relies on two types of information: job types and arrival times.</div><div>This paper focuses on the problem of minimizing the tail in a light-tailed M/G/1-queue in the partial job size information setting when the scheduler only makes use of the job types and <em>does not exploit arrival times</em>. Prior work showed that in case of 2 job types the so-called Nudge-<span><math><mi>M</mi></math></span> policy minimizes the tail in a large class of scheduling policies. In this paper we introduce the <span><math><mi>γ</mi></math></span>-CounterBoost policy in the partial information setting with <span><math><mrow><mi>d</mi><mo>≥</mo><mn>2</mn></mrow></math></span> job types and prove that it minimizes the tail in an even broader class of scheduling policies called Contextual CounterBoost policies. The <span><math><mi>γ</mi></math></span>-CounterBoost policy reduces to the Nudge-<span><math><mi>M</mi></math></span> policy in case of <span><math><mrow><mi>d</mi><mo>=</mo><mn>2</mn></mrow></math></span> job types.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102514"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Near-optimal PCM wear leveling under adversarial attacks 在对抗性攻击下接近最佳的PCM磨损水平
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102522
Tomer Lange, Joseph (Seffi) Naor, Gala Yadgar
Phase-change memory (PCM) is a promising memory technology known for its speed, high density, and durability. However, each PCM cell can endure only a limited number of erase and subsequent write operations before failing, and the failure of a single cell can limit the lifespan of the entire device. This vulnerability makes PCM particularly susceptible to adversarial attacks that induce excessive writes to accelerate device failure. To counter this, wear-leveling techniques aim to distribute write operations evenly across PCM cells.
In this paper, we study the online PCM utilization problem, which seeks to maximize the number of write requests served before any cell reaches the erase limit. While extensively studied in the systems and architecture communities, this problem remains largely unexplored from a theoretical perspective. We bridge this gap by presenting a novel algorithm that leverages hardware feedback to optimize PCM utilization. We prove that our algorithm achieves near-optimal worst-case guarantees and outperforms state-of-the-art practical solutions both theoretically and empirically, providing an efficient approach to prolonging PCM lifespan.
相变存储器(PCM)是一种很有前途的存储技术,以其速度快、高密度和耐用性而闻名。然而,每个PCM单元在失效之前只能承受有限数量的擦除和随后的写入操作,并且单个单元的失效可能会限制整个设备的使用寿命。这个漏洞使PCM特别容易受到对抗性攻击的影响,这种攻击会导致过度写入,从而加速设备故障。为了解决这个问题,损耗均衡技术的目标是在PCM单元之间均匀地分配写操作。在本文中,我们研究在线PCM利用率问题,该问题寻求在任何单元达到擦除限制之前服务的写请求数量最大化。虽然在系统和架构社区中进行了广泛的研究,但从理论的角度来看,这个问题在很大程度上仍未得到探索。我们通过提出一种利用硬件反馈优化PCM利用率的新算法来弥合这一差距。我们证明了我们的算法实现了近乎最优的最坏情况保证,并且在理论和经验上都优于最先进的实际解决方案,为延长PCM的使用寿命提供了有效的方法。
{"title":"Near-optimal PCM wear leveling under adversarial attacks","authors":"Tomer Lange,&nbsp;Joseph (Seffi) Naor,&nbsp;Gala Yadgar","doi":"10.1016/j.peva.2025.102522","DOIUrl":"10.1016/j.peva.2025.102522","url":null,"abstract":"<div><div>Phase-change memory (PCM) is a promising memory technology known for its speed, high density, and durability. However, each PCM cell can endure only a limited number of erase and subsequent write operations before failing, and the failure of a single cell can limit the lifespan of the entire device. This vulnerability makes PCM particularly susceptible to adversarial attacks that induce excessive writes to accelerate device failure. To counter this, wear-leveling techniques aim to distribute write operations evenly across PCM cells.</div><div>In this paper, we study the <em>online PCM utilization problem</em>, which seeks to maximize the number of write requests served before any cell reaches the erase limit. While extensively studied in the systems and architecture communities, this problem remains largely unexplored from a theoretical perspective. We bridge this gap by presenting a novel algorithm that leverages hardware feedback to optimize PCM utilization. We prove that our algorithm achieves near-optimal worst-case guarantees and outperforms state-of-the-art practical solutions both theoretically and empirically, providing an efficient approach to prolonging PCM lifespan.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102522"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMEmu: A lightweight performance emulator for high-fidelity distributed LLM training LLMEmu:用于高保真分布式LLM训练的轻量级性能模拟器
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102526
Siyuan Yang , Enda Yu , Pingjing Lu , Dezun Dong
The prohibitive cost of training trillion-parameter large language models (LLMs) necessitates low-cost emulation tools for distributed system optimization. In modern large-scale clusters, communication often becomes the primary bottleneck to scalability. However, existing emulators, such as vTrain and ASTRA-Sim, overlook dynamic network factors that significantly impact performance at scale, resulting in limited emulation accuracy. This work offers an efficient and reliable tool for training system optimization and parallel strategy exploration, considerably lowering the barrier to large-scale AI research. We present LLMEmu, a distributed training emulator that combines real kernel profiling and actual communication execution. First, computation is profiled through real CUDA kernel traces on GPU nodes to construct an operator-level latency lookup table, enabling GPU-like execution on CPU clusters. Second, inter-node communication is executed using communication library primitives (e.g., AllReduce, Send/Recv), triggered by communication anchors embedded in the execution graph, and implemented using a pluggable communication backend. LLMEmu can seamlessly model hybrid parallelism strategies and supports multiple collective algorithms. Its lightweight design incorporates gradient bucketing with latency reuse to minimize overhead while maintaining extensibility to various network interconnects. The effectiveness of LLMEmu is validated through its performance results, demonstrating an average prediction error of only 2.17% on 24-GPU clusters, which outperforms vTrain by 21.09%, and confirming its scalability in modeling training cost distributions across 128-node CPU emulations under varying network conditions.
训练具有万亿参数的大型语言模型(llm)的高昂成本需要用于分布式系统优化的低成本仿真工具。在现代大规模集群中,通信往往成为可伸缩性的主要瓶颈。然而,现有的仿真器,如vTrain和ASTRA-Sim,忽略了在规模上显著影响性能的动态网络因素,导致仿真精度有限。这项工作为训练系统优化和并行策略探索提供了高效可靠的工具,大大降低了大规模人工智能研究的门槛。我们提出了LLMEmu,一个分布式训练仿真器,结合了真实的内核分析和实际的通信执行。首先,通过GPU节点上的真实CUDA内核跟踪对计算进行分析,以构建一个操作级延迟查找表,从而在CPU集群上实现类似GPU的执行。其次,节点间通信使用通信库原语(例如,AllReduce, Send/Recv)执行,由嵌入在执行图中的通信锚触发,并使用可插拔的通信后端实现。LLMEmu可以无缝地建模混合并行策略,并支持多种集体算法。它的轻量级设计结合了梯度桶和延迟重用,以尽量减少开销,同时保持对各种网络互连的可扩展性。通过性能结果验证了LLMEmu的有效性,在24个gpu集群上的平均预测误差仅为2.17%,比vTrain高出21.09%,并证实了其在不同网络条件下跨128节点CPU模拟的训练成本分布建模方面的可扩展性。
{"title":"LLMEmu: A lightweight performance emulator for high-fidelity distributed LLM training","authors":"Siyuan Yang ,&nbsp;Enda Yu ,&nbsp;Pingjing Lu ,&nbsp;Dezun Dong","doi":"10.1016/j.peva.2025.102526","DOIUrl":"10.1016/j.peva.2025.102526","url":null,"abstract":"<div><div>The prohibitive cost of training trillion-parameter large language models (LLMs) necessitates low-cost emulation tools for distributed system optimization. In modern large-scale clusters, communication often becomes the primary bottleneck to scalability. However, existing emulators, such as vTrain and ASTRA-Sim, overlook dynamic network factors that significantly impact performance at scale, resulting in limited emulation accuracy. This work offers an efficient and reliable tool for training system optimization and parallel strategy exploration, considerably lowering the barrier to large-scale AI research. We present LLMEmu, a distributed training emulator that combines real kernel profiling and actual communication execution. First, computation is profiled through real CUDA kernel traces on GPU nodes to construct an operator-level latency lookup table, enabling GPU-like execution on CPU clusters. Second, inter-node communication is executed using communication library primitives (e.g., AllReduce, Send/Recv), triggered by communication anchors embedded in the execution graph, and implemented using a pluggable communication backend. LLMEmu can seamlessly model hybrid parallelism strategies and supports multiple collective algorithms. Its lightweight design incorporates gradient bucketing with latency reuse to minimize overhead while maintaining extensibility to various network interconnects. The effectiveness of LLMEmu is validated through its performance results, demonstrating an average prediction error of only 2.17% on 24-GPU clusters, which outperforms vTrain by 21.09%, and confirming its scalability in modeling training cost distributions across 128-node CPU emulations under varying network conditions.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102526"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TiFSN: A wavelet-EC-TCN model for quadrotor UAV trajectory prediction based on time–frequency–spatial feature fusion 基于时频空特征融合的四旋翼无人机轨迹预测小波- ec - tcn模型
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102515
Huan Zhao , Yong Kou , Yuxin Xue , Shuang Wang , Zhaojun Gu
Flight trajectory prediction (FTP) with high precision is the core technology for the autonomous flight of quadrotor unmanned aerial vehicles (UAVs) in environments with limited navigation signals. In response to the problem that most existing methods focus on the features of a single domain and ignore the cross-domain feature correlation, making it challenging to maintain high accuracy in FTP, a prediction model based on time–frequency–spatial feature fusion named TiFSN is proposed. Firstly, based on wavelet transform technology, the velocity signal is extended to time–frequency joint features. Furthermore, a fusion mechanism between time–frequency domain features and attitude angles is established, so that a multi-domain feature set with time–frequency–spatial perception can be constructed. Finally, an extended channels-based temporal convolutional network (EC-TCN) is designed, which achieves high-precision FTP by expanding the feature receiving field. Experiments were conducted on real flight datasets, and the results show that the model significantly improved the evaluation metrics compared to baseline methods. The generalization test of various complex FTP tasks using the onboard CPU also verified the excellent performance of the TiFSN. The ablation experiment further revealed the influence of wavelet decomposition depth and the strategy of expanded channels on the performance.
高精度飞行轨迹预测是四旋翼无人机在有限导航信号环境下自主飞行的核心技术。针对现有方法大多关注单域特征而忽略跨域特征相关性,难以保持FTP高精度的问题,提出了一种基于时频空特征融合的预测模型TiFSN。首先,基于小波变换技术,将速度信号扩展为时频联合特征;建立了时频域特征与姿态角的融合机制,构建了具有时频空间感知的多域特征集。最后,设计了一种基于扩展通道的时间卷积网络(EC-TCN),通过扩展特征接收场来实现高精度FTP。在真实飞行数据集上进行了实验,结果表明,与基线方法相比,该模型显著提高了评估指标。利用板载CPU对各种复杂FTP任务的泛化测试也验证了TiFSN的优异性能。烧蚀实验进一步揭示了小波分解深度和扩展通道策略对性能的影响。
{"title":"TiFSN: A wavelet-EC-TCN model for quadrotor UAV trajectory prediction based on time–frequency–spatial feature fusion","authors":"Huan Zhao ,&nbsp;Yong Kou ,&nbsp;Yuxin Xue ,&nbsp;Shuang Wang ,&nbsp;Zhaojun Gu","doi":"10.1016/j.peva.2025.102515","DOIUrl":"10.1016/j.peva.2025.102515","url":null,"abstract":"<div><div>Flight trajectory prediction (FTP) with high precision is the core technology for the autonomous flight of quadrotor unmanned aerial vehicles (UAVs) in environments with limited navigation signals. In response to the problem that most existing methods focus on the features of a single domain and ignore the cross-domain feature correlation, making it challenging to maintain high accuracy in FTP, a prediction model based on time–frequency–spatial feature fusion named TiFSN is proposed. Firstly, based on wavelet transform technology, the velocity signal is extended to time–frequency joint features. Furthermore, a fusion mechanism between time–frequency domain features and attitude angles is established, so that a multi-domain feature set with time–frequency–spatial perception can be constructed. Finally, an extended channels-based temporal convolutional network (EC-TCN) is designed, which achieves high-precision FTP by expanding the feature receiving field. Experiments were conducted on real flight datasets, and the results show that the model significantly improved the evaluation metrics compared to baseline methods. The generalization test of various complex FTP tasks using the onboard CPU also verified the excellent performance of the TiFSN. The ablation experiment further revealed the influence of wavelet decomposition depth and the strategy of expanded channels on the performance.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102515"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters swPredictor:一个数据驱动的性能模型,用于大规模高性能计算集群上的分布式数据并行性训练
IF 0.8 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 DOI: 10.1016/j.peva.2025.102530
Xianyu Zhu , Ruohan Wu , Junshi Chen , Hong An
Given the complexity of heterogeneous architectures and multi-node collaboration, large-scale HPC (High-Performance Computing) clusters pose challenges in resource utilization and performance optimization during distributed data parallelism (DDP) training. Performance modeling aims to identify application bottlenecks and guide algorithm design, but existing performance models rarely consider the impact of system architecture on communication performance or provide a systematic analysis of distributed training. To address these issues, this paper proposes swPredictor, a data-driven performance model devised for accurately predicting the performance of DDP training. First, an original performance dataset is developed based on various communication patterns at runtime to avoid systematic errors. Subsequently, a novel multi-branch module FNO-Inception is proposed, combining FNO (Fourier Neural Operator) layer with Inception structure to simultaneously utilize various frequency features. Finally, by introducing the FNO-Inception module, a novel regression model FI-Net is constructed to fit complex nonlinear relationships. The experimental results demonstrate that FI-Net can accurately predict the performance of DDP training on the Sunway OceanLight supercomputer with an overall MAPE of 0.93%, which outperforms the other baseline models.
考虑到异构架构和多节点协作的复杂性,大规模高性能计算集群在分布式数据并行(DDP)训练过程中对资源利用和性能优化提出了挑战。性能建模的目的是识别应用瓶颈,指导算法设计,但现有的性能模型很少考虑系统架构对通信性能的影响,也很少对分布式训练进行系统分析。为了解决这些问题,本文提出了swPredictor,这是一个数据驱动的性能模型,旨在准确预测DDP训练的性能。首先,在运行时基于各种通信模式开发原始性能数据集,以避免系统错误。随后,提出了一种新的多分支模块FNO-Inception,将FNO(傅里叶神经算子)层与Inception结构相结合,同时利用各种频率特征。最后,通过引入FNO-Inception模块,构造了一个新的拟合复杂非线性关系的回归模型FI-Net。实验结果表明,FI-Net在神威海洋之光超级计算机上能够准确预测DDP训练的性能,总体MAPE为0.93%,优于其他基准模型。
{"title":"swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters","authors":"Xianyu Zhu ,&nbsp;Ruohan Wu ,&nbsp;Junshi Chen ,&nbsp;Hong An","doi":"10.1016/j.peva.2025.102530","DOIUrl":"10.1016/j.peva.2025.102530","url":null,"abstract":"<div><div>Given the complexity of heterogeneous architectures and multi-node collaboration, large-scale HPC (High-Performance Computing) clusters pose challenges in resource utilization and performance optimization during distributed data parallelism (DDP) training. Performance modeling aims to identify application bottlenecks and guide algorithm design, but existing performance models rarely consider the impact of system architecture on communication performance or provide a systematic analysis of distributed training. To address these issues, this paper proposes swPredictor, a data-driven performance model devised for accurately predicting the performance of DDP training. First, an original performance dataset is developed based on various communication patterns at runtime to avoid systematic errors. Subsequently, a novel multi-branch module FNO-Inception is proposed, combining FNO (Fourier Neural Operator) layer with Inception structure to simultaneously utilize various frequency features. Finally, by introducing the FNO-Inception module, a novel regression model FI-Net is constructed to fit complex nonlinear relationships. The experimental results demonstrate that FI-Net can accurately predict the performance of DDP training on the Sunway OceanLight supercomputer with an overall MAPE of 0.93%, which outperforms the other baseline models.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102530"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Performance Evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1