Pub Date : 2025-12-30DOI: 10.1016/j.peva.2025.102538
Chuan Xu , Caelin Kaplan , Angelo Rodio , Tareq Si Salem , Giovanni Neglia
In today’s increasingly diverse computing landscape, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Collaborative Inference Systems (CISs) address this performance trade-off by enabling smaller devices to offload part of their inference tasks to more capable devices. These systems often deploy hierarchical models that share numerous parameters, exemplified by deep neural networks that utilize strategies like early exits or ordered dropout. In such instances, Federated Learning (FL) may be employed to jointly train the models within a CIS. Yet, traditional training methods have overlooked the operational dynamics of CISs during inference, particularly the potential high heterogeneity in serving rates across the devices within a given CIS. To address this gap, we propose a novel FL approach that explicitly accounts for variations in serving rates within CISs. Our framework not only offers rigorous theoretical guarantees but also surpasses state-of-the-art training algorithms for CISs, especially in scenarios where end devices handle higher inference request rates and where data availability is uneven across devices.
{"title":"Federated Learning for Collaborative Inference Systems: The case of early exit networks","authors":"Chuan Xu , Caelin Kaplan , Angelo Rodio , Tareq Si Salem , Giovanni Neglia","doi":"10.1016/j.peva.2025.102538","DOIUrl":"10.1016/j.peva.2025.102538","url":null,"abstract":"<div><div>In today’s increasingly diverse computing landscape, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Collaborative Inference Systems (CISs) address this performance trade-off by enabling smaller devices to offload part of their inference tasks to more capable devices. These systems often deploy hierarchical models that share numerous parameters, exemplified by deep neural networks that utilize strategies like early exits or ordered dropout. In such instances, Federated Learning (FL) may be employed to jointly train the models within a CIS. Yet, traditional training methods have overlooked the operational dynamics of CISs during inference, particularly the potential high heterogeneity in serving rates across the devices within a given CIS. To address this gap, we propose a novel FL approach that explicitly accounts for variations in serving rates within CISs. Our framework not only offers rigorous theoretical guarantees but also surpasses state-of-the-art training algorithms for CISs, especially in scenarios where end devices handle higher inference request rates and where data availability is uneven across devices.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102538"},"PeriodicalIF":0.8,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1016/j.peva.2025.102540
Veena Goswami
We consider the offloading of tasks in edge–cloud computing systems using a renewal input modified batch service queue. Tasks are processed using a modified batch service policy with a minimum batch size of and a maximum batch size of in an edge–cloud computing system. Bulk services combine several tasks from many Internet of Things devices and offload them to the edge or cloud for concurrent execution. The updated batch service rule allows tasks to be offloaded for variable batch sizes, smaller batches when network circumstances are favorable, and bigger batches when the network is congested to reduce transmission overhead. In addition, if the server has commenced the processing and there are fewer than tasks, we let the tasks join. Furthermore, the batches’ processing rates are presumed to depend on the batch size. We derive the analytic results for the marginal and joint probability distribution of the number of tasks in the queue/system and with the server. We show the influence of light-tailed and heavy-tailed inter-arrival time distributions on the system model with numerical examples. Dynamic service rates adjust processing speeds at edge or cloud servers based on workload, network latency, and available resources. It reduces latency, balances computational load, and improves system adaptability to changing conditions.
{"title":"Leveraging task offloading in edge-cloud Computing systems using GI/M(L→K)/1 queueing model with dynamic service rates","authors":"Veena Goswami","doi":"10.1016/j.peva.2025.102540","DOIUrl":"10.1016/j.peva.2025.102540","url":null,"abstract":"<div><div>We consider the offloading of tasks in edge–cloud computing systems using a renewal input modified batch service queue. Tasks are processed using a modified batch service policy with a minimum batch size of <span><math><mi>L</mi></math></span> and a maximum batch size of <span><math><mi>K</mi></math></span> in an edge–cloud computing system. Bulk services combine several tasks from many Internet of Things devices and offload them to the edge or cloud for concurrent execution. The updated batch service rule allows tasks to be offloaded for variable batch sizes, smaller batches when network circumstances are favorable, and bigger batches when the network is congested to reduce transmission overhead. In addition, if the server has commenced the processing and there are fewer than <span><math><mi>K</mi></math></span> tasks, we let the tasks join. Furthermore, the batches’ processing rates are presumed to depend on the batch size. We derive the analytic results for the marginal and joint probability distribution of the number of tasks in the queue/system and with the server. We show the influence of light-tailed and heavy-tailed inter-arrival time distributions on the system model with numerical examples. Dynamic service rates adjust processing speeds at edge or cloud servers based on workload, network latency, and available resources. It reduces latency, balances computational load, and improves system adaptability to changing conditions.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102540"},"PeriodicalIF":0.8,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-27DOI: 10.1016/j.peva.2025.102539
Qianlin Liang , Haoliang Wang , Prashant Shenoy
As deep learning has been widely used in various application domains, a diversity of GPUs are adopted to accelerate DNN inference workloads and ensure Quality of Service (QoS). Robust prediction of inference latency using GPUs within cloud environments facilitates enhanced efficiency and maintains QoS in resource management solutions, such as consolidation and autoscaling. However, latency prediction is challenging due to the vast heterogeneity in both DNN architectures and GPU capacities.
In this work, we present Lilou, an efficient and accurate latency predicting system for a wide range of DNN inference tasks across diverse GPU resource allocations. Lilou employs two techniques. (i) Lilou represents DNNs as directed acyclic graphs (DAGs), and utilizes a novel graph neural network (GNN) model for edge classification to detect the fusion of operators, also known as kernels. (ii) Lilou identifies the GPU features that significantly impact inference latency and learns a predictor to estimate the latency and type of kernels, which are detected in the preceding step. To evaluate Lilou, we conduct comprehensive experiments across a variety of commercial GPUs commonly utilized in public cloud environments, employing a wide range of popular DNN architectures, including both convolutional neural networks and transformers. Our experiment results show that Lilou is robust to a wide range of DNN architectures and GPU resource allocations. Our novel learning-based method surpasses the state-of-the-art rule-based approach in fusion prediction with an accuracy of 98.26%, laying a solid foundation for end-to-end latency prediction that achieves a MAPE of 8.68%, also outperforming existing benchmarks.
{"title":"Lilou: Resource-aware model-driven latency prediction for GPU-accelerated model serving","authors":"Qianlin Liang , Haoliang Wang , Prashant Shenoy","doi":"10.1016/j.peva.2025.102539","DOIUrl":"10.1016/j.peva.2025.102539","url":null,"abstract":"<div><div>As deep learning has been widely used in various application domains, a diversity of GPUs are adopted to accelerate DNN inference workloads and ensure Quality of Service (QoS). Robust prediction of inference latency using GPUs within cloud environments facilitates enhanced efficiency and maintains QoS in resource management solutions, such as consolidation and autoscaling. However, latency prediction is challenging due to the vast heterogeneity in both DNN architectures and GPU capacities.</div><div>In this work, we present <span>Lilou</span>, an efficient and accurate latency predicting system for a wide range of DNN inference tasks across diverse GPU resource allocations. <span>Lilou</span> employs two techniques. (i) <span>Lilou</span> represents DNNs as directed acyclic graphs (DAGs), and utilizes a novel graph neural network (GNN) model for edge classification to detect the fusion of operators, also known as kernels. (ii) <span>Lilou</span> identifies the GPU features that significantly impact inference latency and learns a predictor to estimate the latency and type of kernels, which are detected in the preceding step. To evaluate <span>Lilou</span>, we conduct comprehensive experiments across a variety of commercial GPUs commonly utilized in public cloud environments, employing a wide range of popular DNN architectures, including both convolutional neural networks and transformers. Our experiment results show that <span>Lilou</span> is robust to a wide range of DNN architectures and GPU resource allocations. Our novel learning-based method surpasses the state-of-the-art rule-based approach in fusion prediction with an accuracy of 98.26%, laying a solid foundation for end-to-end latency prediction that achieves a MAPE of 8.68%, also outperforming existing benchmarks.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102539"},"PeriodicalIF":0.8,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.peva.2025.102516
Yong Kou , Jinlong He , Xia Yuan , Dening Luo , Yanci Zhang
3D Gaussian Splatting (3DGS) has recently demonstrated outstanding performance in 3D reconstruction and real-time rendering. However, its scalability to large scenes remains limited by single-GPU memory constraints. We propose ScaleGS, a scalable distributed training framework for large-scale 3DGS with lightweight edge-aware communication. (1) We present a spatial median-guided binary partitioning algorithm that divides the point cloud into balanced, non-overlapping, and spatially contiguous cuboid regions for efficient multi-GPU management. To ensure global view consistency, each GPU independently grows and updates only its local Gaussians, while cross-GPU Gaussians are accessed only for rendering and loss computation. (2) We design a lightweight edge communication strategy to significantly reduce cross-GPU communication overhead. A greedy GPU-Tile remapping algorithm leverages the spatial concentration of Gaussians to confine cross-GPU communication to edge regions, effectively decoupling communication complexity from GPU count, with per-GPU complexity remaining . An optimized all-to-all communication scheme is also introduced to eliminate redundant transmissions. (3) Our framework introduces an adaptive edge-refined load balancing mechanism that periodically monitors GPU workloads and selectively migrates Gaussians between neighboring GPUs to maintain balance and spatial continuity with negligible cost. Evaluations on large-scale 4K scenes show that ScaleGS consistently outperforms state-of-the-art methods, achieving up to 20% faster training and approximately 20% model size reduction on 8 T P40 GPUs without compromising reconstruction quality. Project page: https://aicodeclub.github.io/ScaleGS.
三维高斯溅射(3DGS)技术近年来在三维重建和实时渲染方面表现出优异的性能。然而,它对大型场景的可扩展性仍然受到单gpu内存限制的限制。我们提出了ScaleGS,一个可扩展的分布式训练框架,用于大规模3DGS,具有轻量级边缘感知通信。(1)提出了一种空间中值引导二值分割算法,该算法将点云划分为平衡、不重叠和空间连续的长方体区域,以实现高效的多gpu管理。为了保证全局视图的一致性,每个GPU只独立增长和更新其本地高斯分布,而跨GPU高斯分布仅用于渲染和损耗计算。(2)设计轻量级边缘通信策略,显著降低跨gpu通信开销。贪婪GPU- tile重映射算法利用高斯的空间集中将跨GPU通信限制在边缘区域,有效地将通信复杂度与GPU计数解耦,每个GPU的复杂度保持为0(1)。为了消除冗余传输,采用了一种优化的全对全通信方案。(3)我们的框架引入了一种自适应边缘精细负载平衡机制,该机制定期监控GPU工作负载,并在相邻GPU之间选择性地迁移高斯值,以保持平衡和空间连续性,成本可以忽略不计。对大规模4K场景的评估表明,ScaleGS始终优于最先进的方法,在不影响重建质量的情况下,在8 T P40 gpu上实现了高达20%的训练速度和大约20%的模型尺寸缩小。项目页面:https://aicodeclub.github.io/ScaleGS。
{"title":"ScaleGS: Scalable distributed framework for large-scale 3D Gaussian splatting with edge communication","authors":"Yong Kou , Jinlong He , Xia Yuan , Dening Luo , Yanci Zhang","doi":"10.1016/j.peva.2025.102516","DOIUrl":"10.1016/j.peva.2025.102516","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has recently demonstrated outstanding performance in 3D reconstruction and real-time rendering. However, its scalability to large scenes remains limited by single-GPU memory constraints. We propose ScaleGS, a scalable distributed training framework for large-scale 3DGS with lightweight edge-aware communication. (1) We present a spatial median-guided binary partitioning algorithm that divides the point cloud into balanced, non-overlapping, and spatially contiguous cuboid regions for efficient multi-GPU management. To ensure global view consistency, each GPU independently grows and updates only its local Gaussians, while cross-GPU Gaussians are accessed only for rendering and loss computation. (2) We design a lightweight edge communication strategy to significantly reduce cross-GPU communication overhead. A greedy GPU-Tile remapping algorithm leverages the spatial concentration of Gaussians to confine cross-GPU communication to edge regions, effectively decoupling communication complexity from GPU count, with per-GPU complexity remaining <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. An optimized all-to-all communication scheme is also introduced to eliminate redundant transmissions. (3) Our framework introduces an adaptive edge-refined load balancing mechanism that periodically monitors GPU workloads and selectively migrates Gaussians between neighboring GPUs to maintain balance and spatial continuity with negligible cost. Evaluations on large-scale 4K scenes show that ScaleGS consistently outperforms state-of-the-art methods, achieving up to 20% faster training and approximately 20% model size reduction on 8 T P40 GPUs without compromising reconstruction quality. Project page: <span><span>https://aicodeclub.github.io/ScaleGS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102516"},"PeriodicalIF":0.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145622460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1016/j.peva.2025.102525
Zhongrui Chen , Adityo Anggraito , Diletta Olliaro , Andrea Marin , Marco Ajmone Marsan , Benjamin Berg , Isaac Grosof
Modern data center workloads are composed of multiserver jobs, computational jobs that require multiple servers in order to run. A data center can run many multiserver jobs in parallel, as long as it has sufficient resources to meet their individual demands. Multiserver jobs are generally stateful, meaning that job preemptions incur significant overhead from saving and reloading the state associated with running jobs. Hence, most systems try to avoid these costly job preemptions altogether. Given these constraints, a scheduling policy must determine what set of jobs to run in parallel at each moment in time to minimize the mean response time across a stream of arriving jobs. Unfortunately, simple non-preemptive policies such as First-Come First-Served (FCFS) may leave many servers idle, resulting in high mean response times or even system instability. Our goal is to design and analyze non-preemptive scheduling policies for multiserver jobs that maintain high system utilization to achieve low mean response time.
One well-known non-preemptive scheduling policy, Most Servers First (MSF), prioritizes jobs with higher server needs and is known for achieving high resource utilization. However, MSF causes extreme variability in job waiting times, and can perform significantly worse than FCFS in practice. To address this issue, we propose and analyze a class of scheduling policies called Most Servers First with Quickswap (MSFQ) that performs well in a wide variety of cases. MSFQ reduces the variability of job waiting times by periodically granting priority to other jobs in the system. We provide both stability results and an analysis of mean response time under MSFQ to prove that our policy dramatically outperforms MSF in the case where jobs either request one server or all the servers. In more complex cases, we evaluate MSFQ in simulation. We show that, with some additional optimization, variants of the MSFQ policy can greatly outperform MSF and FCFS on real-world multiserver job workloads.
{"title":"Improving nonpreemptive multiserver job scheduling with quickswap","authors":"Zhongrui Chen , Adityo Anggraito , Diletta Olliaro , Andrea Marin , Marco Ajmone Marsan , Benjamin Berg , Isaac Grosof","doi":"10.1016/j.peva.2025.102525","DOIUrl":"10.1016/j.peva.2025.102525","url":null,"abstract":"<div><div>Modern data center workloads are composed of <em>multiserver jobs</em>, computational jobs that require multiple servers in order to run. A data center can run many multiserver jobs in parallel, as long as it has sufficient resources to meet their individual demands. Multiserver jobs are generally <em>stateful</em>, meaning that job preemptions incur significant overhead from saving and reloading the state associated with running jobs. Hence, most systems try to avoid these costly job preemptions altogether. Given these constraints, a <em>scheduling policy</em> must determine what set of jobs to run in parallel at each moment in time to minimize the mean response time across a stream of arriving jobs. Unfortunately, simple non-preemptive policies such as First-Come First-Served (FCFS) may leave many servers idle, resulting in high mean response times or even system instability. Our goal is to design and analyze non-preemptive scheduling policies for multiserver jobs that maintain high system utilization to achieve low mean response time.</div><div>One well-known non-preemptive scheduling policy, Most Servers First (MSF), prioritizes jobs with higher server needs and is known for achieving high resource utilization. However, MSF causes extreme variability in job waiting times, and can perform significantly worse than FCFS in practice. To address this issue, we propose and analyze a class of scheduling policies called <em>Most Servers First with Quickswap</em> (MSFQ) that performs well in a wide variety of cases. MSFQ reduces the variability of job waiting times by periodically granting priority to other jobs in the system. We provide both stability results and an analysis of mean response time under MSFQ to prove that our policy dramatically outperforms MSF in the case where jobs either request one server or all the servers. In more complex cases, we evaluate MSFQ in simulation. We show that, with some additional optimization, variants of the MSFQ policy can greatly outperform MSF and FCFS on real-world multiserver job workloads.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102525"},"PeriodicalIF":0.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145691456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.peva.2025.102529
Simon Scherrer , Adrian Perrig , Stefan Schmid
To understand the fairness properties of the BBR congestion-control algorithm (CCA), previous research has analyzed BBR behavior with a variety of models. However, previous model-based work suffers from a trade-off between accuracy and interpretability: While dynamic fluid models generate highly accurate predictions through simulation, the causes of their predictions cannot be easily understood. In contrast, steady-state models predict CCA behavior in a manner that is intuitively understandable, but often less accurate. This trade-off is especially consequential when analyzing the competition between BBR and traditional loss-based CCAs, as this competition often suffers from instability, i.e., sending-rate oscillation. Steady-state models cannot predict this instability at all, and fluid-model simulation cannot yield analytical results regarding preconditions and severity of the oscillation.
To overcome this trade-off, we extend the recent dynamic fluid model of BBR by means of control theory. Based on this control-theoretic analysis, we derive quantitative conditions for BBR/CUBIC oscillation, identify network settings that are susceptible to instability, and find that these conditions are frequently satisfied by practical networks. Our analysis illuminates the fairness implications of BBR/CUBIC oscillation, namely by deriving and experimentally validating fairness bounds that reflect the extreme rate distributions during oscillation. In summary, our analysis shows that BBR/CUBIC oscillation is frequent and harms BBR fairness, but can be remedied by means of our control-theoretic framework.
{"title":"A control-theoretic perspective on BBR/CUBIC congestion-control competition","authors":"Simon Scherrer , Adrian Perrig , Stefan Schmid","doi":"10.1016/j.peva.2025.102529","DOIUrl":"10.1016/j.peva.2025.102529","url":null,"abstract":"<div><div>To understand the fairness properties of the BBR congestion-control algorithm (CCA), previous research has analyzed BBR behavior with a variety of models. However, previous model-based work suffers from a trade-off between accuracy and interpretability: While dynamic fluid models generate highly accurate predictions through simulation, the causes of their predictions cannot be easily understood. In contrast, steady-state models predict CCA behavior in a manner that is intuitively understandable, but often less accurate. This trade-off is especially consequential when analyzing the competition between BBR and traditional loss-based CCAs, as this competition often suffers from instability, i.e., sending-rate oscillation. Steady-state models cannot predict this instability at all, and fluid-model simulation cannot yield analytical results regarding preconditions and severity of the oscillation.</div><div>To overcome this trade-off, we extend the recent dynamic fluid model of BBR by means of control theory. Based on this control-theoretic analysis, we derive quantitative conditions for BBR/CUBIC oscillation, identify network settings that are susceptible to instability, and find that these conditions are frequently satisfied by practical networks. Our analysis illuminates the fairness implications of BBR/CUBIC oscillation, namely by deriving and experimentally validating fairness bounds that reflect the extreme rate distributions during oscillation. In summary, our analysis shows that BBR/CUBIC oscillation is frequent and harms BBR fairness, but can be remedied by means of our control-theoretic framework.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102529"},"PeriodicalIF":0.8,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1016/j.peva.2025.102521
Pranay Agarwal , D. Manjunath
The ubiquity of smartphones has fueled content consumption worldwide, leading to an ever-increasing demand for a better Internet experience. This has necessitated an upgrade of the capacity of the access network. The Internet service providers (ISPs) have been demanding that the content providers (CPs) share the cost of upgrading access network infrastructure. A public investment in the infrastructure of a neutral ISP will boost the profit of the CPs, and hence, seems a rational strategy. A CP can also make a private investment in its infrastructure and boost its profits. In this paper, we study the trade-off between public and private investments by a CP when the decision is made under different types of interaction between them. Specifically, we consider four interaction models between CPs—centralized allocation, cooperative game, non-cooperative game, and a bargaining game—and determine the public and private investment for each model. Via numerical results, we evaluate the impact of different incentive structures on the utility of the CPs. We see that the bargaining game can result in higher public investment than the non-cooperative and centralized models. However, this benefit gets reduced if the CPs are incentivized to invest in private infrastructure.
{"title":"Content and access networks synergies: Tradeoffs in public and private investments by content providers","authors":"Pranay Agarwal , D. Manjunath","doi":"10.1016/j.peva.2025.102521","DOIUrl":"10.1016/j.peva.2025.102521","url":null,"abstract":"<div><div>The ubiquity of smartphones has fueled content consumption worldwide, leading to an ever-increasing demand for a better Internet experience. This has necessitated an upgrade of the capacity of the access network. The Internet service providers (ISPs) have been demanding that the content providers (CPs) share the cost of upgrading access network infrastructure. A <em>public investment</em> in the infrastructure of a neutral ISP will boost the profit of the CPs, and hence, seems a rational strategy. A CP can also make a <em>private investment</em> in its infrastructure and boost its profits. In this paper, we study the trade-off between public and private investments by a CP when the decision is made under different types of interaction between them. Specifically, we consider four interaction models between CPs—centralized allocation, cooperative game, non-cooperative game, and a bargaining game—and determine the public and private investment for each model. Via numerical results, we evaluate the impact of different incentive structures on the utility of the CPs. We see that the bargaining game can result in higher public investment than the non-cooperative and centralized models. However, this benefit gets reduced if the CPs are incentivized to invest in private infrastructure.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"171 ","pages":"Article 102521"},"PeriodicalIF":0.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a recommendation system where sellers compete for visibility by strategically offering commissions to a platform that optimally curates a ranked menu of items and their respective prices for each customer. Customers interact sequentially with the menu following a cascade click model, and their purchase decisions are influenced by price sensitivity and positions of various items in the menu. We model the seller-platform interaction as a Stackelberg game with sellers as leaders and consider two different games depending on whether the prices are set by the platform or prefixed by the sellers.
It is complicated to find the optimal policy of the platform in complete generality; hence, we solve the problem in an important asymptotic regime. In fact, both the games coincide in this regime, obtained by decreasing the customer exploration rates to zero (in this regime, the customers explore fewer items). Through simulations, we illustrate that the limit game well approximates the original game(s) even for exploration probabilities as high as 0.4 (the differences are around 2.54%). Further, the second game (where the sellers prefix the prices) coincides with the approximate game for all values of .
The core contribution of this paper lies in characterizing the equilibrium structure of the limit game. We show that when sellers are of different strengths, the standard Nash equilibrium does not exist due to discontinuities in utilities. We instead establish the existence of a novel equilibrium solution, namely ‘-connected equilibrium cycle’ (-EC), which captures oscillatory strategic responses at the equilibrium. Unlike the (pure) Nash equilibrium, which defines a fixed point of mutual best responses, this is a set-valued solution concept of connected components. This novel equilibrium concept identifies a Cartesian product set of connected action profiles in the continuous action space that satisfies four important properties: stability against external deviations, no external chains, instability against internal deviations, and minimality. We extend a recently introduced solution concept equilibrium cycle to include stability against measure-zero violations and avoid some topological difficulties to propose -EC.
{"title":"Strategic pricing and ranking in recommendation systems with seller competition","authors":"Tushar Shankar Walunj , Veeraruna Kavitha , Jayakrishnan Nair , Priyank Agarwal","doi":"10.1016/j.peva.2025.102518","DOIUrl":"10.1016/j.peva.2025.102518","url":null,"abstract":"<div><div>We study a recommendation system where sellers compete for visibility by strategically offering commissions to a platform that optimally curates a ranked menu of items and their respective prices for each customer. Customers interact sequentially with the menu following a cascade click model, and their purchase decisions are influenced by price sensitivity and positions of various items in the menu. We model the seller-platform interaction as a Stackelberg game with sellers as leaders and consider two different games depending on whether the prices are set by the platform or prefixed by the sellers.</div><div>It is complicated to find the optimal policy of the platform in complete generality; hence, we solve the problem in an important asymptotic regime. In fact, both the games coincide in this regime, obtained by decreasing the customer exploration rates <span><math><mi>γ</mi></math></span> to zero (in this regime, the customers explore fewer items). Through simulations, we illustrate that the limit game well approximates the original game(s) even for exploration probabilities as high as 0.4 (the differences are around 2.54%). Further, the second game (where the sellers prefix the prices) coincides with the approximate game for all values of <span><math><mi>γ</mi></math></span>.</div><div>The core contribution of this paper lies in characterizing the equilibrium structure of the limit game. We show that when sellers are of different strengths, the standard Nash equilibrium does not exist due to discontinuities in utilities. We instead establish the existence of a novel equilibrium solution, namely ‘<span><math><mi>μ</mi></math></span>-connected equilibrium cycle’ (<span><math><mi>μ</mi></math></span>-EC), which captures oscillatory strategic responses at the equilibrium. Unlike the (pure) Nash equilibrium, which defines a fixed point of mutual best responses, this is a set-valued solution concept of connected components. This novel equilibrium concept identifies a Cartesian product set of connected action profiles in the continuous action space that satisfies four important properties: stability against external deviations, no external chains, instability against internal deviations, and minimality. We extend a recently introduced solution concept <em>equilibrium cycle</em> to include stability against measure-zero violations and avoid some topological difficulties to propose <span><math><mi>μ</mi></math></span>-EC.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102518"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145516997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.peva.2025.102519
Anush Anand, Pranav Agrawal, Tejas Bodas
Dynamic pricing is the practice of adjusting the selling price of a product to maximize a firm’s revenue by responding to market demand. The literature typically distinguishes between two settings: infinite inventory, where the firm has unlimited stock and time to sell, and finite inventory, where both inventory and selling horizon are limited. In both cases, the central challenge lies in the fact that the demand function — how sales respond to price — is unknown and must be learned from data. Traditional approaches often assume a specific parametric form for the demand function, enabling the use of reinforcement learning (RL) to identify near-optimal pricing strategies. However, such assumptions may not hold in real-world scenarios, limiting the applicability of these methods.
In this work, we propose a Gaussian Process (GP) based nonparametric approach to dynamic pricing that avoids restrictive modeling assumptions. We treat the demand function as a black-box function of the price and develop pricing algorithms based on Bayesian Optimization (BO)—a sample-efficient method for optimizing unknown functions. We present BO-based algorithms tailored for both infinite and finite inventory settings and provide regret guarantees for both regimes, thereby quantifying the learning efficiency of our methods. Through extensive experiments, we demonstrate that our BO-based methods outperform several state-of-the-art RL algorithms in terms of revenue, while requiring fewer assumptions and offering greater robustness. This highlights Bayesian Optimization as a powerful and practical tool for dynamic pricing in complex, uncertain environments.
{"title":"Bayesian optimization for dynamic pricing and learning","authors":"Anush Anand, Pranav Agrawal, Tejas Bodas","doi":"10.1016/j.peva.2025.102519","DOIUrl":"10.1016/j.peva.2025.102519","url":null,"abstract":"<div><div>Dynamic pricing is the practice of adjusting the selling price of a product to maximize a firm’s revenue by responding to market demand. The literature typically distinguishes between two settings: infinite inventory, where the firm has unlimited stock and time to sell, and finite inventory, where both inventory and selling horizon are limited. In both cases, the central challenge lies in the fact that the demand function — how sales respond to price — is unknown and must be learned from data. Traditional approaches often assume a specific parametric form for the demand function, enabling the use of reinforcement learning (RL) to identify near-optimal pricing strategies. However, such assumptions may not hold in real-world scenarios, limiting the applicability of these methods.</div><div>In this work, we propose a Gaussian Process (GP) based nonparametric approach to dynamic pricing that avoids restrictive modeling assumptions. We treat the demand function as a black-box function of the price and develop pricing algorithms based on Bayesian Optimization (BO)—a sample-efficient method for optimizing unknown functions. We present BO-based algorithms tailored for both infinite and finite inventory settings and provide regret guarantees for both regimes, thereby quantifying the learning efficiency of our methods. Through extensive experiments, we demonstrate that our BO-based methods outperform several state-of-the-art RL algorithms in terms of revenue, while requiring fewer assumptions and offering greater robustness. This highlights Bayesian Optimization as a powerful and practical tool for dynamic pricing in complex, uncertain environments.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102519"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.peva.2025.102523
Wesley Geelen , Maria Vlasiou , Yaron Yeger
The Asymmetric Inclusion Process (ASIP) models unidirectional transport with particle clustering, yet remains analytically intractable for systems beyond small sizes. To address this, we develop two approximation methods: the replica mean-field (RMF) limit, providing a first-order approximation, and the power series algorithm (PSA), a numerical scheme based on traffic intensity expansions. We evaluate these approximations against Monte Carlo simulations for general systems and prior exact results for homogeneous ASIP systems. Both methods yield accurate estimates, with PSA closely matching simulations for both homogeneous and heterogeneous systems, while RMF performing well for early sites but being slightly impacted downstream or as load increases. These approximations offer practical and computationally efficient alternatives to simulation, enabling detailed performance analysis of ASIP tandem queues where exact solutions are unavailable.
{"title":"Comparing approximations in the ASIP tandem queue","authors":"Wesley Geelen , Maria Vlasiou , Yaron Yeger","doi":"10.1016/j.peva.2025.102523","DOIUrl":"10.1016/j.peva.2025.102523","url":null,"abstract":"<div><div>The Asymmetric Inclusion Process (ASIP) models unidirectional transport with particle clustering, yet remains analytically intractable for systems beyond small sizes. To address this, we develop two approximation methods: the replica mean-field (RMF) limit, providing a first-order approximation, and the power series algorithm (PSA), a numerical scheme based on traffic intensity expansions. We evaluate these approximations against Monte Carlo simulations for general systems and prior exact results for homogeneous ASIP systems. Both methods yield accurate estimates, with PSA closely matching simulations for both homogeneous and heterogeneous systems, while RMF performing well for early sites but being slightly impacted downstream or as load increases. These approximations offer practical and computationally efficient alternatives to simulation, enabling detailed performance analysis of ASIP tandem queues where exact solutions are unavailable.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"170 ","pages":"Article 102523"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}