Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems最新文献_第6页

Censored Demand Estimation in Retail 零售业的删减需求估计

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219624

M. Amjad, D. Shah

In this paper, the question of interest is estimating true demand of a product at a given store location and time period in the retail environment based on a single noisy and potentially censored observation. To address this question, we introduce a non-parametric framework to make inference from multiple time series. Somewhat surprisingly, we establish that the algorithm introduced for the purpose of "matrix completion" can be used to solve the relevant inference problem. Specifically, using the Universal Singular Value Thresholding (USVT) algorithm [2], we show that our estimator is consistent: the average mean squared error of the estimated average demand with respect to the true average demand goes to 0 as the number of store locations and time intervals increase to ınfty. We establish naturally appealing properties of the resulting estimator both analytically as well as through a sequence of instructive simulations. Using a real dataset in retail (Walmart), we argue for the practical relevance of our approach.

在本文中，感兴趣的问题是基于单个有噪声和可能被删减的观察，在零售环境中给定的商店位置和时间段估计产品的真实需求。为了解决这个问题，我们引入了一个非参数框架来从多个时间序列中进行推理。令人惊讶的是，我们确定了为“矩阵补全”而引入的算法可以用于解决相关的推理问题。具体来说，使用通用奇异值阈值(USVT)算法[2]，我们表明我们的估计器是一致的:随着存储位置和时间间隔增加到ınfty，估计的平均需求相对于真实平均需求的平均均方误差趋于0。我们通过分析和一系列有指导意义的模拟，建立了结果估计量的自然吸引人的性质。使用零售(沃尔玛)的真实数据集，我们论证了我们方法的实际相关性。

引用次数: 1

An Optimal Randomized Online Algorithm for QoS Buffer Management 一种QoS缓冲区管理的最优随机在线算法

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219629

L. Yang, W. Wong, M. Hajiesmaili

The QoS buffer management problem, with significant and diverse computer applications, e.g., in online cloud resource allocation problems, is a classic online admission control problem in the presence of resource constraints. In its basic setting, packets with different values, arrive in online fashion to a switching node with limited buffer size. Then, the switch needs to make an immediate decision to either admit or reject the incoming packet based on the value of the packet and its buffer availability. The objective is to maximize the cumulative profit of the admitted packets, while respecting the buffer constraint. Even though the QoS buffer management problem was proposed more than a decade ago, no optimal online solution has been proposed in the literature. This paper proposes an optimal randomized online algorithm for this problem.

QoS缓冲区管理问题是一个典型的存在资源约束条件下的在线准入控制问题，在在线云资源分配问题等计算机应用中有着重要而多样的应用。在其基本设置中，具有不同值的数据包以在线方式到达缓冲区大小有限的交换节点。然后，交换机需要根据数据包的值及其缓冲区可用性立即决定接受或拒绝传入数据包。目标是在尊重缓冲区约束的情况下，使允许的数据包的累积利润最大化。尽管在十多年前就提出了QoS缓冲区管理问题，但在文献中还没有提出最优的在线解决方案。本文针对这一问题提出了一种最优的随机在线算法。

引用次数: 2

Tomographic Node Placement Strategies and the Impact of the Routing Model 层析节点放置策略及路由模型的影响

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219648

Y. Pignolet, S. Schmid, Gilles Trédan

Tomographic techniques can be used for the fast detection of link failures at low cost. Our paper studies the impact of the routing model on tomographic node placement costs. We present a taxonomy of path routing models and provide optimal and near-optimal algorithms to deploy a minimal number of asymmetric and symmetric tomography nodes for basic network topologies under different routing model classes. Intriguingly, we find that in many cases routing according to a more restrictive routing model gives better results: compared to a more general routing model, computing a good placement is algorithmically more tractable and does not entail high monitoring costs, a desirable trade-off in practice.

层析成像技术可用于低成本的链路故障快速检测。本文研究了路由模型对层析节点放置成本的影响。我们提出了路径路由模型的分类，并提供了在不同路由模型类别下为基本网络拓扑部署最小数量的非对称和对称断层扫描节点的最优和近最优算法。有趣的是，我们发现在许多情况下，根据更严格的路由模型进行路由会得到更好的结果:与更通用的路由模型相比，计算一个好的位置在算法上更容易处理，并且不需要高的监控成本，这是实践中理想的权衡。

引用次数: 2

Predictive Impact Analysis for Designing a Resilient Cellular Backhaul Network 弹性蜂窝回程网络设计的预测影响分析

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219651

Sen Yang, He Yan, Zihui Ge, Dongmei Wang, Jun Xu

Backhaul transport network design and optimization for cellular service providers involve a unique challenge stemming from the fact that an end-user's equipment (UE) is within the radio reach of multiple cellular towers: It is hard to evaluate the impact of the failure of the UE's primary serving tower on the UE, because the UE may simply switch to get service from other nearby cellular towers. To overcome this challenge, one needs to quantify the cellular service redundancy among the cellular towers riding on that transport circuit and their nearby cellular towers, which in turn requires a comprehensive understanding of the radio signal profile in the area of the impacted towers, the spatial distribution of UEs therein, and their expected workload (e.g., calls, data throughput). In this work, we develop a novel methodology for assessing the service impact of any hypothetical cellular tower outage scenario, and implement it in an operational system named Tower Outage Impact Predictor (TOIP). Our evaluations, using both synthetic data and historical real tower outages in a large operational cellular network, show conclusively that TOIP gives an accurate assessment of various tower outage scenarios, and can provide critical input data towards designing a reliable cellular backhaul transport network.

蜂窝服务提供商的回程传输网络设计和优化涉及一个独特的挑战，这一挑战源于终端用户的设备(UE)在多个蜂窝塔的无线电覆盖范围内:很难评估UE主要服务塔故障对UE的影响，因为UE可能简单地切换到附近其他蜂窝塔获取服务。为了克服这一挑战，需要量化该传输电路上的蜂窝塔及其附近的蜂窝塔之间的蜂窝服务冗余，这反过来又需要全面了解受影响塔区域的无线电信号概况，其中ue的空间分布以及它们的预期工作负载(例如，呼叫，数据吞吐量)。在这项工作中，我们开发了一种新的方法来评估任何假设的蜂窝塔中断场景的服务影响，并在一个名为塔中断影响预测器(TOIP)的操作系统中实现它。我们的评估使用了综合数据和大型运营蜂窝网络中历史真实的塔中断，最终表明TOIP给出了各种塔中断情况的准确评估，并可以为设计可靠的蜂窝回程传输网络提供关键的输入数据。

{"title":"Predictive Impact Analysis for Designing a Resilient Cellular Backhaul Network","authors":"Sen Yang, He Yan, Zihui Ge, Dongmei Wang, Jun Xu","doi":"10.1145/3219617.3219651","DOIUrl":"https://doi.org/10.1145/3219617.3219651","url":null,"abstract":"Backhaul transport network design and optimization for cellular service providers involve a unique challenge stemming from the fact that an end-user's equipment (UE) is within the radio reach of multiple cellular towers: It is hard to evaluate the impact of the failure of the UE's primary serving tower on the UE, because the UE may simply switch to get service from other nearby cellular towers. To overcome this challenge, one needs to quantify the cellular service redundancy among the cellular towers riding on that transport circuit and their nearby cellular towers, which in turn requires a comprehensive understanding of the radio signal profile in the area of the impacted towers, the spatial distribution of UEs therein, and their expected workload (e.g., calls, data throughput). In this work, we develop a novel methodology for assessing the service impact of any hypothetical cellular tower outage scenario, and implement it in an operational system named Tower Outage Impact Predictor (TOIP). Our evaluations, using both synthetic data and historical real tower outages in a large operational cellular network, show conclusively that TOIP gives an accurate assessment of various tower outage scenarios, and can provide critical input data towards designing a reliable cellular backhaul transport network.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132826405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent 对抗环境中的分布式统计机器学习:拜占庭梯度下降

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219655

Yudong Chen, Lili Su, Jiaming Xu

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks. In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q łe m for an arbitrarily small but fixed constant ε>0. The parameter estimate converges in O(łog N) rounds with an estimation error on the order of max √dq/N, ~√d/N , which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q . The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log 3 N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

我们考虑分布式统计学习问题在分散的系统，容易受到对抗性攻击。这种设置出现在许多实际应用程序中，包括b谷歌的联邦学习。形式上，我们关注的是一个分散的系统，它由一个参数服务器和m台工作机器组成;每台工作机器保留N/m个数据样本，其中N为样本总数。在每次迭代中，m台工作机器中多达q台会出现拜占庭故障——给定迭代中的故障机器对系统的行为非常糟糕，并且完全了解系统。另外，故障机器的集合在迭代中可能是不同的。我们的目标是设计健壮的算法，使系统能够学习潜在的真实参数，这是d维的，尽管拜占庭攻击的中断。本文基于梯度均值的几何中位数，提出了经典梯度下降法的一种简单变体。我们证明，对于一个任意小但固定的常数ε>0，我们的方法可以容忍高达2(1+ε)q łe m的q拜占庭故障。参数估计在0 (łog N)轮内收敛，估计误差在max√dq/N， ~√d/N数量级，比集中无故障设置下的最小最优错误率√d/N大不超过一个√q。我们的算法在每台工作机器上的总计算复杂度为O((Nd/m) log N)，在中央服务器上的总计算复杂度为O(md + kd log 3n)，总通信成本为O(md log N)。我们进一步将我们的一般结果应用于线性回归问题。在上述问题中出现的一个关键挑战是，拜占庭式失败在迭代和聚合梯度之间创建了任意且未指定的依赖关系。为了在分析中解决这一问题，我们证明了聚合梯度作为模型参数的函数一致收敛于真梯度函数。

{"title":"Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent","authors":"Yudong Chen, Lili Su, Jiaming Xu","doi":"10.1145/3219617.3219655","DOIUrl":"https://doi.org/10.1145/3219617.3219655","url":null,"abstract":"We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks. In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q łe m for an arbitrarily small but fixed constant ε>0. The parameter estimate converges in O(łog N) rounds with an estimation error on the order of max √dq/N, ~√d/N , which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q . The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log 3 N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124069615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 164

Online Learning of Optimally Diverse Rankings 最优多样化排名的在线学习

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219637

Stefan Magureanu, A. Proutière, Marcus Isaksson, Boxun Zhang

Search engines answer users' queries by listing relevant items (e.g. documents, songs, products, web pages, ...). These engines rely on algorithms that learn to rank items so as to present an ordered list maximizing the probability that it contains relevant item. The main challenge in the design of learning-to-rank algorithms stems from the fact that queries often have different meanings for different users. In absence of any contextual information about the query, one often has to adhere to the diversity principle, i.e., to return a list covering the various possible topics or meanings of the query. To formalize this learning-to-rank problem, we propose a natural model where (i) items are categorized into topics, (ii) users find items relevant only if they match the topic of their query, and (iii) the engine is not aware of the topic of an arriving query, nor of the frequency at which queries related to various topics arrive, nor of the topic-dependent click-through-rates of the items. For this problem, we devise LDR (Learning Diverse Rankings), an algorithm that efficiently learns the optimal list based on users' feedback only. We show that after T queries, the regret of LDR scales as O((N-L)log(T)) where N is the number of all items. This scaling cannot be improved, i.e., LDR is order optimal.

搜索引擎通过列出相关项目(如文档、歌曲、产品、网页等)来回答用户的查询。这些引擎依赖于学习对项目进行排序的算法，以便呈现一个有序的列表，最大限度地提高它包含相关项目的概率。排序学习算法设计中的主要挑战源于这样一个事实，即查询对于不同的用户通常具有不同的含义。在没有任何关于查询的上下文信息的情况下，通常必须遵循多样性原则，即返回一个涵盖查询的各种可能主题或含义的列表。为了形式化这个学习排序问题，我们提出了一个自然模型，其中(i)项目被分类为主题，(ii)用户只有在他们的查询主题匹配时才能找到相关的项目，以及(iii)引擎不知道到达查询的主题，也不知道与各种主题相关的查询到达的频率，也不知道与主题相关的项目的点击率。针对这个问题，我们设计了LDR (Learning Diverse Rankings)算法，该算法仅根据用户的反馈有效地学习最优列表。我们表明，经过T次查询后，LDR的后悔规模为O((N- l)log(T))，其中N为所有项目的数量。这种扩展无法改进，即LDR是顺序最优的。

{"title":"Online Learning of Optimally Diverse Rankings","authors":"Stefan Magureanu, A. Proutière, Marcus Isaksson, Boxun Zhang","doi":"10.1145/3219617.3219637","DOIUrl":"https://doi.org/10.1145/3219617.3219637","url":null,"abstract":"Search engines answer users' queries by listing relevant items (e.g. documents, songs, products, web pages, ...). These engines rely on algorithms that learn to rank items so as to present an ordered list maximizing the probability that it contains relevant item. The main challenge in the design of learning-to-rank algorithms stems from the fact that queries often have different meanings for different users. In absence of any contextual information about the query, one often has to adhere to the diversity principle, i.e., to return a list covering the various possible topics or meanings of the query. To formalize this learning-to-rank problem, we propose a natural model where (i) items are categorized into topics, (ii) users find items relevant only if they match the topic of their query, and (iii) the engine is not aware of the topic of an arriving query, nor of the frequency at which queries related to various topics arrive, nor of the topic-dependent click-through-rates of the items. For this problem, we devise LDR (Learning Diverse Rankings), an algorithm that efficiently learns the optimal list based on users' feedback only. We show that after T queries, the regret of LDR scales as O((N-L)log(T)) where N is the number of all items. This scaling cannot be improved, i.e., LDR is order optimal.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124552235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays 通信延迟下有限和优化的分布梯度方法的收敛速度

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219654

T. Doan, Carolyn L. Beck, R. Srikant

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the large scale of the data sets, the data and computation must be distributed over multiple processors resulting in the need for distributed algorithms. In this paper, we consider a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. An important problem in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We prove the convergence of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the processors. Moreover, we obtain an upper bound on the rate of convergence of the algorithm as a function of the network size, topology, and the inter-processor communication delays.

在机器学习和统计学应用的激励下，我们研究了处理器网络上的分布式优化问题，其目标是优化由局部函数和组成的全局目标。在这些问题中，由于数据集的规模很大，数据和计算必须分布在多个处理器上，这就需要分布式算法。本文考虑了一种流行的基于分布式梯度的一致性算法，该算法只需要局部计算和通信。该领域的一个重要问题是分析分布式系统中存在不可避免的通信延迟时这些算法的收敛速度。我们证明了在处理器之间存在统一但可能任意大的通信延迟的情况下，基于梯度的共识算法的收敛性。此外，我们还得到了该算法收敛速度的上界，该上界与网络大小、拓扑结构和处理器间通信延迟有关。

引用次数: 13

The CSI Framework for Compiler-Inserted Program Instrumentation 用于编译器插入程序检测的CSI框架

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219657

T. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I. Lee, C. Leiserson

The CSI framework provides comprehensive static instrumentation that a compiler can insert into a program-under-test so that dynamic-analysis tools - memory checkers, race detectors, cache simulators, performance profilers, code-coverage analyzers, etc. - can observe and investigate runtime behavior. Heretofore, tools based on compiler instrumentation would each separately modify the compiler to insert their own instrumentation. In contrast, CSI inserts a standard collection of instrumentation hooks into the program-under-test. Each CSI-tool is implemented as a library that defines relevant hooks, and the remaining hooks are "nulled" out and elided during either compile-time or link-time optimization, resulting in instrumented runtimes on par with custom instrumentation. CSI allows many compiler-based tools to be written as simple libraries without modifying the compiler, lowering the bar for the development of dynamic-analysis tools. We have defined a standard API for CSI and modified LLVM to insert CSI hooks into the compiler's internal representation (IR) of the program. The API organizes IR objects - such as functions, basic blocks, and memory accesses - into flat and compact ID spaces, which not only simplifies the building of tools, but surprisingly enables faster maintenance of IR-object data than do traditional hash tables. CSI hooks contain a "property" parameter that allows tools to customize behavior based on static information without introducing overhead. CSI provides "forensic" tables that tools can use to associate IR objects with source-code locations and to relate IR objects to each other. To evaluate the efficacy of CSI, we implemented six demonstration CSI-tools. One of our studies shows that compiling with CSI and linking with the "null" CSI-tool produces a tool-instrumented executable that is as fast as the original uninstrumented code. Another study, using a CSI port of Google's ThreadSanitizer, shows that the CSI-tool rivals the performance of Google's custom compiler-based implementation. All other demonstration CSI tools slow down the execution of the program-under-test by less than 70%.

CSI框架提供了全面的静态检测工具，编译器可以将其插入到被测程序中，以便动态分析工具——内存检查器、竞争检测器、缓存模拟器、性能分析器、代码覆盖率分析器等——可以观察和调查运行时行为。到目前为止，基于编译器插装的工具将分别修改编译器以插入它们自己的插装。相反，CSI在被测程序中插入一个标准的检测挂钩集合。每个csi工具都是作为定义相关钩子的库实现的，其余的钩子在编译时或链接时优化期间被“空化”和省略，从而导致仪表化的运行时与自定义仪表化相当。CSI允许将许多基于编译器的工具编写为简单的库，而无需修改编译器，从而降低了开发动态分析工具的门槛。我们为CSI定义了一个标准API，并修改了LLVM，以便将CSI钩子插入到程序的编译器内部表示(IR)中。API将IR对象(如函数、基本块和内存访问)组织到扁平和紧凑的ID空间中，这不仅简化了工具的构建，而且比传统的哈希表更快地维护IR对象数据。CSI钩子包含一个“属性”参数，该参数允许工具基于静态信息定制行为，而不会引入开销。CSI提供了“取证”表，工具可以使用这些表将IR对象与源代码位置关联起来，并将IR对象彼此关联起来。为了评估CSI的疗效，我们实施了六个示范CSI工具。我们的一项研究表明，使用CSI进行编译并链接到“null”CSI工具会产生一个工具检测的可执行文件，其速度与原始的未检测代码一样快。另一项使用Google ThreadSanitizer的CSI端口的研究表明，CSI工具的性能可以与Google基于编译器的定制实现相媲美。所有其他演示CSI工具将被测程序的执行速度降低不到70%。

{"title":"The CSI Framework for Compiler-Inserted Program Instrumentation","authors":"T. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I. Lee, C. Leiserson","doi":"10.1145/3219617.3219657","DOIUrl":"https://doi.org/10.1145/3219617.3219657","url":null,"abstract":"The CSI framework provides comprehensive static instrumentation that a compiler can insert into a program-under-test so that dynamic-analysis tools - memory checkers, race detectors, cache simulators, performance profilers, code-coverage analyzers, etc. - can observe and investigate runtime behavior. Heretofore, tools based on compiler instrumentation would each separately modify the compiler to insert their own instrumentation. In contrast, CSI inserts a standard collection of instrumentation hooks into the program-under-test. Each CSI-tool is implemented as a library that defines relevant hooks, and the remaining hooks are \"nulled\" out and elided during either compile-time or link-time optimization, resulting in instrumented runtimes on par with custom instrumentation. CSI allows many compiler-based tools to be written as simple libraries without modifying the compiler, lowering the bar for the development of dynamic-analysis tools. We have defined a standard API for CSI and modified LLVM to insert CSI hooks into the compiler's internal representation (IR) of the program. The API organizes IR objects - such as functions, basic blocks, and memory accesses - into flat and compact ID spaces, which not only simplifies the building of tools, but surprisingly enables faster maintenance of IR-object data than do traditional hash tables. CSI hooks contain a \"property\" parameter that allows tools to customize behavior based on static information without introducing overhead. CSI provides \"forensic\" tables that tools can use to associate IR objects with source-code locations and to relate IR objects to each other. To evaluate the efficacy of CSI, we implemented six demonstration CSI-tools. One of our studies shows that compiling with CSI and linking with the \"null\" CSI-tool produces a tool-instrumented executable that is as fast as the original uninstrumented code. Another study, using a CSI port of Google's ThreadSanitizer, shows that the CSI-tool rivals the performance of Google's custom compiler-based implementation. All other demonstration CSI tools slow down the execution of the program-under-test by less than 70%.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129285754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

The PDE Method for the Analysis of Randomized Load Balancing Networks 随机负载均衡网络分析的PDE方法

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-12-19 DOI: 10.1145/3219617.3219672

Reza Aghajani, Xingjie Li, K. Ramanan

We introduce a new framework for the analysis of large-scale load balancing networks with general service time distributions, motivated by applications in server farms, distributed memory machines, cloud computing and communication systems. For a parallel server network using the so-called $SQ(d)$ load balancing routing policy, we use a novel representation for the state of the system and identify its fluid limit, when the number of servers goes to infinity and the arrival rate per server tends to a constant. The fluid limit is characterized as the unique solution to a countable system of coupled partial differential equations (PDE), which serve to approximate transient Quality of Service parameters such as the expected virtual waiting time and queue length distribution. In the special case when the service time distribution is exponential, our method recovers the well-known ordinary differential equation characterization of the fluid limit. Furthermore, we develop a numerical scheme to solve the PDE, and demonstrate the efficacy of the PDE approximation by comparing it with Monte Carlo simulations. We also illustrate how the PDE can be used to gain insight into the performance of large networks in practical scenarios by analyzing relaxation times in a backlogged network. In particular, our numerical approximation of the PDE uncovers two interesting properties of relaxation times under the SQ(2) algorithm. Firstly, when the service time distribution is Pareto with unit mean, the relaxation time decreases as the tail becomes heavier. This is a priori counterintuitive given that for the Pareto distribution, heavier tails have been shown to lead to worse tail behavior in equilibrium. Secondly, for unit mean light-tailed service distributions such as the Weibull and lognormal, the relaxation time decreases as the variance increases. This is in contrast to the behavior observed under random routing, where the relaxation time increases with increase in variance.

我们引入了一个新的框架，用于分析具有一般服务时间分布的大规模负载平衡网络，这是由服务器群、分布式内存机、云计算和通信系统中的应用程序驱动的。对于使用所谓的$SQ(d)$负载平衡路由策略的并行服务器网络，当服务器数量趋于无穷大且每台服务器的到达率趋于恒定时，我们使用系统状态的新表示并确定其流体极限。流体极限被描述为可数耦合偏微分方程(PDE)系统的唯一解，该系统用于近似瞬时服务质量参数，如期望虚拟等待时间和队列长度分布。在使用时间呈指数分布的特殊情况下，我们的方法恢复了流体极限的常微分方程特征。此外，我们开发了一种求解PDE的数值格式，并通过将其与蒙特卡罗模拟进行比较，证明了PDE近似的有效性。我们还通过分析积压网络中的松弛时间来说明如何使用PDE来深入了解大型网络在实际场景中的性能。特别是，我们对PDE的数值近似揭示了SQ(2)算法下弛豫时间的两个有趣性质。首先，当服务时间分布为单位均值帕累托分布时，松弛时间随尾部变重而减小;这是先验的反直觉，因为对于帕累托分布，较重的尾巴已被证明会导致平衡时较差的尾巴行为。其次，对于单位均值轻尾服务分布(如威布尔分布和对数正态分布)，松弛时间随着方差的增大而减小。这与随机路径下观察到的行为相反，在随机路径下，松弛时间随着方差的增加而增加。

{"title":"The PDE Method for the Analysis of Randomized Load Balancing Networks","authors":"Reza Aghajani, Xingjie Li, K. Ramanan","doi":"10.1145/3219617.3219672","DOIUrl":"https://doi.org/10.1145/3219617.3219672","url":null,"abstract":"We introduce a new framework for the analysis of large-scale load balancing networks with general service time distributions, motivated by applications in server farms, distributed memory machines, cloud computing and communication systems. For a parallel server network using the so-called $SQ(d)$ load balancing routing policy, we use a novel representation for the state of the system and identify its fluid limit, when the number of servers goes to infinity and the arrival rate per server tends to a constant. The fluid limit is characterized as the unique solution to a countable system of coupled partial differential equations (PDE), which serve to approximate transient Quality of Service parameters such as the expected virtual waiting time and queue length distribution. In the special case when the service time distribution is exponential, our method recovers the well-known ordinary differential equation characterization of the fluid limit. Furthermore, we develop a numerical scheme to solve the PDE, and demonstrate the efficacy of the PDE approximation by comparing it with Monte Carlo simulations. We also illustrate how the PDE can be used to gain insight into the performance of large networks in practical scenarios by analyzing relaxation times in a backlogged network. In particular, our numerical approximation of the PDE uncovers two interesting properties of relaxation times under the SQ(2) algorithm. Firstly, when the service time distribution is Pareto with unit mean, the relaxation time decreases as the tail becomes heavier. This is a priori counterintuitive given that for the Pareto distribution, heavier tails have been shown to lead to worse tail behavior in equilibrium. Secondly, for unit mean light-tailed service distributions such as the Weibull and lognormal, the relaxation time decreases as the variance increases. This is in contrast to the behavior observed under random routing, where the relaxation time increases with increase in variance.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131406889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Designing Low-Complexity Heavy-Traffic Delay-Optimal Load Balancing Schemes: Theory to Algorithms 设计低复杂度大流量延迟最优负载均衡方案:理论到算法

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Pub Date : 2017-10-12 DOI: 10.1145/3219617.3219670

Xingyu Zhou, Fei Wu, Jian Tan, Yin Sun, N. Shroff

We establish a unified analytical framework for designing load balancing algorithms that can simultaneously achieve low latency, low complexity, and low communication overhead. We first propose a general class ¶ of load balancing policies and prove that they are both throughput optimal and heavy-traffic delay optimal. This class ¶ includes popular policies such as join-shortest-queue (JSQ) and power-of- d as special cases, but not the recently proposed join-idle-queue (JIQ) policy. In fact, we show that JIQ is not heavy-traffic delay optimal even for homogeneous servers. By exploiting the flexibility offered by the class ¶, we design a new load balancing policy called join-below-threshold (JBT-d), in which the arrival jobs are preferentially assigned to queues that are no greater than a threshold, and the threshold is updated infrequently. JBT-d has several benefits: (i) JBT-d belongs to the class ¶i and hence is throughput optimal and heavy-traffic delay optimal. (ii) JBT-d has zero dispatching delay, like JIQ and other pull-based policies, and low message overhead due to infrequent threshold updates. (iii) Extensive simulations show that JBT-d has good delay performance, comparable to the JSQ policy in various system settings.

我们建立了一个统一的分析框架来设计负载均衡算法，可以同时实现低延迟、低复杂度和低通信开销。我们首先提出了一类通用的负载均衡策略，并证明它们既具有吞吐量最优性，又具有大流量延迟最优性。这个类¶包括流行的策略，如join-short -queue (JSQ)和power-of- d作为特例，但不包括最近提出的join-idle-queue (JIQ)策略。事实上，我们证明了JIQ即使对于同类服务器也不是大流量延迟的最佳选择。通过利用类¶提供的灵活性，我们设计了一种新的负载平衡策略，称为join-below-threshold (JBT-d)，其中到达作业优先分配给不大于阈值的队列，并且不频繁更新阈值。JBT-d有几个好处:(i) JBT-d属于类¶i，因此是吞吐量最优和大流量延迟最优的。(ii) JBT-d具有零调度延迟，像JIQ和其他基于拉的策略一样，并且由于不频繁的阈值更新而降低了消息开销。(iii)大量仿真表明，JBT-d具有良好的延迟性能，在各种系统设置下可与JSQ策略相媲美。

{"title":"Designing Low-Complexity Heavy-Traffic Delay-Optimal Load Balancing Schemes: Theory to Algorithms","authors":"Xingyu Zhou, Fei Wu, Jian Tan, Yin Sun, N. Shroff","doi":"10.1145/3219617.3219670","DOIUrl":"https://doi.org/10.1145/3219617.3219670","url":null,"abstract":"We establish a unified analytical framework for designing load balancing algorithms that can simultaneously achieve low latency, low complexity, and low communication overhead. We first propose a general class ¶ of load balancing policies and prove that they are both throughput optimal and heavy-traffic delay optimal. This class ¶ includes popular policies such as join-shortest-queue (JSQ) and power-of- d as special cases, but not the recently proposed join-idle-queue (JIQ) policy. In fact, we show that JIQ is not heavy-traffic delay optimal even for homogeneous servers. By exploiting the flexibility offered by the class ¶, we design a new load balancing policy called join-below-threshold (JBT-d), in which the arrival jobs are preferentially assigned to queues that are no greater than a threshold, and the threshold is updated infrequently. JBT-d has several benefits: (i) JBT-d belongs to the class ¶i and hence is throughput optimal and heavy-traffic delay optimal. (ii) JBT-d has zero dispatching delay, like JIQ and other pull-based policies, and low message overhead due to infrequent threshold updates. (iii) Extensive simulations show that JBT-d has good delay performance, comparable to the JSQ policy in various system settings.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133633637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22