Measurement and Modeling of Computer Systems最新文献_第5页

Scheduling using interactive oracles: connection between iterative optimization and low-complexity scheduling 使用交互式oracle的调度:迭代优化和低复杂度调度之间的联系

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592026

Jinwoo Shin, T. Suk

Since Tassiulas and Ephremides proposed the maximum weight scheduling algorithm of throughput-optimality for constrained queueing networks in 1992, extensive research efforts have been made for resolving its high complexity issue under various directions. In this paper, we resolve this issue by developing a generic framework for designing throughput-optimal and low-complexity scheduling algorithms. Under the framework, an algorithm updates current schedules via an interaction with a given oracle system that generates a solution of a certain discrete optimization problem in a finite number of interactive queries. The complexity of the resulting algorithm is decided by the number of operations required for an oracle processing a single query, which is typically very small. Somewhat surprisingly, we prove that an algorithm using any such oracle is throughput-optimal for general constrained queueing network models that arise in the context of emerging large-scale communication networks. To our best knowledge, our result is the first that establishes a rigorous connection between iterative optimization methods and low-complexity scheduling algorithms, which we believe provides various future directions and new insights in both areas.

自Tassiulas和Ephremides于1992年提出约束排队网络吞吐量最优的最大权重调度算法以来，人们从各个方向对解决其高复杂性问题进行了广泛的研究。在本文中，我们通过开发一个设计吞吐量最优和低复杂度调度算法的通用框架来解决这个问题。在该框架下，算法通过与给定oracle系统的交互来更新当前调度，该系统在有限数量的交互查询中生成某个离散优化问题的解决方案。结果算法的复杂性取决于oracle处理单个查询所需的操作数量，通常非常少。有些令人惊讶的是，我们证明了使用任何这样的oracle的算法对于出现在新兴的大规模通信网络背景下的一般约束排队网络模型是吞吐量最优的。据我们所知，我们的结果是第一个在迭代优化方法和低复杂度调度算法之间建立严格联系的结果，我们相信这为这两个领域提供了各种未来方向和新的见解。

{"title":"Scheduling using interactive oracles: connection between iterative optimization and low-complexity scheduling","authors":"Jinwoo Shin, T. Suk","doi":"10.1145/2591971.2592026","DOIUrl":"https://doi.org/10.1145/2591971.2592026","url":null,"abstract":"Since Tassiulas and Ephremides proposed the maximum weight scheduling algorithm of throughput-optimality for constrained queueing networks in 1992, extensive research efforts have been made for resolving its high complexity issue under various directions. In this paper, we resolve this issue by developing a generic framework for designing throughput-optimal and low-complexity scheduling algorithms. Under the framework, an algorithm updates current schedules via an interaction with a given oracle system that generates a solution of a certain discrete optimization problem in a finite number of interactive queries. The complexity of the resulting algorithm is decided by the number of operations required for an oracle processing a single query, which is typically very small. Somewhat surprisingly, we prove that an algorithm using any such oracle is throughput-optimal for general constrained queueing network models that arise in the context of emerging large-scale communication networks. To our best knowledge, our result is the first that establishes a rigorous connection between iterative optimization methods and low-complexity scheduling algorithms, which we believe provides various future directions and new insights in both areas.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126177257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Non-work-conserving effects in MapReduce: diffusion limit and criticality MapReduce中的非功守恒效应:扩散极限和临界

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592007

Jian Tan, Yandong Wang, Weikuan Yu, Li Zhang

Sequentially arriving jobs share a MapReduce cluster, each desiring a fair allocation of computing resources to serve its associated map and reduce tasks. The model of such a system consists of a processor sharing queue for the MapTasks and a multi-server queue for the ReduceTasks. These two queues are dependent through a constraint that the input data of each ReduceTask are fetched from the intermediate data generated by the MapTasks belonging to the same job. A more generalized form of MapReduce queueing model can capture the essence of other distributed data processing systems that contain interdependent processor sharing queues and multi-server queues. Through theoretical modeling and extensive experiments, we show that, this dependence, if not carefully dealt with, can cause non-work-conserving effects that negatively impact system performance and scalability. First, we characterize the heavy-traffic approximation. Depending on how tasks are scheduled, the number of jobs in the system can even exhibit jumps in diffusion limits, resulting in prolonged job execution times. This problem can be mitigated through carefully applying a tie-breaking rule for ReduceTasks, which as a theoretical finding has direct engineering implications. Second, we empirically validate a criticality phenomenon using experiments. MapReduce systems experience an undesirable performance degradation when they have reached certain critical points, another finding that offers fundamental guidance on managing MapReduce systems.

顺序到达的作业共享一个MapReduce集群，每个作业都希望公平分配计算资源，以服务于其相关的map和reduce任务。这种系统的模型由一个用于MapTasks的处理器共享队列和一个用于ReduceTasks的多服务器队列组成。这两个队列依赖于一个约束，即每个ReduceTask的输入数据是从属于同一作业的MapTasks生成的中间数据中获取的。MapReduce队列模型的一种更广义的形式可以捕捉到其他分布式数据处理系统的本质，这些系统包含相互依赖的处理器共享队列和多服务器队列。通过理论建模和广泛的实验，我们表明，这种依赖关系，如果不仔细处理，可能会导致对系统性能和可扩展性产生负面影响的非工作节省效应。首先，我们描述了大流量近似。根据任务的调度方式，系统中的作业数量甚至可能出现扩散限制的跳跃，从而导致作业执行时间延长。这个问题可以通过仔细应用ReduceTasks的tie-breaking规则来缓解，作为一个理论发现，它具有直接的工程意义。其次，我们通过实验对临界现象进行实证验证。当MapReduce系统达到某个临界点时，会出现不希望出现的性能下降，这是另一个为管理MapReduce系统提供基本指导的发现。

{"title":"Non-work-conserving effects in MapReduce: diffusion limit and criticality","authors":"Jian Tan, Yandong Wang, Weikuan Yu, Li Zhang","doi":"10.1145/2591971.2592007","DOIUrl":"https://doi.org/10.1145/2591971.2592007","url":null,"abstract":"Sequentially arriving jobs share a MapReduce cluster, each desiring a fair allocation of computing resources to serve its associated map and reduce tasks. The model of such a system consists of a processor sharing queue for the MapTasks and a multi-server queue for the ReduceTasks. These two queues are dependent through a constraint that the input data of each ReduceTask are fetched from the intermediate data generated by the MapTasks belonging to the same job. A more generalized form of MapReduce queueing model can capture the essence of other distributed data processing systems that contain interdependent processor sharing queues and multi-server queues.\u0000 Through theoretical modeling and extensive experiments, we show that, this dependence, if not carefully dealt with, can cause non-work-conserving effects that negatively impact system performance and scalability. First, we characterize the heavy-traffic approximation. Depending on how tasks are scheduled, the number of jobs in the system can even exhibit jumps in diffusion limits, resulting in prolonged job execution times. This problem can be mitigated through carefully applying a tie-breaking rule for ReduceTasks, which as a theoretical finding has direct engineering implications. Second, we empirically validate a criticality phenomenon using experiments. MapReduce systems experience an undesirable performance degradation when they have reached certain critical points, another finding that offers fundamental guidance on managing MapReduce systems.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116029599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Pricing data center demand response 数据中心需求响应定价

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592004

Zhenhua Liu, Iris Liu, S. Low, A. Wierman

Demand response is crucial for the incorporation of renewable energy into the grid. In this paper, we focus on a particularly promising industry for demand response: data centers. We use simulations to show that, not only are data centers large loads, but they can provide as much (or possibly more) flexibility as large-scale storage if given the proper incentives. However, due to the market power most data centers maintain, it is difficult to design programs that are efficient for data center demand response. To that end, we propose that prediction-based pricing is an appealing market design, and show that it outperforms more traditional supply function bidding mechanisms in situations where market power is an issue. However, prediction-based pricing may be inefficient when predictions are inaccurate, and so we provide analytic, worst-case bounds on the impact of prediction error on the efficiency of prediction-based pricing. These bounds hold even when network constraints are considered, and highlight that prediction-based pricing is surprisingly robust to prediction error.

需求响应对于将可再生能源并入电网至关重要。在本文中，我们将重点关注一个特别有前景的需求响应行业:数据中心。我们使用模拟来表明，数据中心不仅负载大，而且如果给予适当的激励，它们可以提供与大型存储一样多(甚至可能更多)的灵活性。然而，由于大多数数据中心保持市场力量，很难设计有效的数据中心需求响应程序。为此，我们提出基于预测的定价是一种有吸引力的市场设计，并表明在市场力量是一个问题的情况下，它优于更传统的供应函数投标机制。然而，当预测不准确时，基于预测的定价可能效率低下，因此我们提供了预测误差对基于预测的定价效率影响的分析，最坏情况边界。即使考虑到网络约束，这些界限仍然成立，并强调基于预测的定价对预测误差的鲁棒性令人惊讶。

引用次数: 172

Conquering big data with spark and BDAS 用spark和BDAS征服大数据

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2637364.2611389

I. Stoica

Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. Unfortunately, existing data analytics tools are slow in answering queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms. These limitations leave the potential of extracting value of big data unfulfilled. To address this challenge, we are developing Berkeley Data Analytics Stack (BDAS), an open source data analytics stack that provides interactive response times for complex computations on massive data. To achieve this goal, BDAS supports efficient, large-scale in-memory data processing, and allows users and applications to trade between query accuracy, time, and cost. In this talk, I'll present the architecture, challenges, results, and our experience with developing BDAS, with a focus on Apache Spark, an in-memory cluster computing engine that provides support for a variety of workloads, including batch, streaming, and iterative computations. In a relatively short time, Spark has become the most active big data project in the open source community, and is already being used by over one hundred of companies and research institutions.

今天，大大小小的组织都收集了大量的数据，他们这样做的目的只有一个:通过复杂的探索性分析提取“价值”，并将其作为制定个性化治疗和广告定位等各种决策的基础。不幸的是，现有的数据分析工具在回答查询时速度很慢，因为它们通常需要筛选存储在磁盘上的大量数据，并且更不适合复杂的计算，例如机器学习算法。这些限制使得提取大数据价值的潜力无法实现。为了应对这一挑战，我们正在开发伯克利数据分析堆栈(BDAS)，这是一个开源数据分析堆栈，可以为大规模数据上的复杂计算提供交互式响应时间。为了实现这一目标，BDAS支持高效、大规模的内存内数据处理，并允许用户和应用程序在查询准确性、时间和成本之间进行权衡。在这次演讲中，我将介绍架构、挑战、结果以及我们开发BDAS的经验，重点是Apache Spark，这是一个内存集群计算引擎，提供了对各种工作负载的支持，包括批处理、流计算和迭代计算。在相对较短的时间内，Spark已经成为开源社区中最活跃的大数据项目，并且已经被一百多家公司和研究机构使用。

{"title":"Conquering big data with spark and BDAS","authors":"I. Stoica","doi":"10.1145/2637364.2611389","DOIUrl":"https://doi.org/10.1145/2637364.2611389","url":null,"abstract":"Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract \"value\" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. Unfortunately, existing data analytics tools are slow in answering queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms. These limitations leave the potential of extracting value of big data unfulfilled.\u0000 To address this challenge, we are developing Berkeley Data Analytics Stack (BDAS), an open source data analytics stack that provides interactive response times for complex computations on massive data. To achieve this goal, BDAS supports efficient, large-scale in-memory data processing, and allows users and applications to trade between query accuracy, time, and cost. In this talk, I'll present the architecture, challenges, results, and our experience with developing BDAS, with a focus on Apache Spark, an in-memory cluster computing engine that provides support for a variety of workloads, including batch, streaming, and iterative computations. In a relatively short time, Spark has become the most active big data project in the open source community, and is already being used by over one hundred of companies and research institutions.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116433011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Noise can help: accurate and efficient per-flow latency measurement without packet probing and time stamping 噪声可以帮助:准确和有效的每流延迟测量，而不需要数据包探测和时间戳

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2591988

Muhammad Shahzad, A. Liu

With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this paper, we propose COLATE, the first per-flow latency measurement scheme that requires no probe packets and time stamping. Given a set of observation points, COLATE records packet timing information at each point so that later for any two points, it can accurately estimate the average and standard deviation of the latencies experienced by the packets of any flow in passing the two points. The key idea is that when recording packet timing information, COLATE purposely allows noise to be introduced for minimizing storage space, and when querying the latency of a target flow, COLATE uses statistical techniques to denoise and obtain an accurate latency estimate. COLATE is designed to be efficiently implementable on network middleboxes. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. In terms of storage space, COLATE uses less than 0.1 bit per packet, which means that, on a backbone link with about half a million packets per second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace. Results show that COLATE always achieves the required reliability for any given confidence interval.

随着需要极低延迟的新兴应用的数量和重要性的增长，网络运营商正面临着越来越多的需求，需要在每个流的基础上执行延迟测量，以进行网络监控和故障排除。在本文中，我们提出COLATE，这是第一个不需要探测数据包和时间戳的逐流延迟测量方案。给定一组观测点，COLATE记录每个点的数据包时间信息，以便以后对于任意两点，它可以准确地估计任何流的数据包在通过这两点时所经历的延迟的平均值和标准差。其关键思想是，在记录数据包定时信息时，COLATE有意地允许引入噪声以最小化存储空间，而在查询目标流的延迟时，COLATE使用统计技术去噪并获得准确的延迟估计。COLATE被设计成可以在网络中间盒上有效地实现。就处理开销而言，COLATE对每个数据包只执行一次哈希和一次内存更新。在存储空间方面，COLATE每包使用不到0.1位，这意味着在每秒大约50万个包的骨干链路上，使用256GB的驱动器，COLATE可以积累超过1.5年的穿越该链路的数据包的时间戳。我们使用三个真实流量跟踪来评估COLATE，其中包括主干流量跟踪、企业网络流量跟踪和数据中心流量跟踪。结果表明，在给定的置信区间内，COLATE总能达到要求的信度。

{"title":"Noise can help: accurate and efficient per-flow latency measurement without packet probing and time stamping","authors":"Muhammad Shahzad, A. Liu","doi":"10.1145/2591971.2591988","DOIUrl":"https://doi.org/10.1145/2591971.2591988","url":null,"abstract":"With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this paper, we propose COLATE, the first per-flow latency measurement scheme that requires no probe packets and time stamping. Given a set of observation points, COLATE records packet timing information at each point so that later for any two points, it can accurately estimate the average and standard deviation of the latencies experienced by the packets of any flow in passing the two points. The key idea is that when recording packet timing information, COLATE purposely allows noise to be introduced for minimizing storage space, and when querying the latency of a target flow, COLATE uses statistical techniques to denoise and obtain an accurate latency estimate. COLATE is designed to be efficiently implementable on network middleboxes. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. In terms of storage space, COLATE uses less than 0.1 bit per packet, which means that, on a backbone link with about half a million packets per second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace. Results show that COLATE always achieves the required reliability for any given confidence interval.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117182843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Pricing link by time 按时间定价链接

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2591974

Chengdi Lai, S. Low, Ka-Cheong Leung, V. Li

The combination of loss-based TCP and drop-tail routers often results in full buffers, creating large queueing delays. The challenge with parameter tuning and the drastic consequence of improper tuning have discouraged network administrators from enabling AQM even when routers support it. To address this problem, we propose a novel design principle for AQM, called the pricing-link-by-time (PLT) principle. PLT increases the link price as the backlog stays above a threshold β, and resets the price once the backlog goes below β. We prove that such a system exhibits cyclic behavior that is robust against changes in network environment and protocol parameters. While β approximately controls the level of backlog, the backlog dynamics are invariant for β across a wide range of values. Therefore, β can be chosen to reduce delay without undermining system performance. We validate these analytical results using packet-level simulation.

基于丢失的TCP和落尾路由器的组合通常会导致缓冲区满，从而造成很大的排队延迟。参数调优的挑战和调优不当的严重后果使网络管理员不愿启用AQM，即使路由器支持AQM。为了解决这个问题，我们提出了一种新的AQM设计原则，称为按时间定价链接(PLT)原则。PLT在积压量高于阈值β时增加链接价格，并在积压量低于阈值β时重置价格。我们证明了这样的系统表现出循环行为，对网络环境和协议参数的变化具有鲁棒性。虽然β近似地控制着待办事项的水平，但对于β来说，待办事项的动态变化在很大范围内都是不变的。因此，可以选择β来减少延迟而不损害系统性能。我们用包级仿真验证了这些分析结果。

引用次数: 6

What's your choice?: learning the mixed multi-nomial 你的选择是什么?:学习混合多项式

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592020

A. Ammar, Sewoong Oh, D. Shah, L. Voloch

Computing a ranking over choices using consumer data gathered from a heterogenous population has become an indispensable module for any modern consumer information system, e.g. Yelp, Netflix, Amazon and app-stores like Google play. In such applications, a ranking or recommendation algorithm needs to extract meaningful information from noisy data accurately and in a scalable manner. A principled approach to resolve this challenge requires a model that connects observations to recommendation decisions and a tractable inference algorithm utilizing this model. To that end, we abstract the preference data generated by consumers as noisy, partial realizations of their innate preferences, i.e. orderings or permutations over choices. Inspired by the seminal works of Samuelson (cf. axiom of revealed preferences) and that of McFadden (cf. discrete choice models for transportation), we model the population's innate preferences as a mixture of the so called Multi-nomial Logit (MMNL) model. Under this model, the recommendation problem boils down to (a) learning the MMNL model from population data, (b) finding am MNL component within the mixture that closely represents the revealed preferences of the consumer at hand, and (c) recommending other choices to her/him that are ranked high according to thus found component. In this work, we address the problem of learning MMNL model from partial preferences. We identify fundamental limitations of any algorithm to learn such a model as well as provide conditions under which, a simple, data-driven (non-parametric) algorithm learns the model effectively. The proposed algorithm has a pleasant similarity to the standard collaborative filtering for scalar (or star) ratings, but in the domain of permutations. This work advances the state-of-art in the domain of learning distribution over permutations (cf. [2]) as well as in the context of learning mixture distributions (cf. [4]).

使用从异质人群中收集的消费者数据来计算选择排名已经成为任何现代消费者信息系统不可或缺的模块，例如Yelp、Netflix、亚马逊和像Google play这样的应用商店。在此类应用中，排序或推荐算法需要以可扩展的方式准确地从噪声数据中提取有意义的信息。解决这一挑战的原则性方法需要一个将观察结果与建议决策联系起来的模型，以及利用该模型的易于处理的推理算法。为此，我们将消费者产生的偏好数据抽象为嘈杂的，部分实现其先天偏好的数据，即对选择的排序或排列。受萨缪尔森(参见“揭示偏好公理”)和麦克法登(参见“运输的离散选择模型”)的开创性著作的启发，我们将人口的先天偏好建模为所谓的多项式Logit (MMNL)模型的混合体。在该模型下，推荐问题归结为(a)从人口数据中学习MMNL模型，(b)在混合物中找到一个与消费者的显示偏好密切相关的MNL组件，以及(c)根据所找到的组件向她/他推荐排名较高的其他选择。在这项工作中，我们解决了从部分偏好中学习MMNL模型的问题。我们确定了学习这种模型的任何算法的基本限制，并提供了一个简单的，数据驱动的(非参数)算法有效学习模型的条件。提出的算法与标准的标量(或星形)评级协同过滤具有令人愉快的相似性，但在排列领域。这项工作在置换学习分布领域(参见[2])以及学习混合分布的背景下(参见[4])推进了目前的技术水平。

{"title":"What's your choice?: learning the mixed multi-nomial","authors":"A. Ammar, Sewoong Oh, D. Shah, L. Voloch","doi":"10.1145/2591971.2592020","DOIUrl":"https://doi.org/10.1145/2591971.2592020","url":null,"abstract":"Computing a ranking over choices using consumer data gathered from a heterogenous population has become an indispensable module for any modern consumer information system, e.g. Yelp, Netflix, Amazon and app-stores like Google play. In such applications, a ranking or recommendation algorithm needs to extract meaningful information from noisy data accurately and in a scalable manner. A principled approach to resolve this challenge requires a model that connects observations to recommendation decisions and a tractable inference algorithm utilizing this model. To that end, we abstract the preference data generated by consumers as noisy, partial realizations of their innate preferences, i.e. orderings or permutations over choices. Inspired by the seminal works of Samuelson (cf. axiom of revealed preferences) and that of McFadden (cf. discrete choice models for transportation), we model the population's innate preferences as a mixture of the so called Multi-nomial Logit (MMNL) model. Under this model, the recommendation problem boils down to (a) learning the MMNL model from population data, (b) finding am MNL component within the mixture that closely represents the revealed preferences of the consumer at hand, and (c) recommending other choices to her/him that are ranked high according to thus found component. In this work, we address the problem of learning MMNL model from partial preferences. We identify fundamental limitations of any algorithm to learn such a model as well as provide conditions under which, a simple, data-driven (non-parametric) algorithm learns the model effectively. The proposed algorithm has a pleasant similarity to the standard collaborative filtering for scalar (or star) ratings, but in the domain of permutations. This work advances the state-of-art in the domain of learning distribution over permutations (cf. [2]) as well as in the context of learning mixture distributions (cf. [4]).","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

ANATOMY: an analytical model of memory system performance 解剖学:记忆系统性能的分析模型

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2591995

N. Gulur, M. Mehendale, R. Manikantan, Ramaswamy Govindarajan

Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we construct an analytical model of the memory system to comprehend this diverse space and to study the impact of memory system parameters from latency and bandwidth perspectives. Our model, called ANATOMY, consists of two key components that are coupled with each other, to model the memory system accurately. The first component is a queuing model of memory which models in detail various design choices and captures the impact of technological choices in memory systems. The second component is an analytical model to summarize key workload characteristics, namely row buffer hit rate (RBH), bank-level parallelism (BLP), and request spread (S) which are used as inputs to the queuing model to estimate memory performance. We validate the model across a wide variety of memory configurations on 4, 8 and 16 cores using a total of 44 workloads. ANATOMY is able to predict memory latency with an average error of 8.1%, 4.1% and 9.7% over 4, 8 and 16 core configurations. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, benefit of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY's ability to capture architectural elements such as scheduling mechanisms (using FR_FCFS and PAR_BS) and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the benefit of redressing them.

从性能和功耗的角度来看，存储系统设计对现代多核架构的影响越来越大。然而，预测内存系统的性能是很复杂的，由于无数的设计选择和参数在多个维度上，即(i)技术，(ii)设计和(iii)架构的选择。在这项工作中，我们构建了一个存储系统的分析模型来理解这个多样化的空间，并从延迟和带宽的角度研究存储系统参数的影响。我们的模型名为ANATOMY，由两个相互关联的关键组件组成，以准确地模拟记忆系统。第一个组件是内存的排队模型，它对各种设计选择进行了详细建模，并捕获了内存系统中技术选择的影响。第二个组件是一个分析模型，用于总结关键工作负载特征，即行缓冲区命中率(RBH)、银行级并行性(BLP)和请求扩展(S)，它们被用作队列模型的输入，以估计内存性能。我们在4核、8核和16核的各种内存配置上验证了该模型，总共使用了44个工作负载。在4核、8核和16核配置下，ANATOMY能够预测内存延迟，平均误差分别为8.1%、4.1%和9.7%。我们通过探索各种存储器设计选择(如时钟速度的影响、多个存储器控制器的好处、银行和通道宽度的作用等)来证明我们模型的可扩展性和适用性。我们还演示了ANATOMY捕获架构元素的能力，例如调度机制(使用FR_FCFS和PAR_BS)和DRAM刷新周期的影响。在所有这些研究中，ANATOMY提供了对内存性能瓶颈来源的洞察，并能够定量预测解决这些瓶颈的好处。

{"title":"ANATOMY: an analytical model of memory system performance","authors":"N. Gulur, M. Mehendale, R. Manikantan, Ramaswamy Govindarajan","doi":"10.1145/2591971.2591995","DOIUrl":"https://doi.org/10.1145/2591971.2591995","url":null,"abstract":"Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we construct an analytical model of the memory system to comprehend this diverse space and to study the impact of memory system parameters from latency and bandwidth perspectives. Our model, called ANATOMY, consists of two key components that are coupled with each other, to model the memory system accurately. The first component is a queuing model of memory which models in detail various design choices and captures the impact of technological choices in memory systems. The second component is an analytical model to summarize key workload characteristics, namely row buffer hit rate (RBH), bank-level parallelism (BLP), and request spread (S) which are used as inputs to the queuing model to estimate memory performance. We validate the model across a wide variety of memory configurations on 4, 8 and 16 cores using a total of 44 workloads. ANATOMY is able to predict memory latency with an average error of 8.1%, 4.1% and 9.7% over 4, 8 and 16 core configurations. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, benefit of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY's ability to capture architectural elements such as scheduling mechanisms (using FR_FCFS and PAR_BS) and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the benefit of redressing them.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125193047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592012

Qi Wang, Liang Liu, Jinbei Zhang, Xinyu Wang, Xinbing Wang, Songwu Lu

We propose the correlated mobile k-hop clustered networks model to implement correlated node movements and scalable clusters. We divide network states into three categories, i.e., cluster-sparse state, cluster-dense state and cluster-inferior dense state, and achieve the critical transmission range for the last two states. Furthermore, we find that correlated mobility and cluster scalability are closely related with each other and the impact of these two properties on connectivity is mainly through influencing network state transition.

我们提出了相关移动k-hop集群网络模型来实现相关节点移动和可扩展集群。我们将网络状态分为簇稀疏状态、簇密集状态和簇次密集状态三种，并求出后两种状态的临界传输范围。此外，我们发现相关移动性和集群可扩展性彼此密切相关，这两个属性对连通性的影响主要是通过影响网络状态转移来实现的。

引用次数: 2

Stochastic bandits with side observations on networks 网络上具有侧观测的随机强盗

Measurement and Modeling of Computer Systems

Pub Date : 2014-06-16 DOI: 10.1145/2591971.2591989

Swapna Buccapatnam, A. Eryilmaz, N. Shroff

We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions. In our model, choosing an action provides additional side observations for a subset of the remaining actions. One example of this model occurs in the problem of targeting users in online social networks where users respond to their friends's activity, thus providing information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic (with respect to time) lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose two policies - a randomized policy and a policy based on the well-known upper confidence bound (UCB) policies, both of which explore each action at a rate that is a function of its network position. We show that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor independent of network structure. The upper bound guarantees on the regret of these policies are better than those of existing policies. Finally, we use numerical examples on a real-world social network to demonstrate the significant benefits obtained by our policies against other existing policies.

研究了存在侧向观测的随机多臂强盗(MAB)问题。在我们的模型中，选择一个操作为剩余操作的子集提供了额外的侧面观察。这种模式的一个例子发生在在线社交网络的目标用户问题上，用户对他们朋友的活动做出回应，从而提供关于彼此偏好的信息。我们的贡献如下:1)我们推导出一个渐进的(关于时间的)下界(作为网络结构的函数)关于任何达到最大长期平均回报的统一好的策略的后悔(损失)。2)我们提出了两种策略-随机策略和基于众所周知的上置信度界(UCB)策略的策略，这两种策略都以其网络位置的函数速率探索每个动作。我们证明了这些策略实现了遗憾的渐近下界，直到一个与网络结构无关的乘因子。这些政策的担保上限优于现有政策的担保上限。最后，我们在现实世界的社交网络上使用数值示例来演示我们的策略相对于其他现有策略所获得的显著收益。

{"title":"Stochastic bandits with side observations on networks","authors":"Swapna Buccapatnam, A. Eryilmaz, N. Shroff","doi":"10.1145/2591971.2591989","DOIUrl":"https://doi.org/10.1145/2591971.2591989","url":null,"abstract":"We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions. In our model, choosing an action provides additional side observations for a subset of the remaining actions. One example of this model occurs in the problem of targeting users in online social networks where users respond to their friends's activity, thus providing information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic (with respect to time) lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose two policies - a randomized policy and a policy based on the well-known upper confidence bound (UCB) policies, both of which explore each action at a rate that is a function of its network position. We show that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor independent of network structure. The upper bound guarantees on the regret of these policies are better than those of existing policies. Finally, we use numerical examples on a real-world social network to demonstrate the significant benefits obtained by our policies against other existing policies.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75