Proceedings of the ACM on Measurement and Analysis of Computing Systems最新文献_第5页

Optimistic No-regret Algorithms for Discrete Caching 离散缓存的乐观无遗憾算法

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-08-15 DOI: 10.1145/3570608

N. Mhaisen, Abhishek Sinha, G. Paschos, Georgios Iosifidis

We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning, where the caching policy has access to a prediction oracle (provided by, e.g., a Neural Network). The successive file requests are assumed to be generated by an adversary, and no assumption is made on the accuracy of the oracle. In this setting, we provide a universal lower bound for prediction-assisted online caching and proceed to design a suite of policies with a range of performance-complexity trade-offs. All proposed policies offer sublinear regret bounds commensurate with the accuracy of the oracle. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only O(√T) regret. In this pursuit, we design, to the best of our knowledge, the first comprehensive optimistic Follow-the-Perturbed leader policy, which generalizes beyond the caching problem. We also study the problem of caching files with different sizes and the bipartite network caching problem. Finally, we evaluate the efficacy of the proposed policies through extensive numerical experiments using real-world traces.

在乐观学习的背景下，我们系统地研究了在有限容量的缓存中存储整个文件的问题，其中缓存策略可以访问预测oracle(由例如神经网络提供)。假定连续的文件请求是由攻击者生成的，并且不假定oracle的准确性。在此设置中，我们为预测辅助在线缓存提供了一个通用的下限，并继续设计一套具有一系列性能复杂性权衡的策略。所有提议的政策都提供了与预言的准确性相称的次线性后悔界限。我们的结果大大改进了最近提出的所有在线缓存策略，这些策略由于无法利用oracle预测，只提供0(√T)的遗憾。在这种追求中，我们设计了，据我们所知，第一个全面的乐观的跟随受扰领导者策略，它超越了缓存问题。我们还研究了不同大小文件的缓存问题和二部网络缓存问题。最后，我们通过使用真实世界轨迹的大量数值实验来评估所提出政策的有效性。

{"title":"Optimistic No-regret Algorithms for Discrete Caching","authors":"N. Mhaisen, Abhishek Sinha, G. Paschos, Georgios Iosifidis","doi":"10.1145/3570608","DOIUrl":"https://doi.org/10.1145/3570608","url":null,"abstract":"We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning, where the caching policy has access to a prediction oracle (provided by, e.g., a Neural Network). The successive file requests are assumed to be generated by an adversary, and no assumption is made on the accuracy of the oracle. In this setting, we provide a universal lower bound for prediction-assisted online caching and proceed to design a suite of policies with a range of performance-complexity trade-offs. All proposed policies offer sublinear regret bounds commensurate with the accuracy of the oracle. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only O(√T) regret. In this pursuit, we design, to the best of our knowledge, the first comprehensive optimistic Follow-the-Perturbed leader policy, which generalizes beyond the caching problem. We also study the problem of caching files with different sizes and the bipartite network caching problem. Finally, we evaluate the efficacy of the proposed policies through extensive numerical experiments using real-world traces.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Enabling Long-term Fairness in Dynamic Resource Allocation 启用动态资源分配的长期公平性

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-08-11 DOI: 10.1145/3570606

T. Si Salem, G. Iosifidis, G. Neglia

We study the fairness of dynamic resource allocation problem under the α-fairness criterion. We recognize two different fairness objectives that naturally arise in this problem: the well-understood slot-fairness objective that aims to ensure fairness at every timeslot, and the less explored horizon-fairness objective that aims to ensure fairness across utilities accumulated over a time horizon. We argue that horizon-fairness comes at a lower price in terms of social welfare. We study horizon-fairness with the regret as a performance metric and show that vanishing regret cannot be achieved in presence of an unrestricted adversary. We propose restrictions on the adversary's capabilities corresponding to realistic scenarios and an online policy that indeed guarantees vanishing regret under these restrictions. We demonstrate the applicability of the proposed fairness framework to a representative resource management problem considering a virtualized caching system where different caches cooperate to serve content requests.

研究了α-公平性准则下动态资源分配问题的公平性问题。我们认识到在这个问题中自然会出现两种不同的公平目标:众所周知的时段公平目标，旨在确保每个时段的公平，以及较少探索的水平公平目标，旨在确保在一个时间范围内积累的公用事业的公平。我们认为，就社会福利而言，地平线公平的代价较低。我们以后悔作为绩效度量来研究视界公平，并证明在不受限制的对手存在的情况下，后悔的消失是无法实现的。我们建议根据现实情况对对手的能力进行限制，并制定一项在线政策，确保在这些限制下消除遗憾。我们演示了所提出的公平性框架对一个代表性资源管理问题的适用性，该问题考虑了一个虚拟缓存系统，其中不同的缓存合作提供内容请求。

引用次数: 7

On the Stochastic and Asymptotic Improvement of First-Come First-Served and Nudge Scheduling 先到先得和助推调度的随机渐近改进

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-21 DOI: 10.1145/3570610

B. Van Houdt

Recently it was shown that, contrary to expectations, the First-Come-First-Served (FCFS) scheduling algorithm can be stochastically improved upon by a scheduling algorithm called Nudge for light-tailed job size distributions. Nudge partitions jobs into 4 types based on their size, say small, medium, large and huge jobs. Nudge operates identical to FCFS, except that whenever a small job arrives that finds a large job waiting at the back of the queue, Nudge swaps the small job with the large one unless the large job was already involved in an earlier swap. In this paper, we show that FCFS can be stochastically improved upon under far weaker conditions. We consider a system with 2 job types and limited swapping between type-1 and type-2 jobs, but where a type-1 job is not necessarily smaller than a type-2 job. More specifically, we introduce and study the Nudge-K scheduling algorithm which allows type-1 jobs to be swapped with up to K type-2 jobs waiting at the back of the queue, while type-2 jobs can be involved in at most one swap. We present an explicit expression for the response time distribution under Nudge-K when both job types follow a phase-type distribution. Regarding the asymptotic tail improvement ratio (ATIR), we derive a simple expression for the ATIR, as well as for the K that maximizes the ATIR. We show that the ATIR is positive and the optimal K tends to infinity in heavy traffic as long as the type-2 jobs are on average longer than the type-1 jobs.

最近的研究表明，与预期相反，对于轻尾作业大小分布，先到先服务(FCFS)调度算法可以被一种称为Nudge的调度算法随机改进。Nudge根据作业的大小将作业划分为4种类型，例如小型、中型、大型和大型作业。Nudge的操作与FCFS相同，除了当一个小作业到达时发现队列后面有一个大作业等待时，Nudge会将小作业与大作业交换，除非大作业已经参与了先前的交换。在本文中，我们证明了FCFS可以在更弱的条件下随机改进。我们考虑一个具有2种作业类型的系统，并且在1类和2类作业之间进行有限的交换，但是1类作业不一定比2类作业小。更具体地说，我们引入并研究了Nudge-K调度算法，该算法允许在队列后面等待的最多K个类型1作业与最多K个类型2作业交换，而类型2作业最多只能参与一次交换。当两种作业类型都遵循阶段型分布时，我们给出了在Nudge-K下响应时间分布的显式表达式。对于渐近尾部改善比(ATIR)，我们导出了ATIR的一个简单表达式，以及使ATIR最大化的K。我们证明，只要2类作业平均比1类作业长，在繁忙的交通中，ATIR是正的，最优K趋于无穷大。

{"title":"On the Stochastic and Asymptotic Improvement of First-Come First-Served and Nudge Scheduling","authors":"B. Van Houdt","doi":"10.1145/3570610","DOIUrl":"https://doi.org/10.1145/3570610","url":null,"abstract":"Recently it was shown that, contrary to expectations, the First-Come-First-Served (FCFS) scheduling algorithm can be stochastically improved upon by a scheduling algorithm called Nudge for light-tailed job size distributions. Nudge partitions jobs into 4 types based on their size, say small, medium, large and huge jobs. Nudge operates identical to FCFS, except that whenever a small job arrives that finds a large job waiting at the back of the queue, Nudge swaps the small job with the large one unless the large job was already involved in an earlier swap. In this paper, we show that FCFS can be stochastically improved upon under far weaker conditions. We consider a system with 2 job types and limited swapping between type-1 and type-2 jobs, but where a type-1 job is not necessarily smaller than a type-2 job. More specifically, we introduce and study the Nudge-K scheduling algorithm which allows type-1 jobs to be swapped with up to K type-2 jobs waiting at the back of the queue, while type-2 jobs can be involved in at most one swap. We present an explicit expression for the response time distribution under Nudge-K when both job types follow a phase-type distribution. Regarding the asymptotic tail improvement ratio (ATIR), we derive a simple expression for the ATIR, as well as for the K that maximizes the ATIR. We show that the ATIR is positive and the optimal K tends to infinity in heavy traffic as long as the type-2 jobs are on average longer than the type-1 jobs.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121022900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure 克服样本高效低秩结构强化学习的长视界障碍

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-07 DOI: 10.1145/3589973

Tyler Sam, Yudong Chen, C. Yu

The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an ε-optimal policy is Ω(|S||A|H/ ε2) over worst case instances of an MDP with state space S, action space A, and horizon H. We consider a class of MDPs for which the associated optimal Q* function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in |S| and |A| due to the low rank structure, we show that without imposing further assumptions beyond low rank of Q*, if one is constrained to estimate the Q function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon H to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of Õ((|S|+|A|)poly (d,H)/ε2) for a rank d setting, which is minimax optimal with respect to the scaling of |S|, |A|, and ε. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.

强化学习算法的实用性受到问题规模的限制，因为在具有状态空间S、动作空间A和视界H的MDP的最坏情况下，学习ε-最优策略的样本复杂度为Ω(|S| A|H/ ε2)。我们考虑一类MDP，其相关的最优Q*函数是低秩的，其中潜在特征是未知的。虽然由于低秩结构，人们希望在|S|和|A|中实现线性样本复杂度，但我们表明，在不施加超出低秩Q*的进一步假设的情况下，如果一个人被约束仅使用来自一个子集的观察值来估计Q函数，则存在最坏的情况，其中必须在视界H中产生样本复杂度指数来学习接近最优策略。我们随后证明，在更强的低秩结构假设下，给定生成模型，低秩蒙特卡罗策略迭代(LR-MCPI)和低秩经验值迭代(LR-EVI)对于秩d设置实现了所需的样本复杂度Õ((|S|+| a |)poly (d,H)/ε2)，该复杂度相对于|S|， | a |和ε的尺度是最小最大最优的。与线性和低秩mdp的文献相比，我们不需要已知的特征映射，我们的算法计算简单，我们的结果适用于很长的时间范围。我们的结果提供了关于相对于转移核和最优动作值函数的MDP所需的最小低秩结构假设的见解。

{"title":"Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure","authors":"Tyler Sam, Yudong Chen, C. Yu","doi":"10.1145/3589973","DOIUrl":"https://doi.org/10.1145/3589973","url":null,"abstract":"The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an ε-optimal policy is Ω(|S||A|H/ ε2) over worst case instances of an MDP with state space S, action space A, and horizon H. We consider a class of MDPs for which the associated optimal Q* function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in |S| and |A| due to the low rank structure, we show that without imposing further assumptions beyond low rank of Q*, if one is constrained to estimate the Q function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon H to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of Õ((|S|+|A|)poly (d,H)/ε2) for a rank d setting, which is minimax optimal with respect to the scaling of |S|, |A|, and ε. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133071796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Monetizing Spare Bandwidth

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530899

Yunming Xiao, Matteo Varvello, A. Kuzmanovic

Residential Internet speeds have been rapidly increasing, reaching averages of ~100 Mbps in most developed countries. Several studies have shown that users have way more bandwidth than they need, only using about 20-30% on a regular day. Several systems exploit this trend by enabling users to monetize their spare bandwidth, e.g., by sharing their WiFi connection or by participating in distributed proxy or VPN (dVPN) services. Despite the proliferation of such systems, little is known on how such marketplaces operate, what are the key factors that determine the price of the spare bandwidth, and how such prices differ worldwide. In this work, we shed some light on this topic using dVPNs as a use-case. We start by formalizing the problem of bandwidth monetization as an optimization between a buyer's cost and seller's income. Next, we explore three popular dVPNs (Mysterium, Sentinel, and Tachyon) using both active and passive measurements. We find that dVPNs have a large and growing footprint, and offer comparable performance to their centralized counterpart. We identify Mysterium (in the US) as the most concrete realization of a bandwidth marketplace, for which we derive a value of spare Internet bandwidth ranging between 11 and 14 cents per GB. We also show that both buyers and sellers utilize ad-hoc "rules-of-thumb" when choosing their prices, which results in a sub-optimal marketplace. By applying our optimization, a seller's income can be tripled by setting a price lower than the default one which allows to attract more buyers. These observations motivate us to create RING, a first and concrete system which helps sellers to automatically adjust their prices and traffic volumes across multiple marketplaces.

住宅互联网速度一直在迅速增长，在大多数发达国家达到平均约100mbps。几项研究表明，用户拥有的带宽远远超过了他们的需求，每天只使用大约20-30%的带宽。一些系统利用这一趋势，使用户能够将他们的空闲带宽货币化，例如，通过共享他们的WiFi连接或参与分布式代理或VPN (dVPN)服务。尽管这样的系统大量出现，但人们对这样的市场是如何运作的，决定备用带宽价格的关键因素是什么，以及这些价格在世界范围内的差异如何知之甚少。在本文中，我们使用dvpn作为一个用例来阐明这个主题。我们首先将带宽货币化问题形式化为买方成本和卖方收入之间的优化。接下来，我们探索三种流行的dvpn(神秘，哨兵，和Tachyon)使用主动和被动测量。我们发现dvpn的占地面积很大，而且还在不断增长，并提供与集中式对等物相当的性能。我们认为《Mysterium》(在美国)是带宽市场最具体的实现，我们从中得出了每GB 11到14美分的空闲互联网带宽价值。我们还表明，买家和卖家在选择价格时都使用临时的“经验法则”，这导致了次优市场。通过应用我们的优化方法，卖家可以通过设定低于默认价格的价格来吸引更多买家，从而使卖家的收入增加三倍。这些观察促使我们创建RING，这是第一个具体的系统，可以帮助卖家在多个市场上自动调整价格和流量。

{"title":"Monetizing Spare Bandwidth","authors":"Yunming Xiao, Matteo Varvello, A. Kuzmanovic","doi":"10.1145/3530899","DOIUrl":"https://doi.org/10.1145/3530899","url":null,"abstract":"Residential Internet speeds have been rapidly increasing, reaching averages of ~100 Mbps in most developed countries. Several studies have shown that users have way more bandwidth than they need, only using about 20-30% on a regular day. Several systems exploit this trend by enabling users to monetize their spare bandwidth, e.g., by sharing their WiFi connection or by participating in distributed proxy or VPN (dVPN) services. Despite the proliferation of such systems, little is known on how such marketplaces operate, what are the key factors that determine the price of the spare bandwidth, and how such prices differ worldwide. In this work, we shed some light on this topic using dVPNs as a use-case. We start by formalizing the problem of bandwidth monetization as an optimization between a buyer's cost and seller's income. Next, we explore three popular dVPNs (Mysterium, Sentinel, and Tachyon) using both active and passive measurements. We find that dVPNs have a large and growing footprint, and offer comparable performance to their centralized counterpart. We identify Mysterium (in the US) as the most concrete realization of a bandwidth marketplace, for which we derive a value of spare Internet bandwidth ranging between 11 and 14 cents per GB. We also show that both buyers and sellers utilize ad-hoc \"rules-of-thumb\" when choosing their prices, which results in a sub-optimal marketplace. By applying our optimization, a seller's income can be tripled by setting a price lower than the default one which allows to attract more buyers. These observations motivate us to create RING, a first and concrete system which helps sellers to automatically adjust their prices and traffic volumes across multiple marketplaces.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127636135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Formalism of DNN Accelerator Flexibility DNN加速器灵活性的一种形式

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530907

Sheng-Chun Kao, Hyoukjun Kwon, Michael Pellauer, A. Parashar, T. Krishna

The high efficiency of domain-specific hardware accelerators for machine learning (ML) has come fromspecialization, with the trade-off of less configurability/ flexibility. There is growing interest in developingflexible ML accelerators to make them future-proof to the rapid evolution of Deep Neural Networks (DNNs). However, the notion of accelerator flexibility has always been used in an informal manner, restricting computer architects from conducting systematic apples-to-apples design-space exploration (DSE) across trillions of choices. In this work, we formally define accelerator flexibility and show how it can be integrated for DSE. % flows. Specifically, we capture DNN accelerator flexibility across four axes: %the map-space of DNN accelerator along four flexibility axes: tiling, ordering, parallelization, and array shape. We categorize existing accelerators into 16 classes based on their axes of flexibility support, and define a precise quantification of the degree of flexibility of an accelerator across each axis. We leverage these to develop a novel flexibility-aware DSE framework. %It respects the difference of accelerator flexibility classes and degree of flexibility support in different accelerators, creating unique map-spaces. %and forms a unique map space for exploration. % We demonstrate how this can be used to perform first-of-their-kind evaluations, including an isolation study to identify the individual impact of the flexibility axes. We demonstrate that adding flexibility features to a hypothetical DNN accelerator designed in 2014 improves runtime on future (i.e., present-day) DNNs by 11.8x geomean.

用于机器学习(ML)的特定领域硬件加速器的高效率来自于专业化，但代价是可配置性/灵活性较低。然而，加速器灵活性的概念一直以非正式的方式使用，限制了计算机架构师在数万亿种选择中进行系统的苹果对苹果的设计空间探索(DSE)。在这项工作中，我们正式定义了加速器的灵活性，并展示了如何将其集成到DSE中。%流。具体来说，我们在四个轴上捕获DNN加速器的灵活性:DNN加速器沿着四个灵活性轴的映射空间%:平铺、排序、并行化和阵列形状。我们将现有的加速器根据其灵活性支持轴分为16类，并定义了加速器在每个轴上的灵活性程度的精确量化。我们利用这些来开发一种新颖的具有灵活性意识的DSE框架。它尊重不同加速器的灵活性类别和灵活性支持程度的差异，创建独特的地图空间。%，形成独特的地图空间供探索。我们演示了如何使用这一方法进行首次评估，包括一项分离研究，以确定灵活性轴的个体影响。我们证明，在2014年设计的假设DNN加速器中添加灵活性特征可以将未来(即当前)DNN的运行时间提高11.8个几何倍。

{"title":"A Formalism of DNN Accelerator Flexibility","authors":"Sheng-Chun Kao, Hyoukjun Kwon, Michael Pellauer, A. Parashar, T. Krishna","doi":"10.1145/3530907","DOIUrl":"https://doi.org/10.1145/3530907","url":null,"abstract":"The high efficiency of domain-specific hardware accelerators for machine learning (ML) has come fromspecialization, with the trade-off of less configurability/ flexibility. There is growing interest in developingflexible ML accelerators to make them future-proof to the rapid evolution of Deep Neural Networks (DNNs). However, the notion of accelerator flexibility has always been used in an informal manner, restricting computer architects from conducting systematic apples-to-apples design-space exploration (DSE) across trillions of choices. In this work, we formally define accelerator flexibility and show how it can be integrated for DSE. % flows. Specifically, we capture DNN accelerator flexibility across four axes: %the map-space of DNN accelerator along four flexibility axes: tiling, ordering, parallelization, and array shape. We categorize existing accelerators into 16 classes based on their axes of flexibility support, and define a precise quantification of the degree of flexibility of an accelerator across each axis. We leverage these to develop a novel flexibility-aware DSE framework. %It respects the difference of accelerator flexibility classes and degree of flexibility support in different accelerators, creating unique map-spaces. %and forms a unique map space for exploration. % We demonstrate how this can be used to perform first-of-their-kind evaluations, including an isolation study to identify the individual impact of the flexibility axes. We demonstrate that adding flexibility features to a hypothetical DNN accelerator designed in 2014 improves runtime on future (i.e., present-day) DNNs by 11.8x geomean.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130733071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Dremel Dremel

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530903

Chenxingyu Zhao, Tapan Chugh, Jaehong Min, Ming Liu, A. Krishnamurthy

LSM-tree-based key-value stores like RocksDB are widely used to support many applications. However, configuring a RocksDB instance is challenging for the following reasons: 1) RocksDB has a massive parameter space to configure; 2) there are inherent trade-offs and dependencies between parameters; 3) right configurations are dependent on workload and hardware; and 4) evaluating configurations is time-consuming. Prior works struggle with handling the curse of dimensionality, capturing relationships between parameters, adapting configurations to workload and hardware, and evaluating quickly. In this work, we present a system, Dremel, to adaptively and quickly configure RocksDB with strategies based on the Multi-Armed Bandit model. To handle the massive parameter space, we propose using fused features, which encode domain-specific knowledge, to work as a compact and powerful representation for configurations. To adapt to the workload and hardware, we build an online bandit model to identify the best configuration. To evaluate quickly, we enable multi-fidelity evaluation and upper-confidence-bound sampling to speed up identifying the best configuration. Dremel not only achieves up to ×2.61 higher IOPS and 57% less latency than default configurations but also achieves up to 63% improvements over prior works on 18 different settings with the same or less time budget.

像RocksDB这样基于lsm树的键值存储被广泛用于支持许多应用程序。然而，配置一个RocksDB实例是具有挑战性的，原因如下:1)RocksDB有大量的参数空间需要配置;2)参数之间存在固有的权衡和依赖关系;3)正确的配置取决于工作负载和硬件;4)评估配置非常耗时。先前的工作与处理维度的诅咒、捕获参数之间的关系、调整配置以适应工作负载和硬件以及快速评估有关。在这项工作中，我们提出了一个基于Multi-Armed Bandit模型的自适应快速配置RocksDB的系统Dremel。为了处理大量的参数空间，我们提出使用融合特征来编码特定于领域的知识，作为一个紧凑而强大的配置表示。为了适应工作负载和硬件，我们建立了一个在线强盗模型来确定最佳配置。为了快速评估，我们启用了多保真度评估和上置信度采样来加速识别最佳配置。Dremel不仅实现了×2.61更高的IOPS和比默认配置少57%的延迟，而且在相同或更少的时间预算下，在18种不同的设置上实现了高达63%的改进。

{"title":"Dremel","authors":"Chenxingyu Zhao, Tapan Chugh, Jaehong Min, Ming Liu, A. Krishnamurthy","doi":"10.1145/3530903","DOIUrl":"https://doi.org/10.1145/3530903","url":null,"abstract":"LSM-tree-based key-value stores like RocksDB are widely used to support many applications. However, configuring a RocksDB instance is challenging for the following reasons: 1) RocksDB has a massive parameter space to configure; 2) there are inherent trade-offs and dependencies between parameters; 3) right configurations are dependent on workload and hardware; and 4) evaluating configurations is time-consuming. Prior works struggle with handling the curse of dimensionality, capturing relationships between parameters, adapting configurations to workload and hardware, and evaluating quickly. In this work, we present a system, Dremel, to adaptively and quickly configure RocksDB with strategies based on the Multi-Armed Bandit model. To handle the massive parameter space, we propose using fused features, which encode domain-specific knowledge, to work as a compact and powerful representation for configurations. To adapt to the workload and hardware, we build an online bandit model to identify the best configuration. To evaluate quickly, we enable multi-fidelity evaluation and upper-confidence-bound sampling to speed up identifying the best configuration. Dremel not only achieves up to ×2.61 higher IOPS and 57% less latency than default configurations but also achieves up to 63% improvements over prior works on 18 different settings with the same or less time budget.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Prediction of the Resource Consumption of Distributed Deep Learning Systems 分布式深度学习系统的资源消耗预测

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530895

Gyeongsik Yang, C. Shin, J. Lee, Yeonho Yoo, C. Yoo

The prediction of the resource consumption for the distributed training of deep learning models is of paramount importance, as it can inform a priori users how long their training would take and also enable users to manage the cost of training. Yet, no such prediction is available for users because the resource consumption itself varies significantly according to "settings" such as GPU types and also by "workloads" like deep learning models. Previous studies have aimed to derive or model such a prediction, but they fall short of accommodating the various combinations of settings and workloads together. This study presents Driple that designs graph neural networks to predict the resource consumption of diverse workloads. Driple also designs transfer learning to extend the graph neural networks to adapt to differences in settings. The evaluation results show that Driple can effectively predict a wide range of workloads and settings. At the same time, Driple can efficiently reduce the time required to tailor the prediction for different settings by up to 7.3×.

预测深度学习模型的分布式训练的资源消耗是至关重要的，因为它可以先验地告知用户他们的训练需要多长时间，并且使用户能够管理训练成本。然而，对于用户来说，这种预测是不可用的，因为资源消耗本身根据“设置”(如GPU类型)和“工作负载”(如深度学习模型)有很大差异。先前的研究旨在推导或模拟这样的预测，但它们无法适应各种设置和工作量的组合。本研究提出了一种设计图形神经网络的Driple来预测不同工作负载的资源消耗。Driple还设计了迁移学习来扩展图神经网络以适应不同的设置。评估结果表明，Driple可以有效地预测大范围的工作负载和设置。同时，Driple可以有效地减少针对不同设置定制预测所需的时间，最多可减少7.3倍。

引用次数: 2

POMACS V6, N2, June 2022 Editorial POMACS V6, N2, 2022年6月社论

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530890

Niklas Carlsson, Edith Cohen, Philippe Robert

The ACM Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented during the ACM SIGMETRICS/Performance 2022 conference. The issue contains papers selected by the editorial board via a rigorous review process that follows a hybrid conference and journal model, with reviews conducted by the 101 members of our POMACS editorial board. Each paper was either conditionally accepted (and shepherded), allowed a "one-shot" revision (to be resubmitted to one of the subsequent two deadlines), or rejected (with resubmission allowed after a year). For this issue, which represents the winter deadline, we accepted 17 papers out of 126 submissions (including 4 papers that had been given a "one-shot" revision opportunity). All submitted papers received at least 3 reviews and we held an online TPC meeting. Based on the indicated primary track, roughly 31% of the submissions were in the Measurement & Applied Modeling track, 25% were in the Systems track, 23% were in the Theory track, and 21% were in the Learning track. Many people contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their work to SIGMETRICS/POMACS. Second, we would like to thank the TPC members for their work: constructive feedback in their reviews to authors, participation to online discussions and also to TPC meetings. We also thank several external reviewers who provided their expert opinion on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Giuliano Casale, and to past TPC Chairs. Finally, we are grateful to the Organization Committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS/Performance 2022.

ACM计算系统测量与分析(POMACS)的ACM论文集侧重于计算机系统的测量和性能评估，并与ACM特别兴趣小组SIGMETRICS密切合作。本期《POMACS》的所有论文将在ACM SIGMETRICS/Performance 2022会议期间提交。本刊包含的论文由编辑委员会通过严格的审查程序选择，该程序遵循会议和期刊混合模式，由POMACS编辑委员会的101名成员进行审查。每篇论文要么被有条件地接受(并受到指导)，允许“一次性”修改(在随后的两个截止日期之一重新提交)，要么被拒绝(一年后允许重新提交)。这期是冬季截稿期，我们从126篇投稿中接受了17篇(包括4篇“一次性”修改机会的论文)。所有提交的论文至少接受了3次评审，并举行了在线TPC会议。根据指示的主要轨道，大约31%的提交是在测量和应用建模轨道，25%在系统轨道，23%在理论轨道，21%在学习轨道。许多人对本期《POMACS》的成功做出了贡献。首先，我们要感谢向SIGMETRICS/POMACS提交工作的作者。其次，我们要感谢TPC成员的工作:他们在对作者的评论中提供建设性的反馈，参与在线讨论和TPC会议。我们还要感谢几位外部审稿人，他们就需要额外投入的具体提交文件提供了专家意见。我们还要感谢SIGMETRICS董事会主席Giuliano Casale和过去的TPC主席。最后，我们感谢组织委员会和SIGMETRICS董事会为ACM SIGMETRICS/Performance 2022创建一个令人兴奋的计划所做的持续努力和倡议。

{"title":"POMACS V6, N2, June 2022 Editorial","authors":"Niklas Carlsson, Edith Cohen, Philippe Robert","doi":"10.1145/3530890","DOIUrl":"https://doi.org/10.1145/3530890","url":null,"abstract":"The ACM Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented during the ACM SIGMETRICS/Performance 2022 conference. The issue contains papers selected by the editorial board via a rigorous review process that follows a hybrid conference and journal model, with reviews conducted by the 101 members of our POMACS editorial board. Each paper was either conditionally accepted (and shepherded), allowed a \"one-shot\" revision (to be resubmitted to one of the subsequent two deadlines), or rejected (with resubmission allowed after a year). For this issue, which represents the winter deadline, we accepted 17 papers out of 126 submissions (including 4 papers that had been given a \"one-shot\" revision opportunity). All submitted papers received at least 3 reviews and we held an online TPC meeting. Based on the indicated primary track, roughly 31% of the submissions were in the Measurement & Applied Modeling track, 25% were in the Systems track, 23% were in the Theory track, and 21% were in the Learning track. Many people contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their work to SIGMETRICS/POMACS. Second, we would like to thank the TPC members for their work: constructive feedback in their reviews to authors, participation to online discussions and also to TPC meetings. We also thank several external reviewers who provided their expert opinion on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Giuliano Casale, and to past TPC Chairs. Finally, we are grateful to the Organization Committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS/Performance 2022.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toxicity in the Decentralized Web and the Potential for Model Sharing 去中心化网络的毒性和模型共享的潜力

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Pub Date : 2022-06-06 DOI: 10.1145/3530901

Haris Bin Zia, Aravindh Raman, Ignacio Castro, Ishaku Hassan Anaobi, Emiliano De Cristofaro, Nishanth R. Sastry, Gareth Tyson

The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89.

“去中心化网络”(DW)是一个不断发展的概念，它包含了旨在在网络上提供更大透明度和开放性的技术。DW依赖于独立的服务器(又名实例)，这些服务器以点对点的方式连接在一起，提供一系列服务(例如微博、图像共享、视频流)。然而，在这种分散的环境中，有害内容审核是具有挑战性的。这是因为没有一个中心实体可以定义毒性，也没有一个大的中心数据池可以用来构建通用分类器。因此，发生了几起高调的生化武器被滥用来协调和传播有害材料的案件也就不足为奇了。使用Pleroma(一个流行的DW微博服务)上117K用户的990万篇帖子的数据集，我们量化了有毒内容的存在。我们发现有毒物质普遍存在，并在实例之间迅速传播。我们表明，由于缺乏足够的可用训练数据和标签所需的努力，自动化每个实例的内容审核是具有挑战性的。因此，我们提出并评估了ModPair，一个有效检测有毒内容的模型共享系统，获得了平均每个实例的宏观f1得分0.89。

{"title":"Toxicity in the Decentralized Web and the Potential for Model Sharing","authors":"Haris Bin Zia, Aravindh Raman, Ignacio Castro, Ishaku Hassan Anaobi, Emiliano De Cristofaro, Nishanth R. Sastry, Gareth Tyson","doi":"10.1145/3530901","DOIUrl":"https://doi.org/10.1145/3530901","url":null,"abstract":"The \"Decentralised Web\" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121294744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3