首页 > 最新文献

Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems最新文献

英文 中文
Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms 理解现代DRAM器件中的低电压操作:实验表征、分析和机制
K. Chang, A. G. Yaglikçi, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O'Connor, Hasan Hassan, O. Mutlu
The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability. In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by manufacturers.
在现代计算系统中,DRAM的能耗是一个关键问题。制造工艺技术的改进使DRAM供应商能够保守地降低DRAM电源电压,从而降低了DRAM的一些能耗。我们希望更积极地降低DRAM供电电压,以进一步降低能耗。积极的电源电压降低需要彻底理解电压缩放对DRAM访问延迟和DRAM可靠性的影响。在本文中,我们采用了一种全面的方法来理解和利用现代DRAM在电源电压低于制造商指定的标称电压水平时的延迟和可靠性特性。
{"title":"Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms","authors":"K. Chang, A. G. Yaglikçi, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O'Connor, Hasan Hassan, O. Mutlu","doi":"10.1145/3078505.3078590","DOIUrl":"https://doi.org/10.1145/3078505.3078590","url":null,"abstract":"The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability. In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by manufacturers.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126610465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Session details: Session 5: Towards Efficient and Durable Storage 会议详情:会议5:迈向高效和持久的存储
B. Urgaonkar
{"title":"Session details: Session 5: Towards Efficient and Durable Storage","authors":"B. Urgaonkar","doi":"10.1145/3248540","DOIUrl":"https://doi.org/10.1145/3248540","url":null,"abstract":"","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scheduling Coflows in Datacenter Networks: Improved Bound for Total Weighted Completion Time 数据中心网络中的协同流调度:改进的总加权完成时间边界
Mehrnoosh Shafiee, Javad Ghaderi
Coflow is a recently proposed networking abstraction to capture communication patterns in data-parallel computing frameworks. We consider the problem of efficiently scheduling coflows with release dates in a shared datacenter network so as to minimize the total weighted completion time of coflows. Specifically, we propose a randomized algorithm with approximation ratio of 3e ~ 8.155, which improves the prior best known ratio of 9+16 √2/3 ~ 16.542. For the special case when all coflows are released at time zero, we obtain a randomized algorithm with approximation ratio of 2e ~ 5.436 which improves the prior best known ratio of 3+2√2 ~ 5.828$. Simulation result using a real traffic trace is presented that shows improvement over the prior approaches.
Coflow是最近提出的一种网络抽象,用于捕获数据并行计算框架中的通信模式。研究了共享数据中心网络中带发布日期的协同流的有效调度问题,以使协同流的总加权完成时间最小。具体而言,我们提出了一种近似比为3e ~ 8.155的随机化算法,改进了先验最优已知比为9+16√2/3 ~ 16.542的算法。对于所有共流在时间为0时释放的特殊情况,我们得到了一个近似比为2e ~ 5.436的随机化算法,改进了先验最优已知比为3+2√2 ~ 5.828$。利用真实流量轨迹的仿真结果表明,该方法比先前的方法有所改进。
{"title":"Scheduling Coflows in Datacenter Networks: Improved Bound for Total Weighted Completion Time","authors":"Mehrnoosh Shafiee, Javad Ghaderi","doi":"10.1145/3078505.3078548","DOIUrl":"https://doi.org/10.1145/3078505.3078548","url":null,"abstract":"Coflow is a recently proposed networking abstraction to capture communication patterns in data-parallel computing frameworks. We consider the problem of efficiently scheduling coflows with release dates in a shared datacenter network so as to minimize the total weighted completion time of coflows. Specifically, we propose a randomized algorithm with approximation ratio of 3e ~ 8.155, which improves the prior best known ratio of 9+16 √2/3 ~ 16.542. For the special case when all coflows are released at time zero, we obtain a randomized algorithm with approximation ratio of 2e ~ 5.436 which improves the prior best known ratio of 3+2√2 ~ 5.828$. Simulation result using a real traffic trace is presented that shows improvement over the prior approaches.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124756275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Simplex Queues for Hot-Data Download 热数据下载的简单队列
M. Aktaş, E. Najm, E. Soljanin
In distributed systems, reliable data storage is accomplished through redundancy, which has traditionally been achieved by simple replication of data across multiple nodes [6]. A special class of erasure codes, known as locally repairable codes (LRCs) [7], has started to replace replication in practice [8], as a more storage-efficient way to provide a desired reliability. It has recently been recognized, that storage redundancy can also provide fast access of stored data (see e.g. [5,9,10] and references therein). Most of these papers consider download scenarios of all jointly encoded pieces of data, and very few [11,12,14] are concerned with download of only some, possibly hot, pieces of data that are jointly encoded with those of less interest. So far, only low traffic regime has been partially addressed. In this paper, we are concerned with hot data download from systems implementing a special class of locally repairable codes, known as LRCs with availability [13,15]. We consider simplex codes, a particular subclass of LRCs with availability, because 1) they are in a certain sense optimal [2] and 2) they are minimally different from replication.
在分布式系统中,可靠的数据存储是通过冗余来实现的,传统的冗余是通过跨多个节点的数据简单复制来实现的[6]。一种特殊的擦除码,被称为本地可修复码(lrc)[7],已经开始在实践中取代复制[8],作为一种更有效的存储方式来提供所需的可靠性。最近已经认识到,存储冗余还可以提供对存储数据的快速访问(参见例如[5,9,10]和其中的参考文献)。这些论文中的大多数都考虑了所有联合编码数据块的下载场景,很少有[11,12,14]只关注与不太感兴趣的数据块联合编码的一些(可能是热的)数据块的下载。到目前为止,只有低流量制度得到了部分解决。在本文中,我们关注的是从系统中下载的热数据,这些系统实现了一类特殊的局部可修复代码,称为具有可用性的lrc[13,15]。我们考虑单纯形码,具有可用性的lrc的一个特殊子类,因为1)它们在某种意义上是最优的[2],2)它们与复制的差异最小。
{"title":"Simplex Queues for Hot-Data Download","authors":"M. Aktaş, E. Najm, E. Soljanin","doi":"10.1145/3078505.3078553","DOIUrl":"https://doi.org/10.1145/3078505.3078553","url":null,"abstract":"In distributed systems, reliable data storage is accomplished through redundancy, which has traditionally been achieved by simple replication of data across multiple nodes [6]. A special class of erasure codes, known as locally repairable codes (LRCs) [7], has started to replace replication in practice [8], as a more storage-efficient way to provide a desired reliability. It has recently been recognized, that storage redundancy can also provide fast access of stored data (see e.g. [5,9,10] and references therein). Most of these papers consider download scenarios of all jointly encoded pieces of data, and very few [11,12,14] are concerned with download of only some, possibly hot, pieces of data that are jointly encoded with those of less interest. So far, only low traffic regime has been partially addressed. In this paper, we are concerned with hot data download from systems implementing a special class of locally repairable codes, known as LRCs with availability [13,15]. We consider simplex codes, a particular subclass of LRCs with availability, because 1) they are in a certain sense optimal [2] and 2) they are minimally different from replication.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121372277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
On Gradient-Based Optimization: Accelerated, Distributed, Asynchronous and Stochastic 基于梯度的优化:加速、分布式、异步和随机
Michael I. Jordan
Many new theoretical challenges have arisen in the area of gradient-based optimization for large-scale statistical data analysis, driven by the needs of applications and the opportunities provided by new hardware and software platforms. I discuss several recent results in this area, including: (1) a new framework for understanding Nesterov acceleration, obtained by taking a continuous-time, Lagrangian/Hamiltonian perspective, (2) a general theory of asynchronous optimization in multi-processor systems, (3) a computationally-efficient approach to stochastic variance reduction, (4) a primal-dual methodology for gradient-based optimization that targets communication bottlenecks in distributed systems, and (5) a discussion of how to avoid saddle-points in nonconvex optimization.
在应用需求和新软硬件平台机遇的推动下,基于梯度的大规模统计数据分析优化领域出现了许多新的理论挑战。我将讨论这一领域的几个最新成果,包括:(1)从连续时间、拉格朗日/哈密顿角度获得了理解Nesterov加速的新框架;(2)多处理器系统中异步优化的一般理论;(3)随机方差减少的高效计算方法;(4)针对分布式系统中通信瓶颈的基于梯度的优化的原始对偶方法;(5)讨论如何避免非凸优化中的鞍点。
{"title":"On Gradient-Based Optimization: Accelerated, Distributed, Asynchronous and Stochastic","authors":"Michael I. Jordan","doi":"10.1145/3143314.3078506","DOIUrl":"https://doi.org/10.1145/3143314.3078506","url":null,"abstract":"Many new theoretical challenges have arisen in the area of gradient-based optimization for large-scale statistical data analysis, driven by the needs of applications and the opportunities provided by new hardware and software platforms. I discuss several recent results in this area, including: (1) a new framework for understanding Nesterov acceleration, obtained by taking a continuous-time, Lagrangian/Hamiltonian perspective, (2) a general theory of asynchronous optimization in multi-processor systems, (3) a computationally-efficient approach to stochastic variance reduction, (4) a primal-dual methodology for gradient-based optimization that targets communication bottlenecks in distributed systems, and (5) a discussion of how to avoid saddle-points in nonconvex optimization.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131641899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory 利用数据寿命提高基于闪存的存储类内存的寿命
Wonil Choi, M. Arjomand, Myoungsoo Jung, M. Kandemir
This paper proposes to exploit the capability of retention time relaxation in flash memories for improving the lifetime of an SLC-based SSD. The main idea is that as a majority of I/O data in a typical workload do not need a retention time larger than a few days, we can have multiple partial program states in a cell and use every two states to store one-bit data at each time. Thus, we can store multiple bits in a cell (one bit at each time) without erasing it after each write -- that would directly translates into lifetime enhancement. The proposed scheme is called Dense-SLC (D-SLC) flash design which improves SSD lifetime by 5.1X--8.6X.
本文提出利用快闪存储器中保留时间松弛的能力来提高基于slc的固态硬盘的寿命。其主要思想是,由于典型工作负载中的大多数I/O数据不需要超过几天的保留时间,因此我们可以在单元中拥有多个部分程序状态,并使用每两个状态每次存储一位数据。因此,我们可以在一个单元中存储多个比特(每次一个比特),而不必在每次写入后擦除它——这将直接转化为生命周期的增强。提出的方案称为高密度slc (D-SLC)闪存设计,可将SSD寿命提高5.1 -8.6倍。
{"title":"Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory","authors":"Wonil Choi, M. Arjomand, Myoungsoo Jung, M. Kandemir","doi":"10.1145/3078505.3078527","DOIUrl":"https://doi.org/10.1145/3078505.3078527","url":null,"abstract":"This paper proposes to exploit the capability of retention time relaxation in flash memories for improving the lifetime of an SLC-based SSD. The main idea is that as a majority of I/O data in a typical workload do not need a retention time larger than a few days, we can have multiple partial program states in a cell and use every two states to store one-bit data at each time. Thus, we can store multiple bits in a cell (one bit at each time) without erasing it after each write -- that would directly translates into lifetime enhancement. The proposed scheme is called Dense-SLC (D-SLC) flash design which improves SSD lifetime by 5.1X--8.6X.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115722003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Session 3: Assessing Vulnerability of Large Networks 会话详细信息:会话3:评估大型网络的漏洞
A. Wierman
{"title":"Session details: Session 3: Assessing Vulnerability of Large Networks","authors":"A. Wierman","doi":"10.1145/3248537","DOIUrl":"https://doi.org/10.1145/3248537","url":null,"abstract":"","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115414624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 8: Analyzing and Controlling Network Interaction 会话详细信息:会话8:分析和控制网络交互
Nicolas Gast
{"title":"Session details: Session 8: Analyzing and Controlling Network Interaction","authors":"Nicolas Gast","doi":"10.1145/3248545","DOIUrl":"https://doi.org/10.1145/3248545","url":null,"abstract":"","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131137271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge 象形文字:通过计算同步合并的局部充分图形处理
Xiaoen Ju, H. Jamjoom, K. Shin
Mainstream graph processing systems (such as Pregel [3] and PowerGraph [1]) follow the bulk synchronous parallel model. This design leads to the tight coupling of computation and communication, where no vertex can proceed to the next iteration of computation until all vertices have been processed in the current iteration and graph states have been synchronized across all hosts. This coupling of computation and communication incurs significant performance penalty. Fully decoupling computation from communication requires (i) restricted access to only local state during computation and (ii) independence of inter-host communication from computation. We call the combination of both conditions local sufficiency. Local sufficiency is not efficiently supported by state of the art. Synchronous systems, by design, do not support local sufficiency due to their intrinsic computation-communication coupling. Even systems that implement asynchronous execution only partially achieve local sufficiency. For example, PowerGraph's asynchronous mode satisfies local sufficiency by distributed scheduling.
主流的图形处理系统(如Pregel[3]和PowerGraph[1])采用的是批量同步并行模型。这种设计导致了计算和通信的紧密耦合,在当前迭代中处理所有顶点并且在所有主机上同步图形状态之前,任何顶点都不能进行下一次计算迭代。这种计算和通信的耦合导致了显著的性能损失。将计算与通信完全解耦需要(i)在计算期间限制仅访问本地状态和(ii)主机间通信与计算的独立性。我们把这两种条件的结合称为局部充分性。当地的自给自足并没有得到最先进技术的有效支持。根据设计,同步系统由于其固有的计算-通信耦合而不支持局部充分性。即使是实现异步执行的系统也只能部分地实现本地充分性。例如,PowerGraph的异步模式通过分布式调度来满足本地充分性。
{"title":"Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge","authors":"Xiaoen Ju, H. Jamjoom, K. Shin","doi":"10.1145/3078505.3078589","DOIUrl":"https://doi.org/10.1145/3078505.3078589","url":null,"abstract":"Mainstream graph processing systems (such as Pregel [3] and PowerGraph [1]) follow the bulk synchronous parallel model. This design leads to the tight coupling of computation and communication, where no vertex can proceed to the next iteration of computation until all vertices have been processed in the current iteration and graph states have been synchronized across all hosts. This coupling of computation and communication incurs significant performance penalty. Fully decoupling computation from communication requires (i) restricted access to only local state during computation and (ii) independence of inter-host communication from computation. We call the combination of both conditions local sufficiency. Local sufficiency is not efficiently supported by state of the art. Synchronous systems, by design, do not support local sufficiency due to their intrinsic computation-communication coupling. Even systems that implement asynchronous execution only partially achieve local sufficiency. For example, PowerGraph's asynchronous mode satisfies local sufficiency by distributed scheduling.","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124318450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hadoop on Named Data Networking: Experience and Results Hadoop在命名数据网络:经验和结果
Mathias Gibbens, C. Gniady, Lei Ye, Beichuan Zhang
In today's data centers, clusters of servers are arranged to perform various tasks in a massively distributed manner: handling web requests, processing scientific data, and running simulations of real-world problems. These clusters are very complex, and require a significant amount of planning and administration to ensure that they perform to their maximum potential. Planning and configuration can be a long and complicated process; once completed it is hard to completely re-architect an existing cluster. In addition to planning the physical hardware, the software must also be properly configured to run on a cluster. Information such as which server is in which rack and the total network bandwidth between rows of racks constrain the placement of jobs scheduled to run on a cluster. Some software may be able to use hints provided by a user about where to schedule jobs, while others may simply place them randomly and hope for the best. Every cluster has at least one bottleneck that constrains the overall performance to less than the optimal that may be achieved on paper. One common bottleneck is the speed of the network: communication between servers in a rack may be unable to saturate their network connections, but traffic flowing between racks or rows in a data center can easily overwhelm the interconnect switches. Various network topologies have been proposed to help mitigate this problem by providing multiple paths between points in the network, but they all suffer from the same fundamental problem: it is cost-prohibitive to build a network that can provide concurrent full network bandwidth between all servers. Researchers have been working on developing new network protocols that can make more efficient use of existing network hardware through a blurring of the line between network layer and applications. One of the most well-known examples of this is Named Data Networking (NDN), a data-centric network architecture that has been in development for several years. While NDN has received significant attention for wide-area Internet, a detailed understanding of NDN benefits and challenges in the data center environment has been lacking. The Named Data Networking architecture retrieves content by names rather than connecting to specific hosts. It provides benefits such as highly efficient and resilient content distribution, which fit well to data-intensive distributed computing. This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network. Through this first-of-its-kind implementation process, we demonstrate the feasibility of running an existing, large, and complex piece of distributed software commonly seen in data centers over NDN. We show advantages such as simplified network code and reduced network traffic, which are beneficial in a data center environment. There are also challenges faced by NDN that are being addressed by the community, which can be magnified under dat
在今天的数据中心中,服务器集群被安排以大规模分布式的方式执行各种任务:处理web请求、处理科学数据和运行现实世界问题的模拟。这些集群非常复杂,需要进行大量的规划和管理,以确保它们发挥最大的潜力。规划和配置可能是一个漫长而复杂的过程;一旦完成,就很难完全重新构建现有集群。除了规划物理硬件之外,还必须正确配置软件以在集群上运行。诸如哪个服务器位于哪个机架以及机架行之间的总网络带宽等信息约束了计划在集群上运行的作业的位置。一些软件可能能够使用用户提供的关于在哪里安排作业的提示,而其他软件可能只是随机地放置它们,并希望得到最好的结果。每个集群都至少有一个瓶颈,它将整体性能限制在低于理论上可能实现的最优性能。一个常见的瓶颈是网络速度:机架中的服务器之间的通信可能无法使其网络连接饱和,但数据中心中机架或行之间的流量很容易使互连交换机不堪重负。已经提出了各种网络拓扑,通过在网络中的点之间提供多条路径来帮助缓解这个问题,但是它们都面临着相同的基本问题:构建一个可以在所有服务器之间提供并发全网络带宽的网络的成本过高。研究人员一直致力于开发新的网络协议,通过模糊网络层和应用程序之间的界限,更有效地利用现有的网络硬件。这方面最著名的例子之一是命名数据网络(NDN),这是一种以数据为中心的网络架构,已经开发了好几年。虽然NDN在广域互联网中受到了极大的关注,但对NDN在数据中心环境中的优势和挑战却缺乏详细的了解。命名数据网络体系结构按名称检索内容,而不是连接到特定的主机。它提供了诸如高效和弹性的内容分发等优点,非常适合数据密集型分布式计算。本文介绍并讨论了我们修改Apache Hadoop(一个流行的MapReduce框架)以在NDN网络上运行的经验。通过这种首创的实现过程,我们演示了在NDN数据中心运行现有的、大型的、复杂的分布式软件的可行性。我们展示了简化网络代码和减少网络流量等优点,这些优点在数据中心环境中是有益的。NDN面临的挑战也正在被社区所解决,这些挑战在数据中心流量下可能会被放大。通过详细的评估,我们发现在使用默认复制设置写入数据时,Hadoop节点之间的总体数据传输减少了16%。初步结果还显示了分布式应用程序中重复读取的网络内缓存的前景。我们表明,虽然目前在NDN下整体性能较慢,但进一步改进NDN存在挑战和机遇。
{"title":"Hadoop on Named Data Networking: Experience and Results","authors":"Mathias Gibbens, C. Gniady, Lei Ye, Beichuan Zhang","doi":"10.1145/3078505.3078508","DOIUrl":"https://doi.org/10.1145/3078505.3078508","url":null,"abstract":"In today's data centers, clusters of servers are arranged to perform various tasks in a massively distributed manner: handling web requests, processing scientific data, and running simulations of real-world problems. These clusters are very complex, and require a significant amount of planning and administration to ensure that they perform to their maximum potential. Planning and configuration can be a long and complicated process; once completed it is hard to completely re-architect an existing cluster. In addition to planning the physical hardware, the software must also be properly configured to run on a cluster. Information such as which server is in which rack and the total network bandwidth between rows of racks constrain the placement of jobs scheduled to run on a cluster. Some software may be able to use hints provided by a user about where to schedule jobs, while others may simply place them randomly and hope for the best. Every cluster has at least one bottleneck that constrains the overall performance to less than the optimal that may be achieved on paper. One common bottleneck is the speed of the network: communication between servers in a rack may be unable to saturate their network connections, but traffic flowing between racks or rows in a data center can easily overwhelm the interconnect switches. Various network topologies have been proposed to help mitigate this problem by providing multiple paths between points in the network, but they all suffer from the same fundamental problem: it is cost-prohibitive to build a network that can provide concurrent full network bandwidth between all servers. Researchers have been working on developing new network protocols that can make more efficient use of existing network hardware through a blurring of the line between network layer and applications. One of the most well-known examples of this is Named Data Networking (NDN), a data-centric network architecture that has been in development for several years. While NDN has received significant attention for wide-area Internet, a detailed understanding of NDN benefits and challenges in the data center environment has been lacking. The Named Data Networking architecture retrieves content by names rather than connecting to specific hosts. It provides benefits such as highly efficient and resilient content distribution, which fit well to data-intensive distributed computing. This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network. Through this first-of-its-kind implementation process, we demonstrate the feasibility of running an existing, large, and complex piece of distributed software commonly seen in data centers over NDN. We show advantages such as simplified network code and reduced network traffic, which are beneficial in a data center environment. There are also challenges faced by NDN that are being addressed by the community, which can be magnified under dat","PeriodicalId":133673,"journal":{"name":"Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124061998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1