HopsFS的可伸缩块报告

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-01 DOI:10.1109/BigDataCongress.2019.00035

Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, J. Dowling

{"title":"HopsFS的可伸缩块报告","authors":"Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, J. Dowling","doi":"10.1109/BigDataCongress.2019.00035","DOIUrl":null,"url":null,"abstract":"Distributed hierarchical file systems typically decouple the storage of the file system's metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system's metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS' existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable Block Reporting for HopsFS\",\"authors\":\"Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, J. Dowling\",\"doi\":\"10.1109/BigDataCongress.2019.00035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed hierarchical file systems typically decouple the storage of the file system's metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system's metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS' existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.\",\"PeriodicalId\":335850,\"journal\":{\"name\":\"2019 IEEE International Congress on Big Data (BigDataCongress)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Congress on Big Data (BigDataCongress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2019.00035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Congress on Big Data (BigDataCongress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2019.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

分布式分层文件系统通常将文件系统元数据的存储与数据(文件系统块)解耦，以支持文件系统的可伸缩性。但是，这种解耦需要引入定期同步协议，以确保文件系统的元数据及其块的一致性。Apache HDFS和HopsFS实现了一个称为块报告的协议，其中每个数据服务器定期向元数据服务器发送有关其所有文件系统块的基本真实信息，从而允许元数据与文件系统中数据块的实际状态同步。然而，现有块报告协议的网络和处理开销随着集群规模的增加而增加，最终限制了集群的可伸缩性。在本文中，我们为HopsFS引入了一个新的块报告协议，与HDFS/HopsFS现有协议相比，该协议将协议带宽和处理开销减少了多达三个数量级。我们的新协议消除了阻碍HopsFS集群扩展到数万台服务器的主要瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Scalable Block Reporting for HopsFS

Distributed hierarchical file systems typically decouple the storage of the file system's metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system's metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS' existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Congress on Big Data (BigDataCongress)

自引率

0.00%

发文量