BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement

2019 35th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2019-05-01 DOI:10.1109/MSST.2019.00-17

Yang Yang, Q. Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang

{"title":"BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement","authors":"Yang Yang, Q. Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang","doi":"10.1109/MSST.2019.00-17","DOIUrl":null,"url":null,"abstract":"Existing local file systems, designed to support a typical single-file access pattern only, can lead to poor performance when accessing a batch of files, especially small files. This single-file pattern essentially serializes accesses to batched files one by one, resulting in a large number of non-sequential, random, and often dependent I/Os between file data and metadata at the storage ends. We first experimentally analyze the root cause of such inefficiency in batch-file accesses. Then, we propose a novel batch-file access approach, referred to as BFO for its set of optimized Batch-File Operations, by developing novel BFOr and BFOw operations for fundamental read and write processes respectively, using a two-phase access for metadata and data jointly. The BFO offers dedicated interfaces for batch-file accesses and additional processes integrated into existing file systems without modifying their structures and procedures. We implement a BFO prototype on ext4, one of the most popular file systems. Our evaluation results show that the batch-file read and write performances of BFO are consistently higher than those of the traditional approaches regardless of access patterns, data layouts, and storage media, with synthetic and real-world file sets. BFO improves the read performance by up to 22.4× and 1.8× with HDD and SSD respectively; and boosts the write performance by up to 111.4× and 2.9× with HDD and SSD respectively. BFO also demonstrates consistent performance advantages when applied to four representative applications, Linux cp, Tar, GridFTP, and Hadoop.","PeriodicalId":391517,"journal":{"name":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2019.00-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Existing local file systems, designed to support a typical single-file access pattern only, can lead to poor performance when accessing a batch of files, especially small files. This single-file pattern essentially serializes accesses to batched files one by one, resulting in a large number of non-sequential, random, and often dependent I/Os between file data and metadata at the storage ends. We first experimentally analyze the root cause of such inefficiency in batch-file accesses. Then, we propose a novel batch-file access approach, referred to as BFO for its set of optimized Batch-File Operations, by developing novel BFOr and BFOw operations for fundamental read and write processes respectively, using a two-phase access for metadata and data jointly. The BFO offers dedicated interfaces for batch-file accesses and additional processes integrated into existing file systems without modifying their structures and procedures. We implement a BFO prototype on ext4, one of the most popular file systems. Our evaluation results show that the batch-file read and write performances of BFO are consistently higher than those of the traditional approaches regardless of access patterns, data layouts, and storage media, with synthetic and real-world file sets. BFO improves the read performance by up to 22.4× and 1.8× with HDD and SSD respectively; and boosts the write performance by up to 111.4× and 2.9× with HDD and SSD respectively. BFO also demonstrates consistent performance advantages when applied to four representative applications, Linux cp, Tar, GridFTP, and Hadoop.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BFO:对海量文件进行批处理文件操作，以实现一致的性能改进

现有的本地文件系统被设计为仅支持典型的单文件访问模式，在访问一批文件(尤其是小文件)时可能导致性能不佳。这种单文件模式本质上是将对批处理文件的访问一个接一个地序列化，从而导致存储端文件数据和元数据之间的大量非顺序、随机且通常依赖的I/ o。我们首先通过实验分析了批处理文件访问效率低下的根本原因。然后，我们提出了一种新的批处理文件访问方法，通过对基本读和写过程分别开发新的BFOr和bflow操作，使用元数据和数据的两阶段访问，将其称为优化的批处理文件操作集。BFO为批量文件访问和集成到现有文件系统的附加进程提供专用接口，而无需修改其结构和过程。我们在ext4(最流行的文件系统之一)上实现了一个BFO原型。我们的评估结果表明，无论访问模式、数据布局和存储介质如何，BFO的批处理文件读写性能始终高于使用合成文件集和真实文件集的传统方法。BFO将HDD和SSD的读取性能分别提高了22.4倍和1.8倍;并将写入性能分别提高111.4倍和2.9倍，分别与HDD和SSD。在将BFO应用于四个代表性应用程序(Linux cp、Tar、GridFTP和Hadoop)时，还展示了一致的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 35th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量