{"title":"A Distributed File System with Storage-Media Awareness","authors":"H. Herodotou","doi":"10.1109/UCC.2015.67","DOIUrl":null,"url":null,"abstract":"Improvements in memory, storage devices, and network technologies are constantly exploited by distributed systems in order to meet the increasing data storage and I/O demands of modern large-scale data analytics. Some systems use memory and SSDs as a cache for local storage while others combine local with network-attached storage to increase performance. However, no work has ever looked at all layers together in a distributed setting. We present a novel design for a distributed file system that is aware of storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The storage media are explicitly exposed to users, allowing them to choose the distribution and placement of replicas in the cluster based on their own performance and fault tolerance requirements. Meanwhile, the system offers a variety of pluggable policies for automating data management with the dual goal of increased performance and better cluster utilization. These two features combined inspire new research opportunities for data-intensive processing systems.","PeriodicalId":381279,"journal":{"name":"2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2015.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Improvements in memory, storage devices, and network technologies are constantly exploited by distributed systems in order to meet the increasing data storage and I/O demands of modern large-scale data analytics. Some systems use memory and SSDs as a cache for local storage while others combine local with network-attached storage to increase performance. However, no work has ever looked at all layers together in a distributed setting. We present a novel design for a distributed file system that is aware of storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The storage media are explicitly exposed to users, allowing them to choose the distribution and placement of replicas in the cluster based on their own performance and fault tolerance requirements. Meanwhile, the system offers a variety of pluggable policies for automating data management with the dual goal of increased performance and better cluster utilization. These two features combined inspire new research opportunities for data-intensive processing systems.