{"title":"面向云和大数据的高性能容错分布式存储系统对缓存的支持","authors":"L. Lundberg, Håkan Grahn, D. Ilie, C. Melander","doi":"10.1109/IPDPSW.2015.65","DOIUrl":null,"url":null,"abstract":"Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"266 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data\",\"authors\":\"L. Lundberg, Håkan Grahn, D. Ilie, C. Melander\",\"doi\":\"10.1109/IPDPSW.2015.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"266 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data
Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.