Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang
{"title":"异构分布式存储系统数据再生的最优节点选择","authors":"Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang","doi":"10.1109/ICPP.2015.48","DOIUrl":null,"url":null,"abstract":"Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems\",\"authors\":\"Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang\",\"doi\":\"10.1109/ICPP.2015.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.\",\"PeriodicalId\":423007,\"journal\":{\"name\":\"2015 44th International Conference on Parallel Processing\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 44th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2015.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems
Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.