{"title":"Ceph文件系统中一种改进的小文件存储策略","authors":"Ya Fan, Yong Wang, Miao Ye","doi":"10.1109/CIS2018.2018.00116","DOIUrl":null,"url":null,"abstract":"As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.","PeriodicalId":185099,"journal":{"name":"2018 14th International Conference on Computational Intelligence and Security (CIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An Improved Small File Storage Strategy in Ceph File System\",\"authors\":\"Ya Fan, Yong Wang, Miao Ye\",\"doi\":\"10.1109/CIS2018.2018.00116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.\",\"PeriodicalId\":185099,\"journal\":{\"name\":\"2018 14th International Conference on Computational Intelligence and Security (CIS)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th International Conference on Computational Intelligence and Security (CIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIS2018.2018.00116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Computational Intelligence and Security (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS2018.2018.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Improved Small File Storage Strategy in Ceph File System
As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.