Ceph文件系统中一种改进的小文件存储策略

Ya Fan, Yong Wang, Miao Ye
{"title":"Ceph文件系统中一种改进的小文件存储策略","authors":"Ya Fan, Yong Wang, Miao Ye","doi":"10.1109/CIS2018.2018.00116","DOIUrl":null,"url":null,"abstract":"As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.","PeriodicalId":185099,"journal":{"name":"2018 14th International Conference on Computational Intelligence and Security (CIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An Improved Small File Storage Strategy in Ceph File System\",\"authors\":\"Ya Fan, Yong Wang, Miao Ye\",\"doi\":\"10.1109/CIS2018.2018.00116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.\",\"PeriodicalId\":185099,\"journal\":{\"name\":\"2018 14th International Conference on Computational Intelligence and Security (CIS)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th International Conference on Computational Intelligence and Security (CIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIS2018.2018.00116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Computational Intelligence and Security (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS2018.2018.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

针对双重写入和备份策略导致Ceph FS(文件系统)中存储大量小文件效率较低的问题,我们设计了一个框架SFPS(小文件处理系统),该框架采用了k-means、Simhash和AHP三种技术。该方法采用自适应跳块技术对重复数据删除后的相似文件进行合并,减少大量小文件的数量和冗余数据。由于关联文件下次被读取的概率较大,我们还设计了基于高效数据库Redis的预取机制和元数据管理模块,保证了高效的读速率。该方案旨在通过消除重复数据的重复副本、合并相似的小文件和引入缓存模块,在Ceph FS中实现硬盘空间和带宽资源利用率、文件访问时间、硬盘I/O以及集群性能之间更好的权衡。实验结果表明,本文提出的方法不仅可以有效提高设备存储的带宽资源和空间利用率,提高小文件的读取速率,而且可以显著减少读写文件所产生的磁盘I/O量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Improved Small File Storage Strategy in Ceph File System
As the double-write and backup strategies cause low efficiency of storing massive small files in Ceph FS (file system), we design a framework SFPS (Small File Process System) which adopted three technologies, including: k-means, Simhash and Analytic Hierarchy Process (AHP). The designed method can reduce the quantity and redundant data of massive small files by merging similar files after the deduplication with adaptive block skipping. Due to the associated files have high probability to be read in the next time, we also design a pre-fetching mechanism and metadata management module based on the high-efficiency database Redis to guarantee a high efficiency of read rate. The proposed scheme aims to achieve a better trade-off among the utilization of space of hard-disk and bandwidth resources, file access time, hard-disk I/O as well as the cluster performance in Ceph FS by eliminating duplicate copies of repeating data, merging similar small files, and introducing the cache module. Experimental results show that the method presented in this paper can not only effectively improve the utilization of bandwidth resources and space of device storage as well as the small file read rate, but also significantly reduce the amount of disk I/O generated by reading and writing files.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Real-Time Location Privacy Protection Method Based on Space Transformation Cryptanalysis of Kumar's Remote User Authentication Scheme with Smart Card Off-Topic Text Detection Based on Neural Networks Combined with Text Features Research of X Ray Image Recognition Based on Neural Network CFO Algorithm Using Niche and Opposition-Based Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1