{"title":"用于小文件的改进HDFS","authors":"Liu Changtong","doi":"10.1109/ICACT.2016.7423437","DOIUrl":null,"url":null,"abstract":"Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.","PeriodicalId":125854,"journal":{"name":"2016 18th International Conference on Advanced Communication Technology (ICACT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"An improved HDFS for small file\",\"authors\":\"Liu Changtong\",\"doi\":\"10.1109/ICACT.2016.7423437\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.\",\"PeriodicalId\":125854,\"journal\":{\"name\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2016.7423437\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2016.7423437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

Hadoop是一个开源的分布式计算平台,HDFS是Hadoop分布式文件系统。HDFS具有强大的数据存储能力。因此,适用于云存储系统。但是HDFS最初是为大型软件的流访问而开发的,对于海量的小文件,它的存储效率很低。为了解决这个问题,改进了HDFS的文件存储流程。在上传文件到HDFS集群之前进行判断。如果文件是小文件,则合并该文件,并以键值对的形式将小文件的索引信息存储在索引文件中。仿真结果表明,改进后的HDFS的NameNode内存消耗低于原始HDFS和Hadoop Archives (HAR文件)。因此,可以提高访问效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An improved HDFS for small file
Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DNSNA: DNS name autoconfiguration for Internet of Things devices A novel multi-carrier waveform with high spectral efficiency: Semi-orthogonal frequency division multiplexing Adaptive spectral co-clustering for multiview data Efficient Doppler mitigation for high-speed rail communications Supply and demand management system based on consumption pattern analysis and tariff for cost minimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1