{"title":"An improved HDFS for small file","authors":"Liu Changtong","doi":"10.1109/ICACT.2016.7423437","DOIUrl":null,"url":null,"abstract":"Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.","PeriodicalId":125854,"journal":{"name":"2016 18th International Conference on Advanced Communication Technology (ICACT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2016.7423437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.