{"title":"An Efficient Approach to Enhance the Scalability of the HDFS: Extended Hadoop Archive (EHAR)","authors":"Vijay Sharma, N. Barwar","doi":"10.1109/ETI4.051663.2021.9619367","DOIUrl":null,"url":null,"abstract":"The Hadoop framework is most popular among data analytics applications. The file system of the Hadoop (HDFS) provides the layered storage facility for the frequent and infrequent data. In HDFS data can be archived using the HAR (Hadoop Archive) technique, but HAR archive are immutable means once the archive created it cannot be modified. One has to rewrite the whole archive if want to append the some new file to the existing archive. This paper introduces extended Hadoop archive (EHAR) that will resolve the scalability issue of the HDFS and also provide the mechanism to append the new files to the existing Hadoop archive. The experimental result shows that the execution time of the proposed approach is 53% to 39% lesser than the native HAR for the different fixed size files and 52% to 38% lesser than the native HAR for the different variable size files.","PeriodicalId":129682,"journal":{"name":"2021 Emerging Trends in Industry 4.0 (ETI 4.0)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Emerging Trends in Industry 4.0 (ETI 4.0)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETI4.051663.2021.9619367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Hadoop framework is most popular among data analytics applications. The file system of the Hadoop (HDFS) provides the layered storage facility for the frequent and infrequent data. In HDFS data can be archived using the HAR (Hadoop Archive) technique, but HAR archive are immutable means once the archive created it cannot be modified. One has to rewrite the whole archive if want to append the some new file to the existing archive. This paper introduces extended Hadoop archive (EHAR) that will resolve the scalability issue of the HDFS and also provide the mechanism to append the new files to the existing Hadoop archive. The experimental result shows that the execution time of the proposed approach is 53% to 39% lesser than the native HAR for the different fixed size files and 52% to 38% lesser than the native HAR for the different variable size files.