{"title":"在apache Hadoop NAS存储上使用ISCSI实现ETL的性能分析","authors":"Adnan, A. A. Ilham, S. Usman","doi":"10.1109/CAIPT.2017.8320716","DOIUrl":null,"url":null,"abstract":"Data analytics has become a key element of the business decision process over the last decade. ETL is Process to migrate the data from the source to the required database, Store and process the huge amount of structured and unstructured data for complex analysis business. Standard ETL tools don't efficiently handle it. Improving it can provide a better return on company's investment. Become interesting to find an opportunity to construct computing-storage devices low-cost, low-power components to perform ETL Process. In this paper, we proposed Hadoop on iSCSI over Ethernet adapted Network Attached Storage (NAS) to process ETL, investigate the benefits of running Hadoop over NAS storage as compared with normal HDFS using a benchmark about extract performance, transform performance and load performance. This research used 1 NameNode, 4 DataNodes, NAS Storage, and dataset to examine the proposed architecture. The result showed that the proposed architecture is ability to use low-cost components to deliver scalable performance and could become storage solution in the Big Data space.","PeriodicalId":351075,"journal":{"name":"2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI\",\"authors\":\"Adnan, A. A. Ilham, S. Usman\",\"doi\":\"10.1109/CAIPT.2017.8320716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data analytics has become a key element of the business decision process over the last decade. ETL is Process to migrate the data from the source to the required database, Store and process the huge amount of structured and unstructured data for complex analysis business. Standard ETL tools don't efficiently handle it. Improving it can provide a better return on company's investment. Become interesting to find an opportunity to construct computing-storage devices low-cost, low-power components to perform ETL Process. In this paper, we proposed Hadoop on iSCSI over Ethernet adapted Network Attached Storage (NAS) to process ETL, investigate the benefits of running Hadoop over NAS storage as compared with normal HDFS using a benchmark about extract performance, transform performance and load performance. This research used 1 NameNode, 4 DataNodes, NAS Storage, and dataset to examine the proposed architecture. The result showed that the proposed architecture is ability to use low-cost components to deliver scalable performance and could become storage solution in the Big Data space.\",\"PeriodicalId\":351075,\"journal\":{\"name\":\"2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAIPT.2017.8320716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIPT.2017.8320716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
在过去十年中,数据分析已成为业务决策过程的关键要素。ETL是将数据从数据源迁移到所需数据库的过程,存储和处理大量的结构化和非结构化数据,用于复杂的分析业务。标准的ETL工具不能有效地处理它。改进它可以为公司的投资提供更好的回报。寻找一个构建低成本、低功耗组件来执行ETL Process的计算存储设备的机会变得很有趣。在本文中,我们提出了基于iSCSI over Ethernet的Hadoop,采用网络附加存储(NAS)来处理ETL,并使用关于提取性能、转换性能和加载性能的基准测试来研究在NAS存储上运行Hadoop与普通HDFS相比的好处。本研究使用了1个NameNode、4个datanode、NAS Storage和数据集来检验提议的架构。结果表明,所提出的架构能够使用低成本的组件来提供可扩展的性能,并且可以成为大数据领域的存储解决方案。
Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI
Data analytics has become a key element of the business decision process over the last decade. ETL is Process to migrate the data from the source to the required database, Store and process the huge amount of structured and unstructured data for complex analysis business. Standard ETL tools don't efficiently handle it. Improving it can provide a better return on company's investment. Become interesting to find an opportunity to construct computing-storage devices low-cost, low-power components to perform ETL Process. In this paper, we proposed Hadoop on iSCSI over Ethernet adapted Network Attached Storage (NAS) to process ETL, investigate the benefits of running Hadoop over NAS storage as compared with normal HDFS using a benchmark about extract performance, transform performance and load performance. This research used 1 NameNode, 4 DataNodes, NAS Storage, and dataset to examine the proposed architecture. The result showed that the proposed architecture is ability to use low-cost components to deliver scalable performance and could become storage solution in the Big Data space.