Hindol Bhattacharya, Samiran Chattopadhyay, M. Chattopadhyay
{"title":"基于NS3的HDFS数据放置算法评估框架","authors":"Hindol Bhattacharya, Samiran Chattopadhyay, M. Chattopadhyay","doi":"10.1109/ICCECE.2017.8526204","DOIUrl":null,"url":null,"abstract":"Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.","PeriodicalId":325599,"journal":{"name":"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"NS3 Based HDFS Data Placement Algorithm Evaluation Framework\",\"authors\":\"Hindol Bhattacharya, Samiran Chattopadhyay, M. Chattopadhyay\",\"doi\":\"10.1109/ICCECE.2017.8526204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.\",\"PeriodicalId\":325599,\"journal\":{\"name\":\"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE.2017.8526204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE.2017.8526204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NS3 Based HDFS Data Placement Algorithm Evaluation Framework
Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.