{"title":"Hadoop的带宽感知数据放置方案","authors":"T. P. Shabeera, S. D. Madhu Kumar","doi":"10.1109/RAICS.2013.6745448","DOIUrl":null,"url":null,"abstract":"We are living in a data rich era. The size of the data is increasing exponentially. Social networking applications, Scientific experiments, etc. are the major contributors of Big Data. The data can be structured, semi-structured or unstructured. Big Data management solutions can be implemented in-house in the organization or it can be stored in cloud. Whether it is stored in-house or in cloud, the placement of data is very important. In general, users demand the availability of data whenever they request for it. There are many parameters that effect the data retrieval time in Hadoop. Among them, this paper pays attention to the available bandwidth. To minimize the data retrieval time, the data must be placed in a DataNode which has the maximum bandwidth. We have proposed a solution for bandwidth-aware data placement in Hadoop by periodically measuring the bandwidth between clients and DataNodes and placing the data blocks in DataNodes that have maximum end-to-end bandwidth.","PeriodicalId":184155,"journal":{"name":"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)","volume":"63 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Bandwidth-aware data placement scheme for Hadoop\",\"authors\":\"T. P. Shabeera, S. D. Madhu Kumar\",\"doi\":\"10.1109/RAICS.2013.6745448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are living in a data rich era. The size of the data is increasing exponentially. Social networking applications, Scientific experiments, etc. are the major contributors of Big Data. The data can be structured, semi-structured or unstructured. Big Data management solutions can be implemented in-house in the organization or it can be stored in cloud. Whether it is stored in-house or in cloud, the placement of data is very important. In general, users demand the availability of data whenever they request for it. There are many parameters that effect the data retrieval time in Hadoop. Among them, this paper pays attention to the available bandwidth. To minimize the data retrieval time, the data must be placed in a DataNode which has the maximum bandwidth. We have proposed a solution for bandwidth-aware data placement in Hadoop by periodically measuring the bandwidth between clients and DataNodes and placing the data blocks in DataNodes that have maximum end-to-end bandwidth.\",\"PeriodicalId\":184155,\"journal\":{\"name\":\"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)\",\"volume\":\"63 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAICS.2013.6745448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAICS.2013.6745448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We are living in a data rich era. The size of the data is increasing exponentially. Social networking applications, Scientific experiments, etc. are the major contributors of Big Data. The data can be structured, semi-structured or unstructured. Big Data management solutions can be implemented in-house in the organization or it can be stored in cloud. Whether it is stored in-house or in cloud, the placement of data is very important. In general, users demand the availability of data whenever they request for it. There are many parameters that effect the data retrieval time in Hadoop. Among them, this paper pays attention to the available bandwidth. To minimize the data retrieval time, the data must be placed in a DataNode which has the maximum bandwidth. We have proposed a solution for bandwidth-aware data placement in Hadoop by periodically measuring the bandwidth between clients and DataNodes and placing the data blocks in DataNodes that have maximum end-to-end bandwidth.