{"title":"通过k近邻连接映射约简","authors":"Srikanth Bethu, B. Babu, S. G. Rao, R. Florence","doi":"10.1109/CYBERC.2018.00050","DOIUrl":null,"url":null,"abstract":"Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.","PeriodicalId":282903,"journal":{"name":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Map Reduce by K-Nearest Neighbor Joins\",\"authors\":\"Srikanth Bethu, B. Babu, S. G. Rao, R. Florence\",\"doi\":\"10.1109/CYBERC.2018.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.\",\"PeriodicalId\":282903,\"journal\":{\"name\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBERC.2018.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2018.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.