{"title":"Hadoop上的索引优化","authors":"Neha Bagwari, O. Kumar","doi":"10.1109/CIACT.2017.7977360","DOIUrl":null,"url":null,"abstract":"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Indexing optimizations on Hadoop\",\"authors\":\"Neha Bagwari, O. Kumar\",\"doi\":\"10.1109/CIACT.2017.7977360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.\",\"PeriodicalId\":218079,\"journal\":{\"name\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIACT.2017.7977360\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.