Hadoop上的索引优化

2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT) Pub Date : 2017-02-01 DOI:10.1109/CIACT.2017.7977360

Neha Bagwari, O. Kumar

{"title":"Hadoop上的索引优化","authors":"Neha Bagwari, O. Kumar","doi":"10.1109/CIACT.2017.7977360","DOIUrl":null,"url":null,"abstract":"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Indexing optimizations on Hadoop\",\"authors\":\"Neha Bagwari, O. Kumar\",\"doi\":\"10.1109/CIACT.2017.7977360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.\",\"PeriodicalId\":218079,\"journal\":{\"name\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIACT.2017.7977360\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

Hadoop是一个高效的开源框架，用于存储和处理大数据。它的组件HDFS以分布式方式存储数据，保持数据的一致性和可用性，而MapReduce负责并行处理。Hadoop最适合容错存储和批处理，但搜索在Hadoop中没有优化，因为它以块的形式存储数据。它缺乏优化的索引设计，导致搜索机制成本高。为了解决这个问题，人们提出了各种索引方法作为Hadoop架构的改进。在大多数方法中，MapReduce通常在运行时生成索引来处理分布在集群中的数据。本文比较了现有的索引方法，提出了一种新的Hadoop生态系统索引创建和存储技术，从而在Hadoop环境下获得更好的搜索结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Indexing optimizations on Hadoop

Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)

自引率

0.00%

发文量

期刊最新文献

Smart solar tracking system for optimal power generation SVM with Gaussian kernel-based image spam detection on textual features Comparison between LDA & NMF for event-detection from large text stream data Research on the wisdom education platform of cloud computing architecture Robust TS fuzzy controller for helicopter via parallel distributed compensation