{"title":"Hadoop上的索引优化","authors":"Neha Bagwari, O. Kumar","doi":"10.1109/CIACT.2017.7977360","DOIUrl":null,"url":null,"abstract":"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Indexing optimizations on Hadoop\",\"authors\":\"Neha Bagwari, O. Kumar\",\"doi\":\"10.1109/CIACT.2017.7977360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.\",\"PeriodicalId\":218079,\"journal\":{\"name\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIACT.2017.7977360\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Hadoop是一个高效的开源框架,用于存储和处理大数据。它的组件HDFS以分布式方式存储数据,保持数据的一致性和可用性,而MapReduce负责并行处理。Hadoop最适合容错存储和批处理,但搜索在Hadoop中没有优化,因为它以块的形式存储数据。它缺乏优化的索引设计,导致搜索机制成本高。为了解决这个问题,人们提出了各种索引方法作为Hadoop架构的改进。在大多数方法中,MapReduce通常在运行时生成索引来处理分布在集群中的数据。本文比较了现有的索引方法,提出了一种新的Hadoop生态系统索引创建和存储技术,从而在Hadoop环境下获得更好的搜索结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Indexing optimizations on Hadoop
Hadoop is an efficient open source framework to store and process the big data. Its component HDFS stores data in distributed manner preserving its consistency and availability while MapReduce is responsible for parallel processing. Hadoop fits best for fault tolerant storage and batch processing but searching is not optimized in Hadoop as it stores data in the form of blocks. It lacks in optimized index design leading to costly searching mechanism. To deal with this various indexing approaches have been proposed as an improvement in Hadoop architecture. In most of the approaches, MapReduce typically generates index at run time to process the data distributed across the cluster. This paper compares the existing indexing approaches and proposes a new index creation and storage technique for Hadoop eco system which will lead to better search results in Hadoop environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart solar tracking system for optimal power generation SVM with Gaussian kernel-based image spam detection on textual features Comparison between LDA & NMF for event-detection from large text stream data Research on the wisdom education platform of cloud computing architecture Robust TS fuzzy controller for helicopter via parallel distributed compensation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1