基于Map Reduce的混合连接算法

2011 Seventh International Conference on Semantics, Knowledge and Grids Pub Date : 2011-10-24 DOI:10.1109/SKG.2011.13

Weisong Hu, Lili Ma, Xiaowei Liu, Hongwei Qi, L. Zha, Huaming Liao, Yuezhuo Zhang

{"title":"基于Map Reduce的混合连接算法","authors":"Weisong Hu, Lili Ma, Xiaowei Liu, Hongwei Qi, L. Zha, Huaming Liao, Yuezhuo Zhang","doi":"10.1109/SKG.2011.13","DOIUrl":null,"url":null,"abstract":"Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and naïve Hive join. Experiments show that HJA can get best performance in most situations.","PeriodicalId":184788,"journal":{"name":"2011 Seventh International Conference on Semantics, Knowledge and Grids","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Hybrid Join Algorithm on Top of Map Reduce\",\"authors\":\"Weisong Hu, Lili Ma, Xiaowei Liu, Hongwei Qi, L. Zha, Huaming Liao, Yuezhuo Zhang\",\"doi\":\"10.1109/SKG.2011.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and naïve Hive join. Experiments show that HJA can get best performance in most situations.\",\"PeriodicalId\":184788,\"journal\":{\"name\":\"2011 Seventh International Conference on Semantics, Knowledge and Grids\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Seventh International Conference on Semantics, Knowledge and Grids\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SKG.2011.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Seventh International Conference on Semantics, Knowledge and Grids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Hadoop在并行处理大量数据方面显示出了强大的能力。Hadoop上的数据库Hive通过提供类似sql的接口，使更多的专家能够处理关系数据。然而，Hive并没有提供一种高效的join方法，join是关系数据库中常见但代价昂贵的操作。鉴于连接的重要性，本文提出了一种新的混合算法HJA，该算法可以自动从几种方法中选择相对较好的方法，包括分割和内存复制合并、分区连接(PJ)和naïve Hive连接。实验表明，HJA在大多数情况下都能获得最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Hybrid Join Algorithm on Top of Map Reduce

Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and naïve Hive join. Experiments show that HJA can get best performance in most situations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Seventh International Conference on Semantics, Knowledge and Grids

自引率

0.00%

发文量

期刊最新文献

The Textual Semantic Lens Domain Ontology Usage Analysis Framework Cyclic Workflow Execution Mechanism on Top of MapReduce Framework Towards an IDM Approach of Transforming Web Services into ACME Providing Quality of Service ATL Transformation for the Generation of SCA Model