基于spark的空间分析系统中距离连接查询的高效分布式算法

IF 2.4 4区 计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS International Journal of General Systems Pub Date : 2023-02-19 DOI:10.1080/03081079.2023.2173750
Francisco García-García, A. Corral, L. Iribarne, M. Vassilakopoulos
{"title":"基于spark的空间分析系统中距离连接查询的高效分布式算法","authors":"Francisco García-García, A. Corral, L. Iribarne, M. Vassilakopoulos","doi":"10.1080/03081079.2023.2173750","DOIUrl":null,"url":null,"abstract":"ABSTRACT Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ.","PeriodicalId":50322,"journal":{"name":"International Journal of General Systems","volume":"52 1","pages":"206 - 250"},"PeriodicalIF":2.4000,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems\",\"authors\":\"Francisco García-García, A. Corral, L. Iribarne, M. Vassilakopoulos\",\"doi\":\"10.1080/03081079.2023.2173750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ.\",\"PeriodicalId\":50322,\"journal\":{\"name\":\"International Journal of General Systems\",\"volume\":\"52 1\",\"pages\":\"206 - 250\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of General Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1080/03081079.2023.2173750\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of General Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/03081079.2023.2173750","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 2

摘要

Apache Sedona(以前的GeoSpark)是一个新的内存集群计算系统,用于处理大规模空间数据,它扩展了Apache Spark的核心,以支持空间数据类型,分区技术,空间索引和空间操作(例如空间范围,最近邻和空间连接查询)。它不支持基于距离的连接查询(djq),如最近邻连接(kNNJQ)或最近邻对查询(kCPQ)。因此,在本文中,我们研究如何在Apache Sedona中设计和实现高效的DJQ分布式算法,使用最合适的空间划分和其他优化技术。在实际数据集上进行的大量实验结果表明,所提出的kNNJQ和kCPQ分布式算法在Apache Sedona中是高效的、可扩展的和鲁棒的。最后,还将Sedona与其他类似的集群计算系统进行了比较,显示了kCPQ的最佳性能和kNNJQ的竞争结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems
ABSTRACT Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of General Systems
International Journal of General Systems 工程技术-计算机:理论方法
CiteScore
4.10
自引率
20.00%
发文量
38
审稿时长
6 months
期刊介绍: International Journal of General Systems is a periodical devoted primarily to the publication of original research contributions to system science, basic as well as applied. However, relevant survey articles, invited book reviews, bibliographies, and letters to the editor are also published. The principal aim of the journal is to promote original systems ideas (concepts, principles, methods, theoretical or experimental results, etc.) that are broadly applicable to various kinds of systems. The term “general system” in the name of the journal is intended to indicate this aim–the orientation to systems ideas that have a general applicability. Typical subject areas covered by the journal include: uncertainty and randomness; fuzziness and imprecision; information; complexity; inductive and deductive reasoning about systems; learning; systems analysis and design; and theoretical as well as experimental knowledge regarding various categories of systems. Submitted research must be well presented and must clearly state the contribution and novelty. Manuscripts dealing with particular kinds of systems which lack general applicability across a broad range of systems should be sent to journals specializing in the respective topics.
期刊最新文献
Stress–strength reliability estimation of s-out-of-k multicomponent systems based on copula function for dependent strength elements under progressively censored sample Reliability of a consecutive k-out-of-n: G system with protection blocks Two-way concept-cognitive learning method: a perspective from progressive learning of fuzzy skills Disturbance-observer-based adaptive neural event-triggered fault-tolerant control for uncertain nonlinear systems against sensor faults Idempotent uninorms on bounded lattices with at most a single point incomparable with the neutral element: Part II
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1