Parallel continuous skyline query over high-dimensional data stream windows

IF 0.9 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Distributed and Parallel Databases Pub Date : 2024-07-06 DOI:10.1007/s10619-024-07443-7

Walid Khames, Allel Hadjali, Mohand Lagha

{"title":"Parallel continuous skyline query over high-dimensional data stream windows","authors":"Walid Khames, Allel Hadjali, Mohand Lagha","doi":"10.1007/s10619-024-07443-7","DOIUrl":null,"url":null,"abstract":"<p>Real-time multi-criteria decision-making applications in fields like high-speed algorithmic trading, emergency response, and disaster management have driven the development of new types of preference queries. This is an example of a skyline search. Multi-criteria decision-making utilizes the skyline operator to extract highly significant tuples or useful data points from extensive sets of multi-dimensional databases. The user’s settings determine the results, which include all tuples whose attribute vector remains undefeated by another tuple. The extracted tuples are commonly known as the skyline set. Lately, there has been a growing trend in research studies to perform skyline queries on data stream applications. These queries consist of extracting desired records from sliding windows and removing outdated records from incoming data sets that do not meet user requirements. The datasets in these applications are extremely large and exhibit a wide range of dimensions that vary over time. Consequently, the skyline query is considered a computationally demanding task, with the challenge of achieving a real-time response within an acceptable duration. We must transport and process enormous quantities of data. Traditional skyline algorithms have faced new challenges due to limitations in data transmission bandwidth and latency. The transfer of vast quantities of data would affect performance, power efficiency, and reliability. Consequently, it is imperative to make alterations to the computer paradigm. Parallel skyline queries have attracted the attention of both scholars and the business sector. The study of skyline queries has focused on sequential algorithms and parallel implementations for multicore processors, primarily due to their widespread use. While previous research has focused on sequential algorithms, there is a limitation to comprehensive studies that specifically address modern parallel processors. While numerous articles have been published regarding the parallelization of regular skyline queries, there is a limited amount of research dedicated specifically to the parallel processing of continuous skyline queries. This study introduces PRSS, a continuous skyline technique for multicore processors specifically designed for sliding window-based data streams. The efficacy of the proposed parallel implementation is demonstrated through tests conducted on both real-world and synthetic datasets, encompassing various point distributions, arrival rates, and window widths. The experimental results for a dataset characterized by a large number of dimensions and cardinality demonstrate significant acceleration.</p>","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"78 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Distributed and Parallel Databases","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10619-024-07443-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Real-time multi-criteria decision-making applications in fields like high-speed algorithmic trading, emergency response, and disaster management have driven the development of new types of preference queries. This is an example of a skyline search. Multi-criteria decision-making utilizes the skyline operator to extract highly significant tuples or useful data points from extensive sets of multi-dimensional databases. The user’s settings determine the results, which include all tuples whose attribute vector remains undefeated by another tuple. The extracted tuples are commonly known as the skyline set. Lately, there has been a growing trend in research studies to perform skyline queries on data stream applications. These queries consist of extracting desired records from sliding windows and removing outdated records from incoming data sets that do not meet user requirements. The datasets in these applications are extremely large and exhibit a wide range of dimensions that vary over time. Consequently, the skyline query is considered a computationally demanding task, with the challenge of achieving a real-time response within an acceptable duration. We must transport and process enormous quantities of data. Traditional skyline algorithms have faced new challenges due to limitations in data transmission bandwidth and latency. The transfer of vast quantities of data would affect performance, power efficiency, and reliability. Consequently, it is imperative to make alterations to the computer paradigm. Parallel skyline queries have attracted the attention of both scholars and the business sector. The study of skyline queries has focused on sequential algorithms and parallel implementations for multicore processors, primarily due to their widespread use. While previous research has focused on sequential algorithms, there is a limitation to comprehensive studies that specifically address modern parallel processors. While numerous articles have been published regarding the parallelization of regular skyline queries, there is a limited amount of research dedicated specifically to the parallel processing of continuous skyline queries. This study introduces PRSS, a continuous skyline technique for multicore processors specifically designed for sliding window-based data streams. The efficacy of the proposed parallel implementation is demonstrated through tests conducted on both real-world and synthetic datasets, encompassing various point distributions, arrival rates, and window widths. The experimental results for a dataset characterized by a large number of dimensions and cardinality demonstrate significant acceleration.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高维数据流窗口上的并行连续天际线查询

高速算法交易、应急响应和灾难管理等领域的实时多标准决策应用推动了新型偏好查询的发展。这是天际线搜索的一个例子。多标准决策利用天际线运算符从大量多维数据库中提取高度重要的图元或有用的数据点。用户的设置决定了结果，其中包括属性向量不被其他图元击败的所有图元。提取的图元通常被称为天际线集。最近，在数据流应用中执行天际线查询的研究越来越多。这些查询包括从滑动窗口中提取所需的记录，以及从输入数据集中删除不符合用户要求的过时记录。这些应用中的数据集非常庞大，并呈现出随时间变化的各种维度。因此，天际线查询被认为是一项计算要求极高的任务，其挑战在于如何在可接受的时间内实现实时响应。我们必须传输和处理海量数据。由于数据传输带宽和延迟的限制，传统的天际线算法面临着新的挑战。海量数据的传输会影响性能、能效和可靠性。因此，改变计算机模式势在必行。并行天际线查询吸引了学者和企业界的关注。对天际线查询的研究主要集中在顺序算法和多核处理器的并行实现上，这主要是由于多核处理器的广泛使用。虽然以前的研究主要集中在顺序算法上，但专门针对现代并行处理器的全面研究还很有限。虽然已经发表了大量关于常规天际线查询并行化的文章，但专门针对连续天际线查询并行处理的研究还很有限。本研究介绍了 PRSS，这是一种适用于多核处理器的连续天际线技术，专门为基于滑动窗口的数据流而设计。通过对实际数据集和合成数据集（包括各种点分布、到达率和窗口宽度）进行测试，证明了所提出的并行实施方案的功效。对具有大量维度和卡入度特征的数据集的实验结果表明，该方法具有显著的加速性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Distributed and Parallel Databases 工程技术-计算机：理论方法

CiteScore

3.50

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Distributed and Parallel Databases publishes papers in all the traditional as well as most emerging areas of database research, including: Availability and reliability; Benchmarking and performance evaluation, and tuning; Big Data Storage and Processing; Cloud Computing and Database-as-a-Service; Crowdsourcing; Data curation, annotation and provenance; Data integration, metadata Management, and interoperability; Data models, semantics, query languages; Data mining and knowledge discovery; Data privacy, security, trust; Data provenance, workflows, Scientific Data Management; Data visualization and interactive data exploration; Data warehousing, OLAP, Analytics; Graph data management, RDF, social networks; Information Extraction and Data Cleaning; Middleware and Workflow Management; Modern Hardware and In-Memory Database Systems; Query Processing and Optimization; Semantic Web and open data; Social Networks; Storage, indexing, and physical database design; Streams, sensor networks, and complex event processing; Strings, Texts, and Keyword Search; Spatial, temporal, and spatio-temporal databases; Transaction processing; Uncertain, probabilistic, and approximate databases.