CSQUiD: an index and non-probability framework for constrained skyline query processing over uncertain data

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2024-09-16 DOI:10.7717/peerj-cs.2225

Ma'aruf Mohammed Lawal, Hamidah Ibrahim, Nor Fazlida Mohd Sani, Razali Yaakob, Ali A. Alwan

{"title":"CSQUiD: an index and non-probability framework for constrained skyline query processing over uncertain data","authors":"Ma'aruf Mohammed Lawal, Hamidah Ibrahim, Nor Fazlida Mohd Sani, Razali Yaakob, Ali A. Alwan","doi":"10.7717/peerj-cs.2225","DOIUrl":null,"url":null,"abstract":"Uncertainty of data, the degree to which data are inaccurate, imprecise, untrusted, and undetermined, is inherent in many contemporary database applications, and numerous research endeavours have been devoted to efficiently answer skyline queries over uncertain data. The literature discussed two different methods that could be used to handle the data uncertainty in which objects having continuous range values. The first method employs a probability-based approach, while the second assumes that the uncertain values are represented by their median values. Nevertheless, neither of these methods seem to be suitable for the modern high-dimensional uncertain databases due to the following reasons. The first method requires an intensive probability calculations while the second is impractical. Therefore, this work introduces an index, non-probability framework named Constrained Skyline Query processing on Uncertain Data (CSQUiD) aiming at reducing the computational time in processing constrained skyline queries over uncertain high-dimensional data. Given a collection of objects with uncertain data, the CSQUiD framework constructs the minimum bounding rectangles (MBRs) by employing the X-tree indexing structure. Instead of scanning the whole collection of objects, only objects within the dominant MBRs are analyzed in determining the final skylines. In addition, CSQUiD makes use of the Fuzzification approach where the exact value of each continuous range value of those dominant MBRs’ objects is identified. The proposed CSQUiD framework is validated using real and synthetic data sets through extensive experimentations. Based on the performance analysis conducted, by varying the sizes of the constrained query, the CSQUiD framework outperformed the most recent methods (CIS algorithm and SkyQUD-T framework) with an average improvement of 44.07% and 57.15% with regards to the number of pairwise comparisons, while the average improvement of CPU processing time over CIS and SkyQUD-T stood at 27.17% and 18.62%, respectively.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"38 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2225","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Uncertainty of data, the degree to which data are inaccurate, imprecise, untrusted, and undetermined, is inherent in many contemporary database applications, and numerous research endeavours have been devoted to efficiently answer skyline queries over uncertain data. The literature discussed two different methods that could be used to handle the data uncertainty in which objects having continuous range values. The first method employs a probability-based approach, while the second assumes that the uncertain values are represented by their median values. Nevertheless, neither of these methods seem to be suitable for the modern high-dimensional uncertain databases due to the following reasons. The first method requires an intensive probability calculations while the second is impractical. Therefore, this work introduces an index, non-probability framework named Constrained Skyline Query processing on Uncertain Data (CSQUiD) aiming at reducing the computational time in processing constrained skyline queries over uncertain high-dimensional data. Given a collection of objects with uncertain data, the CSQUiD framework constructs the minimum bounding rectangles (MBRs) by employing the X-tree indexing structure. Instead of scanning the whole collection of objects, only objects within the dominant MBRs are analyzed in determining the final skylines. In addition, CSQUiD makes use of the Fuzzification approach where the exact value of each continuous range value of those dominant MBRs’ objects is identified. The proposed CSQUiD framework is validated using real and synthetic data sets through extensive experimentations. Based on the performance analysis conducted, by varying the sizes of the constrained query, the CSQUiD framework outperformed the most recent methods (CIS algorithm and SkyQUD-T framework) with an average improvement of 44.07% and 57.15% with regards to the number of pairwise comparisons, while the average improvement of CPU processing time over CIS and SkyQUD-T stood at 27.17% and 18.62%, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CSQUiD：用于不确定数据受限天际线查询处理的索引和非概率框架

数据的不确定性，即数据不准确、不精确、不可信和不确定的程度，是许多当代数据库应用中固有的问题。文献讨论了两种不同的方法，可用于处理具有连续范围值的对象的数据不确定性。第一种方法采用基于概率的方法，而第二种方法则假定不确定值由其中值表示。然而，由于以下原因，这两种方法似乎都不适合现代高维不确定数据库。第一种方法需要大量的概率计算，而第二种方法不切实际。因此，这项工作引入了一个索引、非概率框架，名为 "不确定数据受限天际线查询处理（CSQUiD）"，旨在减少处理不确定高维数据受限天际线查询的计算时间。CSQUiD 框架给定一个不确定数据对象集合，利用 X 树索引结构构建最小边界矩形（MBR）。在确定最终天际线时，不需要扫描整个对象集合，而只对主要边界矩形内的对象进行分析。此外，CSQUiD 还采用了模糊化方法，即确定主要 MBR 物体的每个连续范围值的精确值。通过大量实验，使用真实和合成数据集对所提出的 CSQUiD 框架进行了验证。根据所进行的性能分析，通过改变受限查询的大小，CSQUiD 框架的性能优于最新的方法（CIS 算法和 SkyQUD-T 框架），在配对比较次数方面平均提高了 44.07% 和 57.15%，而 CPU 处理时间方面比 CIS 和 SkyQUD-T 平均分别提高了 27.17% 和 18.62%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.