SRank: Guiding schema selection in NoSQL document stores

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2024-09-14 DOI:10.1016/j.datak.2024.102360

Shelly Sachdeva , Neha Bansal , Hardik Bansal

{"title":"SRank: Guiding schema selection in NoSQL document stores","authors":"Shelly Sachdeva , Neha Bansal , Hardik Bansal","doi":"10.1016/j.datak.2024.102360","DOIUrl":null,"url":null,"abstract":"<div><div>The rise of big data has led to a greater need for applications to change their schema frequently. NoSQL databases provide flexibility in organizing data and offer multiple choices for structuring and storing similar information. While schema flexibility speeds up initial development, choosing schemas wisely is crucial, as they significantly impact performance, affecting data redundancy, navigation cost, data access cost, and maintainability. This paper emphasizes the importance of schema design in NoSQL document stores. It proposes a model to analyze and evaluate different schema alternatives and suggest the best schema out of various schema alternatives. The model is divided into four phases. The model inputs the Entity-Relationship (ER) model and workload queries. In the Transformation Phase, the schema alternatives are initially developed for each ER model, and subsequently, a schema graph is generated for each alternative. Concurrently, workload queries undergo conversion into query graphs. In the Schema Evaluation phase, the Schema Rank (SRank) is calculated for each schema alternative using query metrics derived from the query graphs and path coverage generated from the schema graphs. Finally, in the Output phase, the schema with the highest SRank is recommended as the most suitable choice for the application. The paper includes a case study of a Hotel Reservation System (HRS) to demonstrate the application of the proposed model. It comprehensively evaluates various schema alternatives based on query response time, storage efficiency, scalability, throughput, and latency. The paper validates the SRank computation for schema selection in NoSQL databases through an extensive experimental study. The alignment of SRank values with each schema's performance metrics underscores this ranking system's effectiveness. The SRank simplifies the schema selection process, assisting users in making informed decisions by reducing the time, cost, and effort of identifying the optimal schema for NoSQL document stores.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102360"},"PeriodicalIF":2.7000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000843","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The rise of big data has led to a greater need for applications to change their schema frequently. NoSQL databases provide flexibility in organizing data and offer multiple choices for structuring and storing similar information. While schema flexibility speeds up initial development, choosing schemas wisely is crucial, as they significantly impact performance, affecting data redundancy, navigation cost, data access cost, and maintainability. This paper emphasizes the importance of schema design in NoSQL document stores. It proposes a model to analyze and evaluate different schema alternatives and suggest the best schema out of various schema alternatives. The model is divided into four phases. The model inputs the Entity-Relationship (ER) model and workload queries. In the Transformation Phase, the schema alternatives are initially developed for each ER model, and subsequently, a schema graph is generated for each alternative. Concurrently, workload queries undergo conversion into query graphs. In the Schema Evaluation phase, the Schema Rank (SRank) is calculated for each schema alternative using query metrics derived from the query graphs and path coverage generated from the schema graphs. Finally, in the Output phase, the schema with the highest SRank is recommended as the most suitable choice for the application. The paper includes a case study of a Hotel Reservation System (HRS) to demonstrate the application of the proposed model. It comprehensively evaluates various schema alternatives based on query response time, storage efficiency, scalability, throughput, and latency. The paper validates the SRank computation for schema selection in NoSQL databases through an extensive experimental study. The alignment of SRank values with each schema's performance metrics underscores this ranking system's effectiveness. The SRank simplifies the schema selection process, assisting users in making informed decisions by reducing the time, cost, and effort of identifying the optimal schema for NoSQL document stores.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SRank：指导 NoSQL 文档存储中的模式选择

大数据的兴起导致应用程序更需要频繁更改其模式。NoSQL 数据库可以灵活地组织数据，并为结构化和存储类似信息提供多种选择。虽然模式灵活性加快了初始开发速度，但明智地选择模式至关重要，因为它们会显著影响性能，影响数据冗余、导航成本、数据访问成本和可维护性。本文强调了模式设计在 NoSQL 文档存储中的重要性。它提出了一个模型，用于分析和评估不同的模式备选方案，并从各种模式备选方案中推荐最佳模式。该模型分为四个阶段。该模型输入实体关系（ER）模型和工作量查询。在转换阶段，最初为每个 ER 模型开发模式备选方案，随后为每个备选方案生成模式图。与此同时，工作负载查询也被转换成查询图。在模式评估阶段，使用从查询图和从模式图生成的路径覆盖率得出的查询指标，为每个模式备选方案计算模式排名（SRank）。最后，在输出阶段，推荐 SRank 最高的模式作为最适合应用的选择。本文通过一个酒店预订系统（HRS）的案例研究来展示所提模型的应用。论文根据查询响应时间、存储效率、可扩展性、吞吐量和延迟全面评估了各种模式选择。论文通过广泛的实验研究验证了用于 NoSQL 数据库模式选择的 SRank 计算。SRank 值与每个模式的性能指标相一致，凸显了该排名系统的有效性。SRank 简化了模式选择过程，通过减少为 NoSQL 文档存储确定最佳模式所需的时间、成本和精力，帮助用户做出明智的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.