Self-adapting data migration in the context of schema evolution in NoSQL databases

IF 0.9 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Distributed and Parallel Databases Pub Date : 2020-04-01 DOI:10.1109/ICDEW49219.2020.00013

Andrea Hillenbrand, U. Störl, Shamil Nabiyev, Meike Klettke

{"title":"Self-adapting data migration in the context of schema evolution in NoSQL databases","authors":"Andrea Hillenbrand, U. Störl, Shamil Nabiyev, Meike Klettke","doi":"10.1109/ICDEW49219.2020.00013","DOIUrl":null,"url":null,"abstract":"When NoSQL database systems are used in an agile software development setting, data model changes occur frequently and thus, data is routinely stored in different versions. The management of versioned data leads to an overhead potentially impeding the software development. Several data migration strategies exist that handle legacy data differently during data accesses, each of which can be characterized by certain advantages and disadvantages. Depending on the requirements for the software application, we evaluate and compare different migration strategies through metrics like migration costs and latency as well as precision and recall. Ideally, exactly that strategy should be selected whose characteristics fulfill service-level agreements and match the migration scenario, which depends on the query workload and the changes in the data model which imply an evolution of the database schema. In this paper, we present a methodology of self-adapting data migration, which automatically adjusts migration strategies and their parameters with respect to the migration scenario and service-level agreements, thereby contributing to the self-management of database systems and supporting agile development.","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"40 1","pages":"5 - 25"},"PeriodicalIF":0.9000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDEW49219.2020.00013","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Distributed and Parallel Databases","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/ICDEW49219.2020.00013","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 13

Abstract

When NoSQL database systems are used in an agile software development setting, data model changes occur frequently and thus, data is routinely stored in different versions. The management of versioned data leads to an overhead potentially impeding the software development. Several data migration strategies exist that handle legacy data differently during data accesses, each of which can be characterized by certain advantages and disadvantages. Depending on the requirements for the software application, we evaluate and compare different migration strategies through metrics like migration costs and latency as well as precision and recall. Ideally, exactly that strategy should be selected whose characteristics fulfill service-level agreements and match the migration scenario, which depends on the query workload and the changes in the data model which imply an evolution of the database schema. In this paper, we present a methodology of self-adapting data migration, which automatically adjusts migration strategies and their parameters with respect to the migration scenario and service-level agreements, thereby contributing to the self-management of database systems and supporting agile development.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NoSQL数据库模式进化背景下的自适应数据迁移

当在敏捷软件开发环境中使用NoSQL数据库系统时，数据模型经常发生变化，因此，数据通常存储在不同的版本中。对版本化数据的管理会导致可能阻碍软件开发的开销。存在几种数据迁移策略，它们在数据访问期间以不同的方式处理遗留数据，每种策略都有一定的优点和缺点。根据软件应用程序的要求，我们通过迁移成本、延迟以及精度和召回率等指标来评估和比较不同的迁移策略。理想情况下，应该选择其特征符合服务级别协议并与迁移场景匹配的策略，迁移场景取决于查询工作负载和数据模型的变化，这意味着数据库模式的演变。在本文中，我们提出了一种自适应数据迁移方法，该方法根据迁移场景和服务级别协议自动调整迁移策略及其参数，从而有助于数据库系统的自我管理，支持敏捷开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Distributed and Parallel Databases 工程技术-计算机：理论方法

CiteScore

3.50

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Distributed and Parallel Databases publishes papers in all the traditional as well as most emerging areas of database research, including: Availability and reliability; Benchmarking and performance evaluation, and tuning; Big Data Storage and Processing; Cloud Computing and Database-as-a-Service; Crowdsourcing; Data curation, annotation and provenance; Data integration, metadata Management, and interoperability; Data models, semantics, query languages; Data mining and knowledge discovery; Data privacy, security, trust; Data provenance, workflows, Scientific Data Management; Data visualization and interactive data exploration; Data warehousing, OLAP, Analytics; Graph data management, RDF, social networks; Information Extraction and Data Cleaning; Middleware and Workflow Management; Modern Hardware and In-Memory Database Systems; Query Processing and Optimization; Semantic Web and open data; Social Networks; Storage, indexing, and physical database design; Streams, sensor networks, and complex event processing; Strings, Texts, and Keyword Search; Spatial, temporal, and spatio-temporal databases; Transaction processing; Uncertain, probabilistic, and approximate databases.