{"title":"Defining Data Model Quality Metrics for Data Vault 2.0 Model Evaluation","authors":"Heli Helskyaho, Laura Ruotsalainen, Tomi Männistö","doi":"10.3390/inventions9010021","DOIUrl":null,"url":null,"abstract":"Designing a database is a crucial step in providing businesses with high-quality data for decision making. The quality of a data model is the key to the quality of its data. Evaluating the quality of a data model is a complex and time-consuming task. Having suitable metrics for evaluating the quality of a data model is an essential requirement for automating the design process of a data model. While there are metrics available for evaluating data warehouse data models to some degree, there is a distinct lack of metrics specifically designed to assess how well a data model conforms to the rules and best practices of Data Vault 2.0. The quality of a Data Vault 2.0 data model is considered suboptimal if it fails to adhere to these principles. In this paper, we introduce new metrics that can be used for evaluating the quality of a Data Vault 2.0 data model, either manually or automatically. This methodology involves defining a set of metrics based on the best practices of Data Vault 2.0, evaluating five representative data models using both metrics and manual assessments made by a human expert. Finally, a comparative analysis of both evaluations was conducted to validate the consistency of the metrics with the judgments made by a human expert.","PeriodicalId":509629,"journal":{"name":"Inventions","volume":" 22","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inventions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/inventions9010021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing a database is a crucial step in providing businesses with high-quality data for decision making. The quality of a data model is the key to the quality of its data. Evaluating the quality of a data model is a complex and time-consuming task. Having suitable metrics for evaluating the quality of a data model is an essential requirement for automating the design process of a data model. While there are metrics available for evaluating data warehouse data models to some degree, there is a distinct lack of metrics specifically designed to assess how well a data model conforms to the rules and best practices of Data Vault 2.0. The quality of a Data Vault 2.0 data model is considered suboptimal if it fails to adhere to these principles. In this paper, we introduce new metrics that can be used for evaluating the quality of a Data Vault 2.0 data model, either manually or automatically. This methodology involves defining a set of metrics based on the best practices of Data Vault 2.0, evaluating five representative data models using both metrics and manual assessments made by a human expert. Finally, a comparative analysis of both evaluations was conducted to validate the consistency of the metrics with the judgments made by a human expert.
设计数据库是为企业决策提供高质量数据的关键一步。数据模型的质量是数据质量的关键。评估数据模型的质量是一项复杂而耗时的任务。要实现数据模型设计过程的自动化,就必须有合适的指标来评估数据模型的质量。虽然在某种程度上有用于评估数据仓库数据模型的指标,但明显缺乏专门用于评估数据模型符合 Data Vault 2.0 规则和最佳实践的程度的指标。如果 Data Vault 2.0 数据模型未能遵守这些原则,那么它的质量就会被认为是次优的。本文介绍了可用于手动或自动评估 Data Vault 2.0 数据模型质量的新指标。这种方法包括根据 Data Vault 2.0 的最佳实践定义一套衡量标准,使用衡量标准和人工专家的手动评估对五个具有代表性的数据模型进行评估。最后,对这两种评估进行比较分析,以验证度量标准与人类专家判断的一致性。