{"title":"Embedded Functional Dependencies and Data-completeness Tailored Database Design","authors":"Ziheng Wei, Sebastian Link","doi":"10.1145/3450518","DOIUrl":null,"url":null,"abstract":"We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"32 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3450518","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.
期刊介绍:
Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.