{"title":"A Novel Automatic Relational Database Normalization Method","authors":"Emre Akadal, Mehmet Hakan Satman","doi":"10.18267/j.aip.193","DOIUrl":null,"url":null,"abstract":"The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.","PeriodicalId":36592,"journal":{"name":"Acta Informatica Pragensia","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica Pragensia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18267/j.aip.193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.