一种新的关系数据库自动规范化方法

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Acta Informatica Pragensia Pub Date : 2022-09-09 DOI:10.18267/j.aip.193

Emre Akadal, Mehmet Hakan Satman

{"title":"一种新的关系数据库自动规范化方法","authors":"Emre Akadal, Mehmet Hakan Satman","doi":"10.18267/j.aip.193","DOIUrl":null,"url":null,"abstract":"The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.","PeriodicalId":36592,"journal":{"name":"Acta Informatica Pragensia","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Automatic Relational Database Normalization Method\",\"authors\":\"Emre Akadal, Mehmet Hakan Satman\",\"doi\":\"10.18267/j.aip.193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.\",\"PeriodicalId\":36592,\"journal\":{\"name\":\"Acta Informatica Pragensia\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Informatica Pragensia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18267/j.aip.193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica Pragensia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18267/j.aip.193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

数据多样性的增加以及数据库设计是一个困难的过程，使得为遇到的所有数据集设计一个独特的数据库模式实际上是不可能的。在本文中，我们介绍了一种基于全自动遗传算法的关系数据库规范化方法，该方法使用原始数据集，在不需要任何先验知识的情况下，揭示正确的数据库模式。为了测量算法的性能，我们使用使用50个知名数据库生成的250个数据集进行了模拟研究。总共进行了2500次模拟，对于包含不同合成内容的所有数据库设计的五个非规范化变体中的每一个，都进行了十次模拟。仿真研究结果表明，该算法准确地发现了72%的未知数据库模式。可以通过微调优化参数来提高性能。仿真研究的结果还表明，当手头只有原始数据集时，所设计的算法可以在许多数据集中使用，以揭示数据库的结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Automatic Relational Database Normalization Method

The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊