一种新的关系数据库自动规范化方法

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Acta Informatica Pragensia Pub Date : 2022-09-09 DOI:10.18267/j.aip.193
Emre Akadal, Mehmet Hakan Satman
{"title":"一种新的关系数据库自动规范化方法","authors":"Emre Akadal, Mehmet Hakan Satman","doi":"10.18267/j.aip.193","DOIUrl":null,"url":null,"abstract":"The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.","PeriodicalId":36592,"journal":{"name":"Acta Informatica Pragensia","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Automatic Relational Database Normalization Method\",\"authors\":\"Emre Akadal, Mehmet Hakan Satman\",\"doi\":\"10.18267/j.aip.193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.\",\"PeriodicalId\":36592,\"journal\":{\"name\":\"Acta Informatica Pragensia\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Informatica Pragensia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18267/j.aip.193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica Pragensia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18267/j.aip.193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

数据多样性的增加以及数据库设计是一个困难的过程,使得为遇到的所有数据集设计一个独特的数据库模式实际上是不可能的。在本文中,我们介绍了一种基于全自动遗传算法的关系数据库规范化方法,该方法使用原始数据集,在不需要任何先验知识的情况下,揭示正确的数据库模式。为了测量算法的性能,我们使用使用50个知名数据库生成的250个数据集进行了模拟研究。总共进行了2500次模拟,对于包含不同合成内容的所有数据库设计的五个非规范化变体中的每一个,都进行了十次模拟。仿真研究结果表明,该算法准确地发现了72%的未知数据库模式。可以通过微调优化参数来提高性能。仿真研究的结果还表明,当手头只有原始数据集时,所设计的算法可以在许多数据集中使用,以揭示数据库的结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Novel Automatic Relational Database Normalization Method
The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Informatica Pragensia
Acta Informatica Pragensia Social Sciences-Library and Information Sciences
CiteScore
1.70
自引率
0.00%
发文量
26
审稿时长
12 weeks
期刊最新文献
Visualisation of User Stories in UML Models: A Systematic Literature Review Safe Haven for Asian Equity Markets During Financial Distress: Bitcoin Versus Gold Consumer Behaviour in Gamified Environment: A Bibliometric and Systematic Literature Review in Business and Management Area Impact of Women Driving Rights on Adoption and Usage of E-hailing Applications in Saudi Arabia Use of Data Mining for Analysis of Czech Real Estate Market
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1