{"title":"Using Probabilistic Relational Models to generate synthetic spatial or non-spatial databases","authors":"Rajani Chulyadyo, Philippe Leray","doi":"10.1109/RCIS.2018.8406645","DOIUrl":null,"url":null,"abstract":"When real datasets are difficult to obtain for tasks such as system analysis, or algorithm evaluation, synthetic datasets are commonly used. Techniques for generating such datasets often generate random data for single-table datasets. Such datasets are often inapplicable when it comes to evaluating data mining or machine learning algorithms dealing with relational data. To address this, our earlier works have dealt with the task of generating relational datasets from Probabilistic Relational Models (PRMs), a framework for dealing with probabilistic uncertainties in relational domains. In this article, we extend this work by proposing to use more efficient data sampling algorithms, and by using a spatial extension of PRMs to generate synthetic spatial datasets. We also present our experimental analysis on three different data sampling algorithms applicable in our method, and the quality of the datasets generated by them.","PeriodicalId":408651,"journal":{"name":"2018 12th International Conference on Research Challenges in Information Science (RCIS)","volume":"804 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Research Challenges in Information Science (RCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2018.8406645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
When real datasets are difficult to obtain for tasks such as system analysis, or algorithm evaluation, synthetic datasets are commonly used. Techniques for generating such datasets often generate random data for single-table datasets. Such datasets are often inapplicable when it comes to evaluating data mining or machine learning algorithms dealing with relational data. To address this, our earlier works have dealt with the task of generating relational datasets from Probabilistic Relational Models (PRMs), a framework for dealing with probabilistic uncertainties in relational domains. In this article, we extend this work by proposing to use more efficient data sampling algorithms, and by using a spatial extension of PRMs to generate synthetic spatial datasets. We also present our experimental analysis on three different data sampling algorithms applicable in our method, and the quality of the datasets generated by them.