{"title":"Cross Engine Database Joining","authors":"Wesley Leonard, Paul B. Albee","doi":"10.1109/SERA.2010.13","DOIUrl":null,"url":null,"abstract":"A standards-based, open-source middleware system was designed and implemented to facilitate the analysis of large and disparate datasets. This system makes it possible to access several different types of database servers simultaneously, browse remote data, combine datasets, and join tables from remote databases independent of vendor. The system uses an algorithm known as Dynamic Merge Cache to handle data caching, query generation, transformations, and joining with minimal operational interference to source databases. The system is able to combine any subset of configured databases and convert the information into XML. The resulting XML is made available to analysis tools through a web service. After the system connects to a remote database, a metadata catalog is created from the source database. The user is able to configure which tables and fields to export from the remote dataset. The user is also able to filter, transform, and combine data. The system was tested with a large fish contaminant database and a second database populated with simulated scientific data.","PeriodicalId":102108,"journal":{"name":"2010 Eighth ACIS International Conference on Software Engineering Research, Management and Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Eighth ACIS International Conference on Software Engineering Research, Management and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2010.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
A standards-based, open-source middleware system was designed and implemented to facilitate the analysis of large and disparate datasets. This system makes it possible to access several different types of database servers simultaneously, browse remote data, combine datasets, and join tables from remote databases independent of vendor. The system uses an algorithm known as Dynamic Merge Cache to handle data caching, query generation, transformations, and joining with minimal operational interference to source databases. The system is able to combine any subset of configured databases and convert the information into XML. The resulting XML is made available to analysis tools through a web service. After the system connects to a remote database, a metadata catalog is created from the source database. The user is able to configure which tables and fields to export from the remote dataset. The user is also able to filter, transform, and combine data. The system was tested with a large fish contaminant database and a second database populated with simulated scientific data.