评估不确定匹配的概率查询

2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI:10.1109/ICDE.2012.14

Reynold Cheng, Jian Gong, D. Cheung, Jiefeng Cheng

{"title":"评估不确定匹配的概率查询","authors":"Reynold Cheng, Jian Gong, D. Cheung, Jiefeng Cheng","doi":"10.1109/ICDE.2012.14","DOIUrl":null,"url":null,"abstract":"A matching between two database schemas, generated by machine learning techniques (e.g., COMA++), is often uncertain. Handling the uncertainty of schema matching has recently raised a lot of research interest, because the quality of applications rely on the matching result. We study query evaluation over an inexact schema matching, which is represented as a set of ``possible mappings'', as well as the probabilities that they are correct. Since the number of possible mappings can be large, evaluating queries through these mappings can be expensive. By observing the fact that the possible mappings between two schemas often exhibit a high degree of overlap, we develop two efficient solutions. We also present a fast algorithm to compute answers with the k highest probabilities. An extensive evaluation on real schemas shows that our approaches improve the query performance by almost an order of magnitude.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Evaluating Probabilistic Queries over Uncertain Matching\",\"authors\":\"Reynold Cheng, Jian Gong, D. Cheung, Jiefeng Cheng\",\"doi\":\"10.1109/ICDE.2012.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A matching between two database schemas, generated by machine learning techniques (e.g., COMA++), is often uncertain. Handling the uncertainty of schema matching has recently raised a lot of research interest, because the quality of applications rely on the matching result. We study query evaluation over an inexact schema matching, which is represented as a set of ``possible mappings'', as well as the probabilities that they are correct. Since the number of possible mappings can be large, evaluating queries through these mappings can be expensive. By observing the fact that the possible mappings between two schemas often exhibit a high degree of overlap, we develop two efficient solutions. We also present a fast algorithm to compute answers with the k highest probabilities. An extensive evaluation on real schemas shows that our approaches improve the query performance by almost an order of magnitude.\",\"PeriodicalId\":321608,\"journal\":{\"name\":\"2012 IEEE 28th International Conference on Data Engineering\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 28th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2012.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

由机器学习技术(例如，COMA++)生成的两个数据库模式之间的匹配通常是不确定的。模式匹配的不确定性的处理是近年来研究热点之一，因为匹配结果直接影响应用的质量。我们研究了对一个不精确模式匹配的查询评估，它被表示为一组“可能映射”，以及它们是正确的概率。由于可能的映射数量可能很大，因此通过这些映射评估查询的成本可能很高。通过观察两个模式之间可能的映射经常表现出高度重叠这一事实，我们开发了两个有效的解决方案。我们还提出了一种快速算法来计算具有k个最高概率的答案。对实际模式的广泛评估表明，我们的方法将查询性能提高了几乎一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluating Probabilistic Queries over Uncertain Matching

A matching between two database schemas, generated by machine learning techniques (e.g., COMA++), is often uncertain. Handling the uncertainty of schema matching has recently raised a lot of research interest, because the quality of applications rely on the matching result. We study query evaluation over an inexact schema matching, which is represented as a set of ``possible mappings'', as well as the probabilities that they are correct. Since the number of possible mappings can be large, evaluating queries through these mappings can be expensive. By observing the fact that the possible mappings between two schemas often exhibit a high degree of overlap, we develop two efficient solutions. We also present a fast algorithm to compute answers with the k highest probabilities. An extensive evaluation on real schemas shows that our approaches improve the query performance by almost an order of magnitude.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 28th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Keyword Query Reformulation on Structured Data Accuracy-Aware Uncertain Stream Databases Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks Project Daytona: Data Analytics as a Cloud Service Automatic Extraction of Structured Web Data with Domain Knowledge