Learning From Query-Answers

Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
{"title":"Learning From Query-Answers","authors":"Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer","doi":"10.1145/3277503","DOIUrl":null,"url":null,"abstract":"Tuple-independent and disjoint-independent probabilistic databases (TI- and DI-PDBs) represent uncertain data in a factorized form as a product of independent random variables that represent either tuples (TI-PDBs) or sets of tuples (DI-PDBs). When the user submits a query, the database derives the marginal probabilities of each output-tuple, exploiting the underlying assumptions of statistical independence. While query processing in TI- and DI-PDBs has been studied extensively, limited research has been dedicated to the problems of updating or deriving the parameters from observations of query results. Addressing this problem is the main focus of this article. We first introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. The key idea of B-PDBs is to treat each parameter as a latent, Beta-distributed random variable. We show how this simple expedient enables both belief updating and parameter learning in a principled way, without imposing any burden on regular query processing. Building on B-PDBs, we then introduce Dirichlet Probabilistic Databases (D-PDBs), a generalization of DI-PDBs with similar properties. We provide the following key contributions for both B- and D-PDBs: (i) We study the complexity of performing Bayesian belief updates and devise efficient algorithms for certain tractable classes of queries; (ii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iii) we present an algorithm for efficiently computing conditional probabilities, allowing us to efficiently implement B- and D-PDBs via a standard relational engine; and (iv) we support our conclusions with extensive experimental results.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"50 1","pages":"1 - 41"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3277503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Tuple-independent and disjoint-independent probabilistic databases (TI- and DI-PDBs) represent uncertain data in a factorized form as a product of independent random variables that represent either tuples (TI-PDBs) or sets of tuples (DI-PDBs). When the user submits a query, the database derives the marginal probabilities of each output-tuple, exploiting the underlying assumptions of statistical independence. While query processing in TI- and DI-PDBs has been studied extensively, limited research has been dedicated to the problems of updating or deriving the parameters from observations of query results. Addressing this problem is the main focus of this article. We first introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. The key idea of B-PDBs is to treat each parameter as a latent, Beta-distributed random variable. We show how this simple expedient enables both belief updating and parameter learning in a principled way, without imposing any burden on regular query processing. Building on B-PDBs, we then introduce Dirichlet Probabilistic Databases (D-PDBs), a generalization of DI-PDBs with similar properties. We provide the following key contributions for both B- and D-PDBs: (i) We study the complexity of performing Bayesian belief updates and devise efficient algorithms for certain tractable classes of queries; (ii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iii) we present an algorithm for efficiently computing conditional probabilities, allowing us to efficiently implement B- and D-PDBs via a standard relational engine; and (iv) we support our conclusions with extensive experimental results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从问答中学习
元组独立和不分离独立概率数据库(TI-和DI-PDBs)以分解形式表示不确定数据,作为表示元组(TI- pdbs)或元组集合(DI-PDBs)的独立随机变量的乘积。当用户提交查询时,数据库利用统计独立性的基本假设,派生出每个输出元组的边际概率。虽然TI-和DI-PDBs中的查询处理已经得到了广泛的研究,但很少有研究致力于从查询结果的观察中更新或派生参数的问题。解决这个问题是本文的主要焦点。我们首先介绍Beta概率数据库(B-PDBs),这是TI-PDBs的一种推广,旨在以原则性和可扩展的方式支持(i)信念更新和(ii)参数学习。B-PDBs的关键思想是将每个参数视为潜在的beta分布随机变量。我们将展示这个简单的权宜之计如何以一种有原则的方式实现信念更新和参数学习,而不会给常规查询处理带来任何负担。在B-PDBs的基础上,我们介绍了Dirichlet概率数据库(D-PDBs),它是具有类似性质的DI-PDBs的推广。我们为B-和D-PDBs提供了以下关键贡献:(i)我们研究了执行贝叶斯信念更新的复杂性,并为某些可处理的查询类设计了有效的算法;(ii)我们提出了一种软em算法来计算参数的最大似然估计;(iii)我们提出了一种有效计算条件概率的算法,使我们能够通过标准的关系引擎有效地实现B-和D-PDBs;(iv)我们用广泛的实验结果来支持我们的结论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On Finding Rank Regret Representatives Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration Persistent Summaries Influence Maximization Revisited: Efficient Sampling with Bound Tightened The Space-Efficient Core of Vadalog
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1