使用马尔科夫链蒙特卡洛玩琐事

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767941

Daniel Deutch, Ohad Greenshpan, Boris Kostenko, T. Milo

{"title":"使用马尔科夫链蒙特卡洛玩琐事","authors":"Daniel Deutch, Ohad Greenshpan, Boris Kostenko, T. Milo","doi":"10.1109/ICDE.2011.5767941","DOIUrl":null,"url":null,"abstract":"We introduce in this Demonstration a system called Trivia Masster that generates a very large Database of facts in a variety of topics, and uses it for question answering. The facts are collected from human users (the “crowd”); the system motivates users to contribute to the Database by using a Trivia Game, where users gain points based on their contribution. A key challenge here is to provide a suitable Data Cleaning mechanism that allows to identify which of the facts (answers to Trivia questions) submitted by users are indeed correct / reliable, and consequently how many points to grant users, how to answer questions based on the collected data, and which questions to present to the Trivia players, in order to improve the data quality. As no existing single Data Cleaning technique provides a satisfactory solution to this challenge, we propose here a novel approach, based on a declarative framework for defining recursive and probabilistic Data Cleaning rules. Our solution employs an algorithm that is based on Markov Chain Monte Carlo Algorithms.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Using Markov Chain Monte Carlo to play Trivia\",\"authors\":\"Daniel Deutch, Ohad Greenshpan, Boris Kostenko, T. Milo\",\"doi\":\"10.1109/ICDE.2011.5767941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce in this Demonstration a system called Trivia Masster that generates a very large Database of facts in a variety of topics, and uses it for question answering. The facts are collected from human users (the “crowd”); the system motivates users to contribute to the Database by using a Trivia Game, where users gain points based on their contribution. A key challenge here is to provide a suitable Data Cleaning mechanism that allows to identify which of the facts (answers to Trivia questions) submitted by users are indeed correct / reliable, and consequently how many points to grant users, how to answer questions based on the collected data, and which questions to present to the Trivia players, in order to improve the data quality. As no existing single Data Cleaning technique provides a satisfactory solution to this challenge, we propose here a novel approach, based on a declarative framework for defining recursive and probabilistic Data Cleaning rules. Our solution employs an algorithm that is based on Markov Chain Monte Carlo Algorithms.\",\"PeriodicalId\":332374,\"journal\":{\"name\":\"2011 IEEE 27th International Conference on Data Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 27th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2011.5767941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 27th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2011.5767941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

在这个演示中，我们介绍了一个名为Trivia master的系统，它生成了一个非常大的各种主题的事实数据库，并将其用于问答。事实是从人类用户(“人群”)那里收集的;该系统通过一个问答游戏来激励用户为数据库做出贡献，用户可以根据自己的贡献获得积分。这里的一个关键挑战是提供一种合适的数据清理机制，允许识别用户提交的哪些事实(对Trivia问题的回答)确实是正确/可靠的，因此可以给用户多少分，如何根据收集到的数据回答问题，以及向Trivia玩家呈现哪些问题，以提高数据质量。由于没有现有的单一数据清理技术为这一挑战提供满意的解决方案，我们在这里提出了一种新的方法，该方法基于定义递归和概率数据清理规则的声明性框架。我们的解决方案采用了一种基于马尔可夫链蒙特卡罗算法的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Using Markov Chain Monte Carlo to play Trivia

We introduce in this Demonstration a system called Trivia Masster that generates a very large Database of facts in a variety of topics, and uses it for question answering. The facts are collected from human users (the “crowd”); the system motivates users to contribute to the Database by using a Trivia Game, where users gain points based on their contribution. A key challenge here is to provide a suitable Data Cleaning mechanism that allows to identify which of the facts (answers to Trivia questions) submitted by users are indeed correct / reliable, and consequently how many points to grant users, how to answer questions based on the collected data, and which questions to present to the Trivia players, in order to improve the data quality. As no existing single Data Cleaning technique provides a satisfactory solution to this challenge, we propose here a novel approach, based on a declarative framework for defining recursive and probabilistic Data Cleaning rules. Our solution employs an algorithm that is based on Markov Chain Monte Carlo Algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Advanced search, visualization and tagging of sensor metadata Bidirectional mining of non-redundant recurrent rules from a sequence database Web-scale information extraction with vertex Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins Dynamic prioritization of database queries