首页 > 最新文献

Conference on Automated Knowledge Base Construction最新文献

英文 中文
Scalable Rule Learning in Probabilistic Knowledge Bases 概率知识库中的可扩展规则学习
Pub Date : 2019-05-02 DOI: 10.24432/C5MW26
Arcchit Jain, Tal Friedman, Ondřej Kuželka, Guy Van den Broeck, L. D. Raedt
Knowledge Bases (KBs) are becoming increasingly large, sparse and probabilistic. These KBs are typically used to perform query inferences and rule mining. But their efficacy is only as high as their completeness. Efficiently utilizing incomplete KBs remains a major challenge as the current KB completion techniques either do not take into account the inherent uncertainty associated with each KB tuple or do not scale to large KBs. Probabilistic rule learning not only considers the probability of every KB tuple but also tackles the problem of KB completion in an explainable way. For any given probabilistic KB, it learns probabilistic first-order rules from its relations to identify interesting patterns. But, the current probabilistic rule learning techniques perform grounding to do probabilistic inference for evaluation of candidate rules. It does not scale well to large KBs as the time complexity of inference using grounding is exponential over the size of the KB. In this paper, we present SafeLearner -- a scalable solution to probabilistic KB completion that performs probabilistic rule learning using lifted probabilistic inference -- as faster approach instead of grounding. We compared SafeLearner to the state-of-the-art probabilistic rule learner ProbFOIL+ and to its deterministic contemporary AMIE+ on standard probabilistic KBs of NELL (Never-Ending Language Learner) and Yago. Our results demonstrate that SafeLearner scales as good as AMIE+ when learning simple rules and is also significantly faster than ProbFOIL+.
知识库(KBs)正变得越来越大、稀疏和概率化。这些KBs通常用于执行查询推理和规则挖掘。但是,它们的有效性取决于它们的完整性。有效地利用不完整的KB仍然是一个主要的挑战,因为当前的KB补全技术要么没有考虑到每个KB元组相关的固有不确定性,要么不能扩展到大的KB。概率规则学习不仅考虑每个知识库元组的概率,而且以一种可解释的方式解决知识库补全问题。对于任何给定的概率知识库,它从其关系中学习概率一阶规则,以识别有趣的模式。但是,目前的概率规则学习技术是基于对候选规则的评估进行概率推理。它不能很好地扩展到大KB,因为使用接地的推理的时间复杂度在KB的大小上呈指数级增长。在本文中,我们提出了SafeLearner——一种可扩展的概率知识库完成解决方案,它使用提升的概率推理执行概率规则学习——作为更快的方法而不是基础。我们将SafeLearner与最先进的概率规则学习器ProbFOIL+及其确定性当代AMIE+在NELL(永无止境的语言学习者)和Yago的标准概率KBs上进行了比较。我们的结果表明,SafeLearner在学习简单规则时与AMIE+一样好,并且也明显快于ProbFOIL+。
{"title":"Scalable Rule Learning in Probabilistic Knowledge Bases","authors":"Arcchit Jain, Tal Friedman, Ondřej Kuželka, Guy Van den Broeck, L. D. Raedt","doi":"10.24432/C5MW26","DOIUrl":"https://doi.org/10.24432/C5MW26","url":null,"abstract":"Knowledge Bases (KBs) are becoming increasingly large, sparse and probabilistic. These KBs are typically used to perform query inferences and rule mining. But their efficacy is only as high as their completeness. Efficiently utilizing incomplete KBs remains a major challenge as the current KB completion techniques either do not take into account the inherent uncertainty associated with each KB tuple or do not scale to large KBs. Probabilistic rule learning not only considers the probability of every KB tuple but also tackles the problem of KB completion in an explainable way. For any given probabilistic KB, it learns probabilistic first-order rules from its relations to identify interesting patterns. But, the current probabilistic rule learning techniques perform grounding to do probabilistic inference for evaluation of candidate rules. It does not scale well to large KBs as the time complexity of inference using grounding is exponential over the size of the KB. In this paper, we present SafeLearner -- a scalable solution to probabilistic KB completion that performs probabilistic rule learning using lifted probabilistic inference -- as faster approach instead of grounding. We compared SafeLearner to the state-of-the-art probabilistic rule learner ProbFOIL+ and to its deterministic contemporary AMIE+ on standard probabilistic KBs of NELL (Never-Ending Language Learner) and Yago. Our results demonstrate that SafeLearner scales as good as AMIE+ when learning simple rules and is also significantly faster than ProbFOIL+.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Classifying entities into an incomplete ontology 将实体分类为一个不完整的本体
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509564
Bhavana Dalvi, William W. Cohen, Jamie Callan
Exponential growth of unlabeled web-scale datasets, and class hierarchies to represent them, has given rise to new challenges for hierarchical classification. It is costly and time consuming to create a complete ontology of classes to represent entities on the Web. Hence, there is a need for techniques that can do hierarchical classification of entities into incomplete ontologies. In this paper we present Hierarchical Exploratory EM algorithm (an extension of the Exploratory EM algorithm [7]) that takes a seed class hierarchy and seed class instances as input. Our method classifies relevant entities into some of the classes from the seed hierarchy and on its way adds newly discovered classes into the hierarchy. Experiments with subsets of the NELL ontology and text datasets derived from the ClueWeb09 corpus show that our Hierarchical Exploratory EM approach improves seed class F1 by up to 21% when compared to its semi-supervised counterpart.
未标记的网络规模数据集呈指数级增长,以及代表它们的类层次结构,给层次分类带来了新的挑战。创建一个完整的类本体来表示Web上的实体是非常昂贵和耗时的。因此,需要能够将实体分层分类为不完整本体的技术。在本文中,我们提出了分层探索性EM算法(探索性EM算法[7]的扩展),该算法以种子类层次结构和种子类实例作为输入。我们的方法将相关实体从种子层次结构中分类到一些类中,并在此过程中将新发现的类添加到层次结构中。对NELL本体子集和来自ClueWeb09语料库的文本数据集进行的实验表明,与半监督的方法相比,我们的分层探索性EM方法将种子类F1提高了21%。
{"title":"Classifying entities into an incomplete ontology","authors":"Bhavana Dalvi, William W. Cohen, Jamie Callan","doi":"10.1145/2509558.2509564","DOIUrl":"https://doi.org/10.1145/2509558.2509564","url":null,"abstract":"Exponential growth of unlabeled web-scale datasets, and class hierarchies to represent them, has given rise to new challenges for hierarchical classification. It is costly and time consuming to create a complete ontology of classes to represent entities on the Web. Hence, there is a need for techniques that can do hierarchical classification of entities into incomplete ontologies. In this paper we present Hierarchical Exploratory EM algorithm (an extension of the Exploratory EM algorithm [7]) that takes a seed class hierarchy and seed class instances as input. Our method classifies relevant entities into some of the classes from the seed hierarchy and on its way adds newly discovered classes into the hierarchy. Experiments with subsets of the NELL ontology and text datasets derived from the ClueWeb09 corpus show that our Hierarchical Exploratory EM approach improves seed class F1 by up to 21% when compared to its semi-supervised counterpart.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124915576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A joint model for discovering and linking entities 用于发现和链接实体的联合模型
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509570
Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum
Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.
实体解析,即自动确定哪些提及引用了相同的现实世界实体的任务,是知识库构建和管理的一个关键方面。然而,在大规模执行实体解析是具有挑战性的,因为(1)推理算法必须处理不可避免的系统可伸缩性问题,(2)搜索空间在提及的数量上呈指数级增长。目前的传统观点是,在这些尺度上执行共同引用需要分解问题,首先解决实体链接的简单任务(将一组提及与一组已知的知识库实体进行匹配),然后作为后处理步骤执行实体发现(识别知识库中不存在的新实体)。然而,我们认为这种传统的方法对实体链接和整体共参考准确性都是有害的。因此,我们接受了将实体链接和实体发现联合建模作为单个实体解决问题的挑战。为了在可扩展性方面取得进展,我们(1)提出了一个对紧凑的分层实体表示进行推理的模型,(2)提出了一种新的分布式推理体系结构,该体系结构不会受到映射约简体系结构中固有的同步性瓶颈的影响。研究表明,更多的测试时间数据实际上提高了共同引用的准确性,并且表明联合共同引用比传统的实体链接准确得多,将误差降低了75%。
{"title":"A joint model for discovering and linking entities","authors":"Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum","doi":"10.1145/2509558.2509570","DOIUrl":"https://doi.org/10.1145/2509558.2509570","url":null,"abstract":"Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Ontology-aware partitioning for knowledge graph identification 知识图识别的本体感知划分
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509562
J. Pujara, Hui Miao, L. Getoor, William W. Cohen
Knowledge graphs provide a powerful representation of entities and the relationships between them, but automatically constructing such graphs from noisy extractions presents numerous challenges. Knowledge graph identification (KGI) is a technique for knowledge graph construction that jointly reasons about entities, attributes and relations in the presence of uncertain inputs and ontological constraints. Although knowledge graph identification shows promise scaling to knowledge graphs built from millions of extractions, increasingly powerful extraction engines may soon require knowledge graphs built from billions of extractions. One tool for scaling is partitioning extractions to allow reasoning to occur in parallel. We explore approaches which leverage ontological information and distributional information in partitioning. We compare these techniques with hash-based approaches, and show that using a richer partitioning model that incorporates the ontology graph and distribution of extractions provides superior results. Our results demonstrate that partitioning can result in order-of-magnitude speedups without reducing model performance.
知识图提供了实体及其之间关系的强大表示,但是从噪声提取中自动构建这样的图提出了许多挑战。知识图谱识别(KGI)是一种在不确定输入和本体约束条件下对实体、属性和关系进行联合推理的知识图谱构建技术。尽管知识图谱识别显示出可以扩展到从数百万次提取中构建的知识图谱,但越来越强大的提取引擎可能很快就需要从数十亿次提取中构建知识图谱。扩展的一个工具是分区提取,以允许并行地进行推理。我们探索了在分区中利用本体信息和分布信息的方法。我们将这些技术与基于哈希的方法进行了比较,并表明使用包含本体图和提取分布的更丰富的划分模型提供了更好的结果。我们的结果表明,分区可以在不降低模型性能的情况下导致数量级的速度提高。
{"title":"Ontology-aware partitioning for knowledge graph identification","authors":"J. Pujara, Hui Miao, L. Getoor, William W. Cohen","doi":"10.1145/2509558.2509562","DOIUrl":"https://doi.org/10.1145/2509558.2509562","url":null,"abstract":"Knowledge graphs provide a powerful representation of entities and the relationships between them, but automatically constructing such graphs from noisy extractions presents numerous challenges. Knowledge graph identification (KGI) is a technique for knowledge graph construction that jointly reasons about entities, attributes and relations in the presence of uncertain inputs and ontological constraints. Although knowledge graph identification shows promise scaling to knowledge graphs built from millions of extractions, increasingly powerful extraction engines may soon require knowledge graphs built from billions of extractions. One tool for scaling is partitioning extractions to allow reasoning to occur in parallel. We explore approaches which leverage ontological information and distributional information in partitioning. We compare these techniques with hash-based approaches, and show that using a richer partitioning model that incorporates the ontology graph and distribution of extractions provides superior results. Our results demonstrate that partitioning can result in order-of-magnitude speedups without reducing model performance.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131889658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Universal schema for entity type prediction 用于实体类型预测的通用模式
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509572
Limin Yao, S. Riedel, A. McCallum
Categorizing entities by their types is useful in many applications, including knowledge base construction, relation extraction and query intent prediction. Fine-grained entity type ontologies are especially valuable, but typically difficult to design because of unavoidable quandaries about level of detail and boundary cases. Automatically classifying entities by type is challenging as well, usually involving hand-labeling data and training a supervised predictor. This paper presents a universal schema approach to fine-grained entity type prediction. The set of types is taken as the union of textual surface patterns (e.g. appositives) and pre-defined types from available databases (e.g. Freebase)---yielding not tens or hundreds of types, but more than ten thousands of entity types, such as financier, criminologist, and musical trio. We robustly learn mutual implication among this large union by learning latent vector embeddings from probabilistic matrix factorization, thus avoiding the need for hand-labeled data. Experimental results demonstrate more than 30% reduction in error versus a traditional classification approach on predicting fine-grained entities types.
根据实体的类型对实体进行分类在很多应用中都很有用,包括知识库的构建、关系提取和查询意图预测。细粒度实体类型本体特别有价值,但通常难以设计,因为在细节级别和边界情况方面存在不可避免的困境。按类型自动分类实体也很有挑战性,通常涉及手动标记数据和训练监督预测器。提出了一种用于细粒度实体类型预测的通用模式方法。类型集是文本表面模式(例如同位词)和来自可用数据库(例如Freebase)的预定义类型的结合——产生的不是数十或数百种类型,而是超过上万种实体类型,例如金融家、犯罪学家和音乐三重奏。我们通过学习概率矩阵分解的潜在向量嵌入来鲁棒地学习这个大联合之间的相互含义,从而避免了对手工标记数据的需要。实验结果表明,在预测细粒度实体类型方面,与传统分类方法相比,误差降低了30%以上。
{"title":"Universal schema for entity type prediction","authors":"Limin Yao, S. Riedel, A. McCallum","doi":"10.1145/2509558.2509572","DOIUrl":"https://doi.org/10.1145/2509558.2509572","url":null,"abstract":"Categorizing entities by their types is useful in many applications, including knowledge base construction, relation extraction and query intent prediction. Fine-grained entity type ontologies are especially valuable, but typically difficult to design because of unavoidable quandaries about level of detail and boundary cases. Automatically classifying entities by type is challenging as well, usually involving hand-labeling data and training a supervised predictor.\u0000 This paper presents a universal schema approach to fine-grained entity type prediction. The set of types is taken as the union of textual surface patterns (e.g. appositives) and pre-defined types from available databases (e.g. Freebase)---yielding not tens or hundreds of types, but more than ten thousands of entity types, such as financier, criminologist, and musical trio. We robustly learn mutual implication among this large union by learning latent vector embeddings from probabilistic matrix factorization, thus avoiding the need for hand-labeled data. Experimental results demonstrate more than 30% reduction in error versus a traditional classification approach on predicting fine-grained entities types.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129840840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
A survey of noise reduction methods for distant supervision 远程监理降噪方法综述
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509571
Benjamin Roth, Tassilo Barth, Michael Wiegand, D. Klakow
We survey recent approaches to noise reduction in distant supervision learning for relation extraction. We group them according to the principles they are based on: at-least-one constraints, topic-based models, or pattern correlations. Besides describing them, we illustrate the fundamental differences and attempt to give an outlook to potentially fruitful further research. In addition, we identify related work in sentiment analysis which could profit from approaches to noise reduction.
我们综述了最近在远程监督学习中用于关系提取的降噪方法。我们根据它们所基于的原则对它们进行分组:至少一个约束、基于主题的模型或模式相关性。除了描述它们之外,我们还说明了基本差异,并试图对可能富有成果的进一步研究进行展望。此外,我们确定了情感分析中的相关工作,这些工作可以从降噪方法中获益。
{"title":"A survey of noise reduction methods for distant supervision","authors":"Benjamin Roth, Tassilo Barth, Michael Wiegand, D. Klakow","doi":"10.1145/2509558.2509571","DOIUrl":"https://doi.org/10.1145/2509558.2509571","url":null,"abstract":"We survey recent approaches to noise reduction in distant supervision learning for relation extraction. We group them according to the principles they are based on: at-least-one constraints, topic-based models, or pattern correlations. Besides describing them, we illustrate the fundamental differences and attempt to give an outlook to potentially fruitful further research. In addition, we identify related work in sentiment analysis which could profit from approaches to noise reduction.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122162878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Exploiting DBpedia for web search results clustering 利用DBpedia进行web搜索结果聚类
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509574
M. Schuhmacher, Simone Paolo Ponzetto
We present a knowledge-rich approach to Web search result clustering which exploits the output of an open-domain entity linker, as well as the types and topical concepts encoded within a wide-coverage ontology. Our results indicate that, thanks to an accurate and compact semantification of the search result snippets, we are able to achieve a competitive performance on a benchmarking dataset for this task.
我们提出了一种知识丰富的Web搜索结果聚类方法,该方法利用开放域实体链接器的输出,以及在广泛覆盖的本体中编码的类型和主题概念。我们的结果表明,由于搜索结果片段的准确和紧凑的语义化,我们能够在此任务的基准数据集上实现具有竞争力的性能。
{"title":"Exploiting DBpedia for web search results clustering","authors":"M. Schuhmacher, Simone Paolo Ponzetto","doi":"10.1145/2509558.2509574","DOIUrl":"https://doi.org/10.1145/2509558.2509574","url":null,"abstract":"We present a knowledge-rich approach to Web search result clustering which exploits the output of an open-domain entity linker, as well as the types and topical concepts encoded within a wide-coverage ontology. Our results indicate that, thanks to an accurate and compact semantification of the search result snippets, we are able to achieve a competitive performance on a benchmarking dataset for this task.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124387883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Extracting meronyms for a biology knowledge base using distant supervision 利用远程监控提取生物知识库的别名
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509560
Xiao Ling, Peter Clark, Daniel S. Weld
Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped methods, but achieving both high precision and recall has proven elusive. This paper reports on a thorough exploration of distant supervision to learn a meronym extractor for the domain of college biology. We introduce a novel algorithm, generalizing the ``at least one'' assumption of multi-instance learning to handle the case where a fixed (but unknown) percentage of bag members are positive examples. Detailed experiments compare strategies for mention detection, negative example generation, leveraging out-of-domain meronyms, and evaluate the benefit of our multi-instance percentage model.
对象及其组成部分的知识、同义词关系是许多问答系统的核心,但手动编码这些事实是不切实际的。过去的研究人员尝试过手写模式、监督学习和自举方法,但事实证明,既要实现高精度又要实现召回是难以捉摸的。本文报道了对大学生物学领域远程监督学习的深入探索。我们引入了一种新的算法,推广了多实例学习的“至少一个”假设,以处理固定(但未知)百分比的包成员是正例的情况。详细的实验比较了提及检测、负例生成、利用域外异名的策略,并评估了我们的多实例百分比模型的好处。
{"title":"Extracting meronyms for a biology knowledge base using distant supervision","authors":"Xiao Ling, Peter Clark, Daniel S. Weld","doi":"10.1145/2509558.2509560","DOIUrl":"https://doi.org/10.1145/2509558.2509560","url":null,"abstract":"Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped methods, but achieving both high precision and recall has proven elusive. This paper reports on a thorough exploration of distant supervision to learn a meronym extractor for the domain of college biology. We introduce a novel algorithm, generalizing the ``at least one'' assumption of multi-instance learning to handle the case where a fixed (but unknown) percentage of bag members are positive examples. Detailed experiments compare strategies for mention detection, negative example generation, leveraging out-of-domain meronyms, and evaluate the benefit of our multi-instance percentage model.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133883702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Knowledge base population and visualization using an ontology based on semantic roles 基于语义角色的本体的知识库填充和可视化
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509573
Maryam Siahbani, Ravikiran Vadlapudi, M. Whitney, Anoop Sarkar
This paper extracts facts using "micro-reading" of text in contrast to approaches that extract common-sense knowledge using "macro-reading" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia.
本文通过对文本的“微观阅读”提取事实,与使用“宏观阅读”方法提取常识的方法形成对比。我们的目标是使用以谓词为中心的事件视图(谁对谁做了什么,何时以及如何做)从自然语言中提取有关事件的详细事实。我们利用语义角色标签来为我们的知识库中的实体创建一个新的以谓词为中心的本体。这使得用户可以很容易地发现不寻常的事实。为此,我们将知识库和本体与信息可视化系统紧密耦合,该系统可用于探索和导航从大型自然语言文本集合中提取的事件。我们使用我们的方法来创建一个基于web的维基百科历史事件可视化浏览器。
{"title":"Knowledge base population and visualization using an ontology based on semantic roles","authors":"Maryam Siahbani, Ravikiran Vadlapudi, M. Whitney, Anoop Sarkar","doi":"10.1145/2509558.2509573","DOIUrl":"https://doi.org/10.1145/2509558.2509573","url":null,"abstract":"This paper extracts facts using \"micro-reading\" of text in contrast to approaches that extract common-sense knowledge using \"macro-reading\" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using natural language to integrate, evaluate, and optimize extracted knowledge bases 使用自然语言来整合、评估和优化提取的知识库
Pub Date : 2013-10-27 DOI: 10.1145/2509558.2509569
Doug Downey, Chandra Bhagavatula, A. Yates
Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.
Web信息提取(Web Information Extraction, WIE)系统提取了数十亿个独特的事实,但是将断言集成到一个连贯的知识库中并跨不同的WIE技术进行评估仍然是一个挑战。我们提出了一个框架,利用自然语言来整合和评估提取的知识库(KBs)。在框架中,KBs通过交换自然语言上的概率分布来集成,并通过输出分布预测持有文本的程度来评估。我们描述了该方法的优点,并详细介绍了剩余的研究挑战。
{"title":"Using natural language to integrate, evaluate, and optimize extracted knowledge bases","authors":"Doug Downey, Chandra Bhagavatula, A. Yates","doi":"10.1145/2509558.2509569","DOIUrl":"https://doi.org/10.1145/2509558.2509569","url":null,"abstract":"Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129146789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Conference on Automated Knowledge Base Construction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1