使用语义上下文评估稀疏信息提取

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2505598

Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu

{"title":"使用语义上下文评估稀疏信息提取","authors":"Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu","doi":"10.1145/2505515.2505598","DOIUrl":null,"url":null,"abstract":"One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Assessing sparse information extraction using semantic contexts\",\"authors\":\"Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu\",\"doi\":\"10.1145/2505515.2505598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.\",\"PeriodicalId\":20528,\"journal\":{\"name\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2505515.2505598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2505598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

信息提取的一个重要假设是，越频繁的提取越有可能是正确的。稀疏信息提取具有挑战性，因为无论语料库有多大，语料库中只有少量证据支持的提取。一项名为REALM的开创性工作学习hmm对语义关系的上下文进行建模，以评估提取。这是非常昂贵的，并且为上下文显示的语义并不显式。在这项工作中，我们引入了一种轻量级的、显式的语义方法来进行稀疏信息提取。我们使用由数百万个概念、实体和属性组成的大型语义网络来显式地建模语义关系的上下文。实验表明，我们的方法在保持更高效率的同时，比最先进的基于HMM的方法提高了至少11.2%的提取f分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Assessing sparse information extraction using semantic contexts

One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Exploring XML data is as easy as using maps Mining-based compression approach of propositional formulae Flexible and dynamic compromises for effective recommendations Efficient parsing-based search over structured data Recommendation via user's personality and social contextual