语义实体

Christophe Van Gysel, M. de Rijke, M. Worring
{"title":"语义实体","authors":"Christophe Van Gysel, M. de Rijke, M. Worring","doi":"10.1145/2810133.2810139","DOIUrl":null,"url":null,"abstract":"Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Semantic Entities\",\"authors\":\"Christophe Van Gysel, M. de Rijke, M. Worring\",\"doi\":\"10.1145/2810133.2810139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.\",\"PeriodicalId\":298747,\"journal\":{\"name\":\"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2810133.2810139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2810133.2810139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

实体检索在过去的十年里引起了研究界的极大兴趣。十年前,在TREC企业轨道期间,专家知识检索任务在研究界得到了普及[10]。从那以后,它一直保持相关性,同时扩展到社交媒体,跟踪专业知识的动态[1- 5,8,11],更普遍的是,用于一系列实体检索任务。在第二作者的演讲中,我们将指出现有的实体或专家检索方法未能解决关键挑战:(1)查询和专家文档使用不同的表示来描述相同的概念[6,7]。实体和专家之间的术语不匹配[7]是由于广泛使用的最大似然语言模型无法利用词之间的语义相似性[9]。(2)随着可用数据量的增加,显然需要比平滑最大似然语言模型更强大、学习能力更强的方法[13]。(3)实体或专业知识检索的监督方法[5,8]是在过去十年中引入的。然而,数据可用性的加速有一个主要缺点,即在监督方法的情况下,手动注释工作需要维持类似的增长顺序。这就要求进一步发展无监督的方法。(4)根据某些实体或专业知识检索方法,为集合中的每个文档构建语言模型。这些方法对于大型文档集合缺乏有效的查询能力,因为每个查询词都需要针对每个文档进行匹配[2]。在演讲中,我们将讨论最近提出的解决方案[12],该解决方案非常强调无监督模型构建,高效查询能力,最重要的是,查询项和候选实体之间的语义匹配。我们表明,与生成语言模型相比,所提出的方法提高了检索性能,主要是因为它具有执行语义匹配的能力[7]。所提出的方法不需要任何注释或监督相关性判断,并且能够仅从原始文本证据和候选文档关联中学习。该提案的目的是为我们如何避免显式注释和特征工程而仍然获得语义上有意义的检索结果提供见解。在演讲中,我们将提供所提出的语义实体检索模型和执行精确匹配的传统生成语言模型之间的比较误差分析,这将对语义匹配和精确匹配的相对优势产生重要的见解,特别是对于专家检索任务和一般的实体检索。我们还将讨论提出的模型的扩展,旨在处理实体和专家检索的可伸缩性和动态方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Semantic Entities
Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Temporal Reconciliation for Dating Photographs Using Entity Information Hugo: Entity-based News Search and Summarisation CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects Contextualizing Data on a Content Management System Harnessing Semantics for Answer Sentence Retrieval
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1