首页 > 最新文献

Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval最新文献

英文 中文
Hugo: Entity-based News Search and Summarisation Hugo:基于实体的新闻搜索和摘要
Anaïs Cadilhac, Andrew Chisholm, Ben Hachey, S. Kharazmi
We describe Hugo -- a service initially available on iOS that solicits a structured, semantic query and returns entity-specific news articles. Retrieval is powered by a semantic annotation pipeline that includes named entity linking and automatic summarisation. Search and entity linking use an in-house knowledge base initialised with Wikipedia data and continually curated to include new entities. Hugo delivers timely knowledge about a user's professional network, in particular new people they want to know more about.
我们描述了Hugo——一种最初在iOS上可用的服务,它可以请求结构化的语义查询,并返回特定实体的新闻文章。检索由语义注释管道提供支持,该管道包括命名实体链接和自动摘要。搜索和实体链接使用内部知识库初始化维基百科数据,并不断策划包括新的实体。Hugo提供有关用户专业网络的及时知识,特别是他们想要更多地了解的新人。
{"title":"Hugo: Entity-based News Search and Summarisation","authors":"Anaïs Cadilhac, Andrew Chisholm, Ben Hachey, S. Kharazmi","doi":"10.1145/2810133.2810144","DOIUrl":"https://doi.org/10.1145/2810133.2810144","url":null,"abstract":"We describe Hugo -- a service initially available on iOS that solicits a structured, semantic query and returns entity-specific news articles. Retrieval is powered by a semantic annotation pipeline that includes named entity linking and automatic summarisation. Search and entity linking use an in-house knowledge base initialised with Wikipedia data and continually curated to include new entities. Hugo delivers timely knowledge about a user's professional network, in particular new people they want to know more about.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116038875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Contextualizing Data on a Content Management System 在内容管理系统中对数据进行上下文化
Cátia Moreira, João Taborda, R. Gaudio, Lara dos Santos, Paulo Pereira
Content Management Systems (CMSs) are known for their ability for storing data, both structured and non-structured data. However they are not able to associate meaning and context to the stored information. Furthermore, these systems do not meet the needs and expectations of their users, because as the size of data increases, the system loses its capacity of retrieving meaningful results. In order to overcome this issue, we propose a method to implement data contextualization on a CMS. The proposed method consists of enriching the data with semantic information, allowing a more accurate retrieval of results. The implementation of this approach was validated by applying this contextualization method to a currently used CMS with real information. With this improved CMS, it is expected that the users will be able to retrieve data related to their initial search.
内容管理系统(cms)以其存储数据(包括结构化和非结构化数据)的能力而闻名。然而,它们不能将含义和上下文与存储的信息联系起来。此外,这些系统不能满足用户的需求和期望,因为随着数据大小的增加,系统失去了检索有意义结果的能力。为了克服这个问题,我们提出了一种在CMS上实现数据上下文化的方法。该方法利用语义信息丰富数据,使检索结果更加准确。通过将此上下文化方法应用于当前使用的具有真实信息的CMS,验证了该方法的实现。有了这个改进的CMS,预计用户将能够检索与其初始搜索相关的数据。
{"title":"Contextualizing Data on a Content Management System","authors":"Cátia Moreira, João Taborda, R. Gaudio, Lara dos Santos, Paulo Pereira","doi":"10.1145/2810133.2810134","DOIUrl":"https://doi.org/10.1145/2810133.2810134","url":null,"abstract":"Content Management Systems (CMSs) are known for their ability for storing data, both structured and non-structured data. However they are not able to associate meaning and context to the stored information. Furthermore, these systems do not meet the needs and expectations of their users, because as the size of data increases, the system loses its capacity of retrieving meaningful results. In order to overcome this issue, we propose a method to implement data contextualization on a CMS. The proposed method consists of enriching the data with semantic information, allowing a more accurate retrieval of results. The implementation of this approach was validated by applying this contextualization method to a currently used CMS with real information. With this improved CMS, it is expected that the users will be able to retrieve data related to their initial search.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123034913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Open and Closed Schema for Aligning Knowledge and Text Collections 用于对齐知识和文本集合的开放和封闭模式
Matthew Kelcey
When it comes to knowledge bases most people's first thought are structured sources such as Freebase/Wikidata and their relationship to similarly structured web sources such as Wikipedia. A lot of additional and interesting "knowledge" though is captured in unstructured databases constructed in a less supervised manner using open information extraction techniques. In this talk we'll discuss some of the differences between open/closed schema knowledge bases including the ideas of objective vs subjective content as well as freshness and trust. We'll give an overview on approaches to aligning such data sources in a way that their relative strengths can be combined and finish with applications of such alignments; particularly around open question and answer systems.
说到知识库,大多数人首先想到的是结构化资源,如Freebase/Wikidata,以及它们与类似结构化网络资源(如Wikipedia)的关系。然而,许多额外的和有趣的“知识”是在使用开放信息提取技术以较少监督的方式构建的非结构化数据库中捕获的。在这次演讲中,我们将讨论开放/封闭模式知识库之间的一些差异,包括客观内容与主观内容以及新鲜度和信任度的概念。我们将概述对齐这些数据源的方法,以便将它们的相对优势结合起来,并完成此类对齐的应用程序;特别是围绕开放式问答系统。
{"title":"Open and Closed Schema for Aligning Knowledge and Text Collections","authors":"Matthew Kelcey","doi":"10.1145/2810133.2810140","DOIUrl":"https://doi.org/10.1145/2810133.2810140","url":null,"abstract":"When it comes to knowledge bases most people's first thought are structured sources such as Freebase/Wikidata and their relationship to similarly structured web sources such as Wikipedia. A lot of additional and interesting \"knowledge\" though is captured in unstructured databases constructed in a less supervised manner using open information extraction techniques. In this talk we'll discuss some of the differences between open/closed schema knowledge bases including the ideas of objective vs subjective content as well as freshness and trust. We'll give an overview on approaches to aligning such data sources in a way that their relative strengths can be combined and finish with applications of such alignments; particularly around open question and answer systems.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127927021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Temporal Reconciliation for Dating Photographs Using Entity Information 使用实体信息对照片进行时间协调
Paul Martin, M. Spaniol, A. Doucet
Temporal classification of Web contents requires a "notion" about them. This is particularly relevant when contents contain several dates and a human "interpretation" is required in order to chose the appropriate time point. The dating challenge becomes even more complex, when images have to be dated based on the content describing them. In this paper, we present a novel time-stamping approach based on semantics derived from the document. To this end, we will first introduce our experimental dataset and then explain our temporal reconciliation pipeline. In particular, we will explain the process of temporal reconciliation by incorporating information derived from named entities.
Web内容的时态分类需要对它们有一个“概念”。当内容包含多个日期,并且需要人工“解释”以选择适当的时间点时,这一点尤为重要。当图像必须根据描述它们的内容来确定日期时,约会的挑战就变得更加复杂了。在本文中,我们提出了一种新的基于从文档中派生的语义的时间戳方法。为此,我们将首先介绍我们的实验数据集,然后解释我们的时间调和管道。特别是,我们将通过合并来自命名实体的信息来解释时间对账的过程。
{"title":"Temporal Reconciliation for Dating Photographs Using Entity Information","authors":"Paul Martin, M. Spaniol, A. Doucet","doi":"10.1145/2810133.2810142","DOIUrl":"https://doi.org/10.1145/2810133.2810142","url":null,"abstract":"Temporal classification of Web contents requires a \"notion\" about them. This is particularly relevant when contents contain several dates and a human \"interpretation\" is required in order to chose the appropriate time point. The dating challenge becomes even more complex, when images have to be dated based on the content describing them. In this paper, we present a novel time-stamping approach based on semantics derived from the document. To this end, we will first introduce our experimental dataset and then explain our temporal reconciliation pipeline. In particular, we will explain the process of temporal reconciliation by incorporating information derived from named entities.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Named Entity Disambiguation for Resource-Poor Languages 资源贫乏语言的命名实体消歧
Mohamed H. Gad-Elrab, M. Yosef, G. Weikum
Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.
命名实体消歧(NED)是将自然语言文本中的歧义名称与注册在知识库中的规范实体(如人员、组织或地点)链接起来的任务。这个问题在英语文本中得到了很好的研究,但是很少有系统考虑缺乏综合名称实体字典、实体描述和大型带注释的训练语料库的资源贫乏语言。在本文中,我们解决了带有有限数量注释语料库的语言以及结构化资源(如阿拉伯语)的NED问题。我们提出了一种方法,利用结构化的英语资源来丰富语言不可知的NED系统的组成部分,并使其他语言的NED有效。我们通过融合来自多个多语言资源的数据和自动翻译/音译系统的输出来实现这一目标。我们通过合成阿拉伯语、西班牙语和意大利语的NED系统来展示我们方法的可行性和质量。
{"title":"Named Entity Disambiguation for Resource-Poor Languages","authors":"Mohamed H. Gad-Elrab, M. Yosef, G. Weikum","doi":"10.1145/2810133.2810138","DOIUrl":"https://doi.org/10.1145/2810133.2810138","url":null,"abstract":"Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Applying Semantic Web Technologies for Improving the Visibility of Tourism Data 应用语义Web技术提高旅游数据可见性
Fayrouz Soualah-Alila, Cyril Faucher, F. Bertrand, Mickaël Coustaty, A. Doucet
Tourism industry is an extremely information-intensive, complex and dynamic activity. It can benefit from semantic Web technologies, due to the significant heterogeneity of information sources and the high volume of on-line data. The management of semantically diverse annotated tourism data is facilitated by ontologies that provide methods and standards, which allow flexibility and more intelligent access to on-line data. This paper provides a description of some of the early results of the Tourinflux project which aims to apply semantic Web technologies to support tourist actors in effectively finding and publishing information on the Web.
旅游业是一项信息密集、复杂、动态的产业。由于信息源的显著异构性和在线数据的高容量,它可以受益于语义Web技术。本体提供了方法和标准,为语义多样化的注释旅游数据的管理提供了便利,这些方法和标准允许灵活和更智能地访问在线数据。本文描述了tourinfluin项目的一些早期成果,该项目旨在应用语义网络技术来支持游客在网络上有效地查找和发布信息。
{"title":"Applying Semantic Web Technologies for Improving the Visibility of Tourism Data","authors":"Fayrouz Soualah-Alila, Cyril Faucher, F. Bertrand, Mickaël Coustaty, A. Doucet","doi":"10.1145/2810133.2810137","DOIUrl":"https://doi.org/10.1145/2810133.2810137","url":null,"abstract":"Tourism industry is an extremely information-intensive, complex and dynamic activity. It can benefit from semantic Web technologies, due to the significant heterogeneity of information sources and the high volume of on-line data. The management of semantically diverse annotated tourism data is facilitated by ontologies that provide methods and standards, which allow flexibility and more intelligent access to on-line data. This paper provides a description of some of the early results of the Tourinflux project which aims to apply semantic Web technologies to support tourist actors in effectively finding and publishing information on the Web.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129990256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Knowledge-Driven Video Information Retrieval with LOD: From Semi-Structured to Structured Video Metadata 基于LOD的知识驱动视频信息检索:从半结构化到结构化视频元数据
L. Sikos, D. Powers
In parallel with the tremendously increasing number of video contents on the Web, many technical specifications and standards have been introduced to store technical details and describe the content of, and add subtitles to, online videos. Some of these specifications are based on unstructured data with limited machine-processability, data reuse, and interoperability, while others are XML-based, representing semi-structured data. While low-level video features can be derived automatically, high-level features are mainly related to a particular knowledge domain and heavily rely on human experience, judgment, and background. One of the approaches to solve this problem is to map standard, often semi-structured, vocabularies, such as that of MPEG-7, to machine-interpretable ontologies. Another approach is to introduce new multimedia ontologies. While video contents can be annotated efficiently with terms defined by structured LOD datasets, such as DBpedia, ontology standardization would be desired in the video production and distribution domains. This paper compares the state-of-the-art video annotations in terms of descriptor level and machine-readability, highlights the limitations of the different approaches, and makes suggestions towards standard video annotations.
随着网络上视频内容数量的急剧增加,许多技术规范和标准被引入来存储技术细节,描述在线视频的内容,并为在线视频添加字幕。其中一些规范基于具有有限的机器可处理性、数据重用性和互操作性的非结构化数据,而其他规范则基于xml,表示半结构化数据。虽然低级视频特征可以自动导出,但高级特征主要与特定的知识领域相关,并且严重依赖于人类的经验、判断和背景。解决这个问题的方法之一是将标准的(通常是半结构化的)词汇表(如MPEG-7的词汇表)映射到机器可解释的本体。另一种方法是引入新的多媒体本体。虽然可以用结构化LOD数据集(如DBpedia)定义的术语对视频内容进行有效的注释,但在视频制作和分发领域需要本体标准化。本文从描述符级别和机器可读性两个方面比较了目前最先进的视频注释方法,强调了不同方法的局限性,并对标准视频注释提出了建议。
{"title":"Knowledge-Driven Video Information Retrieval with LOD: From Semi-Structured to Structured Video Metadata","authors":"L. Sikos, D. Powers","doi":"10.1145/2810133.2810141","DOIUrl":"https://doi.org/10.1145/2810133.2810141","url":null,"abstract":"In parallel with the tremendously increasing number of video contents on the Web, many technical specifications and standards have been introduced to store technical details and describe the content of, and add subtitles to, online videos. Some of these specifications are based on unstructured data with limited machine-processability, data reuse, and interoperability, while others are XML-based, representing semi-structured data. While low-level video features can be derived automatically, high-level features are mainly related to a particular knowledge domain and heavily rely on human experience, judgment, and background. One of the approaches to solve this problem is to map standard, often semi-structured, vocabularies, such as that of MPEG-7, to machine-interpretable ontologies. Another approach is to introduce new multimedia ontologies. While video contents can be annotated efficiently with terms defined by structured LOD datasets, such as DBpedia, ontology standardization would be desired in the video production and distribution domains. This paper compares the state-of-the-art video annotations in terms of descriptor level and machine-readability, highlights the limitations of the different approaches, and makes suggestions towards standard video annotations.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Semantic Entities 语义实体
Christophe Van Gysel, M. de Rijke, M. Worring
Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.
实体检索在过去的十年里引起了研究界的极大兴趣。十年前,在TREC企业轨道期间,专家知识检索任务在研究界得到了普及[10]。从那以后,它一直保持相关性,同时扩展到社交媒体,跟踪专业知识的动态[1- 5,8,11],更普遍的是,用于一系列实体检索任务。在第二作者的演讲中,我们将指出现有的实体或专家检索方法未能解决关键挑战:(1)查询和专家文档使用不同的表示来描述相同的概念[6,7]。实体和专家之间的术语不匹配[7]是由于广泛使用的最大似然语言模型无法利用词之间的语义相似性[9]。(2)随着可用数据量的增加,显然需要比平滑最大似然语言模型更强大、学习能力更强的方法[13]。(3)实体或专业知识检索的监督方法[5,8]是在过去十年中引入的。然而,数据可用性的加速有一个主要缺点,即在监督方法的情况下,手动注释工作需要维持类似的增长顺序。这就要求进一步发展无监督的方法。(4)根据某些实体或专业知识检索方法,为集合中的每个文档构建语言模型。这些方法对于大型文档集合缺乏有效的查询能力,因为每个查询词都需要针对每个文档进行匹配[2]。在演讲中,我们将讨论最近提出的解决方案[12],该解决方案非常强调无监督模型构建,高效查询能力,最重要的是,查询项和候选实体之间的语义匹配。我们表明,与生成语言模型相比,所提出的方法提高了检索性能,主要是因为它具有执行语义匹配的能力[7]。所提出的方法不需要任何注释或监督相关性判断,并且能够仅从原始文本证据和候选文档关联中学习。该提案的目的是为我们如何避免显式注释和特征工程而仍然获得语义上有意义的检索结果提供见解。在演讲中,我们将提供所提出的语义实体检索模型和执行精确匹配的传统生成语言模型之间的比较误差分析,这将对语义匹配和精确匹配的相对优势产生重要的见解,特别是对于专家检索任务和一般的实体检索。我们还将讨论提出的模型的扩展,旨在处理实体和专家检索的可伸缩性和动态方面。
{"title":"Semantic Entities","authors":"Christophe Van Gysel, M. de Rijke, M. Worring","doi":"10.1145/2810133.2810139","DOIUrl":"https://doi.org/10.1145/2810133.2810139","url":null,"abstract":"Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126661463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Interface Sketch for Queripidia: Query-driven Knowledge Portfolios from the Web Queripidia的界面草图:来自Web的查询驱动的知识组合
Laura Dietz, M. Schuhmacher
We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.
我们的目标是增加文本知识资源,如维基百科与来自万维网的信息,同时专注于给定的信息需求。我们展示了一个基于我们称之为知识组合的解决方案。知识组合是特定于查询的相关实体的集合,以及来自Web的相关段落,这些段落解释了实体如何与查询相关。通过对万维网和维基百科的检索,结合相互关联的推理过程,提取知识组合。一个关键因素是实体链接注释,它将知识库中的抽象实体绑定到Web上的上下文中。我们展示了我们的全自动系统Queripidia的结果,它能够在TREC Web track的数据上为任何Web样式的查询创建知识组合。在线演示可通过http://smart-cactus.org/~dietz/knowport/获得。
{"title":"An Interface Sketch for Queripidia: Query-driven Knowledge Portfolios from the Web","authors":"Laura Dietz, M. Schuhmacher","doi":"10.1145/2810133.2810145","DOIUrl":"https://doi.org/10.1145/2810133.2810145","url":null,"abstract":"We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130067967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects cadminer:一个挖掘消费者不良药物副作用报告的系统
Sarvnaz Karimi, Alejandro Metke-Jimenez, Anthony N. Nguyen
We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.
我们推出了CADEminer,这是一个挖掘消费者对药物评论的系统,以促进发现临床试验中可能未发现的药物副作用。CADEminer利用搜索和自然语言处理技术(a)提取评论中提到的副作用和其他相关概念,如药物名称和疾病;(b)将提取的提及归一化到它们在本体(如SNOMED CT和MedDRA)中的统一表示;(c)确定所提取概念之间的关系,例如药物引起的副作用;(d)查阅已知药物副作用的权威清单,以确定所提取的副作用是否是新的,因而需要进一步调查;最后(e)提供数据的统计和可视化。
{"title":"CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects","authors":"Sarvnaz Karimi, Alejandro Metke-Jimenez, Anthony N. Nguyen","doi":"10.1145/2810133.2810143","DOIUrl":"https://doi.org/10.1145/2810133.2810143","url":null,"abstract":"We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116658763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1