Anaïs Cadilhac, Andrew Chisholm, Ben Hachey, S. Kharazmi
We describe Hugo -- a service initially available on iOS that solicits a structured, semantic query and returns entity-specific news articles. Retrieval is powered by a semantic annotation pipeline that includes named entity linking and automatic summarisation. Search and entity linking use an in-house knowledge base initialised with Wikipedia data and continually curated to include new entities. Hugo delivers timely knowledge about a user's professional network, in particular new people they want to know more about.
{"title":"Hugo: Entity-based News Search and Summarisation","authors":"Anaïs Cadilhac, Andrew Chisholm, Ben Hachey, S. Kharazmi","doi":"10.1145/2810133.2810144","DOIUrl":"https://doi.org/10.1145/2810133.2810144","url":null,"abstract":"We describe Hugo -- a service initially available on iOS that solicits a structured, semantic query and returns entity-specific news articles. Retrieval is powered by a semantic annotation pipeline that includes named entity linking and automatic summarisation. Search and entity linking use an in-house knowledge base initialised with Wikipedia data and continually curated to include new entities. Hugo delivers timely knowledge about a user's professional network, in particular new people they want to know more about.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116038875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cátia Moreira, João Taborda, R. Gaudio, Lara dos Santos, Paulo Pereira
Content Management Systems (CMSs) are known for their ability for storing data, both structured and non-structured data. However they are not able to associate meaning and context to the stored information. Furthermore, these systems do not meet the needs and expectations of their users, because as the size of data increases, the system loses its capacity of retrieving meaningful results. In order to overcome this issue, we propose a method to implement data contextualization on a CMS. The proposed method consists of enriching the data with semantic information, allowing a more accurate retrieval of results. The implementation of this approach was validated by applying this contextualization method to a currently used CMS with real information. With this improved CMS, it is expected that the users will be able to retrieve data related to their initial search.
{"title":"Contextualizing Data on a Content Management System","authors":"Cátia Moreira, João Taborda, R. Gaudio, Lara dos Santos, Paulo Pereira","doi":"10.1145/2810133.2810134","DOIUrl":"https://doi.org/10.1145/2810133.2810134","url":null,"abstract":"Content Management Systems (CMSs) are known for their ability for storing data, both structured and non-structured data. However they are not able to associate meaning and context to the stored information. Furthermore, these systems do not meet the needs and expectations of their users, because as the size of data increases, the system loses its capacity of retrieving meaningful results. In order to overcome this issue, we propose a method to implement data contextualization on a CMS. The proposed method consists of enriching the data with semantic information, allowing a more accurate retrieval of results. The implementation of this approach was validated by applying this contextualization method to a currently used CMS with real information. With this improved CMS, it is expected that the users will be able to retrieve data related to their initial search.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123034913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When it comes to knowledge bases most people's first thought are structured sources such as Freebase/Wikidata and their relationship to similarly structured web sources such as Wikipedia. A lot of additional and interesting "knowledge" though is captured in unstructured databases constructed in a less supervised manner using open information extraction techniques. In this talk we'll discuss some of the differences between open/closed schema knowledge bases including the ideas of objective vs subjective content as well as freshness and trust. We'll give an overview on approaches to aligning such data sources in a way that their relative strengths can be combined and finish with applications of such alignments; particularly around open question and answer systems.
{"title":"Open and Closed Schema for Aligning Knowledge and Text Collections","authors":"Matthew Kelcey","doi":"10.1145/2810133.2810140","DOIUrl":"https://doi.org/10.1145/2810133.2810140","url":null,"abstract":"When it comes to knowledge bases most people's first thought are structured sources such as Freebase/Wikidata and their relationship to similarly structured web sources such as Wikipedia. A lot of additional and interesting \"knowledge\" though is captured in unstructured databases constructed in a less supervised manner using open information extraction techniques. In this talk we'll discuss some of the differences between open/closed schema knowledge bases including the ideas of objective vs subjective content as well as freshness and trust. We'll give an overview on approaches to aligning such data sources in a way that their relative strengths can be combined and finish with applications of such alignments; particularly around open question and answer systems.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127927021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal classification of Web contents requires a "notion" about them. This is particularly relevant when contents contain several dates and a human "interpretation" is required in order to chose the appropriate time point. The dating challenge becomes even more complex, when images have to be dated based on the content describing them. In this paper, we present a novel time-stamping approach based on semantics derived from the document. To this end, we will first introduce our experimental dataset and then explain our temporal reconciliation pipeline. In particular, we will explain the process of temporal reconciliation by incorporating information derived from named entities.
{"title":"Temporal Reconciliation for Dating Photographs Using Entity Information","authors":"Paul Martin, M. Spaniol, A. Doucet","doi":"10.1145/2810133.2810142","DOIUrl":"https://doi.org/10.1145/2810133.2810142","url":null,"abstract":"Temporal classification of Web contents requires a \"notion\" about them. This is particularly relevant when contents contain several dates and a human \"interpretation\" is required in order to chose the appropriate time point. The dating challenge becomes even more complex, when images have to be dated based on the content describing them. In this paper, we present a novel time-stamping approach based on semantics derived from the document. To this end, we will first introduce our experimental dataset and then explain our temporal reconciliation pipeline. In particular, we will explain the process of temporal reconciliation by incorporating information derived from named entities.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.
{"title":"Named Entity Disambiguation for Resource-Poor Languages","authors":"Mohamed H. Gad-Elrab, M. Yosef, G. Weikum","doi":"10.1145/2810133.2810138","DOIUrl":"https://doi.org/10.1145/2810133.2810138","url":null,"abstract":"Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fayrouz Soualah-Alila, Cyril Faucher, F. Bertrand, Mickaël Coustaty, A. Doucet
Tourism industry is an extremely information-intensive, complex and dynamic activity. It can benefit from semantic Web technologies, due to the significant heterogeneity of information sources and the high volume of on-line data. The management of semantically diverse annotated tourism data is facilitated by ontologies that provide methods and standards, which allow flexibility and more intelligent access to on-line data. This paper provides a description of some of the early results of the Tourinflux project which aims to apply semantic Web technologies to support tourist actors in effectively finding and publishing information on the Web.
{"title":"Applying Semantic Web Technologies for Improving the Visibility of Tourism Data","authors":"Fayrouz Soualah-Alila, Cyril Faucher, F. Bertrand, Mickaël Coustaty, A. Doucet","doi":"10.1145/2810133.2810137","DOIUrl":"https://doi.org/10.1145/2810133.2810137","url":null,"abstract":"Tourism industry is an extremely information-intensive, complex and dynamic activity. It can benefit from semantic Web technologies, due to the significant heterogeneity of information sources and the high volume of on-line data. The management of semantically diverse annotated tourism data is facilitated by ontologies that provide methods and standards, which allow flexibility and more intelligent access to on-line data. This paper provides a description of some of the early results of the Tourinflux project which aims to apply semantic Web technologies to support tourist actors in effectively finding and publishing information on the Web.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129990256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In parallel with the tremendously increasing number of video contents on the Web, many technical specifications and standards have been introduced to store technical details and describe the content of, and add subtitles to, online videos. Some of these specifications are based on unstructured data with limited machine-processability, data reuse, and interoperability, while others are XML-based, representing semi-structured data. While low-level video features can be derived automatically, high-level features are mainly related to a particular knowledge domain and heavily rely on human experience, judgment, and background. One of the approaches to solve this problem is to map standard, often semi-structured, vocabularies, such as that of MPEG-7, to machine-interpretable ontologies. Another approach is to introduce new multimedia ontologies. While video contents can be annotated efficiently with terms defined by structured LOD datasets, such as DBpedia, ontology standardization would be desired in the video production and distribution domains. This paper compares the state-of-the-art video annotations in terms of descriptor level and machine-readability, highlights the limitations of the different approaches, and makes suggestions towards standard video annotations.
{"title":"Knowledge-Driven Video Information Retrieval with LOD: From Semi-Structured to Structured Video Metadata","authors":"L. Sikos, D. Powers","doi":"10.1145/2810133.2810141","DOIUrl":"https://doi.org/10.1145/2810133.2810141","url":null,"abstract":"In parallel with the tremendously increasing number of video contents on the Web, many technical specifications and standards have been introduced to store technical details and describe the content of, and add subtitles to, online videos. Some of these specifications are based on unstructured data with limited machine-processability, data reuse, and interoperability, while others are XML-based, representing semi-structured data. While low-level video features can be derived automatically, high-level features are mainly related to a particular knowledge domain and heavily rely on human experience, judgment, and background. One of the approaches to solve this problem is to map standard, often semi-structured, vocabularies, such as that of MPEG-7, to machine-interpretable ontologies. Another approach is to introduce new multimedia ontologies. While video contents can be annotated efficiently with terms defined by structured LOD datasets, such as DBpedia, ontology standardization would be desired in the video production and distribution domains. This paper compares the state-of-the-art video annotations in terms of descriptor level and machine-readability, highlights the limitations of the different approaches, and makes suggestions towards standard video annotations.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.
{"title":"Semantic Entities","authors":"Christophe Van Gysel, M. de Rijke, M. Worring","doi":"10.1145/2810133.2810139","DOIUrl":"https://doi.org/10.1145/2810133.2810139","url":null,"abstract":"Entity retrieval has seen a lot of interest from the research community over the past decade. Ten years ago, the expertise retrieval task gained popularity in the research community during the TREC Enterprise Track [10]. It has remained relevant ever since, while broadening to social media, to tracking the dynamics of expertise [1-5, 8, 11], and, more generally, to a range of entity retrieval tasks. In the talk, which will be given by the second author, we will point out that existing methods to entity or expert retrieval fail to address key challenges: (1) Queries and expert documents use different representations to describe the same concepts [6, 7]. Term mismatches between entities and experts [7] occur due to the inability of widely used maximum-likelihood language models to make use of semantic similarities between words [9]. (2) As the amount of available data increases, the need for more powerful approaches with greater learning capabilities than smoothed maximum-likelihood language models is obvious [13]. (3) Supervised methods for entity or expertise retrieval [5, 8] were introduced at the turn of the last decade. However, the acceleration of data availability has the major disadvantage that, in the case of supervised methods, manual annotation efforts need to sustain a similar order of growth. This calls for the further development of unsupervised methods. (4) According to some entity or expertise retrieval methods, a language model is constructed for every document in the collection. These methods lack efficient query capabilities for large document collections, as each query term needs to be matched against every document [2]. In the talk we will discuss a recently proposed solution [12] that has a strong emphasis on unsupervised model construction, efficient query capabilities and, most importantly, semantic matching between query terms and candidate entities. We show that the proposed approach improves retrieval performance compared to generative language models mainly due to its ability to perform semantic matching [7]. The proposed method does not require any annotations or supervised relevance judgments and is able to learn from raw textual evidence and document-candidate associations alone. The purpose of the proposal is to provide insight in how we avoid explicit annotations and feature engineering and still obtain semantically meaningful retrieval results. In the talk we will provide a comparative error analysis between the proposed semantic entity retrieval model and traditional generative language models that perform exact matching, which yields important insights in the relative strengths of semantic matching and exact matching for the expert retrieval task in particular and entity retrieval in general. We will also discuss extensions of the proposed model that are meant to deal with scalability and dynamic aspects of entity and expert retrieval.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126661463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.
我们的目标是增加文本知识资源,如维基百科与来自万维网的信息,同时专注于给定的信息需求。我们展示了一个基于我们称之为知识组合的解决方案。知识组合是特定于查询的相关实体的集合,以及来自Web的相关段落,这些段落解释了实体如何与查询相关。通过对万维网和维基百科的检索,结合相互关联的推理过程,提取知识组合。一个关键因素是实体链接注释,它将知识库中的抽象实体绑定到Web上的上下文中。我们展示了我们的全自动系统Queripidia的结果,它能够在TREC Web track的数据上为任何Web样式的查询创建知识组合。在线演示可通过http://smart-cactus.org/~dietz/knowport/获得。
{"title":"An Interface Sketch for Queripidia: Query-driven Knowledge Portfolios from the Web","authors":"Laura Dietz, M. Schuhmacher","doi":"10.1145/2810133.2810145","DOIUrl":"https://doi.org/10.1145/2810133.2810145","url":null,"abstract":"We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130067967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarvnaz Karimi, Alejandro Metke-Jimenez, Anthony N. Nguyen
We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.
{"title":"CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects","authors":"Sarvnaz Karimi, Alejandro Metke-Jimenez, Anthony N. Nguyen","doi":"10.1145/2810133.2810143","DOIUrl":"https://doi.org/10.1145/2810133.2810143","url":null,"abstract":"We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116658763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}