首页 > 最新文献

Workshop on Biomedical Natural Language Processing最新文献

英文 中文
Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second–Order Vectors 利用二阶向量与语义相似度的集成改进人类判断的相关性
Pub Date : 2016-09-02 DOI: 10.18653/v1/W17-2313
Bridget T. McInnes, Ted Pedersen
Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co–occurrence frequencies or statistical measures of association to weight the importance of particular co–occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second–order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus–based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co-occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have recently been evaluated on the same reference standards we have used.
测量语义相似性和相关性的向量空间方法通常依赖于分布信息,如共现频率或关联的统计度量,以加权特定共现的重要性。在本文中,我们通过将基于人类分类的语义相似性度量纳入二阶向量表示来扩展这些方法。这就产生了一种语义相关性度量,它将基于语料库的向量空间表示中可用的上下文信息与生物医学本体中发现的语义知识结合起来。我们的研究结果表明,将语义相似度结合到二阶共现矩阵中可以提高与人类对相似性和相关性判断的相关性,并且我们的方法比最近在我们使用的相同参考标准上评估的各种不同的词嵌入方法更有利。
{"title":"Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second–Order Vectors","authors":"Bridget T. McInnes, Ted Pedersen","doi":"10.18653/v1/W17-2313","DOIUrl":"https://doi.org/10.18653/v1/W17-2313","url":null,"abstract":"Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co–occurrence frequencies or statistical measures of association to weight the importance of particular co–occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second–order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus–based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co-occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have recently been evaluated on the same reference standards we have used.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133906502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016 2016年BioNLP共享任务细菌群落任务综述
Pub Date : 2016-08-13 DOI: 10.18653/v1/W16-3002
Louise Deléger, Robert Bossy, Estelle Chaix, Mouhamadou Ba, Arnaud Ferré, P. Bessières, C. Nédellec
This paper presents the Bacteria Biotope task of the BioNLP Shared Task 2016, which follows the previous 2013 and 2011 editions. The task focuses on the extraction of the locations (biotopes and geographical places) of bacteria from PubMe abstracts and the characterization of bacteria and their associated habitats withrespect to reference knowledge sources (NCBI taxonomy, OntoBiotope ontology). The task is motivated by the importance of the knowledge on bacteria habitats for fundamental research and applications in microbiology. The paper describes the different proposed subtasks, the corpus characteristics, the challenge organization, and the evaluation metrics. We also provide an analysis of the results obtained by participants.
本文介绍了继2013年和2011年版本之后,2016年BioNLP共享任务中的细菌生物群任务。该任务侧重于从PubMe摘要中提取细菌的位置(生物群落和地理位置),并根据参考知识来源(NCBI分类学,生物群落本体)对细菌及其相关栖息地进行表征。这项任务的动机是细菌栖息地知识对微生物学基础研究和应用的重要性。本文描述了不同的提议子任务、语料库特征、挑战组织和评估指标。我们还提供了对参与者获得的结果的分析。
{"title":"Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016","authors":"Louise Deléger, Robert Bossy, Estelle Chaix, Mouhamadou Ba, Arnaud Ferré, P. Bessières, C. Nédellec","doi":"10.18653/v1/W16-3002","DOIUrl":"https://doi.org/10.18653/v1/W16-3002","url":null,"abstract":"This paper presents the Bacteria Biotope task of the BioNLP Shared Task 2016, which follows the previous 2013 and 2011 editions. The task focuses on the extraction of the locations (biotopes and geographical places) of bacteria from PubMe abstracts and the characterization of bacteria and their associated habitats with\u0000respect to reference knowledge sources (NCBI taxonomy, OntoBiotope ontology). The task is motivated by the importance of the knowledge on bacteria habitats for fundamental research and applications in microbiology. The paper describes the different proposed subtasks, the corpus characteristics, the challenge organization, and the evaluation metrics. We also provide an analysis of the results obtained by participants.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130802174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Overview of the Regulatory Network of Plant Seed Development (SeeDev) Task at the BioNLP Shared Task 2016. 2016年BioNLP共享任务中植物种子发育监管网络(SeeDev)任务概述
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-3001
Estelle Chaix, B. Dubreucq, Abdelhak Fatihi, Dialekti Valsamou, Robert Bossy, Mouhamadou Ba, Louise Deléger, Pierre Zweigenbaum, P. Bessières, L. Lepiniec, C. Nédellec
This paper presents the SeeDev Task of the BioNLP Shared Task 2016. The purpose of the SeeDev Task is the extraction from scientific articles of the descriptions of genetic and molecular mechanisms involved in seed development of the model plant, Arabidopsis thaliana. The SeeDev task consists in the extraction of many different event types that involve a wide range of entity types so that they accurately reflect the complexity of the biological mechanisms. The corpus is composed of paragraphs selected from the full-texts of relevant scientific articles. In this paper, we describe the organization of the SeeDev task, the corpus characteristics, and the metrics used for the evaluation of participant systems. We analyze and discuss the final results of the seven participant systems to the test. The best F-score is 0.432, which is similar to the scores achieved in similar tasks on molecular biology.
本文介绍了BioNLP共享任务2016中的SeeDev任务。种子开发任务的目的是从科学文章中提取模式植物拟南芥种子发育的遗传和分子机制的描述。SeeDev任务包括提取许多不同的事件类型,这些事件类型涉及广泛的实体类型,以便它们准确地反映生物机制的复杂性。语料库由从相关科学文章的全文中选择的段落组成。在本文中,我们描述了SeeDev任务的组织、语料库特征以及用于评估参与系统的度量。对七个参与系统的最终测试结果进行了分析和讨论。最好的f值是0.432,这与分子生物学类似任务的得分相近。
{"title":"Overview of the Regulatory Network of Plant Seed Development (SeeDev) Task at the BioNLP Shared Task 2016.","authors":"Estelle Chaix, B. Dubreucq, Abdelhak Fatihi, Dialekti Valsamou, Robert Bossy, Mouhamadou Ba, Louise Deléger, Pierre Zweigenbaum, P. Bessières, L. Lepiniec, C. Nédellec","doi":"10.18653/v1/W16-3001","DOIUrl":"https://doi.org/10.18653/v1/W16-3001","url":null,"abstract":"This paper presents the SeeDev Task of the BioNLP Shared Task 2016. The purpose of the SeeDev Task is the extraction from scientific articles of the descriptions of genetic and molecular mechanisms involved in seed development of the model plant, Arabidopsis thaliana. The SeeDev task consists in the extraction of many different event types that involve a wide range of entity types so that they accurately reflect the complexity of the biological mechanisms. The corpus is composed of paragraphs selected from the full-texts of relevant scientific articles. In this paper, we describe the organization of the SeeDev task, the corpus characteristics, and the metrics used for the evaluation of participant systems. We analyze and discuss the final results of the seven participant systems to the test. The best F-score is 0.432, which is similar to the scores achieved in similar tasks on molecular biology.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114774770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Identification of Mentions and Relations between Bacteria and Biotope from PubMed Abstracts PubMed摘要中细菌与生物群落关系的鉴定
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-3008
Cyril Grouin
This paper presents our participation in the Bacteria/Biotope track from the 2016 BioNLP Shared-Task. Our methods rely on a combination of distinct machinelearning and rule-based systems. We used CRF and post-processing rules to identify mentions of bacteria and biotopes, a rulebased approach to normalize the concepts in the ontology and the taxonomy, and SVM to identify relations between bacteria and biotopes. On the test datasets, we achieved similar results to those obtained on the development datasets: on the categorization task, precision of 0.503 (gold standard entities) and SER of 0.827 (both NER and categorization); on the event relation task, F-measure of 0.485 (gold standard entities, ranking third out of 11) and of 0.192 (both NER and event relation, ranking first); on the knowledgebased task, mean references of 0.771 (gold standard entities) and of 0.202 (both NER, categorization and event relation).
本文介绍了我们在2016年BioNLP共享任务中参与的细菌/生物圈轨道。我们的方法依赖于不同的机器学习和基于规则的系统的组合。我们使用CRF和后处理规则来识别提到的细菌和生物群落,一种基于规则的方法来规范化本体和分类法中的概念,以及SVM来识别细菌和生物群落之间的关系。在测试数据集上,我们获得了与开发数据集相似的结果:在分类任务上,精度为0.503(金标准实体),SER为0.827 (NER和分类);在事件关系任务上,F-measure为0.485(金标准主体,在11个主体中排名第三),F-measure为0.192 (NER和事件关系都是,排名第一);在基于知识的任务中,平均引用量为0.771(金标准实体),平均引用量为0.202 (NER、分类和事件关系)。
{"title":"Identification of Mentions and Relations between Bacteria and Biotope from PubMed Abstracts","authors":"Cyril Grouin","doi":"10.18653/v1/W16-3008","DOIUrl":"https://doi.org/10.18653/v1/W16-3008","url":null,"abstract":"This paper presents our participation in the Bacteria/Biotope track from the 2016 BioNLP Shared-Task. Our methods rely on a combination of distinct machinelearning and rule-based systems. We used CRF and post-processing rules to identify mentions of bacteria and biotopes, a rulebased approach to normalize the concepts in the ontology and the taxonomy, and SVM to identify relations between bacteria and biotopes. On the test datasets, we achieved similar results to those obtained on the development datasets: on the categorization task, precision of 0.503 (gold standard entities) and SER of 0.827 (both NER and categorization); on the event relation task, F-measure of 0.485 (gold standard entities, ranking third out of 11) and of 0.192 (both NER and event relation, ranking first); on the knowledgebased task, mean references of 0.771 (gold standard entities) and of 0.202 (both NER, categorization and event relation).","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129448417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Ontology-Based Categorization of Bacteria and Habitat Entities using Information Retrieval Techniques 基于信息检索技术的细菌和生境实体本体分类
Pub Date : 2016-08-01 DOI: 10.18653/v1/w16-3007
Mert Tiftikci, H. Sahin, Berfu Büyüköz, Alper Yayikçi, Arzucan Özgür
A database which provides information about bacteria and their habitats in a comprehensive and normalized way is crucial for applied microbiology studies. Having this information spread through textual resources such as scientific articles and web pages leads to a need for automatically detecting bacteria and habitat entities in text, semantically tagging them using ontologies, and finally extracting the events among them. These are the challenges set forth by the Bacteria Biotopes Task of the BioNLP Shared Task 2016. This paper describes a system for habitat and bacteria entity normalization through the OntoBiotope ontology and the NCBI taxonomy, respectively. The system, which obtained promising results on the shared task data set, utilizes basic information retrieval techniques.
一个全面、规范地提供细菌及其栖息地信息的数据库对应用微生物学研究至关重要。通过科学文章和网页等文本资源传播这些信息,需要自动检测文本中的细菌和栖息地实体,使用本体对它们进行语义标记,并最终提取其中的事件。这些都是BioNLP共享任务2016中的细菌生物群任务所提出的挑战。本文介绍了一种利用生物本体(OntoBiotope ontology)和NCBI分类法分别对生境和细菌实体进行规范化的系统。该系统利用了基本的信息检索技术,在共享任务数据集上取得了良好的效果。
{"title":"Ontology-Based Categorization of Bacteria and Habitat Entities using Information Retrieval Techniques","authors":"Mert Tiftikci, H. Sahin, Berfu Büyüköz, Alper Yayikçi, Arzucan Özgür","doi":"10.18653/v1/w16-3007","DOIUrl":"https://doi.org/10.18653/v1/w16-3007","url":null,"abstract":"A database which provides information about bacteria and their habitats in a comprehensive and normalized way is crucial for applied microbiology studies. Having this information spread through textual resources such as scientific articles and web pages leads to a need for automatically detecting bacteria and habitat entities in text, semantically tagging them using ontologies, and finally extracting the events among them. These are the challenges set forth by the Bacteria Biotopes Task of the BioNLP Shared Task 2016. This paper describes a system for habitat and bacteria entity normalization through the OntoBiotope ontology and the NCBI taxonomy, respectively. The system, which obtained promising results on the shared task data set, utilizes basic information retrieval techniques.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121726401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations BioNLP-ST 2016中的DUTIR:利用卷积网络和分布式表示提取复杂关系
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-3012
Honglei Li, Jianhai Zhang, Jian Wang, Hongfei Lin, Zhihao Yang
We participate in the two event extraction tasks of BioNLP 2016 Shared Task: binary relation extraction of SeeDev task and localization relations extraction of Bacteria Biotope task. Convolutional neural network (CNN) is employed to model the sentences by convolution and maxpooling operation from raw input with word embedding. Then, full connected neural network is used to learn senior and significant features automatically. The proposed model mainly contains two modules: distributive semantic representation building, such as word embedding, POS embedding, distance embedding and entity type embedding, and CNN model training. The results with F-score of 0.370 and 0.478 in our participant tasks, which were evaluated on the test data set, show that our proposed method contributes to binary relation extraction effectively and can reduce the impact of artificial feature engineering through automatically feature learning.
我们参与了BioNLP 2016共享任务的两个事件提取任务:SeeDev任务的二元关系提取和Bacteria Biotope任务的定位关系提取。利用卷积神经网络(CNN)对原始输入进行卷积和最大池化操作,并结合词嵌入对句子进行建模。然后,利用全连接神经网络自动学习高级特征和重要特征。该模型主要包含两个模块:分布式语义表示构建,如词嵌入、POS嵌入、距离嵌入和实体类型嵌入;CNN模型训练。在测试数据集上对我们的参与者任务进行评估,f值分别为0.370和0.478,结果表明我们的方法可以有效地提取二元关系,并且可以通过自动特征学习减少人工特征工程的影响。
{"title":"DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations","authors":"Honglei Li, Jianhai Zhang, Jian Wang, Hongfei Lin, Zhihao Yang","doi":"10.18653/v1/W16-3012","DOIUrl":"https://doi.org/10.18653/v1/W16-3012","url":null,"abstract":"We participate in the two event extraction tasks of BioNLP 2016 Shared Task: binary relation extraction of SeeDev task and localization relations extraction of Bacteria Biotope task. Convolutional neural network (CNN) is employed to model the sentences by convolution and maxpooling operation from raw input with word embedding. Then, full connected neural network is used to learn senior and significant features automatically. The proposed model mainly contains two modules: distributive semantic representation building, such as word embedding, POS embedding, distance embedding and entity type embedding, and CNN model training. The results with F-score of 0.370 and 0.478 in our participant tasks, which were evaluated on the test data set, show that our proposed method contributes to binary relation extraction effectively and can reduce the impact of artificial feature engineering through automatically feature learning.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A dictionary- and rule-based system for identification of bacteria and habitats in text 一个基于字典和规则的系统,用于识别文本中的细菌和栖息地
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-3006
H. Cook, E. Pafilis, L. Jensen
The number of scientific papers published each year is growing exponentially and given the rate of this growth, automated information extraction is needed to efficiently extract information from this corpus. A critical first step in this process is to accurately recognize the names of entities in text. Previous efforts, such as SPECIES, have identified bacteria strain names, among other taxonomic groups, but have been limited to those names present in NCBI taxonomy. We have implemented a dictionary-based named entity tagger, TagIt, that is followed by a rule based expansion system to identify bacteria strain names and habitats and resolve them to the closest match possible in the NCBI taxonomy and the OntoBiotope ontology respectively. The rule based post processing steps expand acronyms, and extend strain names according to a set of rules, which captures additional aliases and strains that are not present in the dictionary. TagIt has the best performance out of three entries to BioNLP-ST BB3 cat+ner, with an overall SER of 0.628 on the independent test set.
每年发表的科学论文数量呈指数级增长,鉴于这种增长速度,需要自动化信息提取来有效地从该语料库中提取信息。这个过程的关键第一步是准确识别文本中实体的名称。以前的努力,如物种,已经确定了细菌菌株名称,在其他分类组中,但仅限于NCBI分类中存在的那些名称。我们实现了一个基于字典的命名实体标记器TagIt,然后是一个基于规则的扩展系统来识别细菌菌株名称和栖息地,并将它们分别解析为NCBI分类法和OntoBiotope本体中最接近的匹配。基于规则的后处理步骤根据一组规则扩展首字母缩略词,并扩展品系名称,这些规则捕获字典中不存在的附加别名和品系。TagIt在BioNLP-ST BB3 cat+ner的三个条目中表现最好,在独立测试集上的总体SER为0.628。
{"title":"A dictionary- and rule-based system for identification of bacteria and habitats in text","authors":"H. Cook, E. Pafilis, L. Jensen","doi":"10.18653/v1/W16-3006","DOIUrl":"https://doi.org/10.18653/v1/W16-3006","url":null,"abstract":"The number of scientific papers published each year is growing exponentially and given the rate of this growth, automated information extraction is needed to efficiently extract information from this corpus. A critical first step in this process is to accurately recognize the names of entities in text. Previous efforts, such as SPECIES, have identified bacteria strain names, among other taxonomic groups, but have been limited to those names present in NCBI taxonomy. We have implemented a dictionary-based named entity tagger, TagIt, that is followed by a rule based expansion system to identify bacteria strain names and habitats and resolve them to the closest match possible in the NCBI taxonomy and the OntoBiotope ontology respectively. The rule based post processing steps expand acronyms, and extend strain names according to a set of rules, which captures additional aliases and strains that are not present in the dictionary. TagIt has the best performance out of three entries to BioNLP-ST BB3 cat+ner, with an overall SER of 0.628 on the independent test set.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126734862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
LitWay, Discriminative Extraction for Different Bio-Events 不同生物事件的判别提取
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-3004
Chen Li, Zhiqiang Rao, Xiangrong Zhang
Even a simple biological phenomenon may introduce a complex network of molecular interactions. Scientific literature is one of the trustful resources delivering knowledge of these networks. We propose LitWay, a system for extracting semantic relations from texts. LitWay utilizes a hybrid method that combines both a rule-based method and a machine learning-based method. It is tested on the SeeDev task of BioNLP-ST 2016, achieves the state-of-the-art performance with the F-score of 43.2%, ranking first of all participating teams. To further reveal the linguistic characteristics of each event, we test the system solely with syntactic rules or machine learning, and different combinations of two methods. We find that it is difficult for one method to achieve good performance for all semantic relation types due to the complication of bio-events in the literatures.
即使是一个简单的生物现象也可能引入一个复杂的分子相互作用网络。科学文献是传递这些网络知识的可靠资源之一。我们提出了LitWay,一个从文本中提取语义关系的系统。LitWay采用了一种混合方法,结合了基于规则的方法和基于机器学习的方法。在BioNLP-ST 2016的SeeDev任务测试中,以43.2%的f分取得了最先进的成绩,在所有参赛团队中排名第一。为了进一步揭示每个事件的语言特征,我们单独使用语法规则或机器学习以及两种方法的不同组合来测试系统。我们发现,由于文献中生物事件的复杂性,一种方法很难对所有语义关系类型达到良好的性能。
{"title":"LitWay, Discriminative Extraction for Different Bio-Events","authors":"Chen Li, Zhiqiang Rao, Xiangrong Zhang","doi":"10.18653/v1/W16-3004","DOIUrl":"https://doi.org/10.18653/v1/W16-3004","url":null,"abstract":"Even a simple biological phenomenon may introduce a complex network of molecular interactions. Scientific literature is one of the trustful resources delivering knowledge of these networks. We propose LitWay, a system for extracting semantic relations from texts. LitWay utilizes a hybrid method that combines both a rule-based method and a machine learning-based method. It is tested on the SeeDev task of BioNLP-ST 2016, achieves the state-of-the-art performance with the F-score of 43.2%, ranking first of all participating teams. To further reveal the linguistic characteristics of each event, we test the system solely with syntactic rules or machine learning, and different combinations of two methods. We find that it is difficult for one method to achieve good performance for all semantic relation types due to the complication of bio-events in the literatures.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123445857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Statistical Term Profiling for Query Pattern Mining 用于查询模式挖掘的统计术语分析
Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572336
P. Buitelaar, P. Wennerberg, S. Zillner
Through advanced technologies in clinical care and research, especially the rapid progress in imaging technologies, more and more medical imaging data and patient text data is generated by hospitals, pharmaceutical companies, and medical research. For enabling advanced access to clinical imaging and text data, it is relevant to know what kind of knowledge the clinician wants to know or the queries that clinicians are interested in. Through intensive interviews and discussions with radiologists and clinicians, we have learned that medical imaging data is analyzed - and hence queried -- from three different perspectives, i.e. the anatomic perspective addressing the involved body parts, the radiology-specific spatial perspective describing the relationships of located anatomical regions to other anatomical parts, and the disease perspective distinguishing between normal and abnormal imaging features. Our aim is to establish query patterns reflecting those three perspectives that would typically be used by clinicians and radiologists to find patient-specific sets of relevant images.
通过临床护理和研究的先进技术,特别是成像技术的快速进步,越来越多的医学影像数据和患者文本数据由医院、制药公司和医学研究产生。为了实现对临床影像和文本数据的高级访问,了解临床医生想要了解的知识或临床医生感兴趣的查询是相关的。通过与放射科医生和临床医生的密集访谈和讨论,我们了解到医学影像数据是从三个不同的角度进行分析和查询的,即解剖角度处理涉及的身体部位,放射学特定的空间角度描述所定位的解剖区域与其他解剖部位的关系,以及区分正常和异常影像特征的疾病角度。我们的目标是建立反映这三种视角的查询模式,临床医生和放射科医生通常使用这三种视角来查找患者特定的相关图像集。
{"title":"Statistical Term Profiling for Query Pattern Mining","authors":"P. Buitelaar, P. Wennerberg, S. Zillner","doi":"10.3115/1572306.1572336","DOIUrl":"https://doi.org/10.3115/1572306.1572336","url":null,"abstract":"Through advanced technologies in clinical care and research, especially the rapid progress in imaging technologies, more and more medical imaging data and patient text data is generated by hospitals, pharmaceutical companies, and medical research. For enabling advanced access to clinical imaging and text data, it is relevant to know what kind of knowledge the clinician wants to know or the queries that clinicians are interested in. Through intensive interviews and discussions with radiologists and clinicians, we have learned that medical imaging data is analyzed - and hence queried -- from three different perspectives, i.e. the anatomic perspective addressing the involved body parts, the radiology-specific spatial perspective describing the relationships of located anatomical regions to other anatomical parts, and the disease perspective distinguishing between normal and abnormal imaging features. Our aim is to establish query patterns reflecting those three perspectives that would typically be used by clinicians and radiologists to find patient-specific sets of relevant images.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128804972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Textual Information for Predicting Functional Properties of the Genes 用于预测基因功能特性的文本信息
Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572334
Oana Frunza, D. Inkpen
This paper is focused on determining which proteins affect the activity of Aryl Hydrocarbon Receptor (AHR) system when learning a model that can accurately predict its activity when single genes are knocked out. Experiments with results are presented when models are trained on a single source of information: abstracts from Medline (http://medline.cos.com/) that talk about the genes involved in the experiments. The results suggest that AdaBoost classifier with a binary bag-of-words representation obtains significantly better results.
本文的研究重点是确定哪些蛋白质会影响芳烃受体(Aryl Hydrocarbon Receptor, AHR)系统的活性,同时学习一个能够准确预测单个基因敲除时AHR系统活性的模型。当模型在单一信息源上进行训练时,实验结果就会出现:Medline (http://medline.cos.com/)的摘要,其中讨论了实验中涉及的基因。结果表明,采用二元词袋表示的AdaBoost分类器获得了明显更好的结果。
{"title":"Textual Information for Predicting Functional Properties of the Genes","authors":"Oana Frunza, D. Inkpen","doi":"10.3115/1572306.1572334","DOIUrl":"https://doi.org/10.3115/1572306.1572334","url":null,"abstract":"This paper is focused on determining which proteins affect the activity of Aryl Hydrocarbon Receptor (AHR) system when learning a model that can accurately predict its activity when single genes are knocked out. Experiments with results are presented when models are trained on a single source of information: abstracts from Medline (http://medline.cos.com/) that talk about the genes involved in the experiments. The results suggest that AdaBoost classifier with a binary bag-of-words representation obtains significantly better results.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121453608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Workshop on Biomedical Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1