BCC-NER:双向,上下文线索命名的实体标记器,用于基因/蛋白质提及识别。

Gurusamy Murugesan, Sabenabanu Abdulkadhar, Balu Bhasuran, Jeyakumar Natarajan
{"title":"BCC-NER:双向,上下文线索命名的实体标记器,用于基因/蛋白质提及识别。","authors":"Gurusamy Murugesan,&nbsp;Sabenabanu Abdulkadhar,&nbsp;Balu Bhasuran,&nbsp;Jeyakumar Natarajan","doi":"10.1186/s13637-017-0060-6","DOIUrl":null,"url":null,"abstract":"<p><p>Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"7"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0060-6","citationCount":"18","resultStr":"{\"title\":\"BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.\",\"authors\":\"Gurusamy Murugesan,&nbsp;Sabenabanu Abdulkadhar,&nbsp;Balu Bhasuran,&nbsp;Jeyakumar Natarajan\",\"doi\":\"10.1186/s13637-017-0060-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.</p>\",\"PeriodicalId\":72957,\"journal\":{\"name\":\"EURASIP journal on bioinformatics & systems biology\",\"volume\":\"2017 1\",\"pages\":\"7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13637-017-0060-6\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EURASIP journal on bioinformatics & systems biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13637-017-0060-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2017/5/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EURASIP journal on bioinformatics & systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13637-017-0060-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/5/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

标记生物医学实体(如基因、蛋白质、细胞、细胞系)是生物医学文献挖掘的第一步,也是重要的先决条件。在本文中,我们描述了我们的混合命名实体标记方法,即BCC-NER(双向,上下文线索命名实体标记器用于基因/蛋白质提及识别)。BCC-NER部署了三个模块。第一个模块用于文本处理,包括基本的自然语言处理预处理、特征提取和特征选择。第二个模块是使用双向条件随机场(CRF)进行训练和模型构建,在两个方向(向前和向后)上解析文本,并使用边缘注入放松算法(MIRA)整合向后和向前训练的模型。第三个也是最后一个模块用于后处理,以获得更好的性能,其中包括围绕文本特征,括号不匹配和两层缩写算法。BCC-NER在BioCreative II GM测试语料库上的评价结果,准确率为89.95,召回率为84.15,总体f分为86.95,高于目前其他开源标注器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.

Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data. On biometric systems: electrocardiogram Gaussianity and data synthesis. BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. Review of stochastic hybrid systems with applications in biological systems modeling and analysis. Bayesian inference for biomarker discovery in proteomics: an analytic solution.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1