ChEBI的化学实体识别与解析。

ISRN bioinformatics Pub Date : 2012-02-15 eCollection Date: 2012-01-01 DOI:10.5402/2012/619427
Tiago Grego, Catia Pesquita, Hugo P Bastos, Francisco M Couto
{"title":"ChEBI的化学实体识别与解析。","authors":"Tiago Grego,&nbsp;Catia Pesquita,&nbsp;Hugo P Bastos,&nbsp;Francisco M Couto","doi":"10.5402/2012/619427","DOIUrl":null,"url":null,"abstract":"<p><p>Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"619427"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393067/pdf/","citationCount":"28","resultStr":"{\"title\":\"Chemical Entity Recognition and Resolution to ChEBI.\",\"authors\":\"Tiago Grego,&nbsp;Catia Pesquita,&nbsp;Hugo P Bastos,&nbsp;Francisco M Couto\",\"doi\":\"10.5402/2012/619427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. </p>\",\"PeriodicalId\":90877,\"journal\":{\"name\":\"ISRN bioinformatics\",\"volume\":\"2012 \",\"pages\":\"619427\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393067/pdf/\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISRN bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5402/2012/619427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2012/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISRN bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5402/2012/619427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

摘要

化学实体在生物医学文献中无处不在,需要开发能够有效识别这些实体的文本挖掘系统。由于缺乏可用的语料库和数据资源,社区一直致力于开发基因和蛋白质命名实体识别系统,但随着ChEBI的发布和注释语料库的可用性,这一任务可以得到解决。我们开发了一种基于机器学习的化学实体识别方法和一种基于词汇相似度的化学实体解析方法,并将它们与Whatizit(一种流行的基于词典的方法)进行了比较。我们的方法在所有任务中都优于基于字典的方法,实体识别任务的F-measure提高了20%,实体解析任务的F-measure提高了2-5%,实体识别和解析组合任务的F-measure提高了15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Chemical Entity Recognition and Resolution to ChEBI.

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hierarchical ensemble methods for protein function prediction. Comparison of merging and meta-analysis as alternative approaches for integrative gene expression analysis. NucVoter: A Voting Algorithm for Reliable Nucleosome Prediction Using Next-Generation Sequencing Data. Discovery of YopE Inhibitors by Pharmacophore-Based Virtual Screening and Docking. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1