Attention model-based and multi-organism driven gene recognition from text: application to a microbial biofilm organism set.

A. Bomgni, Ernest Basile Fotseu Fotseu, Daril Raoul Kengne Wambo, R. Sani, C. Lushbough, Etienne Z. Gnimpieba
{"title":"Attention model-based and multi-organism driven gene recognition from text: application to a microbial biofilm organism set.","authors":"A. Bomgni, Ernest Basile Fotseu Fotseu, Daril Raoul Kengne Wambo, R. Sani, C. Lushbough, Etienne Z. Gnimpieba","doi":"10.1109/BIBM55620.2022.9995269","DOIUrl":null,"url":null,"abstract":"Nowadays, online databases such as PUBMED and PMC are experiencing an explosion of publications in the field of biomedical sciences. With so much information available online, one of the biggest challenges is managing all that raw, unstructured data and making it machine-readable. Name entity recognition is nowadays a prerequisite for data identification and extraction in biosciences. One of the areas that allows automatic extraction of information from biomedical literature today is Name Entity Recognition. Indeed, it makes it possible to simplify the workflow analysis and automatic extraction of name entities, thus improving the various existing models. There is in the literature a lot of tools for this purpose, but they are unable to extract microbial genes accurately. Moreover, current goal standard corpora such as BIOCREATIVE I to IV have limited representation of microbial knowledge. In this paper, we proposed a new method to recognize biofilm gene mentions from free text. This method relies on a context-specific dictionary to annotate a consistent corpus necessary to train an efficient recognition model. Indeed, this method provides a new workflow for dataset collection generation for microbial biofilm gene. Trained on a set of biofilm organisms our method achieves a score of up to 94%, outperforming state-of-the-art frameworks.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Nowadays, online databases such as PUBMED and PMC are experiencing an explosion of publications in the field of biomedical sciences. With so much information available online, one of the biggest challenges is managing all that raw, unstructured data and making it machine-readable. Name entity recognition is nowadays a prerequisite for data identification and extraction in biosciences. One of the areas that allows automatic extraction of information from biomedical literature today is Name Entity Recognition. Indeed, it makes it possible to simplify the workflow analysis and automatic extraction of name entities, thus improving the various existing models. There is in the literature a lot of tools for this purpose, but they are unable to extract microbial genes accurately. Moreover, current goal standard corpora such as BIOCREATIVE I to IV have limited representation of microbial knowledge. In this paper, we proposed a new method to recognize biofilm gene mentions from free text. This method relies on a context-specific dictionary to annotate a consistent corpus necessary to train an efficient recognition model. Indeed, this method provides a new workflow for dataset collection generation for microbial biofilm gene. Trained on a set of biofilm organisms our method achieves a score of up to 94%, outperforming state-of-the-art frameworks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于注意模型和多生物驱动的文本基因识别:应用于微生物生物膜生物集。
如今,诸如PUBMED和PMC等在线数据库正在经历生物医学领域出版物的爆炸式增长。由于网上有如此多的信息,最大的挑战之一是管理所有这些原始的、非结构化的数据,并使其成为机器可读的。名称实体识别是当今生物科学中数据识别和提取的先决条件。目前,允许从生物医学文献中自动提取信息的领域之一是名称实体识别。实际上,它可以简化工作流分析和名称实体的自动提取,从而改进现有的各种模型。文献中有很多工具用于此目的,但它们无法准确地提取微生物基因。此外,目前的目标标准语料库,如BIOCREATIVE I到IV,对微生物知识的代表有限。本文提出了一种从自由文本中识别生物膜基因提及的新方法。该方法依赖于上下文特定的字典来注释一致的语料库,这是训练有效识别模型所必需的。该方法为微生物生物膜基因数据集的生成提供了一种新的工作流程。在一组生物膜生物上进行训练,我们的方法达到了高达94%的分数,优于最先进的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A framework for associating structural variants with cell-specific transcription factors and histone modifications in defect phenotypes Secure Password Using EEG-based BrainPrint System: Unlock Smartphone Password Using Brain-Computer Interface Technology On functional annotation with gene co-expression networks ST-ChIP: Accurate prediction of spatiotemporal ChIP-seq data with recurrent neural networks Discovering the Knowledge in Unstructured Early Drug Development Data Using NLP and Advanced Analytics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1