A. Bomgni, Ernest Basile Fotseu Fotseu, Daril Raoul Kengne Wambo, R. Sani, C. Lushbough, Etienne Z. Gnimpieba
{"title":"基于注意模型和多生物驱动的文本基因识别:应用于微生物生物膜生物集。","authors":"A. Bomgni, Ernest Basile Fotseu Fotseu, Daril Raoul Kengne Wambo, R. Sani, C. Lushbough, Etienne Z. Gnimpieba","doi":"10.1109/BIBM55620.2022.9995269","DOIUrl":null,"url":null,"abstract":"Nowadays, online databases such as PUBMED and PMC are experiencing an explosion of publications in the field of biomedical sciences. With so much information available online, one of the biggest challenges is managing all that raw, unstructured data and making it machine-readable. Name entity recognition is nowadays a prerequisite for data identification and extraction in biosciences. One of the areas that allows automatic extraction of information from biomedical literature today is Name Entity Recognition. Indeed, it makes it possible to simplify the workflow analysis and automatic extraction of name entities, thus improving the various existing models. There is in the literature a lot of tools for this purpose, but they are unable to extract microbial genes accurately. Moreover, current goal standard corpora such as BIOCREATIVE I to IV have limited representation of microbial knowledge. In this paper, we proposed a new method to recognize biofilm gene mentions from free text. This method relies on a context-specific dictionary to annotate a consistent corpus necessary to train an efficient recognition model. Indeed, this method provides a new workflow for dataset collection generation for microbial biofilm gene. Trained on a set of biofilm organisms our method achieves a score of up to 94%, outperforming state-of-the-art frameworks.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Attention model-based and multi-organism driven gene recognition from text: application to a microbial biofilm organism set.\",\"authors\":\"A. Bomgni, Ernest Basile Fotseu Fotseu, Daril Raoul Kengne Wambo, R. Sani, C. Lushbough, Etienne Z. Gnimpieba\",\"doi\":\"10.1109/BIBM55620.2022.9995269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, online databases such as PUBMED and PMC are experiencing an explosion of publications in the field of biomedical sciences. With so much information available online, one of the biggest challenges is managing all that raw, unstructured data and making it machine-readable. Name entity recognition is nowadays a prerequisite for data identification and extraction in biosciences. One of the areas that allows automatic extraction of information from biomedical literature today is Name Entity Recognition. Indeed, it makes it possible to simplify the workflow analysis and automatic extraction of name entities, thus improving the various existing models. There is in the literature a lot of tools for this purpose, but they are unable to extract microbial genes accurately. Moreover, current goal standard corpora such as BIOCREATIVE I to IV have limited representation of microbial knowledge. In this paper, we proposed a new method to recognize biofilm gene mentions from free text. This method relies on a context-specific dictionary to annotate a consistent corpus necessary to train an efficient recognition model. Indeed, this method provides a new workflow for dataset collection generation for microbial biofilm gene. Trained on a set of biofilm organisms our method achieves a score of up to 94%, outperforming state-of-the-art frameworks.\",\"PeriodicalId\":210337,\"journal\":{\"name\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM55620.2022.9995269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Attention model-based and multi-organism driven gene recognition from text: application to a microbial biofilm organism set.
Nowadays, online databases such as PUBMED and PMC are experiencing an explosion of publications in the field of biomedical sciences. With so much information available online, one of the biggest challenges is managing all that raw, unstructured data and making it machine-readable. Name entity recognition is nowadays a prerequisite for data identification and extraction in biosciences. One of the areas that allows automatic extraction of information from biomedical literature today is Name Entity Recognition. Indeed, it makes it possible to simplify the workflow analysis and automatic extraction of name entities, thus improving the various existing models. There is in the literature a lot of tools for this purpose, but they are unable to extract microbial genes accurately. Moreover, current goal standard corpora such as BIOCREATIVE I to IV have limited representation of microbial knowledge. In this paper, we proposed a new method to recognize biofilm gene mentions from free text. This method relies on a context-specific dictionary to annotate a consistent corpus necessary to train an efficient recognition model. Indeed, this method provides a new workflow for dataset collection generation for microbial biofilm gene. Trained on a set of biofilm organisms our method achieves a score of up to 94%, outperforming state-of-the-art frameworks.