Automated ontology-based annotation of scientific literature using deep learning

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI:10.1145/3391274.3393636

Prashanti Manda, S. SayedAhmed, S. Mohanty

{"title":"Automated ontology-based annotation of scientific literature using deep learning","authors":"Prashanti Manda, S. SayedAhmed, S. Mohanty","doi":"10.1145/3391274.3393636","DOIUrl":null,"url":null,"abstract":"Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Workshop on Semantic Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3391274.3393636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用深度学习的基于本体的科学文献自动注释

使用本体表示科学知识可以实现数据集成、一致的机器可读数据表示，并允许大规模的计算分析。为了跟上科学出版的快速发展步伐，文本挖掘方法必须能够用本体概念自动处理和注释科学文献。在这里，我们提出了深度学习模型(门控循环单元(GRU)和长短期记忆(LSTM))，结合不同的输入编码格式，用于文本本体概念的自动命名实体识别(NER)。科罗拉多丰富注释全文(CRAFT)金标准语料库用于训练和测试我们的模型。使用Precision, Recall, F-1和Jaccard语义相似度来评估模型的性能。我们发现基于gru的模型在所有评估指标上都优于LSTM模型。令人惊讶的是，对于每个实例，考虑模型的前两个概率预测，而不是前一个，结果导致准确性大幅提高。通过包容推理包含本体语义产生了适度的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the International Workshop on Semantic Big Data

自引率

0.00%

发文量