GOGCN: using deep learning to support insertion of new concepts into gene ontology

Cheng Chen, Lingyun Luo
{"title":"GOGCN: using deep learning to support insertion of new concepts into gene ontology","authors":"Cheng Chen, Lingyun Luo","doi":"10.1117/12.2689526","DOIUrl":null,"url":null,"abstract":"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GOGCN:利用深度学习支持将新概念插入基因本体
许多生物医学本体有规律地发展并随时间变化。一个本体的新版本将更新它的数据,包含修复以前版本中的一些错误,并添加许多新的概念以适应领域的发展。在本体的自动丰富中,将新概念插入到术语的适当位置是一个具有挑战性的问题。在过去,新概念总是由领域专家创造的。然后,专家将运行传统的分类器或人工操作来将新概念插入适当的位置。随着技术的发展,人们提出了基于机器学习(ML)的方法来帮助术语研究者开发和维护术语本体。本文提出了一种基于仅提供概念名称并使用聚合子字符串邻居信息的图卷积网络(GCN)学习方法的新方法。我们选择了双向长短期记忆网络(Bi-LSTM)模型作为预测任务的分类器。我们首先在Gene Ontology (GO) 2020年1月发布的版本中对该方法进行了测试,在预测IS-A直接链接的任务中,平均精度为89.68%,F1分数为0.9081。在比较2020年1月和2022年3月的版本时,我们预测了与新概念相关的链接,我们的平均准确率得分为0.6996。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A smart brain controlled wheelchair based on TGAM Multi-direction prediction based on SALSTM model for ship motion Study on heart disease prediction based on SVM-GBDT hybrid model Research on intelligent monitoring of roof distributed photovoltaics based on high-reliable power line and wireless communication Design of low-power acceleration processor for convolutional neural networks based on RISC-V
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1