{"title":"GOGCN: using deep learning to support insertion of new concepts into gene ontology","authors":"Cheng Chen, Lingyun Luo","doi":"10.1117/12.2689526","DOIUrl":null,"url":null,"abstract":"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.