Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx
{"title":"Development of a novel chemoinformatic tool for natural product databases","authors":"Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx","doi":"10.4155/fdd-2023-0007","DOIUrl":null,"url":null,"abstract":"Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.","PeriodicalId":73122,"journal":{"name":"Future drug discovery","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future drug discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4155/fdd-2023-0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.