Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI).

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Biomedical Semantics Pub Date : 2024-10-17 DOI:10.1186/s13326-024-00320-3

Sabrina Toro, Anna V Anagnostopoulos, Susan M Bello, Kai Blumberg, Rhiannon Cameron, Leigh Carmody, Alexander D Diehl, Damion M Dooley, William D Duncan, Petra Fey, Pascale Gaudet, Nomi L Harris, Marcin P Joachimiak, Leila Kiani, Tiago Lubiana, Monica C Munoz-Torres, Shawn O'Neil, David Osumi-Sutherland, Aleix Puig-Barbe, Justin T Reese, Leonore Reiser, Sofia Mc Robb, Troy Ruemping, James Seager, Eric Sid, Ray Stefancsik, Magalie Weber, Valerie Wood, Melissa A Haendel, Christopher J Mungall

{"title":"Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI).","authors":"Sabrina Toro, Anna V Anagnostopoulos, Susan M Bello, Kai Blumberg, Rhiannon Cameron, Leigh Carmody, Alexander D Diehl, Damion M Dooley, William D Duncan, Petra Fey, Pascale Gaudet, Nomi L Harris, Marcin P Joachimiak, Leila Kiani, Tiago Lubiana, Monica C Munoz-Torres, Shawn O'Neil, David Osumi-Sutherland, Aleix Puig-Barbe, Justin T Reese, Leonore Reiser, Sofia Mc Robb, Troy Ruemping, James Seager, Eric Sid, Ray Stefancsik, Magalie Weber, Valerie Wood, Melissa A Haendel, Christopher J Mungall","doi":"10.1186/s13326-024-00320-3","DOIUrl":null,"url":null,"abstract":"Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources.Results: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues.Conclusions: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"19"},"PeriodicalIF":2.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484368/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-024-00320-3","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources.

Results: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues.

Conclusions: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用人工智能的本体动态检索增强生成（DRAGON-AI）。

背景：本体是生物医学、环境和食品科学等领域信息学基础设施的基本组成部分，以准确和可计算的形式代表共识知识。然而，本体的构建和维护需要大量资源，并且需要领域专家、馆长和本体专家之间的大量协作。我们提出了使用人工智能动态检索增强生成本体（DRAGON-AI），这是一种采用大型语言模型（LLM）和检索增强生成（RAG）的本体生成方法。DRAGON-AI 可以从多个本体和非结构化文本源中的现有知识中生成文本和逻辑本体组件：我们评估了 DRAGON-AI 在十个不同本体中从头构建术语的性能，并对结果进行了广泛的人工评估。我们的方法具有较高的关系生成精度，但精度略低于基于逻辑推理的方法。我们的方法还能生成专家评估员认为可以接受的定义，但这些定义的得分比人工撰写的定义要低。值得注意的是，对某一领域具有最高信任度的评估者能够更好地识别人工智能生成的定义中的缺陷。我们还展示了 DRAGON-AI 以 GitHub 问题的形式纳入自然语言指令的能力：这些发现表明 DRAGON-AI 有潜力为人工构建本体的过程提供实质性帮助。然而，我们的研究结果也强调了由专家策划人和本体编辑来推动本体生成过程的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.