Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky
{"title":"Increased discoverability of rare disease datasets through knowledge graph integration.","authors":"Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky","doi":"10.1093/jamiaopen/ooaf001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations.</p><p><strong>Materials and methods: </strong>We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards.</p><p><strong>Results: </strong>The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context.</p><p><strong>Discussion and conclusion: </strong>Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf001"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806703/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations.
Materials and methods: We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards.
Results: The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context.
Discussion and conclusion: Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.