Increased discoverability of rare disease datasets through knowledge graph integration.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES JAMIA Open Pub Date : 2025-02-06 eCollection Date: 2025-02-01 DOI:10.1093/jamiaopen/ooaf001
Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky
{"title":"Increased discoverability of rare disease datasets through knowledge graph integration.","authors":"Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky","doi":"10.1093/jamiaopen/ooaf001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations.</p><p><strong>Materials and methods: </strong>We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards.</p><p><strong>Results: </strong>The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context.</p><p><strong>Discussion and conclusion: </strong>Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf001"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806703/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations.

Materials and methods: We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards.

Results: The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context.

Discussion and conclusion: Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过知识图集成提高罕见疾病数据集的可发现性。
目的:展示一种方法,通过丰富具有生物学关联的源数据来提高罕见疾病数据集的可发现性。材料和方法:我们开发了Biolink语义模型的扩展,以纳入患者数据,并生成了一个知识图(KG),其中包括患者数据和现有KG中生物实体之间的关联,利用现有的映射和映射标准。结果:丰富的患者数据模型可以支持了解生物学关联的搜索应用程序,并提供语义搜索接口,以在更广泛的生物学背景下发现和总结患者数据集。讨论和结论:我们的方法丰富了数据集,增加了丰富的生物学知识,提高了可发现性。使用条件概念,我们说明了可以应用于源数据(如测量和观察)中的其他实体的技术。这项工作为如何对源数据进行建模以提高上游语言模型用于自然语言查询的准确性提供了一个基础框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
期刊最新文献
Patient perspectives about deployment of artificial intelligence decision support tools in a safety-net healthcare system. Real-time automated billing for tobacco treatment: performance evaluation of the CigStopper machine learning framework. Synergy of diagnosis coding between administrative claims and electronic health records of large patient populations across multiple healthcare organizations. Characterization and comparison of structured and unstructured electronic health record data mapped to MedDRA for post-marketing surveillance. Clinical validation of MyCog Mobile: development of a parsimonious and clinically interpretable prediction model for mild cognitive impairment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1