Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research.

Aaron J Masino, Ranga Baminiwatte
{"title":"Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research.","authors":"Aaron J Masino, Ranga Baminiwatte","doi":"10.1109/icmla61862.2024.00154","DOIUrl":null,"url":null,"abstract":"<p><p>Rare disease diagnosis is challenging in large part due to incomplete knowledge of gene-to-phenotype associations. One way to address this is to adopt a gene-to-patient paradigm wherein one selects an in-silico predicted pathogenic variant, identifies individuals with the variant, and then determines if the individuals have a shared phenotype. Most studies following this paradigm determine presence of a shared phenotype through manual review of ontology terms in the patient record. We propose a novel automated method to identify the shared phenotype via genetic search using a fitness function that compares the similarity of phenotype term embeddings generated by advanced NLP models applied to the term's text descriptions. Leveraging Human Phenotype Ontology resources, we generated a library of simulated patients across 5,076 Mendelian diseases. Applying our approach to these simulated disease cohorts, we found that the solution phenotypes included a closely matching term for the majority of terms in the disease phenotype under variable conditions of annotation imprecision and noise. We anticipate these methods can aid gene-to-phenotype association discovery for rare diseases by enabling a scalable gene-to-patient research paradigm.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2024 ","pages":"1025-1030"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11967416/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icmla61862.2024.00154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Rare disease diagnosis is challenging in large part due to incomplete knowledge of gene-to-phenotype associations. One way to address this is to adopt a gene-to-patient paradigm wherein one selects an in-silico predicted pathogenic variant, identifies individuals with the variant, and then determines if the individuals have a shared phenotype. Most studies following this paradigm determine presence of a shared phenotype through manual review of ontology terms in the patient record. We propose a novel automated method to identify the shared phenotype via genetic search using a fitness function that compares the similarity of phenotype term embeddings generated by advanced NLP models applied to the term's text descriptions. Leveraging Human Phenotype Ontology resources, we generated a library of simulated patients across 5,076 Mendelian diseases. Applying our approach to these simulated disease cohorts, we found that the solution phenotypes included a closely matching term for the majority of terms in the disease phenotype under variable conditions of annotation imprecision and noise. We anticipate these methods can aid gene-to-phenotype association discovery for rare diseases by enabling a scalable gene-to-patient research paradigm.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于罕见病研究的未诊断队列的自动共享表型发现。
罕见病的诊断在很大程度上是具有挑战性的,因为对基因-表型关联的了解不完整。解决这个问题的一种方法是采用基因对患者的模式,其中选择一个计算机预测的致病变异,识别具有该变异的个体,然后确定个体是否具有共同的表型。遵循这一范式的大多数研究都是通过手动审查患者记录中的本体术语来确定共享表型的存在。我们提出了一种新的自动化方法,通过遗传搜索来识别共享表型,该方法使用适应度函数来比较应用于术语文本描述的高级NLP模型生成的表型术语嵌入的相似性。利用人类表型本体资源,我们生成了一个涵盖5,076种孟德尔疾病的模拟患者库。将我们的方法应用于这些模拟疾病队列,我们发现在注释不精确和噪声的可变条件下,溶液表型包含了与疾病表型中的大多数术语密切匹配的术语。我们预计这些方法可以通过实现可扩展的基因对患者的研究范式,帮助发现罕见疾病的基因-表型关联。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis Face Mask Detection Model Using Convolutional Neural Network Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Techniques Context-free Self-Conditioned GAN for Trajectory Forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1