PopGenAdapt：在代表性不足的人群中进行基因型到表型预测的半监督领域适应。

Q2 Computer Science Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2024-01-01

Marçal Comajoan Cara, Daniel Mas Montserrat, Alexander G Ioannidis

{"title":"PopGenAdapt：在代表性不足的人群中进行基因型到表型预测的半监督领域适应。","authors":"Marçal Comajoan Cara, Daniel Mas Montserrat, Alexander G Ioannidis","doi":"","DOIUrl":null,"url":null,"abstract":"The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906137/pdf/","citationCount":"0","resultStr":"{\"title\":\"PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations.\",\"authors\":\"Marçal Comajoan Cara, Daniel Mas Montserrat, Alexander G Ioannidis\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.\",\"PeriodicalId\":34954,\"journal\":{\"name\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906137/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

基因组数据集目前偏重于欧洲血统的个体，缺乏多样性，这给开发包容性生物医学模型带来了挑战。此类数据的稀缺性在包含与电子健康记录相关联的基因组数据的标记数据集中尤为明显。为了弥补这一不足，本文介绍了一种基因型到表型预测模型 PopGenAdapt，它采用了最初为计算机视觉提出的半监督领域适应（SSDA）技术。PopGenAdapt 的设计目的是利用欧洲血统个体的大量标注数据，以及目前代表性不足人群的有限标注数据和大量未标注数据。该方法在来自尼日利亚、斯里兰卡和夏威夷的代表性不足人群中进行了评估，以预测几种疾病的结果。结果表明，与最先进的监督学习方法相比，针对这些人群的基因型到表型模型的性能有了显著提高，这使得 SSDA 成为在生物医学研究中创建更具包容性的机器学习模型的一种有前途的策略。我们的代码可在 https://github.com/AI-sandbox/PopGenAdapt 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations.

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量