在非独立观察的背景下进行高维监督分类,以确定表型中的决定性snp

IF 8.8 3区 医学 Q1 Medicine Infectious Disease Modelling Pub Date : 2023-09-09 DOI:10.1016/j.idm.2023.09.002
Aboubacry Gaye , Abdou Ka Diongue , Lionel Nanguep Komen , Amadou Diallo , Seydou Nourou Sylla , Maryam Diarra , Cheikh Talla , Cheikh Loucoubar
{"title":"在非独立观察的背景下进行高维监督分类,以确定表型中的决定性snp","authors":"Aboubacry Gaye ,&nbsp;Abdou Ka Diongue ,&nbsp;Lionel Nanguep Komen ,&nbsp;Amadou Diallo ,&nbsp;Seydou Nourou Sylla ,&nbsp;Maryam Diarra ,&nbsp;Cheikh Talla ,&nbsp;Cheikh Loucoubar","doi":"10.1016/j.idm.2023.09.002","DOIUrl":null,"url":null,"abstract":"<div><p>This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.</p><p>Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.</p><p>In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.</p></div>","PeriodicalId":36831,"journal":{"name":"Infectious Disease Modelling","volume":null,"pages":null},"PeriodicalIF":8.8000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a7/8f/main.PMC10505671.pdf","citationCount":"0","resultStr":"{\"title\":\"High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype\",\"authors\":\"Aboubacry Gaye ,&nbsp;Abdou Ka Diongue ,&nbsp;Lionel Nanguep Komen ,&nbsp;Amadou Diallo ,&nbsp;Seydou Nourou Sylla ,&nbsp;Maryam Diarra ,&nbsp;Cheikh Talla ,&nbsp;Cheikh Loucoubar\",\"doi\":\"10.1016/j.idm.2023.09.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.</p><p>Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.</p><p>In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.</p></div>\",\"PeriodicalId\":36831,\"journal\":{\"name\":\"Infectious Disease Modelling\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.8000,\"publicationDate\":\"2023-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a7/8f/main.PMC10505671.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infectious Disease Modelling\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468042723000842\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Disease Modelling","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468042723000842","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

这项工作解决了高度相关的高维数据的监督分类问题,这些数据描述了非独立的观察结果,以识别与表型相关的snp。在高维预测模型中,我们使用具有单一随机效应的一般惩罚线性混合模型,该模型同时进行SNP选择和种群结构调整。具体来说,该模型同时选择变量并估计其影响,同时考虑到个体之间的相关性。单核苷酸多态性(SNP)是一种遗传变异,每个SNP代表单个DNA构建块(即核苷酸)的差异。先前的研究表明,snp可以用于识别个体的正确源群体,并且可以单独或同时影响表型。在这方面,研究遗传学在传染病表型中的贡献是非常重要的。在本研究中,我们使用了先前工作中构建的相关变量块中的不相关变量来描述数据集中最相关的观测结果。该模型用90%的观测值进行训练,并用剩下的10%进行测试。利用广义信息准则(GIC)获得的最佳模型发现,位于PRDM16基因第一条染色体((PR/SET结构域16))上的rs2493311 SNP是疟疾发作的最决定性因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype

This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.

Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.

In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Infectious Disease Modelling
Infectious Disease Modelling Mathematics-Applied Mathematics
CiteScore
17.00
自引率
3.40%
发文量
73
审稿时长
17 weeks
期刊介绍: Infectious Disease Modelling is an open access journal that undergoes peer-review. Its main objective is to facilitate research that combines mathematical modelling, retrieval and analysis of infection disease data, and public health decision support. The journal actively encourages original research that improves this interface, as well as review articles that highlight innovative methodologies relevant to data collection, informatics, and policy making in the field of public health.
期刊最新文献
Flexible regression model for predicting the dissemination of Candidatus Liberibacter asiaticus under variable climatic conditions A heterogeneous continuous age-structured model of mumps with vaccine Assessing the impact of disease incidence and immunization on the resilience of complex networks during epidemics Exploring the influencing factors of scrub typhus in Gannan region, China, based on spatial regression modelling and geographical detector Regional variations in HIV diagnosis in Japan before and during the COVID-19 pandemic
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1