{"title":"在非独立观察的背景下进行高维监督分类,以确定表型中的决定性snp","authors":"Aboubacry Gaye , Abdou Ka Diongue , Lionel Nanguep Komen , Amadou Diallo , Seydou Nourou Sylla , Maryam Diarra , Cheikh Talla , Cheikh Loucoubar","doi":"10.1016/j.idm.2023.09.002","DOIUrl":null,"url":null,"abstract":"<div><p>This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.</p><p>Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.</p><p>In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.</p></div>","PeriodicalId":36831,"journal":{"name":"Infectious Disease Modelling","volume":null,"pages":null},"PeriodicalIF":8.8000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a7/8f/main.PMC10505671.pdf","citationCount":"0","resultStr":"{\"title\":\"High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype\",\"authors\":\"Aboubacry Gaye , Abdou Ka Diongue , Lionel Nanguep Komen , Amadou Diallo , Seydou Nourou Sylla , Maryam Diarra , Cheikh Talla , Cheikh Loucoubar\",\"doi\":\"10.1016/j.idm.2023.09.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.</p><p>Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.</p><p>In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.</p></div>\",\"PeriodicalId\":36831,\"journal\":{\"name\":\"Infectious Disease Modelling\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.8000,\"publicationDate\":\"2023-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a7/8f/main.PMC10505671.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infectious Disease Modelling\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468042723000842\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Disease Modelling","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468042723000842","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype
This work addresses the problem of supervised classification for highly correlated high-dimensional data describing non-independent observations to identify SNPs related to a phenotype. We use a general penalized linear mixed model with a single random effect that performs simultaneous SNP selection and population structure adjustment in high-dimensional prediction models. Specifically, the model simultaneously selects variables and estimates their effects, taking into account correlations between individuals.
Single nucleotide polymorphisms (SNPs) are a type of genetic variation and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct source population of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is of great importance.
In this study, we used uncorrelated variables from the construction of blocks of correlated variables done in a previous work to describe the most related observations of the dataset. The model was trained with 90% of the observations and tested with the remaining 10%. The best model obtained with the generalized information criterion (GIC) identified the SNP named rs2493311 located on the first chromosome of the gene called PRDM16 ((PR/SET domain 16)) as the most decisive factor in malaria attacks.
期刊介绍:
Infectious Disease Modelling is an open access journal that undergoes peer-review. Its main objective is to facilitate research that combines mathematical modelling, retrieval and analysis of infection disease data, and public health decision support. The journal actively encourages original research that improves this interface, as well as review articles that highlight innovative methodologies relevant to data collection, informatics, and policy making in the field of public health.