利用基于熵的核密度估计鉴定基因与数量性状之间的关联。

Q2 Agricultural and Biological Sciences Genomics and Informatics Pub Date : 2022-06-01 Epub Date: 2022-06-30 DOI:10.5808/gi.22033

Jaeyong Yee, Taesung Park, Mira Park

{"title":"利用基于熵的核密度估计鉴定基因与数量性状之间的关联。","authors":"Jaeyong Yee, Taesung Park, Mira Park","doi":"10.5808/gi.22033","DOIUrl":null,"url":null,"abstract":"Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"20 2","pages":"e17"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299569/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.\",\"authors\":\"Jaeyong Yee, Taesung Park, Mira Park\",\"doi\":\"10.5808/gi.22033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.\",\"PeriodicalId\":36591,\"journal\":{\"name\":\"Genomics and Informatics\",\"volume\":\"20 2\",\"pages\":\"e17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299569/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5808/gi.22033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/6/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5808/gi.22033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 0

摘要

遗传关联已经用一些统计方法进行了量化。基于熵的互信息可能是估计关联的更直接的方法之一，因为它不依赖于参数化。为此，需要同时获得表型分布的熵和条件熵。然而，数量特征通常不允许熵的精确评估。熵的估计需要一个概率密度函数，这个概率密度函数可以用核密度估计近似。为了计算互信息，我们研究了将核密度估计和熵估计与概率密度函数相结合的适当程序顺序。构建基因型及其相互作用，为条件熵设置条件。使用三种类型的生成函数创建的大量模拟数据使用两种不同的核以及两种类型的多因素降维和另一种称为m-spacing的概率密度近似方法进行了分析。比较正确检出率方面的统计能力。当性状分布比简单的正态分布或伽玛分布更复杂时，使用核函数是最有用的。利用2小时口服葡萄糖耐量试验结果和γ-谷氨酰转肽酶水平作为表型，研究了一个完整的基因组数据集，以确定两者之间的关联。发现了与这些表型相关的明显可区分的单核苷酸多态性(SNP)和相互作用的SNP对，并列出了经验p值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊