{"title":"Potential Benefits and Challenges of Quantifying Pseudoreplication in Genomic Data with Entropy Statistics.","authors":"Eric J Ward, Robin S Waples","doi":"10.3390/e26090805","DOIUrl":null,"url":null,"abstract":"<p><p>Generating vast arrays of genetic markers for evolutionary ecology studies has become routine and cost-effective. However, analyzing data from large numbers of loci associated with a small number of finite chromosomes introduces a challenge: loci on the same chromosome do not assort independently, leading to pseudoreplication. Previous studies have demonstrated that pseudoreplication can substantially reduce precision of genetic analyses (and make confidence intervals wider), such as <i>F<sub>ST</sub></i> and linkage disequilibrium (LD) measures between pairs of loci. In LD analyses, another type of dependency (overlapping pairs of the same loci) also creates pseudoreplication. Building on previous work, we explore the potential of entropy metrics to improve the status quo, particularly total correlation (TC), to assess pseudoreplication in LD studies. Our simulations, performed on a monoecious population with a range of effective population sizes (<i>N<sub>e</sub></i>) and numbers of loci, attempted to isolate the overlapping-pairs-of-loci effect by considering unlinked loci and using entropy to quantify inter-locus relationships. We hypothesized a positive correlation between TC and the number of loci (L), and a negative correlation between TC and <i>N<sub>e</sub></i>. Results from our statistical models predicting TC demonstrate a strong effect of the number of loci, and muted effects of <i>N<sub>e</sub></i> and other predictors, adding support to the use of entropy-based metrics as a tool for estimating the statistical information of complex genetic datasets. Our results also highlight a challenge regarding scalability; computational limitations arise as the number of loci grows, making our current approach limited to smaller datasets. Despite these challenges, this work further refines our understanding of entropy measures, and offers insights into the complex dynamics of genetic information in evolutionary ecology research.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 9","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11431677/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26090805","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Generating vast arrays of genetic markers for evolutionary ecology studies has become routine and cost-effective. However, analyzing data from large numbers of loci associated with a small number of finite chromosomes introduces a challenge: loci on the same chromosome do not assort independently, leading to pseudoreplication. Previous studies have demonstrated that pseudoreplication can substantially reduce precision of genetic analyses (and make confidence intervals wider), such as FST and linkage disequilibrium (LD) measures between pairs of loci. In LD analyses, another type of dependency (overlapping pairs of the same loci) also creates pseudoreplication. Building on previous work, we explore the potential of entropy metrics to improve the status quo, particularly total correlation (TC), to assess pseudoreplication in LD studies. Our simulations, performed on a monoecious population with a range of effective population sizes (Ne) and numbers of loci, attempted to isolate the overlapping-pairs-of-loci effect by considering unlinked loci and using entropy to quantify inter-locus relationships. We hypothesized a positive correlation between TC and the number of loci (L), and a negative correlation between TC and Ne. Results from our statistical models predicting TC demonstrate a strong effect of the number of loci, and muted effects of Ne and other predictors, adding support to the use of entropy-based metrics as a tool for estimating the statistical information of complex genetic datasets. Our results also highlight a challenge regarding scalability; computational limitations arise as the number of loci grows, making our current approach limited to smaller datasets. Despite these challenges, this work further refines our understanding of entropy measures, and offers insights into the complex dynamics of genetic information in evolutionary ecology research.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.