PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2024-10-11 DOI:10.1186/s13040-024-00393-x

Philip J Freda, Attri Ghosh, Priyanka Bhandary, Nicholas Matsumoto, Apurva S Chitre, Jiayan Zhou, Molly A Hall, Abraham A Palmer, Tayo Obafemi-Ajayi, Jason H Moore

{"title":"PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies.","authors":"Philip J Freda, Attri Ghosh, Priyanka Bhandary, Nicholas Matsumoto, Apurva S Chitre, Jiayan Zhou, Molly A Hall, Abraham A Palmer, Tayo Obafemi-Ajayi, Jason H Moore","doi":"10.1186/s13040-024-00393-x","DOIUrl":null,"url":null,"abstract":"Background: The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies.Results: Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model.Conclusions: Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"41"},"PeriodicalIF":6.1000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468469/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00393-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies.

Results: Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model.

Conclusions: Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PAGER：一种新的基因型编码策略，用于对复杂性状关联研究中的加性偏差进行建模。

背景：加性遗传模型假定杂合子（Aa）与同源杂合子（AA 和 aa）完全处于中间状态。虽然这一模型通常用于单病灶遗传关联研究，但与加性遗传的显著偏差已得到充分证实，并导致许多性状和系统的表型变异。这一假设可能会高估或低估偏离可加性的变异的效应，从而导致 I 型和 II 型错误。为了解释不同的遗传模式，人们探索了其他基因型编码策略，但这些策略往往会产生巨大的计算或方法成本。为了应对这些挑战，我们引入了 PAGER（表型调整基因型编码和排序），这是一种高效的预处理方法，它根据二联基因型类别（AA、Aa 和 aa）之间的归一化平均表型差异对每个遗传变异进行编码。这种方法更准确地反映了每个变体的真实遗传模型，提高了模型的精确度，同时最大限度地降低了与其他编码策略相关的成本：通过对具有二元和连续表型的 SNPs 模拟进行广泛的基准测试，我们证明 PAGER 能准确表示各种遗传模式（包括加性、显性、隐性和杂合性），达到或超过其他编码策略的统计能力水平，而且计算速度比类似方法 EDGE 快达 55 倍。我们还将 PAGER 应用于公开的真实世界数据，并发现了一个与大鼠体重指数相关的新的、相关的假定 QTL，该 QTL 在加性模型中未被检测到：总之，我们证明了 PAGER 是一种高效的基因型编码方法，它能发现缺失遗传性的来源，并揭示复杂性状研究中的新见解，同时将成本降到最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.