Hayley R Stoneman, Adelle M Price, Nikole Scribner Trout, Riley Lamont, Souha Tifour, Nikita Pozdeyev, Kristy Crooks, Meng Lin, Nicholas Rafaels, Christopher R Gignoux, Katie M Marker, Audrey E Hendricks
{"title":"Characterizing substructure via mixture modeling in large-scale genetic summary statistics.","authors":"Hayley R Stoneman, Adelle M Price, Nikole Scribner Trout, Riley Lamont, Souha Tifour, Nikita Pozdeyev, Kristy Crooks, Meng Lin, Nicholas Rafaels, Christopher R Gignoux, Katie M Marker, Audrey E Hendricks","doi":"10.1016/j.ajhg.2024.12.007","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":""},"PeriodicalIF":8.1000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2024.12.007","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
期刊介绍:
The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.