X.Q. Wang , L.G. Wang , L.Y. Shi , J.J. Tian , M.Y. Li , L.X. Wang , F.P. Zhao
{"title":"低覆盖率全基因组测序数据的估算策略及其对猪基因组预测和全基因组关联研究的影响","authors":"X.Q. Wang , L.G. Wang , L.Y. Shi , J.J. Tian , M.Y. Li , L.X. Wang , F.P. Zhao","doi":"10.1016/j.animal.2024.101258","DOIUrl":null,"url":null,"abstract":"<div><p>The uncertainty resulting from missing genotypes in low-coverage whole-genome sequencing (<strong>LCWGS</strong>) data complicates genotype imputation. The aim of this study is to find out an optimal strategy for accurately imputing LCWGS data and assess its effectiveness for genomic prediction (<strong>GP</strong>) and genome-wide association study (<strong>GWAS</strong>) on economically important traits of Large White pigs. The LCWGS data of 1 423 Large White pigs were imputed using three different strategies: (1) using the high-coverage whole-genome sequencing (<strong>HCWGS</strong>) of 30 key progenitors as the reference panel (<strong>Ref_LG</strong>); (2) mixing HCWGS of key progenitors with LCWGS (<strong>Mix_HLG</strong>) and (3) self-imputation in LCWGS (<strong>Within_LG</strong>). Additionally, to compare the imputation effects of LCWGS, we also imputed SNP chip data of 1 423 Large White pigs to the whole-genome sequencing level using the reference panel consisting of key progenitors (<strong>Ref_SNP</strong>). To evaluate effects of the imputed sequencing data, we compared the accuracies of GP and statistical power of GWAS for four reproductive traits based on the chip data, sequencing data imputed from chip data and LCWGS data using an optimal strategy. The average imputation accuracies of the Within_LG, Ref_LG and Mix_HLG were 0.9893, 0.9899 and 0.9875, respectively, which were higher than that of the Ref_SNP (0.8522). Using the imputed sequencing data from LCWGS with the Ref_LG imputation strategy, the accuracies of GP for four traits improved by approximately 0.31–1.04% compared to the chip data, and by 0.7–1.05% compared to the imputed sequencing data from chip data. Furthermore, by using the sequence data imputed from LCWGS with the Ref_LG, 18 candidate genes were identified to be associated with the four reproductive traits of interest in Large White pigs: total number of piglets born - <em>EPC2</em>, <em>MBD5</em>, <em>ORC4</em> and <em>ACVR2A</em>; number of piglets born healthy - <em>IKBKE</em>; total litter weight of piglets born alive - <em>HSPA13</em> and <em>CPA1</em>; gestation length - <em>GTF2H5</em>, <em>ITGAV</em>, <em>NFE2L2</em>, <em>CALCRL</em>, <em>ITGA4</em>, <em>STAT1</em>, <em>HOXD10</em>, <em>MSTN</em>, <em>COL5A2</em> and <em>STAT4</em>. With the exception of <em>EPC2</em>, <em>ORC4</em>, <em>ACVR2A</em> and <em>MSTN</em>, others represent novel candidates. Our findings can provide a reference for the application of LCWGS data in livestock and poultry.</p></div>","PeriodicalId":50789,"journal":{"name":"Animal","volume":"18 9","pages":"Article 101258"},"PeriodicalIF":4.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1751731124001897/pdfft?md5=28d01f71dd72ca1d8835c482acebf1fe&pid=1-s2.0-S1751731124001897-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Imputation strategies for low-coverage whole-genome sequencing data and their effects on genomic prediction and genome-wide association studies in pigs\",\"authors\":\"X.Q. Wang , L.G. Wang , L.Y. Shi , J.J. Tian , M.Y. Li , L.X. Wang , F.P. Zhao\",\"doi\":\"10.1016/j.animal.2024.101258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The uncertainty resulting from missing genotypes in low-coverage whole-genome sequencing (<strong>LCWGS</strong>) data complicates genotype imputation. The aim of this study is to find out an optimal strategy for accurately imputing LCWGS data and assess its effectiveness for genomic prediction (<strong>GP</strong>) and genome-wide association study (<strong>GWAS</strong>) on economically important traits of Large White pigs. The LCWGS data of 1 423 Large White pigs were imputed using three different strategies: (1) using the high-coverage whole-genome sequencing (<strong>HCWGS</strong>) of 30 key progenitors as the reference panel (<strong>Ref_LG</strong>); (2) mixing HCWGS of key progenitors with LCWGS (<strong>Mix_HLG</strong>) and (3) self-imputation in LCWGS (<strong>Within_LG</strong>). Additionally, to compare the imputation effects of LCWGS, we also imputed SNP chip data of 1 423 Large White pigs to the whole-genome sequencing level using the reference panel consisting of key progenitors (<strong>Ref_SNP</strong>). To evaluate effects of the imputed sequencing data, we compared the accuracies of GP and statistical power of GWAS for four reproductive traits based on the chip data, sequencing data imputed from chip data and LCWGS data using an optimal strategy. The average imputation accuracies of the Within_LG, Ref_LG and Mix_HLG were 0.9893, 0.9899 and 0.9875, respectively, which were higher than that of the Ref_SNP (0.8522). Using the imputed sequencing data from LCWGS with the Ref_LG imputation strategy, the accuracies of GP for four traits improved by approximately 0.31–1.04% compared to the chip data, and by 0.7–1.05% compared to the imputed sequencing data from chip data. Furthermore, by using the sequence data imputed from LCWGS with the Ref_LG, 18 candidate genes were identified to be associated with the four reproductive traits of interest in Large White pigs: total number of piglets born - <em>EPC2</em>, <em>MBD5</em>, <em>ORC4</em> and <em>ACVR2A</em>; number of piglets born healthy - <em>IKBKE</em>; total litter weight of piglets born alive - <em>HSPA13</em> and <em>CPA1</em>; gestation length - <em>GTF2H5</em>, <em>ITGAV</em>, <em>NFE2L2</em>, <em>CALCRL</em>, <em>ITGA4</em>, <em>STAT1</em>, <em>HOXD10</em>, <em>MSTN</em>, <em>COL5A2</em> and <em>STAT4</em>. With the exception of <em>EPC2</em>, <em>ORC4</em>, <em>ACVR2A</em> and <em>MSTN</em>, others represent novel candidates. Our findings can provide a reference for the application of LCWGS data in livestock and poultry.</p></div>\",\"PeriodicalId\":50789,\"journal\":{\"name\":\"Animal\",\"volume\":\"18 9\",\"pages\":\"Article 101258\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1751731124001897/pdfft?md5=28d01f71dd72ca1d8835c482acebf1fe&pid=1-s2.0-S1751731124001897-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Animal\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1751731124001897\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animal","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751731124001897","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
Imputation strategies for low-coverage whole-genome sequencing data and their effects on genomic prediction and genome-wide association studies in pigs
The uncertainty resulting from missing genotypes in low-coverage whole-genome sequencing (LCWGS) data complicates genotype imputation. The aim of this study is to find out an optimal strategy for accurately imputing LCWGS data and assess its effectiveness for genomic prediction (GP) and genome-wide association study (GWAS) on economically important traits of Large White pigs. The LCWGS data of 1 423 Large White pigs were imputed using three different strategies: (1) using the high-coverage whole-genome sequencing (HCWGS) of 30 key progenitors as the reference panel (Ref_LG); (2) mixing HCWGS of key progenitors with LCWGS (Mix_HLG) and (3) self-imputation in LCWGS (Within_LG). Additionally, to compare the imputation effects of LCWGS, we also imputed SNP chip data of 1 423 Large White pigs to the whole-genome sequencing level using the reference panel consisting of key progenitors (Ref_SNP). To evaluate effects of the imputed sequencing data, we compared the accuracies of GP and statistical power of GWAS for four reproductive traits based on the chip data, sequencing data imputed from chip data and LCWGS data using an optimal strategy. The average imputation accuracies of the Within_LG, Ref_LG and Mix_HLG were 0.9893, 0.9899 and 0.9875, respectively, which were higher than that of the Ref_SNP (0.8522). Using the imputed sequencing data from LCWGS with the Ref_LG imputation strategy, the accuracies of GP for four traits improved by approximately 0.31–1.04% compared to the chip data, and by 0.7–1.05% compared to the imputed sequencing data from chip data. Furthermore, by using the sequence data imputed from LCWGS with the Ref_LG, 18 candidate genes were identified to be associated with the four reproductive traits of interest in Large White pigs: total number of piglets born - EPC2, MBD5, ORC4 and ACVR2A; number of piglets born healthy - IKBKE; total litter weight of piglets born alive - HSPA13 and CPA1; gestation length - GTF2H5, ITGAV, NFE2L2, CALCRL, ITGA4, STAT1, HOXD10, MSTN, COL5A2 and STAT4. With the exception of EPC2, ORC4, ACVR2A and MSTN, others represent novel candidates. Our findings can provide a reference for the application of LCWGS data in livestock and poultry.
期刊介绍:
Editorial board
animal attracts the best research in animal biology and animal systems from across the spectrum of the agricultural, biomedical, and environmental sciences. It is the central element in an exciting collaboration between the British Society of Animal Science (BSAS), Institut National de la Recherche Agronomique (INRA) and the European Federation of Animal Science (EAAP) and represents a merging of three scientific journals: Animal Science; Animal Research; Reproduction, Nutrition, Development. animal publishes original cutting-edge research, ''hot'' topics and horizon-scanning reviews on animal-related aspects of the life sciences at the molecular, cellular, organ, whole animal and production system levels. The main subject areas include: breeding and genetics; nutrition; physiology and functional biology of systems; behaviour, health and welfare; farming systems, environmental impact and climate change; product quality, human health and well-being. Animal models and papers dealing with the integration of research between these topics and their impact on the environment and people are particularly welcome.