Congcong Zhu, Tong Tong, John J Farrell, Eden R Martin, William S Bush, Margaret A Pericak-Vance, Li-San Wang, Gerard D Schellenberg, Jonathan L Haines, Kathryn L Lunetta, Lindsay A Farrer, Xiaoling Zhang
{"title":"MitoH3:阿尔茨海默病测序项目线粒体单倍群和同质/异质变异调用管道。","authors":"Congcong Zhu, Tong Tong, John J Farrell, Eden R Martin, William S Bush, Margaret A Pericak-Vance, Li-San Wang, Gerard D Schellenberg, Jonathan L Haines, Kathryn L Lunetta, Lindsay A Farrer, Xiaoling Zhang","doi":"10.3233/ADR-230120","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Mitochondrial DNA (mtDNA) is a double-stranded circular DNA and has multiple copies in each cell. Excess heteroplasmy, the coexistence of distinct variants in copies of mtDNA within a cell, may lead to mitochondrial impairments. Accurate determination of heteroplasmy in whole-genome sequencing (WGS) data has posed a significant challenge because mitochondria carrying heteroplasmic variants cannot be distinguished during library preparation. Moreover, sequencing errors, contamination, and nuclear mtDNA segments can reduce the accuracy of heteroplasmic variant calling.</p><p><strong>Objective: </strong>To efficiently and accurately call mtDNA homoplasmic and heteroplasmic variants from the large-scale WGS data generated from the Alzheimer's Disease Sequencing Project (ADSP), and test their association with Alzheimer's disease (AD).</p><p><strong>Methods: </strong>In this study, we present MitoH3-a comprehensive computational pipeline for calling mtDNA homoplasmic and heteroplasmic variants and inferring haplogroups in the ADSP WGS data. We first applied MitoH3 to 45 technical replicates from 6 subjects to define a threshold for detecting heteroplasmic variants. Then using the threshold of 5% ≤variant allele fraction≤95%, we further applied MitoH3 to call heteroplasmic variants from a total of 16,113 DNA samples with 6,742 samples from cognitively normal controls and 6,183 from AD cases.</p><p><strong>Results: </strong>This pipeline is available through the Singularity container engine. For 4,311 heteroplasmic variants identified from 16,113 samples, no significant variant count difference was observed between AD cases and controls.</p><p><strong>Conclusions: </strong>Our streamlined pipeline, MitoH3, enables computationally efficient and accurate analysis of a large number of samples.</p>","PeriodicalId":73594,"journal":{"name":"Journal of Alzheimer's disease reports","volume":"8 1","pages":"575-587"},"PeriodicalIF":2.8000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11091720/pdf/","citationCount":"0","resultStr":"{\"title\":\"MitoH3: Mitochondrial Haplogroup and Homoplasmic/Heteroplasmic Variant Calling Pipeline for Alzheimer's Disease Sequencing Project.\",\"authors\":\"Congcong Zhu, Tong Tong, John J Farrell, Eden R Martin, William S Bush, Margaret A Pericak-Vance, Li-San Wang, Gerard D Schellenberg, Jonathan L Haines, Kathryn L Lunetta, Lindsay A Farrer, Xiaoling Zhang\",\"doi\":\"10.3233/ADR-230120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Mitochondrial DNA (mtDNA) is a double-stranded circular DNA and has multiple copies in each cell. Excess heteroplasmy, the coexistence of distinct variants in copies of mtDNA within a cell, may lead to mitochondrial impairments. Accurate determination of heteroplasmy in whole-genome sequencing (WGS) data has posed a significant challenge because mitochondria carrying heteroplasmic variants cannot be distinguished during library preparation. Moreover, sequencing errors, contamination, and nuclear mtDNA segments can reduce the accuracy of heteroplasmic variant calling.</p><p><strong>Objective: </strong>To efficiently and accurately call mtDNA homoplasmic and heteroplasmic variants from the large-scale WGS data generated from the Alzheimer's Disease Sequencing Project (ADSP), and test their association with Alzheimer's disease (AD).</p><p><strong>Methods: </strong>In this study, we present MitoH3-a comprehensive computational pipeline for calling mtDNA homoplasmic and heteroplasmic variants and inferring haplogroups in the ADSP WGS data. We first applied MitoH3 to 45 technical replicates from 6 subjects to define a threshold for detecting heteroplasmic variants. Then using the threshold of 5% ≤variant allele fraction≤95%, we further applied MitoH3 to call heteroplasmic variants from a total of 16,113 DNA samples with 6,742 samples from cognitively normal controls and 6,183 from AD cases.</p><p><strong>Results: </strong>This pipeline is available through the Singularity container engine. For 4,311 heteroplasmic variants identified from 16,113 samples, no significant variant count difference was observed between AD cases and controls.</p><p><strong>Conclusions: </strong>Our streamlined pipeline, MitoH3, enables computationally efficient and accurate analysis of a large number of samples.</p>\",\"PeriodicalId\":73594,\"journal\":{\"name\":\"Journal of Alzheimer's disease reports\",\"volume\":\"8 1\",\"pages\":\"575-587\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11091720/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Alzheimer's disease reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/ADR-230120\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Alzheimer's disease reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/ADR-230120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
MitoH3: Mitochondrial Haplogroup and Homoplasmic/Heteroplasmic Variant Calling Pipeline for Alzheimer's Disease Sequencing Project.
Background: Mitochondrial DNA (mtDNA) is a double-stranded circular DNA and has multiple copies in each cell. Excess heteroplasmy, the coexistence of distinct variants in copies of mtDNA within a cell, may lead to mitochondrial impairments. Accurate determination of heteroplasmy in whole-genome sequencing (WGS) data has posed a significant challenge because mitochondria carrying heteroplasmic variants cannot be distinguished during library preparation. Moreover, sequencing errors, contamination, and nuclear mtDNA segments can reduce the accuracy of heteroplasmic variant calling.
Objective: To efficiently and accurately call mtDNA homoplasmic and heteroplasmic variants from the large-scale WGS data generated from the Alzheimer's Disease Sequencing Project (ADSP), and test their association with Alzheimer's disease (AD).
Methods: In this study, we present MitoH3-a comprehensive computational pipeline for calling mtDNA homoplasmic and heteroplasmic variants and inferring haplogroups in the ADSP WGS data. We first applied MitoH3 to 45 technical replicates from 6 subjects to define a threshold for detecting heteroplasmic variants. Then using the threshold of 5% ≤variant allele fraction≤95%, we further applied MitoH3 to call heteroplasmic variants from a total of 16,113 DNA samples with 6,742 samples from cognitively normal controls and 6,183 from AD cases.
Results: This pipeline is available through the Singularity container engine. For 4,311 heteroplasmic variants identified from 16,113 samples, no significant variant count difference was observed between AD cases and controls.
Conclusions: Our streamlined pipeline, MitoH3, enables computationally efficient and accurate analysis of a large number of samples.