Katharine S Walter, Ted Cohen, Barun Mathema, Caroline Colijn, Benjamin Sobkowiak, Iñaki Comas, Galo A Goig, Julio Croda, Jason R Andrews
{"title":"Signatures of transmission in within-host Mycobacterium tuberculosis complex variation: a retrospective genomic epidemiology study.","authors":"Katharine S Walter, Ted Cohen, Barun Mathema, Caroline Colijn, Benjamin Sobkowiak, Iñaki Comas, Galo A Goig, Julio Croda, Jason R Andrews","doi":"10.1016/j.lanmic.2024.06.003","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Mycobacterium tuberculosis complex (MTBC) species evolve slowly, so isolates from individuals linked in transmission often have identical or nearly identical genomes, making it difficult to reconstruct transmission chains. Finding additional sources of shared MTBC variation could help overcome this problem. Previous studies have reported MTBC diversity within infected individuals; however, whether within-host variation improves transmission inferences remains unclear. Here, we aimed to quantify within-host MTBC variation and assess whether such information improves transmission inferences.</p><p><strong>Methods: </strong>We conducted a retrospective genomic epidemiology study in which we reanalysed publicly available sequence data from household transmission studies published in PubMed from database inception until Jan 31, 2024, for which both genomic and epidemiological contact data were available, using household membership as a proxy for transmission linkage. We quantified minority variants (ie, positions with two or more alleles each supported by at least five-fold coverage and with a minor allele frequency of 1% or more) outside of PE and PPE genes, within individual samples and shared across samples. We used receiver operator characteristic (ROC) curves to compare the performance of a general linear model for household membership that included shared minority variants and one that included only fixed genetic differences.</p><p><strong>Findings: </strong>We identified three MTBC household transmission studies with publicly available whole-genome sequencing data and epidemiological linkages: a household transmission study in Vitória, Brazil (Colangeli et al), a retrospective population-based study of paediatric tuberculosis in British Columbia, Canada (Guthrie et al), and a retrospective population-based study in Oxfordshire, England (Walker et al). We found moderate levels of minority variation present in MTBC sequence data from cultured isolates that varied significantly across studies: mean 168·6 minority variants (95% CI 151·4-185·9) for the Colangeli et al dataset, 5·8 (1·5-10·2) for Guthrie et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al), and 7·1 (2·4-11·9) for Walker et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al). Isolates from household pairs shared more minority variants than did randomly selected pairs of isolates: mean 97·7 shared minority variants (79·1-116·3) versus 9·8 (8·6-11·0) in Colangeli et al, 0·8 (0·1-1·5) versus 0·2 (0·1-0·2) in Guthrie et al, and 0·7 (0·1-1·3) versus 0·2 (0·2-0·2) in Walker et al (all p<0·0001, Wilcoxon rank sum test). Shared within-host variation was significantly associated with household membership (odds ratio 1·51 [95% CI 1·30-1·71], p<0·0001), for one standard deviation increase in shared minority variants. Models that included shared within-host variation versus models without within-host variation improved the accuracy of predicting household membership in all three studies: area under the ROC curve 0·95 versus 0·92 for the Colangeli et al study, 0·99 versus 0·95 for the Guthrie et al study, and 0·93 versus 0·91 for the Walker et al study.</p><p><strong>Interpretation: </strong>Within-host MTBC variation persists through culture of sputum and could enhance the resolution of transmission inferences. The substantial differences in minority variation recovered across studies highlight the need to optimise approaches to recover and incorporate within-host variation into automated phylogenetic and transmission inference.</p><p><strong>Funding: </strong>National Institutes of Health.</p>","PeriodicalId":46633,"journal":{"name":"Lancet Microbe","volume":" ","pages":"100936"},"PeriodicalIF":20.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Microbe","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.lanmic.2024.06.003","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Mycobacterium tuberculosis complex (MTBC) species evolve slowly, so isolates from individuals linked in transmission often have identical or nearly identical genomes, making it difficult to reconstruct transmission chains. Finding additional sources of shared MTBC variation could help overcome this problem. Previous studies have reported MTBC diversity within infected individuals; however, whether within-host variation improves transmission inferences remains unclear. Here, we aimed to quantify within-host MTBC variation and assess whether such information improves transmission inferences.
Methods: We conducted a retrospective genomic epidemiology study in which we reanalysed publicly available sequence data from household transmission studies published in PubMed from database inception until Jan 31, 2024, for which both genomic and epidemiological contact data were available, using household membership as a proxy for transmission linkage. We quantified minority variants (ie, positions with two or more alleles each supported by at least five-fold coverage and with a minor allele frequency of 1% or more) outside of PE and PPE genes, within individual samples and shared across samples. We used receiver operator characteristic (ROC) curves to compare the performance of a general linear model for household membership that included shared minority variants and one that included only fixed genetic differences.
Findings: We identified three MTBC household transmission studies with publicly available whole-genome sequencing data and epidemiological linkages: a household transmission study in Vitória, Brazil (Colangeli et al), a retrospective population-based study of paediatric tuberculosis in British Columbia, Canada (Guthrie et al), and a retrospective population-based study in Oxfordshire, England (Walker et al). We found moderate levels of minority variation present in MTBC sequence data from cultured isolates that varied significantly across studies: mean 168·6 minority variants (95% CI 151·4-185·9) for the Colangeli et al dataset, 5·8 (1·5-10·2) for Guthrie et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al), and 7·1 (2·4-11·9) for Walker et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al). Isolates from household pairs shared more minority variants than did randomly selected pairs of isolates: mean 97·7 shared minority variants (79·1-116·3) versus 9·8 (8·6-11·0) in Colangeli et al, 0·8 (0·1-1·5) versus 0·2 (0·1-0·2) in Guthrie et al, and 0·7 (0·1-1·3) versus 0·2 (0·2-0·2) in Walker et al (all p<0·0001, Wilcoxon rank sum test). Shared within-host variation was significantly associated with household membership (odds ratio 1·51 [95% CI 1·30-1·71], p<0·0001), for one standard deviation increase in shared minority variants. Models that included shared within-host variation versus models without within-host variation improved the accuracy of predicting household membership in all three studies: area under the ROC curve 0·95 versus 0·92 for the Colangeli et al study, 0·99 versus 0·95 for the Guthrie et al study, and 0·93 versus 0·91 for the Walker et al study.
Interpretation: Within-host MTBC variation persists through culture of sputum and could enhance the resolution of transmission inferences. The substantial differences in minority variation recovered across studies highlight the need to optimise approaches to recover and incorporate within-host variation into automated phylogenetic and transmission inference.
期刊介绍:
The Lancet Microbe is a gold open access journal committed to publishing content relevant to clinical microbiologists worldwide, with a focus on studies that advance clinical understanding, challenge the status quo, and advocate change in health policy.