{"title":"利用全基因组测序检测锥虫的复杂感染。","authors":"João Luís Reis-Cunha, Daniel Charlton Jeffares","doi":"10.1186/s12864-024-10862-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in \"complex\" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.</p><p><strong>Results: </strong>The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex.</p><p><strong>Conclusions: </strong>The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520695/pdf/","citationCount":"0","resultStr":"{\"title\":\"Detecting complex infections in trypanosomatids using whole genome sequencing.\",\"authors\":\"João Luís Reis-Cunha, Daniel Charlton Jeffares\",\"doi\":\"10.1186/s12864-024-10862-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in \\\"complex\\\" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.</p><p><strong>Results: </strong>The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex.</p><p><strong>Conclusions: </strong>The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.</p>\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520695/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-024-10862-6\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10862-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
背景:锥虫寄生虫是一类原生动物,可导致严重影响发展中国家的毁灭性疾病。为了在哺乳动物宿主体内生存,这些原生动物发展出了多种适应机制,如在宿主与寄生虫相互作用中广泛扩增多基因家族,适应入侵和调节宿主细胞,以及存在非整倍体和多倍体。有两种机制可能会导致 "复杂 "的分离物,即单个样本中存在两种以上的单倍型:多重感染(MOI)和多倍体。我们根据杂合位置等位基因读数深度的波动,开发并验证了一种利用全基因组测序读数识别多克隆感染和多倍体的方法,该方法可在从一个样本到更大群体调查的基因组测序实验中轻松实施:结果:该方法估算了分离株的复杂性指数(CI),并在个体和种群水平上对真实样本与模拟克隆感染进行了比较,排除了存在染色体和基因拷贝数变异的区域。该方法主要通过模拟 MOI 和已知多倍体分离物分别从利什曼原虫和克鲁斯锥虫中进行验证。然后,利用来自 L. donovani/L.infantum、L. braziliensis、T. cruzi 和 T. brucei 四个支系的 497 个锥虫样本的全基因组 SNP 数据评估了感染的复杂性,提供了这些培养寄生虫中多克隆感染和多倍体的概况。我们的研究表明,我们的方法能在至少有 25 倍覆盖率、100 个杂合 SNP 和 5-10% 的读数与二级克隆相对应的样本中稳健地检测出复杂感染。我们发现,相对较小比例(≤ 7%)的培养锥虫分离物是复杂的:结论:该方法能准确识别多倍体分离株,并能在基因组读数覆盖率足够大的情况下识别多克隆感染。我们将该方法打包到一个 R 脚本中,运行时只需一个标准变异调用格式(VCF)文件 ( https://github.com/jaumlrc/Complex-Infections )。我们的分析表明,多克隆性和多倍体确实发生在所有支系中,但在培养的锥虫中并不常见。我们要提醒的是,由于目前实验室和生物信息学方法的局限性,我们的估计值只是下限。
Detecting complex infections in trypanosomatids using whole genome sequencing.
Background: Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in "complex" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.
Results: The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex.
Conclusions: The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.
期刊介绍:
BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics.
BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.