{"title":"缺失数据对序列捕获副渔获物进化分析的影响,并应用于一种农业害虫。","authors":"Leo A Featherstone, Angela McGaughran","doi":"10.1007/s00438-024-02097-7","DOIUrl":null,"url":null,"abstract":"<p><p>Sequence capture is a genomic technique that selectively enriches target sequences before high throughput next-generation sequencing, to generate specific sequences of interest. Off-target or 'bycatch' data are often discarded from capture experiments, but can be leveraged to address evolutionary questions under some circumstances. Here, we investigated the effects of missing data on a variety of evolutionary analyses using bycatch from an exon capture experiment on the global pest moth, Helicoverpa armigera. We added > 200 new samples from across Australia in the form of mitogenomes obtained as bycatch from targeted sequence capture, and combined these into an additional larger dataset to total > 1000 mitochondrial cytochrome c oxidase subunit I (COI) sequences across the species' global distribution. Using discriminant analysis of principal components and Bayesian coalescent analyses, we showed that mitogenomes assembled from bycatch with up to 75% missing data were able to return evolutionary inferences consistent with higher coverage datasets and the broader literature surrounding H. armigera. For example, low-coverage sequences broadly supported the delineation of two H. armigera subspecies and also provided new insights into the potential for geographic turnover among these subspecies. However, we also identified key effects of dataset coverage and composition on our results. Thus, low-coverage bycatch data can offer valuable information for population genetic and phylodynamic analyses, but caution is required to ensure the reduced information does not introduce confounding factors, such as sampling biases, that drive inference. We encourage more researchers to consider maximizing the potential of the targeted sequence approach by examining evolutionary questions with their off-target bycatch where possible-especially in cases where no previous mitochondrial data exists-but recommend stratifying data at different genome coverage thresholds to separate sampling effects from genuine genomic signals, and to understand their implications for evolutionary research.</p>","PeriodicalId":18816,"journal":{"name":"Molecular Genetics and Genomics","volume":"299 1","pages":"11"},"PeriodicalIF":2.3000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10881687/pdf/","citationCount":"0","resultStr":"{\"title\":\"The effect of missing data on evolutionary analysis of sequence capture bycatch, with application to an agricultural pest.\",\"authors\":\"Leo A Featherstone, Angela McGaughran\",\"doi\":\"10.1007/s00438-024-02097-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Sequence capture is a genomic technique that selectively enriches target sequences before high throughput next-generation sequencing, to generate specific sequences of interest. Off-target or 'bycatch' data are often discarded from capture experiments, but can be leveraged to address evolutionary questions under some circumstances. Here, we investigated the effects of missing data on a variety of evolutionary analyses using bycatch from an exon capture experiment on the global pest moth, Helicoverpa armigera. We added > 200 new samples from across Australia in the form of mitogenomes obtained as bycatch from targeted sequence capture, and combined these into an additional larger dataset to total > 1000 mitochondrial cytochrome c oxidase subunit I (COI) sequences across the species' global distribution. Using discriminant analysis of principal components and Bayesian coalescent analyses, we showed that mitogenomes assembled from bycatch with up to 75% missing data were able to return evolutionary inferences consistent with higher coverage datasets and the broader literature surrounding H. armigera. For example, low-coverage sequences broadly supported the delineation of two H. armigera subspecies and also provided new insights into the potential for geographic turnover among these subspecies. However, we also identified key effects of dataset coverage and composition on our results. Thus, low-coverage bycatch data can offer valuable information for population genetic and phylodynamic analyses, but caution is required to ensure the reduced information does not introduce confounding factors, such as sampling biases, that drive inference. We encourage more researchers to consider maximizing the potential of the targeted sequence approach by examining evolutionary questions with their off-target bycatch where possible-especially in cases where no previous mitochondrial data exists-but recommend stratifying data at different genome coverage thresholds to separate sampling effects from genuine genomic signals, and to understand their implications for evolutionary research.</p>\",\"PeriodicalId\":18816,\"journal\":{\"name\":\"Molecular Genetics and Genomics\",\"volume\":\"299 1\",\"pages\":\"11\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10881687/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Genetics and Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00438-024-02097-7\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics and Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00438-024-02097-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
序列捕获是一种基因组学技术,可在高通量下一代测序前选择性地富集目标序列,以产生特定的感兴趣序列。非目标数据或 "副渔获物 "数据通常会从捕获实验中丢弃,但在某些情况下也可用于解决进化问题。在这里,我们利用对全球害蛾 Helicoverpa armigera 的外显子捕获实验中的副捕获物,研究了缺失数据对各种进化分析的影响。我们在澳大利亚各地添加了 > 200 个新样本,这些样本是以有丝分裂基因组的形式从定向序列捕获中获得的,并将这些样本合并到一个额外的更大的数据集中,从而在该物种的全球分布中总共获得了 > 1000 个线粒体细胞色素 c 氧化酶亚单位 I(COI)序列。我们利用主成分判别分析和贝叶斯聚合分析表明,从数据缺失率高达 75% 的副渔获物中组装的线粒体基因组能够返回与更高覆盖率数据集和更广泛的 H. armigera 相关文献一致的进化推论。例如,低覆盖率序列广泛支持了两个 H. armigera 亚种的划分,也为这些亚种之间潜在的地理更替提供了新的见解。不过,我们也发现了数据集的覆盖率和组成对结果的关键影响。因此,低覆盖率的副渔获物数据可以为种群遗传和系统动力学分析提供有价值的信息,但需要谨慎,以确保减少的信息不会引入干扰因素,如取样偏差,从而影响推论。我们鼓励更多的研究人员考虑最大限度地发挥定向序列方法的潜力,在可能的情况下利用其非目标副渔获物研究进化问题--尤其是在以前没有线粒体数据的情况下--但建议在不同的基因组覆盖阈值下对数据进行分层,以将取样效应与真正的基因组信号区分开来,并了解其对进化研究的影响。
The effect of missing data on evolutionary analysis of sequence capture bycatch, with application to an agricultural pest.
Sequence capture is a genomic technique that selectively enriches target sequences before high throughput next-generation sequencing, to generate specific sequences of interest. Off-target or 'bycatch' data are often discarded from capture experiments, but can be leveraged to address evolutionary questions under some circumstances. Here, we investigated the effects of missing data on a variety of evolutionary analyses using bycatch from an exon capture experiment on the global pest moth, Helicoverpa armigera. We added > 200 new samples from across Australia in the form of mitogenomes obtained as bycatch from targeted sequence capture, and combined these into an additional larger dataset to total > 1000 mitochondrial cytochrome c oxidase subunit I (COI) sequences across the species' global distribution. Using discriminant analysis of principal components and Bayesian coalescent analyses, we showed that mitogenomes assembled from bycatch with up to 75% missing data were able to return evolutionary inferences consistent with higher coverage datasets and the broader literature surrounding H. armigera. For example, low-coverage sequences broadly supported the delineation of two H. armigera subspecies and also provided new insights into the potential for geographic turnover among these subspecies. However, we also identified key effects of dataset coverage and composition on our results. Thus, low-coverage bycatch data can offer valuable information for population genetic and phylodynamic analyses, but caution is required to ensure the reduced information does not introduce confounding factors, such as sampling biases, that drive inference. We encourage more researchers to consider maximizing the potential of the targeted sequence approach by examining evolutionary questions with their off-target bycatch where possible-especially in cases where no previous mitochondrial data exists-but recommend stratifying data at different genome coverage thresholds to separate sampling effects from genuine genomic signals, and to understand their implications for evolutionary research.
期刊介绍:
Molecular Genetics and Genomics (MGG) publishes peer-reviewed articles covering all areas of genetics and genomics. Any approach to the study of genes and genomes is considered, be it experimental, theoretical or synthetic. MGG publishes research on all organisms that is of broad interest to those working in the fields of genetics, genomics, biology, medicine and biotechnology.
The journal investigates a broad range of topics, including these from recent issues: mechanisms for extending longevity in a variety of organisms; screening of yeast metal homeostasis genes involved in mitochondrial functions; molecular mapping of cultivar-specific avirulence genes in the rice blast fungus and more.