Oliveira, V., A. R. M. Polónia, D. F. R. Cleary, et al. 2021. “Characterization of putative circular plasmids in sponge-associated bacterial communities using a selective multiply-primed rolling circle amplification.” Molecular Ecology Resources21, no. 1: 110–121. https://doi.org/10.1111/1755-0998.13248.
The authors of the above article noticed an error in the DNA concentration which is detailed in the ‘Methods’ section, section 2.3 (‘Selective multiply-primed rolling circle amplification’), paragraph 2. The correct text should read as ‘1 μL template DNA (ca. 200 ng)’.
The authors apologise for this error and any inconvenience it may have caused.
奥利维拉,V., A. R. M. Polónia, D. F. R.克利里等。2021。使用选择性多重引物滚动圈扩增技术表征海绵相关细菌群落中假定的圆形质粒。分子生态资源,第21期。1: 110 - 121。https://doi.org/10.1111/1755-0998.13248.The上述文章的作者注意到DNA浓度中的一个错误,详细信息请参见第2.3节(“选择性多重引物滚动圈扩增”)第2段的“方法”部分。正确的文本应为“1 μL模板DNA (ca. 200 ng)”。作者对这个错误及其可能造成的任何不便表示歉意。
{"title":"Correction to “Characterisation of Putative Circular Plasmids in Sponge-Associated Bacterial Communities Using a Selective Multiply-Primed Rolling Circle Amplification”","authors":"","doi":"10.1111/1755-0998.14043","DOIUrl":"10.1111/1755-0998.14043","url":null,"abstract":"<p>Oliveira, V., A. R. M. Polónia, D. F. R. Cleary, et al. 2021. “Characterization of putative circular plasmids in sponge-associated bacterial communities using a selective multiply-primed rolling circle amplification.” <i>Molecular Ecology Resources</i> <b>21</b>, no. 1: 110–121. https://doi.org/10.1111/1755-0998.13248.</p><p>The authors of the above article noticed an error in the DNA concentration which is detailed in the ‘Methods’ section, section 2.3 (‘Selective multiply-primed rolling circle amplification’), paragraph 2. The correct text should read as ‘1 μL template DNA (ca. 200 ng)’.</p><p>The authors apologise for this error and any inconvenience it may have caused.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magnolia sieboldii K. Koch (M. sieboldii) stands as an elegant tree species within the Magnoliaceae family, esteemed for its exquisite beauty, cultural significance and economic advantages. The species faces challenges in seed germination under natural conditions, primarily attributed to morphological dormancy. Despite its significance, the molecular mechanisms governing M. sieboldii seed germination remain elusive, compounded by the absence of genomic resources specific to this species. In this study, we present the first chromosome-scale genome assembly of M. sieboldii, with a total genome size of 2.01 Gb, including 1096 scaffolds assigned to 19 chromosomes (N50 = 102.4 Mb). Phylogenetic analyses, incorporating 13 plant species, illuminate the evolutionary independence of Magnoliids from monocots and eudicots, positioning them as a sister clade. Through RNA-seq analysis, we identify pivotal genes and pathways contributing to seed dormancy and germination. In addition, our investigation delves into the the far-red-impaired response (FAR1) transcription factor gene family, revealing their enrichment throughout evolution and their involvement in the intricate process of seed germination. This comprehensive genome sequencing initiative offers invaluable insights into the biological attributes of M. sieboldii, with a specific emphasis on unravelling the complexities of seed dormancy and germination.
木兰(M. sieboldii K. Koch)是木兰科中的一个优雅树种,因其精致美观、文化意义和经济优势而备受推崇。该树种在自然条件下种子萌发面临挑战,主要原因是形态休眠。尽管其重要性不言而喻,但管理 M. sieboldii 种子萌发的分子机制仍然难以捉摸,而该物种特有基因组资源的缺乏又加剧了这一问题。在本研究中,我们首次完成了 M. sieboldii 的染色体级基因组组装,基因组总大小为 2.01 Gb,包括分配给 19 条染色体的 1096 个支架(N50 = 102.4 Mb)。包含 13 个植物物种的系统进化分析表明,木兰科植物在进化上独立于单子叶植物和真叶植物,是一个姊妹支系。通过RNA-seq分析,我们确定了有助于种子休眠和萌发的关键基因和途径。此外,我们还深入研究了远红外损伤反应(FAR1)转录因子基因家族,揭示了它们在整个进化过程中的富集及其在种子萌发复杂过程中的参与。这项全面的基因组测序计划为我们深入了解西波胆酵母菌的生物学特性提供了宝贵的资料,尤其是在揭示种子休眠和萌发的复杂性方面。
{"title":"The Chromosome-Scale Genome of Magnolia sieboldii K. Koch Provides Insight Into the Evolutionary Position of Magnoliids and Seed Germination","authors":"Xiujun Lu, Mei Mei, Lin Liu, Xin Xu, Wanfeng Ai","doi":"10.1111/1755-0998.14030","DOIUrl":"10.1111/1755-0998.14030","url":null,"abstract":"<div>\u0000 \u0000 <p><i>Magnolia sieboldii</i> K. Koch (<i>M. sieboldii</i>) stands as an elegant tree species within the Magnoliaceae family, esteemed for its exquisite beauty, cultural significance and economic advantages. The species faces challenges in seed germination under natural conditions, primarily attributed to morphological dormancy. Despite its significance, the molecular mechanisms governing <i>M. sieboldii</i> seed germination remain elusive, compounded by the absence of genomic resources specific to this species. In this study, we present the first chromosome-scale genome assembly of <i>M. sieboldii</i>, with a total genome size of 2.01 Gb, including 1096 scaffolds assigned to 19 chromosomes (N50 = 102.4 Mb). Phylogenetic analyses, incorporating 13 plant species, illuminate the evolutionary independence of Magnoliids from monocots and eudicots, positioning them as a sister clade. Through RNA-seq analysis, we identify pivotal genes and pathways contributing to seed dormancy and germination. In addition, our investigation delves into the the far-red-impaired response (FAR1) transcription factor gene family, revealing their enrichment throughout evolution and their involvement in the intricate process of seed germination. This comprehensive genome sequencing initiative offers invaluable insights into the biological attributes of <i>M. sieboldii</i>, with a specific emphasis on unravelling the complexities of seed dormancy and germination.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142542399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver W. White, Andie Hall, Ben W. Price, Suzanne T. Williams, Matthew D. Clark
Low coverage ‘genome-skims’ are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines skim2mito, skim2rrna and gene2phylo, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, skim2mito and skim2rrna perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline gene2phylo takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.
低覆盖率的 "基因组基线 "通常用于组装细胞器基因组和核糖体基因序列,以进行经济有效的系统发育和条形码研究。自然历史藏品蕴藏着宝贵的生物信息,但由于保存不善导致 DNA 降解,往往会阻碍基于聚合酶链反应的分析。不过,可以生成文库,并对典型的降解 DNA 短片段进行测序,从而从博物馆藏品中生成基因组片段。在这里,我们介绍一个由 skim2mito、skim2rrna 和 gene2phylo 三个管道组成的 snakemake 工具包,旨在利用基因组撇取技术发掘博物馆历史标本的基因组潜力。具体来说,skim2mito 和 skim2rrna 分别从低覆盖率的基因组标本中对线粒体基因组和核核糖体基因进行批量组装、注释和系统发育分析。第三个管道 gene2phylo 利用一组基因排列,对单个基因进行系统发育分析,对连接排列进行分区分析,并基于基因树进行系统发育分析。我们先用模拟数据对我们的管道进行基准测试,然后再用一个新的基因组撇取数据集进行测试,该数据集来自近期和历史上的腹足纲动物样本。我们的结果表明,该工具包可以从腹足纲腹足目保存较差的博物馆标本中恢复线粒体和核糖体基因,而且系统发育分析符合我们目前对分类关系的理解。从自然历史博物馆收藏的大量标本中生成生物信息学管道,以便于处理大量序列数据,这将极大地有助于物种发现和生物多样性的长期探索,最终有助于面对不断变化的地球的保护工作。
{"title":"A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections","authors":"Oliver W. White, Andie Hall, Ben W. Price, Suzanne T. Williams, Matthew D. Clark","doi":"10.1111/1755-0998.14036","DOIUrl":"10.1111/1755-0998.14036","url":null,"abstract":"<p>Low coverage ‘genome-skims’ are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines <i>skim2mito</i>, <i>skim2rrna</i> and <i>gene2phylo</i>, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, <i>skim2mito</i> and <i>skim2rrna</i> perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline <i>gene2phylo</i> takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Describing naturally occurring genetic variation is a fundamental goal of molecular phylogeography and population genetics. Popular methods for this task include STRUCTURE, a model-based algorithm that assigns individuals to genetic clusters, and principal component analysis (PCA), a parameter-free method. The ability of STRUCTURE to infer mixed ancestry makes it popular for documenting natural hybridisation, which is of considerable interest to evolutionary biologists, given that such systems provide a window into the speciation process. Yet, STRUCTURE can produce misleading results when its underlying assumptions are violated, like when genetic variation is distributed continuously across geographic space. To test the ability of STRUCTURE and PCA to accurately distinguish admixture from continuous variation, we use forward-time simulations to generate population genetic data under three demographic scenarios: two involving admixture and one with isolation by distance (IBD). STRUCTURE and PCA alone cannot distinguish admixture from IBD, but complementing these analyses with triangle plots, which visualise hybrid index against interclass heterozygosity, provides more accurate inference of demographic history, especially in cases of recent admixture. We demonstrate that triangle plots are robust to missing data, while STRUCTURE and PCA are not, and show that setting a low allele frequency difference threshold for ancestry-informative marker (AIM) identification can accurately characterise the relationship between hybrid index and interclass heterozygosity across demographic histories of admixture and range expansion. While STRUCTURE and PCA provide useful summaries of genetic variation, results should be paired with triangle plots before admixture is inferred.
{"title":"That's Not a Hybrid: How to Distinguish Patterns of Admixture and Isolation By Distance","authors":"Ben J. Wiens, Jocelyn P. Colella","doi":"10.1111/1755-0998.14039","DOIUrl":"10.1111/1755-0998.14039","url":null,"abstract":"<div>\u0000 \u0000 <p>Describing naturally occurring genetic variation is a fundamental goal of molecular phylogeography and population genetics. Popular methods for this task include <i>STRUCTURE</i>, a model-based algorithm that assigns individuals to genetic clusters, and principal component analysis (PCA), a parameter-free method. The ability of <i>STRUCTURE</i> to infer mixed ancestry makes it popular for documenting natural hybridisation, which is of considerable interest to evolutionary biologists, given that such systems provide a window into the speciation process. Yet, <i>STRUCTURE</i> can produce misleading results when its underlying assumptions are violated, like when genetic variation is distributed continuously across geographic space. To test the ability of <i>STRUCTURE</i> and PCA to accurately distinguish admixture from continuous variation, we use forward-time simulations to generate population genetic data under three demographic scenarios: two involving admixture and one with isolation by distance (IBD). <i>STRUCTURE</i> and PCA alone cannot distinguish admixture from IBD, but complementing these analyses with triangle plots, which visualise hybrid index against interclass heterozygosity, provides more accurate inference of demographic history, especially in cases of recent admixture. We demonstrate that triangle plots are robust to missing data, while <i>STRUCTURE</i> and PCA are not, and show that setting a low allele frequency difference threshold for ancestry-informative marker (AIM) identification can accurately characterise the relationship between hybrid index and interclass heterozygosity across demographic histories of admixture and range expansion. While <i>STRUCTURE</i> and PCA provide useful summaries of genetic variation, results should be paired with triangle plots before admixture is inferred.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 3","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Across the tree of life, many organisms are able to reproduce clonally, via vegetative spread, budding or parthenogenesis. In population genetic analyses of clonally reproducing organisms, it is common practice to retain only a single representative per multilocus genotype. Though this practice of clone correction is widespread, the theoretical justification behind it has been very little studied. Here, I use individual-based simulations to study the effect of clone correction on the estimation of the genetic summary statistics HO, HS, FIS, FST, F′′ST and Dest. The simulations follow the standard finite island model, consisting of a set of populations connected by gene flow, but with a variable rate of sexual versus asexual reproduction. The results of the simulations show that by itself, the inclusion of replicated genotypes does not lead to a deviation in the values of the summary statistics, except when the rate of sexual reproduction is less than about one in thousand. However, clone correction can introduce a strong deviation in the values of most of the statistics, when compared to a scenario of full sexual reproduction. For HS and FIS, this deviation can be informative about the process of asexual reproduction, but for FST, F′′ST and Dest, clone correction can lead to incorrect conclusions. I therefore argue that clone correction is not strictly necessary, but can in some cases be insightful. However, when clone correction is applied, it is imperative that results for both the corrected and uncorrected data are presented.
在生命之树上,许多生物都能通过无性繁殖、芽生或孤雌生殖进行克隆繁殖。在对克隆生殖生物进行群体遗传分析时,通常的做法是每个多聚焦基因型只保留一个代表。虽然克隆校正的做法很普遍,但对其背后的理论依据却研究甚少。在此,我使用基于个体的模拟来研究克隆校正对遗传汇总统计量 HO、HS、FIS、FST、F''ST 和 Dest 估算的影响。模拟采用了标准的有限岛模型,由一组通过基因流连接的种群组成,但有性生殖和无性生殖的比率各不相同。模拟结果表明,除了有性繁殖率低于千分之一时,加入复制基因型本身并不会导致汇总统计值出现偏差。然而,与完全有性生殖的情况相比,克隆校正会使大多数统计量的值出现较大偏差。对 HS 和 FIS 来说,这种偏差可以说明无性生殖的过程,但对 FST、F''ST 和 Dest 来说,克隆校正会导致错误的结论。因此,我认为克隆校正并不是绝对必要的,但在某些情况下可能会有启发。不过,在进行克隆校正时,必须同时提交校正和未校正数据的结果。
{"title":"Correcting for Replicated Genotypes May Introduce More Problems Than it Solves","authors":"Patrick G. Meirmans","doi":"10.1111/1755-0998.14041","DOIUrl":"10.1111/1755-0998.14041","url":null,"abstract":"<p>Across the tree of life, many organisms are able to reproduce clonally, via vegetative spread, budding or parthenogenesis. In population genetic analyses of clonally reproducing organisms, it is common practice to retain only a single representative per multilocus genotype. Though this practice of clone correction is widespread, the theoretical justification behind it has been very little studied. Here, I use individual-based simulations to study the effect of clone correction on the estimation of the genetic summary statistics <i>H</i><sub>O</sub>, <i>H</i><sub>S</sub>, <i>F</i><sub>IS</sub>, <i>F</i><sub>ST</sub>, <i>F</i>′′<sub>ST</sub> and <i>D</i><sub>est</sub>. The simulations follow the standard finite island model, consisting of a set of populations connected by gene flow, but with a variable rate of sexual versus asexual reproduction. The results of the simulations show that by itself, the inclusion of replicated genotypes does not lead to a deviation in the values of the summary statistics, except when the rate of sexual reproduction is less than about one in thousand. However, clone correction can introduce a strong deviation in the values of most of the statistics, when compared to a scenario of full sexual reproduction. For <i>H</i><sub>S</sub> and <i>F</i><sub>IS</sub>, this deviation can be informative about the process of asexual reproduction, but for <i>F</i><sub>ST</sub>, <i>F</i>′′<sub>ST</sub> and <i>D</i><sub>est</sub>, clone correction can lead to incorrect conclusions. I therefore argue that clone correction is not strictly necessary, but can in some cases be insightful. However, when clone correction is applied, it is imperative that results for both the corrected and uncorrected data are presented.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 3","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the realm of genome assembly, even minor errors can send researchers down to rabbit holes of unintended misinterpretation. Enter Klumpy—a tool designed to help detecting these elusive mistakes before they cause significant problems. By providing detailed, region-specific assessments and an intuitive visualisation platform, Klumpy (Madrigal, et al. 2024) empowers researchers to pinpoint and resolve potential issues with precision, paving the way for more reliable downstream analyses and discoveries.
{"title":"Detecting Assembly Errors With Klumpy: Building Confidence in Your Daily Genomic Analysis","authors":"Isheng Jason Tsai","doi":"10.1111/1755-0998.14037","DOIUrl":"10.1111/1755-0998.14037","url":null,"abstract":"<p>In the realm of genome assembly, even minor errors can send researchers down to rabbit holes of unintended misinterpretation. Enter Klumpy—a tool designed to help detecting these elusive mistakes before they cause significant problems. By providing detailed, region-specific assessments and an intuitive visualisation platform, Klumpy (Madrigal, et al. 2024) empowers researchers to pinpoint and resolve potential issues with precision, paving the way for more reliable downstream analyses and discoveries.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noa Yaffa Kan-Lingwood, Liran Sagi, Shahar Mazie, Naama Shahar, Lilith Zecherle Bitton, Alan Templeton, Daniel Rubenstein, Amos Bouskila, Shirli Bar-David
A major challenge in analysing single-nucleotide polymorphism (SNP) genotype datasets is detecting and filtering errors that bias analyses and misinterpret ecological and evolutionary processes. Here, we present a comprehensive method to estimate and minimise genotyping error rates (deviations from the ‘true’ genotype) in any SNP datasets using triplicates (three repeats of the same sample) in a four-step filtration pipeline. The approach involves: (1) SNP filtering by missing data; (2) SNP filtering by error rates; (3) sample filtering by missing data and (4) detection of recaptured individuals by using estimated SNP error rates. The modular pipeline is provided in an R script that allows customised adjustments. We demonstrate the applicability of the method using non-invasive sampling from the Asiatic wild ass (Equus hemionus) population in Israel. We genotyped 756 samples using 625 SNPs, of which 255 were triplicates of 85 samples. The average SNP error rate, calculated based on the number of mismatching genotypes across triplicates before filtration, was 0.0034 and was reduced to 0.00174 following filtration. Evaluating genetic distance (GD) and relatedness (r) between triplicates before and after filtration (expected to be at the minimum and maximum respectively) showed a significant reduction in the average GD, from 58.1 to 25.3 (p = 0.0002) and a significant increase in relatedness, from r = 0.98 to r = 0.991 (p = 0.00587). We demonstrate how error rate estimation enhances recapture detection and improves genotype quality.
分析单核苷酸多态性(SNP)基因型数据集的一个主要挑战是检测和过滤错误,这些错误会使分析产生偏差并误解生态和进化过程。在这里,我们提出了一种综合方法,利用三重样本(同一样本的三次重复)在四步过滤管道中估算并最小化任何 SNP 数据集中的基因分型错误率(与 "真实 "基因型的偏差)。该方法包括:(1) 根据缺失数据过滤 SNP;(2) 根据错误率过滤 SNP;(3) 根据缺失数据过滤样本;(4) 根据估计的 SNP 错误率检测重新捕获的个体。该模块化管道以 R 脚本的形式提供,可进行定制调整。我们利用对以色列亚洲野驴(Equus hemionus)种群的非侵入性采样证明了该方法的适用性。我们使用 625 个 SNP 对 756 个样本进行了基因分型,其中 255 个样本是 85 个样本的三倍体。根据过滤前三重样本中不匹配基因型的数量计算,SNP 平均错误率为 0.0034,过滤后降至 0.00174。评估过滤前后(预计分别为最小值和最大值)三重样之间的遗传距离(GD)和亲缘关系(r)显示,平均 GD 显著降低,从 58.1 降至 25.3(p = 0.0002),亲缘关系显著增加,从 r = 0.98 升至 r = 0.991(p = 0.00587)。我们展示了误差率估计是如何增强再捕获检测并提高基因型质量的。
{"title":"Genotyping Error Detection and Customised Filtration for SNP Datasets","authors":"Noa Yaffa Kan-Lingwood, Liran Sagi, Shahar Mazie, Naama Shahar, Lilith Zecherle Bitton, Alan Templeton, Daniel Rubenstein, Amos Bouskila, Shirli Bar-David","doi":"10.1111/1755-0998.14033","DOIUrl":"10.1111/1755-0998.14033","url":null,"abstract":"<div>\u0000 \u0000 <p>A major challenge in analysing single-nucleotide polymorphism (SNP) genotype datasets is detecting and filtering errors that bias analyses and misinterpret ecological and evolutionary processes. Here, we present a comprehensive method to estimate and minimise genotyping error rates (deviations from the ‘true’ genotype) in any SNP datasets using triplicates (three repeats of the same sample) in a four-step filtration pipeline. The approach involves: (1) SNP filtering by missing data; (2) SNP filtering by error rates; (3) sample filtering by missing data and (4) detection of recaptured individuals by using estimated SNP error rates. The modular pipeline is provided in an R script that allows customised adjustments. We demonstrate the applicability of the method using non-invasive sampling from the Asiatic wild ass (<i>Equus hemionus</i>) population in Israel. We genotyped 756 samples using 625 SNPs, of which 255 were triplicates of 85 samples. The average SNP error rate, calculated based on the number of mismatching genotypes across triplicates before filtration, was 0.0034 and was reduced to 0.00174 following filtration. Evaluating genetic distance (GD) and relatedness (<i>r</i>) between triplicates before and after filtration (expected to be at the minimum and maximum respectively) showed a significant reduction in the average GD, from 58.1 to 25.3 (<i>p</i> = 0.0002) and a significant increase in relatedness, from <i>r</i> = 0.98 to <i>r =</i> 0.991 (<i>p</i> = 0.00587). We demonstrate how error rate estimation enhances recapture detection and improves genotype quality.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yannis Schöneberg, Tracy Lynn Audisio, Alexander Ben Hamadou, Martin Forman, Jiří Král, Tereza Kořínková, Eva Líznarová, Christoph Mayer, Lenka Prokopcová, Henrik Krehenwinkel, Stefan Prost, Susan Kennedy
Spiders are a hyperdiverse taxon and among the most abundant predators in nearly all terrestrial habitats. Their success is often attributed to key developments in their evolution such as silk and venom production and major apomorphies such as a whole-genome duplication. Resolving deep relationships within the spider tree of life has been historically challenging, making it difficult to measure the relative importance of these novelties for spider evolution. Whole-genome data offer an essential resource in these efforts, but also for functional genomic studies. Here, we present de novo assemblies for three spider species: Ryuthela nishihirai (Liphistiidae), a representative of the ancient Mesothelae, the suborder that is sister to all other extant spiders; Uloborus plumipes (Uloboridae), a cribellate orbweaver whose phylogenetic placement is especially challenging; and Cheiracanthium punctorium (Cheiracanthiidae), which represents only the second family to be sequenced in the hyperdiverse Dionycha clade. These genomes fill critical gaps in the spider tree of life. Using these novel genomes along with 25 previously published ones, we examine the evolutionary history of spidroin gene and structural hox cluster diversity. Our assemblies provide critical genomic resources to facilitate deeper investigations into spider evolution. The near chromosome-level genome of the ‘living fossil’ R. nishihirai represents an especially important step forward, offering new insights into the origins of spider traits.
{"title":"Three Novel Spider Genomes Unveil Spidroin Diversification and Hox Cluster Architecture: Ryuthela nishihirai (Liphistiidae), Uloborus plumipes (Uloboridae) and Cheiracanthium punctorium (Cheiracanthiidae)","authors":"Yannis Schöneberg, Tracy Lynn Audisio, Alexander Ben Hamadou, Martin Forman, Jiří Král, Tereza Kořínková, Eva Líznarová, Christoph Mayer, Lenka Prokopcová, Henrik Krehenwinkel, Stefan Prost, Susan Kennedy","doi":"10.1111/1755-0998.14038","DOIUrl":"10.1111/1755-0998.14038","url":null,"abstract":"<p>Spiders are a hyperdiverse taxon and among the most abundant predators in nearly all terrestrial habitats. Their success is often attributed to key developments in their evolution such as silk and venom production and major apomorphies such as a whole-genome duplication. Resolving deep relationships within the spider tree of life has been historically challenging, making it difficult to measure the relative importance of these novelties for spider evolution. Whole-genome data offer an essential resource in these efforts, but also for functional genomic studies. Here, we present de novo assemblies for three spider species: <i>Ryuthela nishihirai</i> (Liphistiidae), a representative of the ancient Mesothelae, the suborder that is sister to all other extant spiders; <i>Uloborus plumipes</i> (Uloboridae), a cribellate orbweaver whose phylogenetic placement is especially challenging; and <i>Cheiracanthium punctorium</i> (Cheiracanthiidae), which represents only the second family to be sequenced in the hyperdiverse Dionycha clade. These genomes fill critical gaps in the spider tree of life. Using these novel genomes along with 25 previously published ones, we examine the evolutionary history of spidroin gene and structural hox cluster diversity. Our assemblies provide critical genomic resources to facilitate deeper investigations into spider evolution. The near chromosome-level genome of the ‘living fossil’ <i>R. nishihirai</i> represents an especially important step forward, offering new insights into the origins of spider traits.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anders K. Krabberød, Embla Stokke, Ella Thoen, Inger Skrede, Håvard Kauserud
Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.
{"title":"The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies","authors":"Anders K. Krabberød, Embla Stokke, Ella Thoen, Inger Skrede, Håvard Kauserud","doi":"10.1111/1755-0998.14031","DOIUrl":"10.1111/1755-0998.14031","url":null,"abstract":"<p>Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One essential initial step in the analysis of ancient DNA is to authenticate that the DNA sequencing reads are actually from ancient DNA. This is done by assessing if the reads exhibit typical characteristics of post-mortem damage (PMD), including cytosine deamination and nicks. We present a novel statistical method implemented in a fast multithreaded programme, ngsBriggs that enables rapid quantification of PMD by estimation of the Briggs ancient damage model parameters (Briggs parameters). Using a multinomial model with maximum likelihood fit, ngsBriggs accurately estimates the parameters of the Briggs model, quantifying the PMD signal from single and double-stranded DNA regions. We extend the original Briggs model to capture PMD signals for contemporary sequencing platforms and show that ngsBriggs accurately estimates the Briggs parameters across a variety of contamination levels. Classification of reads into ancient or modern reads, for the purpose of decontamination, is significantly more accurate using ngsBriggs than using other methods available. Furthermore, ngsBriggs is substantially faster than other state-of-the-art methods. ngsBriggs offers a practical and accurate method for researchers seeking to authenticate ancient DNA and improve the quality of their data.
分析古 DNA 的一个重要初始步骤是鉴定 DNA 测序读数是否真的来自古 DNA。要做到这一点,需要评估读数是否表现出典型的死后损伤(PMD)特征,包括胞嘧啶脱氨和刻痕。我们介绍了一种在快速多线程程序 ngsBriggs 中实施的新型统计方法,该方法可通过估算布里格斯古损伤模型参数(布里格斯参数)快速量化 PMD。ngsBriggs 使用最大似然拟合的多项式模型,准确估计了布里格斯模型的参数,量化了单链和双链 DNA 区域的 PMD 信号。我们对原始布里格斯模型进行了扩展,以捕捉当代测序平台的 PMD 信号,结果表明 ngsBriggs 能准确估计各种污染水平下的布里格斯参数。与其他可用方法相比,使用 ngsBriggs 将读数分为古代读数和现代读数以达到净化目的的准确性要高得多。此外,ngsBriggs 比其他最先进的方法快得多。ngsBriggs 为寻求鉴定古代 DNA 和提高数据质量的研究人员提供了一种实用而准确的方法。
{"title":"Revisiting the Briggs Ancient DNA Damage Model: A Fast Maximum Likelihood Method to Estimate Post-Mortem Damage","authors":"Lei Zhao, Rasmus Amund Henriksen, Abigail Ramsøe, Rasmus Nielsen, Thorfinn Sand Korneliussen","doi":"10.1111/1755-0998.14029","DOIUrl":"10.1111/1755-0998.14029","url":null,"abstract":"<p>One essential initial step in the analysis of ancient DNA is to authenticate that the DNA sequencing reads are actually from ancient DNA. This is done by assessing if the reads exhibit typical characteristics of post-mortem damage (PMD), including cytosine deamination and nicks. We present a novel statistical method implemented in a fast multithreaded programme, ngsBriggs that enables rapid quantification of PMD by estimation of the Briggs ancient damage model parameters (Briggs parameters). Using a multinomial model with maximum likelihood fit, ngsBriggs accurately estimates the parameters of the Briggs model, quantifying the PMD signal from single and double-stranded DNA regions. We extend the original Briggs model to capture PMD signals for contemporary sequencing platforms and show that ngsBriggs accurately estimates the Briggs parameters across a variety of contamination levels. Classification of reads into ancient or modern reads, for the purpose of decontamination, is significantly more accurate using ngsBriggs than using other methods available. Furthermore, ngsBriggs is substantially faster than other state-of-the-art methods. ngsBriggs offers a practical and accurate method for researchers seeking to authenticate ancient DNA and improve the quality of their data.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}