首页 > 最新文献

Molecular Ecology Resources最新文献

英文 中文
Historical Collections of Tropical Marine Mammals Are an Excellent Resource for Ancient DNA 热带海洋哺乳动物的历史收藏是古代DNA的优秀资源。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-17 DOI: 10.1111/1755-0998.70015
Lydia Hildebrand Furness, Richard Sabin, Marianne Strand Torvanger, Oliver Kersten, James H. Barrett, Bastiaan Star

The ability to predict ancient DNA sequencing success in natural history collections is critical to reducing the amount of destructive sampling of a finite resource. So far, studies investigating such success have predominantly focused on taxa with ranges restricted to temperate or cold environments at northern latitudes, which likely aids DNA preservation. Here, we report remarkably high aDNA sequencing success in Sirenia, herbivorous marine mammals of which the distribution is currently constrained to the global tropics. We investigate 91 samples from 85 specimens comprising all four contemporary species and one extinct species, comparing different sample types (cranial/post-cranial bone, skin and cartilage), species, collections, and material age. We obtained remarkably high (e.g., > 20%) endogenous DNA preservation for the majority (e.g., ~57% percent) of samples. Sequencing success was linked to sample type, with cranial bones (including petrous and tympanic bones) yielding significantly higher endogenous DNA. Additionally, we obtained variable, but potentially superior DNA results for preserved cartilage and hide samples that can be associated with historical bone. Although such tissue is not always present, this type of material is easy to sample, with very limited destructive impacts on the associated bones, and we therefore highlight its untapped potential as a source of DNA. Overall, our results show the high success of ancient DNA retrieval from historical collections of species with a tropical distribution, expanding on the types of specimens that are available for temporal genomic analyses.

预测自然历史收藏品中古代DNA测序成功的能力对于减少有限资源的破坏性采样数量至关重要。到目前为止,调查这种成功的研究主要集中在北纬温带或寒冷环境的分类群上,这可能有助于DNA的保存。在这里,我们报告了在Sirenia中非常高的aDNA测序成功,Sirenia是一种食草海洋哺乳动物,其分布目前仅限于全球热带地区。我们调查了85个标本中的91个样本,包括所有四个当代物种和一个灭绝物种,比较了不同的样本类型(颅骨/颅骨后骨、皮肤和软骨)、物种、收集和材料年龄。我们获得了非常高的内源DNA保存率(例如,约20%),大多数样品(例如,约57%)。测序的成功与样品类型有关,颅骨(包括岩质骨和鼓室骨)产生的内源性DNA明显较高。此外,我们还获得了与历史骨骼相关的保存软骨和兽皮样本的可变但潜在优越的DNA结果。虽然这种组织并不总是存在,但这种材料很容易取样,对相关骨骼的破坏性影响非常有限,因此我们强调其作为DNA来源的未开发潜力。总的来说,我们的研究结果表明,从热带分布的物种的历史收藏中检索古代DNA取得了很高的成功,扩展了可用于时间基因组分析的标本类型。
{"title":"Historical Collections of Tropical Marine Mammals Are an Excellent Resource for Ancient DNA","authors":"Lydia Hildebrand Furness,&nbsp;Richard Sabin,&nbsp;Marianne Strand Torvanger,&nbsp;Oliver Kersten,&nbsp;James H. Barrett,&nbsp;Bastiaan Star","doi":"10.1111/1755-0998.70015","DOIUrl":"10.1111/1755-0998.70015","url":null,"abstract":"<div>\u0000 \u0000 <p>The ability to predict ancient DNA sequencing success in natural history collections is critical to reducing the amount of destructive sampling of a finite resource. So far, studies investigating such success have predominantly focused on taxa with ranges restricted to temperate or cold environments at northern latitudes, which likely aids DNA preservation. Here, we report remarkably high aDNA sequencing success in <i>Sirenia</i>, herbivorous marine mammals of which the distribution is currently constrained to the global tropics. We investigate 91 samples from 85 specimens comprising all four contemporary species and one extinct species, comparing different sample types (cranial/post-cranial bone, skin and cartilage), species, collections, and material age. We obtained remarkably high (e.g., &gt; 20%) endogenous DNA preservation for the majority (e.g., ~57% percent) of samples. Sequencing success was linked to sample type, with cranial bones (including petrous and tympanic bones) yielding significantly higher endogenous DNA. Additionally, we obtained variable, but potentially superior DNA results for preserved cartilage and hide samples that can be associated with historical bone. Although such tissue is not always present, this type of material is easy to sample, with very limited destructive impacts on the associated bones, and we therefore highlight its untapped potential as a source of DNA. Overall, our results show the high success of ancient DNA retrieval from historical collections of species with a tropical distribution, expanding on the types of specimens that are available for temporal genomic analyses.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144648080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Tool Marine Metabarcoding Bioassessment for Baselining and Monitoring Species and Communities in Kelp Habitats 多工具海洋元条形码生物评价在海带生境的基线和监测物种和群落。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-17 DOI: 10.1111/1755-0998.70010
Giulia Maiello, Marilla R. Lippert, Erika F. Neave, Erik A. Hanson, Stephen R. Palumbi, Stefano Mariani

The astonishing biological diversity found in Californian kelp forests requires efficient and robust monitoring tools to better understand ecological trends and mitigate against loss or disruption of ecosystem services due to human pressure and climate changes. With environmental DNA (eDNA) metabarcoding becoming a popular biodiversity assessment approach, we set out to evaluate a combination of powerful, rapid and sustainable eDNA solutions for characterising marine community composition in kelp-dominated habitats along the central California coast, in the newly proposed Chumash Heritage National Marine Sanctuary. We employed and compared the efficiency of several eDNA collection approaches, including ‘traditional’ surface water filtration, the collection of organisms encrusting cobble rocks and various deployments of an artificial passive sampler, the metaprobe (i.e., attached to divers, dangled from a boat and cast from the shore using a fishing rod). By combining the information from fish specific (Tele02 12S) and universal metazoan (COI) markers, we ‘captured’ 501 unique marine taxa, belonging to at least 36 phyla, over 400 of which were identified to genus/species level, and including 52 vertebrate species typical of Californian kelp forest ecosystems. Despite differences in the type of biodiversity returned by the tested sampling methods, the overall community structure of the surveyed area was highly spatially structured and strongly influenced by the biogeographic break around Point Conception (Humqaq). We discuss the benefits of integrating eDNA metabarcoding in existing monitoring programs and devising a reproducible approach to monitor faunal changes in kelp forest habitats and beyond.

加州海带林中惊人的生物多样性需要有效和强大的监测工具,以更好地了解生态趋势,并减轻由于人类压力和气候变化而导致的生态系统服务丧失或中断。随着环境DNA (eDNA)元条形码成为一种流行的生物多样性评估方法,我们开始评估强大、快速和可持续的eDNA解决方案的组合,以表征加利福尼亚中部海岸以海带为主的栖息地的海洋群落组成,在新提议的丘马什遗产国家海洋保护区。我们采用并比较了几种eDNA收集方法的效率,包括“传统的”地表水过滤,收集鹅卵石岩石上的生物和人工被动采样器的各种部署,即元探针(即附着在潜水员身上,悬挂在船上,用鱼竿从岸上抛下)。通过结合鱼类特异性(Tele02 12S)和普遍后生动物(COI)标记的信息,我们“捕获”了501个独特的海洋分类群,属于至少36门,其中400多个被鉴定为属/种水平,其中包括52种典型的加利福尼亚海带森林生态系统脊椎动物。尽管不同采样方法所返回的生物多样性类型存在差异,但调查地区的整体群落结构具有高度的空间结构,并受到Humqaq点附近生物地理断裂的强烈影响。我们讨论了将eDNA元条形码整合到现有监测程序中的好处,并设计了一种可重复的方法来监测海带林栖息地及其他地区的动物变化。
{"title":"Multi-Tool Marine Metabarcoding Bioassessment for Baselining and Monitoring Species and Communities in Kelp Habitats","authors":"Giulia Maiello,&nbsp;Marilla R. Lippert,&nbsp;Erika F. Neave,&nbsp;Erik A. Hanson,&nbsp;Stephen R. Palumbi,&nbsp;Stefano Mariani","doi":"10.1111/1755-0998.70010","DOIUrl":"10.1111/1755-0998.70010","url":null,"abstract":"<p>The astonishing biological diversity found in Californian kelp forests requires efficient and robust monitoring tools to better understand ecological trends and mitigate against loss or disruption of ecosystem services due to human pressure and climate changes. With environmental DNA (eDNA) metabarcoding becoming a popular biodiversity assessment approach, we set out to evaluate a combination of powerful, rapid and sustainable eDNA solutions for characterising marine community composition in kelp-dominated habitats along the central California coast, in the newly proposed Chumash Heritage National Marine Sanctuary. We employed and compared the efficiency of several eDNA collection approaches, including ‘traditional’ surface water filtration, the collection of organisms encrusting cobble rocks and various deployments of an artificial passive sampler, the metaprobe (i.e., attached to divers, dangled from a boat and cast from the shore using a fishing rod). By combining the information from fish specific (Tele02 12S) and universal metazoan (COI) markers, we ‘captured’ 501 unique marine taxa, belonging to at least 36 phyla, over 400 of which were identified to genus/species level, and including 52 vertebrate species typical of Californian kelp forest ecosystems. Despite differences in the type of biodiversity returned by the tested sampling methods, the overall community structure of the surveyed area was highly spatially structured and strongly influenced by the biogeographic break around Point Conception (<i>Humqaq</i>). We discuss the benefits of integrating eDNA metabarcoding in existing monitoring programs and devising a reproducible approach to monitor faunal changes in kelp forest habitats and beyond.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144648081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conserved Sequence Identification Within Large Genomic Datasets Using ‘Unikseq2’: Application in Environmental DNA Assay Development 使用“Unikseq2”的大型基因组数据集中的保守序列鉴定:在环境DNA分析开发中的应用。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-11 DOI: 10.1111/1755-0998.70014
Mark Louie D. Lopez, René L. Warren, Michael J. Allison, Lauren Coombe, Jacob J. Imbery, Inanc Birol, Caren C. Helbing

Identification of conserved genomic sequences and their utilisation as anchor points for clade detection and/or characterisation is a mainstay in ecological studies. For environmental DNA (eDNA) assays, effective processing of large genomic datasets is crucial for reliable species detection in biodiversity monitoring. While considerable focus has been on developing robust species-targeted assays, eDNA assays with broader taxonomic coverage (e.g., detecting any species within a taxonomic group such as fish), can significantly streamline environmental monitoring, especially when detecting individual species' DNA proves challenging. Designing such assays requires identifying conserved regions representing the target taxonomic group, a chiefly manual task that is often labor-intensive and error-prone, particularly when working with large sequence datasets. To address these challenges, we present unikseq2, an enhanced, alignment-free, k-mer-based tool for identifying unique and conserved sequences. It introduces a new functionality to identify sequence conservation among target species, enabling more informed marker selection for applications such as universal primer design. This automates sequence selection in large-scale mitochondrial genome datasets eliminating the need for manual inspection of computationally costly multiple sequence alignments. Herein, we demonstrate unikseq2's capabilities by developing and validating eDNA assays for various taxa, including Osteichthyes (bony fishes), the Salmonidae family (salmon and trout), Myotis bats and Cervus deer. Unikseq2-based eDNA assays allow for accurate detection across multiple taxonomic levels, from genus to class, enhancing the flexibility, scalability and reliability of eDNA tools in environmental monitoring. By leveraging genomic data from public repositories, unikseq2 supports efficient, reproducible assay design, making it an invaluable tool for a wide range of ecological and biodiversity research applications.

鉴定保守的基因组序列并利用它们作为进化支检测和/或特征的锚点是生态学研究的主要内容。对于环境DNA (eDNA)分析,有效处理大型基因组数据集对于生物多样性监测中可靠的物种检测至关重要。虽然相当多的焦点一直放在开发强大的物种靶向分析上,但具有更广泛分类覆盖范围的eDNA分析(例如,检测一个分类群体中的任何物种,如鱼类)可以显著简化环境监测,特别是当检测单个物种的DNA被证明具有挑战性时。设计这样的分析需要识别代表目标分类群的保守区域,这是一项主要的手工任务,通常是劳动密集型的,容易出错,特别是在处理大型序列数据集时。为了解决这些挑战,我们提出了unikseq2,一个增强的,无比对的,基于k-mer的工具,用于识别独特和保守序列。它引入了一个新的功能来识别目标物种之间的序列保守性,为通用引物设计等应用提供了更明智的标记选择。这自动化了大规模线粒体基因组数据集的序列选择,消除了人工检查计算成本高的多个序列比对的需要。在这里,我们通过开发和验证不同分类群的eDNA分析来展示unikseq2的能力,包括骨鱼科(硬骨鱼类)、鲑鱼科(鲑鱼和鳟鱼)、Myotis蝙蝠和鹿。基于unikseq2的eDNA分析允许从属到类的多个分类水平的准确检测,增强环境监测中eDNA工具的灵活性,可扩展性和可靠性。通过利用来自公共存储库的基因组数据,unikseq2支持高效,可重复的分析设计,使其成为广泛的生态和生物多样性研究应用的宝贵工具。
{"title":"Conserved Sequence Identification Within Large Genomic Datasets Using ‘Unikseq2’: Application in Environmental DNA Assay Development","authors":"Mark Louie D. Lopez,&nbsp;René L. Warren,&nbsp;Michael J. Allison,&nbsp;Lauren Coombe,&nbsp;Jacob J. Imbery,&nbsp;Inanc Birol,&nbsp;Caren C. Helbing","doi":"10.1111/1755-0998.70014","DOIUrl":"10.1111/1755-0998.70014","url":null,"abstract":"<p>Identification of conserved genomic sequences and their utilisation as anchor points for clade detection and/or characterisation is a mainstay in ecological studies. For environmental DNA (eDNA) assays, effective processing of large genomic datasets is crucial for reliable species detection in biodiversity monitoring. While considerable focus has been on developing robust species-targeted assays, eDNA assays with broader taxonomic coverage (e.g., detecting any species within a taxonomic group such as fish), can significantly streamline environmental monitoring, especially when detecting individual species' DNA proves challenging. Designing such assays requires identifying conserved regions representing the target taxonomic group, a chiefly manual task that is often labor-intensive and error-prone, particularly when working with large sequence datasets. To address these challenges, we present <i>unikseq2</i>, an enhanced, alignment-free, k-mer-based tool for identifying unique and conserved sequences. It introduces a new functionality to identify sequence conservation among target species, enabling more informed marker selection for applications such as universal primer design. This automates sequence selection in large-scale mitochondrial genome datasets eliminating the need for manual inspection of computationally costly multiple sequence alignments. Herein, we demonstrate <i>unikseq2</i>'s capabilities by developing and validating eDNA assays for various taxa, including Osteichthyes (bony fishes), the Salmonidae family (salmon and trout), <i>Myotis</i> bats and <i>Cervus</i> deer. <i>Unikseq2</i>-based eDNA assays allow for accurate detection across multiple taxonomic levels, from genus to class, enhancing the flexibility, scalability and reliability of eDNA tools in environmental monitoring. By leveraging genomic data from public repositories, <i>unikseq2</i> supports efficient, reproducible assay design, making it an invaluable tool for a wide range of ecological and biodiversity research applications.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomes of Two Monophagous Weevils and Their Host Plant Provide Insights Into Evolution of Plant Defence and Insect Counter-Defence 两种单食象鼻虫及其寄主植物的基因组为植物防御和昆虫反防御的进化提供了新的见解。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-11 DOI: 10.1111/1755-0998.70009
Wei Song, Li-Jun Cao, Jin-Cui Chen, Wen-Juan Guo, Hui-Juan Li, Xue-Wen Sun, Ary Anthony Hoffmann, Jun-Bao Wen, Shu-Jun Wei

Plant secondary metabolites play important roles in defence against herbivorous insects. However, many insects can overcome plant defences even when they produce a rich toxin load, and an arms race between plants evolving new toxins and insects evolving to counter them is expected. Here, we deciphered genomic features linked to a potential race between the tree of heaven and two monophagous weevils that only feed on this tree species, with the tree of heaven producing a rich set of secondary metabolites involving about 745 compounds. We first assembled chromosome-level genomes for the tree of heaven and the two weevils. Comparative genomics showed an expansion of genes related to synthesising secondary metabolites in the tree, while in the weevils, genes related to detoxification and chemosensing expanded. The expansion of core genes involved in quassinoid biosynthesis in the tree was linked to tandem duplication and whole genome duplication, while the expansion of detoxifying GST and chemosensing SNMP genes in the two weevils was linked to tandem duplication and novel genes, respectively. The results indicate that plants and insect herbivores both reshaped their genomes through gene expansion, while the host tree also underwent whole genome duplication and the two weevils evolved novel genes. These changes likely reflect an arms race of defence and counter-defence, providing an understanding of genome evolution driven by trophic interactions.

植物次生代谢物在抵御草食性昆虫方面起着重要作用。然而,许多昆虫即使在产生丰富的毒素负荷时也能克服植物的防御,并且预计在进化出新毒素的植物和进化出对抗它们的昆虫之间会发生军备竞赛。在这里,我们破译了基因组特征,这些特征与天堂之树和两种只以天堂之树为食的单食象鼻虫之间的潜在竞争有关,天堂之树产生了一套丰富的次级代谢物,涉及约745种化合物。我们首先组装了天树和两种象鼻虫的染色体水平基因组。比较基因组学显示,在树中与合成次生代谢物相关的基因扩增,而在象鼻虫中,与解毒和化学感应相关的基因扩增。拟assinoid生物合成核心基因的扩增与串联重复和全基因组重复有关,解毒GST和化学感应SNMP基因的扩增则分别与串联重复和新基因有关。结果表明,植物和昆虫食草动物都通过基因扩增重塑了基因组,而寄主树也经历了全基因组复制,两种象鼻虫进化出了新的基因。这些变化可能反映了防御和反防御的军备竞赛,提供了由营养相互作用驱动的基因组进化的理解。
{"title":"Genomes of Two Monophagous Weevils and Their Host Plant Provide Insights Into Evolution of Plant Defence and Insect Counter-Defence","authors":"Wei Song,&nbsp;Li-Jun Cao,&nbsp;Jin-Cui Chen,&nbsp;Wen-Juan Guo,&nbsp;Hui-Juan Li,&nbsp;Xue-Wen Sun,&nbsp;Ary Anthony Hoffmann,&nbsp;Jun-Bao Wen,&nbsp;Shu-Jun Wei","doi":"10.1111/1755-0998.70009","DOIUrl":"10.1111/1755-0998.70009","url":null,"abstract":"<div>\u0000 \u0000 <p>Plant secondary metabolites play important roles in defence against herbivorous insects. However, many insects can overcome plant defences even when they produce a rich toxin load, and an arms race between plants evolving new toxins and insects evolving to counter them is expected. Here, we deciphered genomic features linked to a potential race between the tree of heaven and two monophagous weevils that only feed on this tree species, with the tree of heaven producing a rich set of secondary metabolites involving about 745 compounds. We first assembled chromosome-level genomes for the tree of heaven and the two weevils. Comparative genomics showed an expansion of genes related to synthesising secondary metabolites in the tree, while in the weevils, genes related to detoxification and chemosensing expanded. The expansion of core genes involved in quassinoid biosynthesis in the tree was linked to tandem duplication and whole genome duplication, while the expansion of detoxifying <i>GST</i> and chemosensing <i>SNMP</i> genes in the two weevils was linked to tandem duplication and novel genes, respectively. The results indicate that plants and insect herbivores both reshaped their genomes through gene expansion, while the host tree also underwent whole genome duplication and the two weevils evolved novel genes. These changes likely reflect an arms race of defence and counter-defence, providing an understanding of genome evolution driven by trophic interactions.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions “高信息量”的遗传标记可能导致结论偏差:示例和一般解决方案。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-11 DOI: 10.1111/1755-0998.70011
Andy Lee, William Hemstrom, Natalie Molea, Gordon Luikart, Mark R. Christie

High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).

高分级偏倚是指由于模型过拟合而导致的位点子集的高估能力。使用经验和模拟数据集,我们表明,当选择(即确定)高信息或高fst标记并用于后续评估时,高分级偏差可能导致对群体结构的严重高估,从而误导研究者,这是群体遗传研究中的一种常见做法。这个问题可能发生在没有适应当地环境的流感人群中。选择高fst标记的有偏差的结果可能会对管理和保护产生严重的下游影响,例如错误的保护单元划定,这可能会浪费有限的保护资源来保护错误定义的“种群”。此外,我们警告说,高分级并不局限于FST方法;每当首先选择一小部分标记来解释基于差异程度的群体之间的差异,然后再利用来估计这些群体之间的差异程度时,高分级偏差就会引起关注。例如,选择高FST位点用于GT-seq面板或使用差异表达基因来绘制多变量空间中的样本隶属度,都可能导致不存在的虚假结构。我们说明,使用基于统计的离群值检验来代替任意的FST截止值可以减少偏差。另外,排列试验或交叉评价可用于检测高分级偏倚。我们提供了一个R软件包PCAssess,通过自动化排列测试和主成分分析来帮助研究人员检测和防止遗传数据集中的高分级偏差(https://github.com/hemstrow/PCAssess)。
{"title":"‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions","authors":"Andy Lee,&nbsp;William Hemstrom,&nbsp;Natalie Molea,&nbsp;Gordon Luikart,&nbsp;Mark R. Christie","doi":"10.1111/1755-0998.70011","DOIUrl":"10.1111/1755-0998.70011","url":null,"abstract":"<p>High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-<i>F</i><sub><i>ST</i></sub> markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation<i>.</i> Biased results from choosing high-<i>F</i><sub><i>ST</i></sub> markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to <i>F</i><sub><i>ST</i></sub> approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high <i>F</i><sub><i>ST</i></sub> loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary <i>F</i><sub><i>ST</i></sub> cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pan-Genome of Jasminum sambac Reveals the Genetic Diversity of Different Petal Morphology and Aroma-Related Genes 茉莉泛基因组揭示了不同花瓣形态和香气相关基因的遗传多样性。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-09 DOI: 10.1111/1755-0998.70013
Wenmin Fan, Zhenyang Liao, Mengya Gu, Yuhang Zhang, Wenlong Lei, Yingao Zhang, Huike Li, Jiawei Yan, Yezi Xiao, Hongzheng Lin, Shan Jin, Youben Yu, Jingping Fang, Naixing Ye, Pengjie Wang

Jasmine (Jasminum sambac) is globally renowned for its distinct fragrance and ornamental appeal, existing primarily in three floral morphologies: single-petal, double-petal and multi-petal. De novo sequencing and chromosome-level genome assembly were performed on two distinct jasmine varieties: ‘Yuanye’ double-petal and ‘Bijian’ multi-petal jasmines. These assemblies, along with three previously published genomes, were integrated to construct a pan-genome framework that comprehensively encompasses both the core and variable genomic components of jasmine. A substantial number of structural variations (SVs) and single nucleotide polymorphisms (SNPs) had been identified, of which 89.5% were insertions/deletions (size ≥ 50 bp), whereas gene families also exhibited significant contractions and expansions, revealing the high complexity and dynamics of the jasmine genomes. Comparative genomic approaches further revealed multiple transcription factor families associated with aromatic biosynthesis, floral organogenesis and environmental adaptability. Key genes involved in the formation of jasmine scent, with a particular focus on the variation in copy number and expression levels of critical enzyme genes responsible for the production of four major volatile terpenoids and benzyl acetate, thereby elucidating the genetic basis of jasmine aroma diversity. Additionally, within the MADS-box gene family, the PI and AP3 subfamilies are hypothesized to play crucial roles in the development of floral organs. Through the integration of these comprehensive data, a pan-genome website for jasmine was developed to facilitate data download and visualise genomic variations via a genome browser (https://www.pan-jasmine.cn/). In summary, this work provides valuable genomic resources for the genetic enhancement and marker-assisted breeding of jasmine.

茉莉花(Jasminum sambac)以其独特的香味和观赏吸引力而闻名于世,主要有三种花型:单瓣、双瓣和多瓣。对两个不同的茉莉品种“元野”重瓣和“碧健”多瓣进行了从头测序和染色体水平的基因组组装。这些组装,连同三个先前发表的基因组,被整合到一个泛基因组框架,全面包括茉莉花的核心和可变基因组成分。发现了大量的结构变异(sv)和单核苷酸多态性(snp),其中89.5%为插入/缺失(≥50 bp),而基因家族也表现出显著的收缩和扩张,揭示了茉莉基因组的高度复杂性和动态性。比较基因组学方法进一步揭示了与芳香生物合成、花器官发生和环境适应性相关的多个转录因子家族。参与茉莉花香气形成的关键基因,重点关注负责产生四种主要挥发性萜类和乙酸苄酯的关键酶基因的拷贝数和表达水平的变化,从而阐明茉莉花香气多样性的遗传基础。此外,在MADS-box基因家族中,PI和AP3亚家族被认为在花器官的发育中起着至关重要的作用。通过整合这些全面的数据,开发了一个茉莉泛基因组网站,方便通过基因组浏览器下载数据和可视化基因组变异(https://www.pan-jasmine.cn/)。本研究为茉莉的遗传增强和标记辅助育种提供了宝贵的基因组资源。
{"title":"Pan-Genome of Jasminum sambac Reveals the Genetic Diversity of Different Petal Morphology and Aroma-Related Genes","authors":"Wenmin Fan,&nbsp;Zhenyang Liao,&nbsp;Mengya Gu,&nbsp;Yuhang Zhang,&nbsp;Wenlong Lei,&nbsp;Yingao Zhang,&nbsp;Huike Li,&nbsp;Jiawei Yan,&nbsp;Yezi Xiao,&nbsp;Hongzheng Lin,&nbsp;Shan Jin,&nbsp;Youben Yu,&nbsp;Jingping Fang,&nbsp;Naixing Ye,&nbsp;Pengjie Wang","doi":"10.1111/1755-0998.70013","DOIUrl":"10.1111/1755-0998.70013","url":null,"abstract":"<div>\u0000 \u0000 <p>Jasmine (<i>Jasminum sambac</i>) is globally renowned for its distinct fragrance and ornamental appeal, existing primarily in three floral morphologies: single-petal, double-petal and multi-petal. De novo sequencing and chromosome-level genome assembly were performed on two distinct jasmine varieties: ‘Yuanye’ double-petal and ‘Bijian’ multi-petal jasmines. These assemblies, along with three previously published genomes, were integrated to construct a pan-genome framework that comprehensively encompasses both the core and variable genomic components of jasmine. A substantial number of structural variations (SVs) and single nucleotide polymorphisms (SNPs) had been identified, of which 89.5% were insertions/deletions (size ≥ 50 bp), whereas gene families also exhibited significant contractions and expansions, revealing the high complexity and dynamics of the jasmine genomes. Comparative genomic approaches further revealed multiple transcription factor families associated with aromatic biosynthesis, floral organogenesis and environmental adaptability. Key genes involved in the formation of jasmine scent, with a particular focus on the variation in copy number and expression levels of critical enzyme genes responsible for the production of four major volatile terpenoids and benzyl acetate, thereby elucidating the genetic basis of jasmine aroma diversity. Additionally, within the MADS-box gene family, the PI and AP3 subfamilies are hypothesized to play crucial roles in the development of floral organs. Through the integration of these comprehensive data, a pan-genome website for jasmine was developed to facilitate data download and visualise genomic variations via a genome browser (https://www.pan-jasmine.cn/). In summary, this work provides valuable genomic resources for the genetic enhancement and marker-assisted breeding of jasmine.</p>\u0000 </div>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144590045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enrichment of Helminth Mitochondrial Genomes From Faecal Samples Using Hybridisation Capture 利用杂交捕获技术富集粪便样本中的蠕虫线粒体基因组。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-09 DOI: 10.1111/1755-0998.70005
Marina Papaiakovou, Andrea Waeschenbach, Roy M. Anderson, Piet Cools, Zeleke Mekonnen, D. Timothy J. Littlewood, Cinzia Cantacessi, Stephen R. Doyle

New approaches are urgently needed to enrich rare or low-abundant DNA in complex samples. Soil-transmitted helminths (STHs) inhabit heterogeneous environments, including the gastrointestinal tract of their host as adults and are excreted as eggs and larvae in faeces, complicating our understanding of their biology and the use of genetic tools for species monitoring and population tracking. We have developed a hybridisation capture approach to enrich mitochondrial genome sequences of two STH species, the roundworm Ascaris lumbricoides and whipworm Trichuris trichiura, from extracted DNA from faecal material and worm specimens. Employing ~1000 targeted probes, we achieved > 6000 and > 12,000 fold enrichment for A. lumbricoides and T. trichiura, respectively, relative to direct whole genome shotgun (WGS) sequencing. Sequencing coverage was highly concordant with probe targets and correlated with the number of eggs per gram (EPG) of parasites present, from which DNA from as few as 336 EPG for Ascaris and 48 EPG for Trichuris were efficiently captured and sufficient to provide effective mitochondrial genome data. Finally, allele frequencies were highly concordant between WGS and hybridisation capture, suggesting little genetic information is lost with additional sample processing required for enrichment. Our hybridisation capture design and approach enable sensitive and flexible STH mitochondrial genome sampling from faecal DNA extracts and pave the way for broader hybridisation capture-based genome-wide applications and molecular epidemiology studies of STHs.

迫切需要新的方法来富集复杂样品中的稀有或低丰度DNA。土壤传播蠕虫(STHs)栖息在异质环境中,包括成年寄主的胃肠道,并以卵和幼虫的形式随粪便排出体外,这使我们对其生物学的理解和对物种监测和种群跟踪的遗传工具的使用复杂化。我们开发了一种杂交捕获方法,从粪便和蠕虫标本中提取DNA,富集两种STH物种,即蛔虫蛔虫和鞭虫的线粒体基因组序列。使用约1000个目标探针,我们分别获得了相对于直接全基因组霰弹枪(WGS)测序的A. lumbriicoides和T. trichiura的bbb6000和> 12,000倍富集。测序覆盖率与探针目标高度一致,并与存在的寄生虫每克卵(EPG)的数量相关,其中蛔虫的336个EPG和滴虫的48个EPG的DNA被有效捕获,足以提供有效的线粒体基因组数据。最后,等位基因频率在WGS和杂交捕获之间高度一致,表明在富集所需的额外样品处理中几乎没有遗传信息丢失。我们的杂交捕获设计和方法可以从粪便DNA提取物中进行敏感和灵活的STH线粒体基因组采样,为更广泛的基于杂交捕获的全基因组应用和STHs的分子流行病学研究铺平道路。
{"title":"Enrichment of Helminth Mitochondrial Genomes From Faecal Samples Using Hybridisation Capture","authors":"Marina Papaiakovou,&nbsp;Andrea Waeschenbach,&nbsp;Roy M. Anderson,&nbsp;Piet Cools,&nbsp;Zeleke Mekonnen,&nbsp;D. Timothy J. Littlewood,&nbsp;Cinzia Cantacessi,&nbsp;Stephen R. Doyle","doi":"10.1111/1755-0998.70005","DOIUrl":"10.1111/1755-0998.70005","url":null,"abstract":"<p>New approaches are urgently needed to enrich rare or low-abundant DNA in complex samples. Soil-transmitted helminths (STHs) inhabit heterogeneous environments, including the gastrointestinal tract of their host as adults and are excreted as eggs and larvae in faeces, complicating our understanding of their biology and the use of genetic tools for species monitoring and population tracking. We have developed a hybridisation capture approach to enrich mitochondrial genome sequences of two STH species, the roundworm <i>Ascaris lumbricoides</i> and whipworm <i>Trichuris trichiura,</i> from extracted DNA from faecal material and worm specimens. Employing ~1000 targeted probes, we achieved &gt; 6000 and &gt; 12,000 fold enrichment for <i>A. lumbricoides</i> and <i>T. trichiura,</i> respectively, relative to direct whole genome shotgun (WGS) sequencing. Sequencing coverage was highly concordant with probe targets and correlated with the number of eggs per gram (EPG) of parasites present, from which DNA from as few as 336 EPG for <i>Ascaris</i> and 48 EPG for <i>Trichuris</i> were efficiently captured and sufficient to provide effective mitochondrial genome data. Finally, allele frequencies were highly concordant between WGS and hybridisation capture, suggesting little genetic information is lost with additional sample processing required for enrichment. Our hybridisation capture design and approach enable sensitive and flexible STH mitochondrial genome sampling from faecal DNA extracts and pave the way for broader hybridisation capture-based genome-wide applications and molecular epidemiology studies of STHs.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144590044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking Imputed Low Coverage Genomes in a Human Population Genetics Context 在人类群体遗传学背景下对标估算低覆盖率基因组。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-08 DOI: 10.1111/1755-0998.70007
Gludhug A. Purnomo, João C. Teixeira, Herawati Sudoyo, Bastien Llamas, Raymond Tobler

Ongoing advances in population genomic methodologies have recently enabled the study of millions of loci across hundreds of genomes at a relatively low cost, by leveraging a combination of low-coverage shotgun sequencing and innovative genotype imputation methods. This approach has the potential to provide abundant genotype information at low costs comparable to another widely used cost-effective genotyping approach—that is, SNP panels—while avoiding potential issues related to loci being ascertained in distantly related populations. Nonetheless, the wide adoption of imputation methods in humans and other species is currently constrained by the lack of publicly available reference panels that capture diversity representative of the target genomes—though the recent development of ‘joint’ imputation approaches, which allow genetic information from the target population to be used in genotype calling, may potentially mitigate this shortcoming. Here, we assess the performance of multiple genotyping approaches on eight low coverage genomes (range ~3× to ~5×) sourced from different Indonesian populations—including a joint imputation approach that leverages 248 additional low coverage genomes (mean ~2.4×) from related populations. The inclusion of these related genomes in the joint imputation process resulted in more accurate genotype calls and produced population genetic inferences with similar accuracy but improved precision compared to pseudohaploid calls—even though the reference panel was only weakly representative of the target genomes. These results highlight the enormous potential of joint imputation to enable economical population genetic research for taxa that are currently poorly represented in publicly available reference panels.

群体基因组方法学的不断进步,最近使得利用低覆盖率鸟枪测序和创新的基因型插补方法的结合,以相对较低的成本研究数百个基因组中的数百万个位点成为可能。与另一种广泛使用的具有成本效益的基因分型方法(即SNP面板)相比,这种方法有可能以低成本提供丰富的基因型信息,同时避免在远亲群体中确定位点相关的潜在问题。尽管如此,在人类和其他物种中广泛采用的归算方法目前受到缺乏公开可用的参考面板的限制,这些参考面板可以捕获目标基因组的多样性代表-尽管最近发展的“联合”归算方法允许将目标群体的遗传信息用于基因型调用,可能会潜在地减轻这一缺点。在这里,我们评估了多种基因分型方法对来自不同印度尼西亚人群的8个低覆盖率基因组(范围为~ 3x至~ 5x)的性能,包括利用来自相关人群的248个额外低覆盖率基因组(平均~2.4 x)的联合归算方法。将这些相关基因组纳入到联合代入过程中,产生了更准确的基因型呼叫,并产生了与伪单倍体呼叫相似的种群遗传推断,但精度提高了,尽管参考面板仅能微弱地代表目标基因组。这些结果突出了联合归算的巨大潜力,可以对目前在公共参考小组中代表性不足的分类群进行经济的种群遗传研究。
{"title":"Benchmarking Imputed Low Coverage Genomes in a Human Population Genetics Context","authors":"Gludhug A. Purnomo,&nbsp;João C. Teixeira,&nbsp;Herawati Sudoyo,&nbsp;Bastien Llamas,&nbsp;Raymond Tobler","doi":"10.1111/1755-0998.70007","DOIUrl":"10.1111/1755-0998.70007","url":null,"abstract":"<p>Ongoing advances in population genomic methodologies have recently enabled the study of millions of loci across hundreds of genomes at a relatively low cost, by leveraging a combination of low-coverage shotgun sequencing and innovative genotype imputation methods. This approach has the potential to provide abundant genotype information at low costs comparable to another widely used cost-effective genotyping approach—that is, SNP panels—while avoiding potential issues related to loci being ascertained in distantly related populations. Nonetheless, the wide adoption of imputation methods in humans and other species is currently constrained by the lack of publicly available reference panels that capture diversity representative of the target genomes—though the recent development of ‘joint’ imputation approaches, which allow genetic information from the target population to be used in genotype calling, may potentially mitigate this shortcoming. Here, we assess the performance of multiple genotyping approaches on eight low coverage genomes (range ~3× to ~5×) sourced from different Indonesian populations—including a joint imputation approach that leverages 248 additional low coverage genomes (mean ~2.4×) from related populations. The inclusion of these related genomes in the joint imputation process resulted in more accurate genotype calls and produced population genetic inferences with similar accuracy but improved precision compared to pseudohaploid calls—even though the reference panel was only weakly representative of the target genomes. These results highlight the enormous potential of joint imputation to enable economical population genetic research for taxa that are currently poorly represented in publicly available reference panels.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144582696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RDAforest: Identifying Environmental Drivers of Polygenic Adaptation 森林:确定多基因适应的环境驱动因素。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-04 DOI: 10.1111/1755-0998.70002
Mikhail V. Matz, Kristina L. Black

Identifying environmental gradients driving genetic adaptation is one of the major goals of ecological genomics. We present RDAforest, a methodology that leverages the predominantly polygenic nature of adaptation and harnesses the versatility of random forest regression to solve this problem. Instead of computing individual SNP-environment associations, RDAforest seeks to explain the overall genetic covariance structure based on multiple environmental predictors. By relying on random forest instead of linear regression, this method can detect non-linear and non-monotonous dependencies as well as all possible interactions between predictors. It also incorporates a novel procedure to select the best predictor out of several correlated ones, and uses jackknifing to model uncertainty of genetic structure determination. Lastly, our methodology incorporates delineation and plotting of “adaptive neighbourhoods”—areas on the landscape that are predicted to harbour differentially adapted individuals. Such maps can be used as a guide for planning conservation and ecological restoration efforts. We demonstrate the use of RDAforest in two simulated scenarios and one real dataset (North American grey wolves).

识别驱动遗传适应的环境梯度是生态基因组学的主要目标之一。我们提出rdforest,一种利用适应的主要多基因性质和利用随机森林回归的多功能性来解决这个问题的方法。rdforest不是计算单个snp -环境关联,而是试图解释基于多个环境预测因子的整体遗传协方差结构。通过依赖随机森林而不是线性回归,该方法可以检测非线性和非单调的依赖关系以及预测因子之间所有可能的相互作用。它还采用了一种新的方法,从几个相关的预测因子中选择最佳的预测因子,并使用jackknife来模拟遗传结构确定的不确定性。最后,我们的方法结合了“适应性社区”的描绘和绘图,即景观上预计容纳不同适应个体的区域。这些地图可以作为规划保护和生态恢复工作的指南。我们在两个模拟场景和一个真实数据集(北美灰狼)中演示了rdforest的使用。
{"title":"RDAforest: Identifying Environmental Drivers of Polygenic Adaptation","authors":"Mikhail V. Matz,&nbsp;Kristina L. Black","doi":"10.1111/1755-0998.70002","DOIUrl":"10.1111/1755-0998.70002","url":null,"abstract":"<p>Identifying environmental gradients driving genetic adaptation is one of the major goals of ecological genomics. We present RDAforest, a methodology that leverages the predominantly polygenic nature of adaptation and harnesses the versatility of random forest regression to solve this problem. Instead of computing individual SNP-environment associations, RDAforest seeks to explain the overall genetic covariance structure based on multiple environmental predictors. By relying on random forest instead of linear regression, this method can detect non-linear and non-monotonous dependencies as well as all possible interactions between predictors. It also incorporates a novel procedure to select the best predictor out of several correlated ones, and uses jackknifing to model uncertainty of genetic structure determination. Lastly, our methodology incorporates delineation and plotting of “adaptive neighbourhoods”—areas on the landscape that are predicted to harbour differentially adapted individuals. Such maps can be used as a guide for planning conservation and ecological restoration efforts. We demonstrate the use of RDAforest in two simulated scenarios and one real dataset (North American grey wolves).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144566863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seeing the Forest Despite the Trees in Repeat-Rich Genomic Regions 在重复丰富的基因组区域,尽管有树木,但仍能看到森林。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-07-03 DOI: 10.1111/1755-0998.70008
Amanda M. Larracuente, John S. Sproul

Technological advances are producing genome assemblies of increasing quality at steadily decreasing costs. These assemblies enable the extraction of rich biological information from previously inaccessible genomic regions (e.g., repeat-rich regions) and from diverse organisms underrepresented in genomic research. Gaining functional insights from new assemblies often requires generating additional data sets, experimental approaches and complex analysis. Novel analytical methods that substantially shorten the path to biological insights are valuable, particularly if they draw conclusions from the direct analysis of assemblies. In this issue of Molecular Ecology Resources, Elphinstone et al. (2025) present RepeatOBserver—a tool to visualise repeat organisation through direct analysis of chromosome-scale assemblies. This tool facilitates the summary and visualisation of large- and fine-scale patterns of repetitive DNA sequence structure across assemblies. Their approach borrows metrics from information theory, which have found uses in ecology (i.e., the Shannon Diversity Index), to help infer functional regions within repetitive sequences including putative centromeres. Importantly, RepeatOBserver does not require annotations, repeat libraries or functional genomic data—just a high-quality assembly. This type of tool addresses ongoing challenges in mapping the structure and functions of repeat-rich chromosomal regions, which remain the least well-understood components of genomes.

The availability of chromosome-scale genome assemblies is growing rapidly, as advances in long-read sequencing technology make assembly-based approaches accessible to more taxa. These genome assemblies can reveal important insights into genome biology, biomedicine and biodiversity. Our ability to extract these insights from assemblies is built on decades of hard-won work in early genomic model organisms. For example, early work on gene structure, regulation and evolution provided a knowledge base for ab initio gene prediction from nothing more than a DNA sequence. While annotation tools for non-coding sequences like the abundant repetitive DNAs found in most eukaryotic genomes are now accessible, the methods to extract insights from these regions are less mature. Repetitive DNAs evolve rapidly: their composition, organisation and abundance varies across species (Yunis and Yasmineh 1971), making predictions based on sequence conservation difficult. Many insights require functional genomic data (e.g., ChIP-seq, methylation and ATAC-seq), which may be challenging to access in non-model systems. Despite recent progress in resolving repeats in chromosome-scale assemblies, their assembly and annotation remain non-trivial problems (Lower et al. 2018).

Some genome regions with critical functions are enriched in, or entirely composed of, repeated DNA sequences. Centromeres—the essential structures that guide chromosome segr

技术进步正在以稳定降低的成本生产质量不断提高的基因组组件。这些组件能够从以前无法访问的基因组区域(例如,重复丰富的区域)和基因组研究中代表性不足的各种生物体中提取丰富的生物信息。从新组件中获得功能见解通常需要生成额外的数据集、实验方法和复杂的分析。新颖的分析方法大大缩短了生物学见解的路径是有价值的,特别是如果它们从组装的直接分析中得出结论。在本期的《分子生态资源》中,Elphinstone等人(2025)展示了repeatobserver——一种通过直接分析染色体尺度组装体来可视化重复序列组织的工具。该工具有助于总结和可视化跨组装的重复DNA序列结构的大型和精细模式。他们的方法借鉴了信息论的指标,信息论在生态学中已经有了应用(例如,香农多样性指数),以帮助推断重复序列中的功能区域,包括假定的着丝粒。重要的是,RepeatOBserver不需要注释、重复库或功能基因组数据——只需要一个高质量的程序集。这种类型的工具解决了在绘制富含重复序列的染色体区域的结构和功能方面的持续挑战,这些区域仍然是基因组中了解最少的组成部分。随着长读测序技术的进步,染色体尺度基因组组装的可用性正在迅速增长,使基于组装的方法可用于更多的分类群。这些基因组组装可以揭示基因组生物学,生物医学和生物多样性的重要见解。我们从组装中提取这些见解的能力是建立在几十年来在早期基因组模式生物中来之不易的工作基础上的。例如,早期关于基因结构、调控和进化的工作为从头开始基因预测提供了知识基础,而仅仅是DNA序列。虽然非编码序列(如在大多数真核生物基因组中发现的大量重复dna)的注释工具现在可以访问,但从这些区域提取见解的方法尚不成熟。重复dna进化迅速:它们的组成、组织和丰度因物种而异(Yunis和Yasmineh, 1971),这使得基于序列保守的预测变得困难。许多见解需要功能基因组数据(例如ChIP-seq,甲基化和ATAC-seq),这在非模型系统中可能具有挑战性。尽管最近在解决染色体尺度组装中的重复方面取得了进展,但它们的组装和注释仍然是非常重要的问题(Lower et al. 2018)。一些具有关键功能的基因组区域富含或完全由重复的DNA序列组成。着丝粒——在细胞分裂过程中指导染色体分离的基本结构——通常嵌入重复区域,并且可能仍然是最具挑战性的基因组区域。着丝粒预测尤其具有挑战性,因为:(1)它们通常由着丝粒特异性组蛋白变体(CENP-A)的存在而不是由特定的DNA序列来定义;(2)它们往往发生在重复密集的染色体区域,这些区域富含卫星DNA和/或转座因子,可以排列成高顺序的重复序列(Allshire和Karpen 2008)。着丝粒的大小和组织在不同物种之间差异很大(Hartley和O'Neill 2019年进行了综述)。基因组研究直到最近才开始揭示一些真菌、植物和动物的着丝粒的详细组织结构。这些研究中出现了一些有趣的模式:功能着丝粒核心可能与低重复多样性区域和DNA甲基化区域下降相关(例如,Altemose et al. 2022)。然而,推断重复DNA是这些功能域之一的一部分仍然是一个挑战,因为这需要高质量的组装和额外的支持数据集(例如DNA-蛋白质相互作用和甲基化数据)。RepeatOBserver (Elphinstone et al. 2025)是一个可视化和分析基因组重复模式的工具,不需要先验的重复注释或额外的基因组数据(https://github.com/celphin/RepeatOBserverV1)。可视化重复序列在基因组中的分布和多样性可以标记与着丝粒等功能区域一致的特征,并有助于识别其他重复序列丰富区域的结构模式。这个工具可以帮助利用越来越多的染色体规模组装,为基因组生物学提供重要的见解。RepeatOBserver的可视化是围绕DNA在染色体尺度上的傅里叶变换构建的(图1)。傅里叶变换是一种将复杂信号分解成其组成部分的数学方法。 当应用于基因组序列数据时,它可以帮助识别生物模式,包括基因结构、系统发育关系和重复基序。检测DNA序列数据中隐藏周期性的能力使傅里叶变换成为重复识别的良好选择(例如,Sharma et al. 2004)。RepeatOBserver使用沿着染色体滑动的DNA行走,根据序列组成将核苷酸转换为数值。然后,该工具应用快速傅里叶变换(FFT),并将得到的光谱绘制为热图(图1)。这些图揭示了每条染色体上的重复序列组织,突出了重复序列的位置和长度以及它们的丰度。重复序列多样性和丰度的对比模式可以揭示基因组区域的结构,否则很难分析。通过借用生态学中经常使用的信息论方法(Shannon 1948), RepeatOBserver总结了重复多样性的模式。香农多样性指数(SDI)通常用来描述物种多样性,但它正在寻找生态学研究之外的用途。在这里,基因组SDI不是描述地理区域内的物种多样性,而是基于傅立叶谱在序列窗口内总结不同重复序列的多样性。绘制sdi和重复序列丰度可以确定染色体间重复序列多样性的趋势。SDI值的一个令人兴奋的应用是仅基于DNA序列数据预测功能染色体区域,如着丝粒(图1)。富含串联卫星dna的单中心染色体,就像在人类中发现的那样,倾向于在较大的重复序列阵列中形成均匀的年轻重复区域(Altemose等人,2022;在Naish和Henderson 2024中进行了综述)。因此,着丝粒区域应具有较低的重复多样性,并在SDI样地中表现为局部最小值。另外,具有单中心染色体的物种,如小麦,富含转座子,应该出现在高重复丰度的区域。具有许多分散着丝粒的全新中心染色体可以在染色体上分散的重复簇上出现局部SDI最小值。因此,RepeatOBserver对重复序列多样性和丰度的总结可以为一系列已知着丝粒类型的假定着丝粒鉴定提供一个起点。作者在12种植物和动物物种(159条染色体)中验证了许多着丝粒预测,这些预测代表着不同的着丝粒大小和组织,每个着丝粒都有实验数据支持假设的着丝粒位置。RepeatOBserver报道的重复分布的总体模式在很大程度上符合对精心设计的植物和动物基因组的预期,如着丝粒周围卫星的位置,着丝粒、异染色质和常染色质结构域之间的边界。RepeatOBserver输出可以帮助识别其他结构模式,包括周中心和亚端粒重复序列、多拷贝基因家族、一些结构重排和高阶重复序列。该工具甚至适用于不完美的高阶重复,这可能很难用标准注释工具进行识别。这种方法有一些局限性,因为像着丝粒鉴定这样的功能预测仍然需要用其他工具进行验证。然而,RepeatOBserver可以提供初步的见解,可以作为一个跳板,解决有关基因组结构、功能和进化的问题。像RepeatOBserver这样的基因组可视化工具是及时的,并且可以在重复解析组装越来越多地跨分类群访问的时候加速在困难的基因组区域的发现。科学领域之间理论和方法的交叉授粉可以推动生物学的快速发展。基因组学作为一个领域代表了从多个学科组装的方法的融合。生态学原理在理解由基因组、调控因子及其产物组成的复杂相互作用方面显示出极好的前景(例如,Brookfield 2005)。跨学科的方法、工具和思维的持续整合可能有助于我们利用现有基因组的巨大多样性,对其惊人的复杂性有
{"title":"Seeing the Forest Despite the Trees in Repeat-Rich Genomic Regions","authors":"Amanda M. Larracuente,&nbsp;John S. Sproul","doi":"10.1111/1755-0998.70008","DOIUrl":"10.1111/1755-0998.70008","url":null,"abstract":"<p>Technological advances are producing genome assemblies of increasing quality at steadily decreasing costs. These assemblies enable the extraction of rich biological information from previously inaccessible genomic regions (e.g., repeat-rich regions) and from diverse organisms underrepresented in genomic research. Gaining functional insights from new assemblies often requires generating additional data sets, experimental approaches and complex analysis. Novel analytical methods that substantially shorten the path to biological insights are valuable, particularly if they draw conclusions from the direct analysis of assemblies. In this issue of <i>Molecular Ecology Resources</i>, Elphinstone et al. (<span>2025</span>) present RepeatOBserver—a tool to visualise repeat organisation through direct analysis of chromosome-scale assemblies. This tool facilitates the summary and visualisation of large- and fine-scale patterns of repetitive DNA sequence structure across assemblies. Their approach borrows metrics from information theory, which have found uses in ecology (i.e., the Shannon Diversity Index), to help infer functional regions within repetitive sequences including putative centromeres. Importantly, RepeatOBserver does not require annotations, repeat libraries or functional genomic data—just a high-quality assembly. This type of tool addresses ongoing challenges in mapping the structure and functions of repeat-rich chromosomal regions, which remain the least well-understood components of genomes.</p><p>The availability of chromosome-scale genome assemblies is growing rapidly, as advances in long-read sequencing technology make assembly-based approaches accessible to more taxa. These genome assemblies can reveal important insights into genome biology, biomedicine and biodiversity. Our ability to extract these insights from assemblies is built on decades of hard-won work in early genomic model organisms. For example, early work on gene structure, regulation and evolution provided a knowledge base for ab initio gene prediction from nothing more than a DNA sequence. While annotation tools for non-coding sequences like the abundant repetitive DNAs found in most eukaryotic genomes are now accessible, the methods to extract insights from these regions are less mature. Repetitive DNAs evolve rapidly: their composition, organisation and abundance varies across species (Yunis and Yasmineh <span>1971</span>), making predictions based on sequence conservation difficult. Many insights require functional genomic data (e.g., ChIP-seq, methylation and ATAC-seq), which may be challenging to access in non-model systems. Despite recent progress in resolving repeats in chromosome-scale assemblies, their assembly and annotation remain non-trivial problems (Lower et al. <span>2018</span>).</p><p>Some genome regions with critical functions are enriched in, or entirely composed of, repeated DNA sequences. Centromeres—the essential structures that guide chromosome segr","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144551534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Ecology Resources
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1