首页 > 最新文献

Molecular Ecology Resources最新文献

英文 中文
Genomic and transcriptomic analyses of a social hemipteran provide new insights into insect sociality 对一种社会性半翅目昆虫的基因组和转录组分析为了解昆虫的社会性提供了新的视角
IF 7.7 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-12 DOI: 10.1111/1755-0998.14019
Hui Zhang, Qian Liu, Jianjun Lu, Liying Wu, Zhentao Cheng, Gexia Qiao, Xiaolei Huang
The origin of sociality represents one of the most important evolutionary transitions. Insect sociality evolved in some hemipteran aphids, which can produce soldiers and normal nymphs with distinct morphology and behaviour through parthenogenesis. The lack of genomic data resources has hindered the investigations into molecular mechanisms underlying their social evolution. Herein, we generated the first chromosomal‐level genome of a social hemipteran (Pseudoregma bambucicola) with highly specialized soldiers and performed comparative genomic and transcriptomic analyses to elucidate the molecular signatures and regulatory mechanisms of caste differentiation. P. bambucicola has a larger known aphid genome of 582.2 Mb with an N50 length of 11.24 Mb, and about 99.6% of the assembly was anchored to six chromosomes with a scaffold N50 of 98.27 Mb. A total of 14,027 protein‐coding genes were predicted and 37.33% of the assembly were identified as repeat sequences. The social evolution is accompanied by a variety of changes in genome organization, including expansion of gene families related to transcription factors, transposable elements, as well as species‐specific expansions of certain sugar transporters and UGPases involved in carbohydrate metabolism. We also characterized large candidate gene sets linked to caste differentiation and found evidence of expression regulation and positive selection acting on energy metabolism and muscle structure, explaining the soldier‐specific traits including morphological and behavioural specialization, developmental arrest and infertility. Overall, this study offers new insights into the molecular basis of social aphids and the evolution of insect sociality and also provides valuable data resources for further comparative and functional studies.
社会性的起源是最重要的进化转变之一。昆虫的社会性是在一些半翅目蚜虫中进化而来的,它们可以通过孤雌生殖产生具有不同形态和行为的兵士和正常若虫。基因组数据资源的缺乏阻碍了对其社会性进化的分子机制的研究。在本文中,我们首次生成了具有高度特化兵士的社会性半翅目昆虫(Pseudoregma bambucicola)的染色体级基因组,并进行了基因组和转录组比较分析,以阐明种姓分化的分子特征和调控机制。P. bambucicola的已知蚜虫基因组较大,为582.2 Mb,N50长度为11.24 Mb,约99.6%的组装固定在6条染色体上,支架N50为98.27 Mb。共预测了 14,027 个编码蛋白质的基因,其中 37.33% 的基因被鉴定为重复序列。社会进化伴随着基因组组织的各种变化,包括与转录因子、转座元件有关的基因家族的扩展,以及某些糖转运体和参与碳水化合物代谢的 UGPases 的物种特异性扩展。我们还描述了与种姓分化相关的大型候选基因集的特征,发现了表达调控和正选择作用于能量代谢和肌肉结构的证据,从而解释了士兵的特异性状,包括形态和行为特化、发育停滞和不育。总之,这项研究为社会性蚜虫的分子基础和昆虫社会性的进化提供了新的见解,也为进一步的比较和功能研究提供了宝贵的数据资源。
{"title":"Genomic and transcriptomic analyses of a social hemipteran provide new insights into insect sociality","authors":"Hui Zhang, Qian Liu, Jianjun Lu, Liying Wu, Zhentao Cheng, Gexia Qiao, Xiaolei Huang","doi":"10.1111/1755-0998.14019","DOIUrl":"https://doi.org/10.1111/1755-0998.14019","url":null,"abstract":"The origin of sociality represents one of the most important evolutionary transitions. Insect sociality evolved in some hemipteran aphids, which can produce soldiers and normal nymphs with distinct morphology and behaviour through parthenogenesis. The lack of genomic data resources has hindered the investigations into molecular mechanisms underlying their social evolution. Herein, we generated the first chromosomal‐level genome of a social hemipteran (<jats:italic>Pseudoregma bambucicola</jats:italic>) with highly specialized soldiers and performed comparative genomic and transcriptomic analyses to elucidate the molecular signatures and regulatory mechanisms of caste differentiation. <jats:italic>P. bambucicola</jats:italic> has a larger known aphid genome of 582.2 Mb with an N50 length of 11.24 Mb, and about 99.6% of the assembly was anchored to six chromosomes with a scaffold N50 of 98.27 Mb. A total of 14,027 protein‐coding genes were predicted and 37.33% of the assembly were identified as repeat sequences. The social evolution is accompanied by a variety of changes in genome organization, including expansion of gene families related to transcription factors, transposable elements, as well as species‐specific expansions of certain sugar transporters and UGPases involved in carbohydrate metabolism. We also characterized large candidate gene sets linked to caste differentiation and found evidence of expression regulation and positive selection acting on energy metabolism and muscle structure, explaining the soldier‐specific traits including morphological and behavioural specialization, developmental arrest and infertility. Overall, this study offers new insights into the molecular basis of social aphids and the evolution of insect sociality and also provides valuable data resources for further comparative and functional studies.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":7.7,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whole-genome resequencing improves the utility of otoliths as a critical source of DNA for fish stock research and monitoring. 全基因组重测序提高了耳石作为鱼类种群研究和监测 DNA 重要来源的实用性。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-09-05 DOI: 10.1111/1755-0998.14013
Jilda Alicia Caccavo, Larissa S Arantes, Enrique Celemín, Susan Mbedi, Sarah Sparmann, Camila J Mazzoni

Fish ear bones, known as otoliths, are often collected in fisheries to assist in management, and are a common sample type in museum and national archives. Beyond their utility for ageing, morphological and trace element analysis, otoliths are a repository of valuable genomic information. Previous work has shown that DNA can be extracted from the trace quantities of tissue remaining on the surface of otoliths, despite the fact that they are often stored dry at room temperature. However, much of this work has used reduced representation sequencing methods in clean lab conditions, to achieve adequate yields of DNA, libraries and ultimately single-nucleotide polymorphisms (SNPs). Here, we pioneer the use of small-scale (spike-in) sequencing to screen contemporary otolith samples prepared in regular molecular biology (in contrast to clean) laboratories for contamination and quality levels, submitting for whole-genome resequencing only samples above a defined endogenous DNA threshold. Despite the typically low quality and quantity of DNA extracted from otoliths, we are able to produce whole-genome libraries and ultimately sets of filtered, unlinked and even putatively adaptive SNPs of ample numbers for downstream uses in population, climate and conservation genomics. By comparing with a set of tissue samples from the same species, we are able to highlight the quality and efficacy of otolith samples from DNA extraction and library preparation, to bioinformatic preprocessing and SNP calling. We provide detailed schematics, protocols and scripts of our approach, such that it can be adopted widely by the community, improving the use of otoliths as a source of valuable genomic data.

鱼类耳骨(称为耳石)通常在渔业中收集,以协助管理,也是博物馆和国家档案馆中常见的样本类型。耳石除了用于年龄、形态和微量元素分析外,还是宝贵的基因组信息库。以往的工作表明,尽管耳石通常在室温下干燥保存,但仍可从耳石表面残留的微量组织中提取 DNA。然而,这些工作大多是在洁净的实验室条件下使用代表性降低的测序方法,以获得足够的 DNA 产量、文库和最终的单核苷酸多态性(SNPs)。在这里,我们开创性地使用小规模(spike-in)测序来筛选在常规分子生物学(而非洁净)实验室中制备的当代耳石样本的污染和质量水平,只提交高于定义的内源性DNA阈值的样本进行全基因组重测序。尽管从耳石中提取的 DNA 质量和数量通常都很低,但我们仍能生成全基因组文库,并最终生成筛选过的、无关联的、甚至是推定适应性的 SNPs,这些 SNPs 数量充足,可用于下游的种群、气候和保护基因组学研究。通过与来自同一物种的一组组织样本进行比较,我们能够突出耳石样本从 DNA 提取和文库制备到生物信息预处理和 SNP 调用的质量和功效。我们提供了我们的方法的详细示意图、协议和脚本,以便社区广泛采用,提高耳石作为宝贵基因组数据来源的使用率。
{"title":"Whole-genome resequencing improves the utility of otoliths as a critical source of DNA for fish stock research and monitoring.","authors":"Jilda Alicia Caccavo, Larissa S Arantes, Enrique Celemín, Susan Mbedi, Sarah Sparmann, Camila J Mazzoni","doi":"10.1111/1755-0998.14013","DOIUrl":"https://doi.org/10.1111/1755-0998.14013","url":null,"abstract":"<p><p>Fish ear bones, known as otoliths, are often collected in fisheries to assist in management, and are a common sample type in museum and national archives. Beyond their utility for ageing, morphological and trace element analysis, otoliths are a repository of valuable genomic information. Previous work has shown that DNA can be extracted from the trace quantities of tissue remaining on the surface of otoliths, despite the fact that they are often stored dry at room temperature. However, much of this work has used reduced representation sequencing methods in clean lab conditions, to achieve adequate yields of DNA, libraries and ultimately single-nucleotide polymorphisms (SNPs). Here, we pioneer the use of small-scale (spike-in) sequencing to screen contemporary otolith samples prepared in regular molecular biology (in contrast to clean) laboratories for contamination and quality levels, submitting for whole-genome resequencing only samples above a defined endogenous DNA threshold. Despite the typically low quality and quantity of DNA extracted from otoliths, we are able to produce whole-genome libraries and ultimately sets of filtered, unlinked and even putatively adaptive SNPs of ample numbers for downstream uses in population, climate and conservation genomics. By comparing with a set of tissue samples from the same species, we are able to highlight the quality and efficacy of otolith samples from DNA extraction and library preparation, to bioinformatic preprocessing and SNP calling. We provide detailed schematics, protocols and scripts of our approach, such that it can be adopted widely by the community, improving the use of otoliths as a source of valuable genomic data.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep estimation of the intensity and timing of natural selection from ancient genomes. 从古基因组中深入估算自然选择的强度和时间。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-31 DOI: 10.1111/1755-0998.14015
Guillaume Laval, Etienne Patin, Lluis Quintana-Murci, Gaspard Kerner

Leveraging past allele frequencies has proven to be key for identifying the impact of natural selection across time. However, this approach suffers from imprecise estimations of the intensity (s) and timing (T) of selection, particularly when ancient samples are scarce in specific epochs. Here, we aimed to bypass the computation of allele frequencies across arbitrarily defined past epochs and refine the estimations of selection parameters by implementing convolutional neural networks (CNNs) algorithms that directly use ancient genotypes sampled across time. Using computer simulations, we first show that genotype-based CNNs consistently outperform an approximate Bayesian computation (ABC) approach based on past allele frequency trajectories, regardless of the selection model assumed and the number of available ancient genotypes. When applying this method to empirical data from modern and ancient Europeans, we replicated the reported increased number of selection events in post-Neolithic Europe, independently of the continental subregion studied. Furthermore, we substantially refined the ABC-based estimations of s and T for a set of positively and negatively selected variants, including iconic cases of positive selection and experimentally validated disease-risk variants. Our CNN predictions support a history of recent positive and negative selection targeting variants associated with host defence against pathogens, aligning with previous work that highlights the significant impact of infectious diseases, such as tuberculosis, in Europe. These findings collectively demonstrate that detecting the footprints of natural selection on ancient genomes is crucial for unravelling the history of severe human diseases.

事实证明,利用过去的等位基因频率是识别跨时间自然选择影响的关键。然而,这种方法存在对选择强度(s)和时间(T)估计不精确的问题,尤其是在特定时代的古样本稀缺的情况下。在这里,我们的目标是通过实施卷积神经网络(CNN)算法,直接使用跨时间采样的古代基因型,绕过计算任意定义的过去时代的等位基因频率,并完善选择参数的估计。通过计算机模拟,我们首先证明了基于基因型的 CNN 始终优于基于过去等位基因频率轨迹的近似贝叶斯计算(ABC)方法,而与假设的选择模型和可用的古代基因型数量无关。将这种方法应用于现代和古代欧洲人的经验数据时,我们复制了新石器时代后欧洲选择事件数量增加的报道,与所研究的大陆亚区无关。此外,我们还大大改进了对一系列正选择和负选择变异的基于 ABC 的 s 和 T 估计,其中包括标志性的正选择案例和经实验验证的疾病风险变异。我们的 CNN 预测支持近期针对与宿主防御病原体有关的变体的正选择和负选择的历史,这与以前的工作相一致,以前的工作强调了传染性疾病(如结核病)在欧洲的重大影响。这些发现共同表明,检测古代基因组上自然选择的足迹对于揭示人类严重疾病的历史至关重要。
{"title":"Deep estimation of the intensity and timing of natural selection from ancient genomes.","authors":"Guillaume Laval, Etienne Patin, Lluis Quintana-Murci, Gaspard Kerner","doi":"10.1111/1755-0998.14015","DOIUrl":"https://doi.org/10.1111/1755-0998.14015","url":null,"abstract":"<p><p>Leveraging past allele frequencies has proven to be key for identifying the impact of natural selection across time. However, this approach suffers from imprecise estimations of the intensity (s) and timing (T) of selection, particularly when ancient samples are scarce in specific epochs. Here, we aimed to bypass the computation of allele frequencies across arbitrarily defined past epochs and refine the estimations of selection parameters by implementing convolutional neural networks (CNNs) algorithms that directly use ancient genotypes sampled across time. Using computer simulations, we first show that genotype-based CNNs consistently outperform an approximate Bayesian computation (ABC) approach based on past allele frequency trajectories, regardless of the selection model assumed and the number of available ancient genotypes. When applying this method to empirical data from modern and ancient Europeans, we replicated the reported increased number of selection events in post-Neolithic Europe, independently of the continental subregion studied. Furthermore, we substantially refined the ABC-based estimations of s and T for a set of positively and negatively selected variants, including iconic cases of positive selection and experimentally validated disease-risk variants. Our CNN predictions support a history of recent positive and negative selection targeting variants associated with host defence against pathogens, aligning with previous work that highlights the significant impact of infectious diseases, such as tuberculosis, in Europe. These findings collectively demonstrate that detecting the footprints of natural selection on ancient genomes is crucial for unravelling the history of severe human diseases.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142102710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The accuracy of predicting maladaptation to new environments with genomic data. 利用基因组数据预测新环境适应不良的准确性。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-30 DOI: 10.1111/1755-0998.14008
Brandon M Lind, Katie E Lotterhos

Rapid environmental change poses unprecedented challenges to species persistence. To understand the extent that continued change could have, genomic offset methods have been used to forecast maladaptation of natural populations to future environmental change. However, while their use has become increasingly common, little is known regarding their predictive performance across a wide array of realistic and challenging scenarios. Here, we evaluate the performance of currently available offset methods (gradientForest, the Risk-Of-Non-Adaptedness, redundancy analysis with and without structure correction and LFMM2) using an extensive set of simulated data sets that vary demography, adaptive architecture and the number and spatial patterns of adaptive environments. For each data set, we train models using either all, adaptive or neutral marker sets and evaluate performance using in silico common gardens by correlating known fitness with projected offset. Using over 4,849,600 of such evaluations, we find that (1) method performance is largely due to the degree of local adaptation across the metapopulation (LA), (2) adaptive marker sets provide minimal performance advantages, (3) performance within the species range is variable across gardens and declines when offset models are trained using additional non-adaptive environments and (4) despite (1) performance declines more rapidly in globally novel climates (i.e. a climate without an analogue within the species range) for metapopulations with greater LA than lesser LA. We discuss the implications of these results for management, assisted gene flow and assisted migration.

环境的快速变化给物种的生存带来了前所未有的挑战。为了了解持续变化可能带来的影响,基因组抵消方法被用来预测自然种群对未来环境变化的不适应。然而,虽然基因组偏移方法的使用越来越普遍,但人们对其在各种现实和具有挑战性的情况下的预测性能却知之甚少。在此,我们使用一组广泛的模拟数据集来评估目前可用的抵消方法(梯度森林、非适应性风险、带或不带结构校正的冗余分析以及 LFMM2)的性能,这些数据集改变了人口、适应性结构以及适应性环境的数量和空间模式。对于每个数据集,我们使用全部、适应性或中性标记集来训练模型,并通过将已知适应性与预测偏移相关联,使用硅共同园来评估性能。通过使用超过 484.96 万次这样的评估,我们发现:(1)方法的性能在很大程度上取决于整个元种群(LA)的局部适应程度;(2)适应性标记集带来的性能优势微乎其微;(3)在物种范围内,不同花园的性能是不同的,当使用额外的非适应性环境训练偏移模型时,性能会下降;(4)尽管有(1),但在全球新气候(即物种范围内没有类似气候)中,LA 较高的元种群比 LA 较低的元种群的性能下降得更快。我们将讨论这些结果对管理、辅助基因流和辅助迁移的影响。
{"title":"The accuracy of predicting maladaptation to new environments with genomic data.","authors":"Brandon M Lind, Katie E Lotterhos","doi":"10.1111/1755-0998.14008","DOIUrl":"https://doi.org/10.1111/1755-0998.14008","url":null,"abstract":"<p><p>Rapid environmental change poses unprecedented challenges to species persistence. To understand the extent that continued change could have, genomic offset methods have been used to forecast maladaptation of natural populations to future environmental change. However, while their use has become increasingly common, little is known regarding their predictive performance across a wide array of realistic and challenging scenarios. Here, we evaluate the performance of currently available offset methods (gradientForest, the Risk-Of-Non-Adaptedness, redundancy analysis with and without structure correction and LFMM2) using an extensive set of simulated data sets that vary demography, adaptive architecture and the number and spatial patterns of adaptive environments. For each data set, we train models using either all, adaptive or neutral marker sets and evaluate performance using in silico common gardens by correlating known fitness with projected offset. Using over 4,849,600 of such evaluations, we find that (1) method performance is largely due to the degree of local adaptation across the metapopulation (LA), (2) adaptive marker sets provide minimal performance advantages, (3) performance within the species range is variable across gardens and declines when offset models are trained using additional non-adaptive environments and (4) despite (1) performance declines more rapidly in globally novel climates (i.e. a climate without an analogue within the species range) for metapopulations with greater LA than lesser LA. We discuss the implications of these results for management, assisted gene flow and assisted migration.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142102711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taxonomic and abundance biases affect the record of marine eukaryotic plankton communities in sediment DNA archives. 分类和丰度偏差会影响沉积物 DNA 档案中海洋真核浮游生物群落的记录。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-26 DOI: 10.1111/1755-0998.14014
Ngoc-Loi Nguyen, Joanna Pawłowska, Marek Zajaczkowski, Agnes K M Weiner, Tristan Cordier, Danielle M Grant, Stijn De Schepper, Jan Pawłowski

Environmental DNA (eDNA) preserved in marine sediments is increasingly being used to study past ecosystems. However, little is known about how accurately marine biodiversity is recorded in sediment eDNA archives, especially planktonic taxa. Here, we address this question by comparing eukaryotic diversity in 273 eDNA samples from three water depths and the surface sediments of 24 stations in the Nordic Seas. Analysis of 18S-V9 metabarcoding data reveals distinct eukaryotic assemblages between water and sediment eDNA. Only 40% of Amplicon Sequence Variants (ASVs) detected in water were also found in sediment eDNA. Remarkably, the ASVs shared between water and sediment accounted for 80% of total sequence reads suggesting that a large amount of plankton DNA is transported to the seafloor, predominantly from abundant phytoplankton taxa. However, not all plankton taxa were equally archived on the seafloor. The plankton DNA deposited in the sediments was dominated by diatoms and showed an underrepresentation of certain nano- and picoplankton taxa (Picozoa or Prymnesiophyceae). Our study offers the first insights into the patterns of plankton diversity recorded in sediment in relation to seasonality and spatial variability of environmental conditions in the Nordic Seas. Our results suggest that the genetic composition and structure of the plankton community vary considerably throughout the water column and differ from what accumulates in the sediment. Hence, the interpretation of sedimentary eDNA archives should take into account potential taxonomic and abundance biases when reconstructing past changes in marine biodiversity.

保存在海洋沉积物中的环境 DNA(eDNA)越来越多地被用于研究过去的生态系统。然而,人们对沉积物 eDNA 档案中记录的海洋生物多样性(尤其是浮游生物类群)的准确性知之甚少。在这里,我们通过比较来自北欧海洋三个水深和 24 个站点表层沉积物的 273 个 eDNA 样品中的真核生物多样性来解决这个问题。对 18S-V9 代谢编码数据的分析表明,水体和沉积物 eDNA 中的真核生物群落截然不同。在水中检测到的扩增子序列变异(ASVs)中,只有 40% 在沉积物 eDNA 中也被发现。值得注意的是,水中和沉积物中共享的 ASV 占总序列读数的 80%,这表明大量浮游生物 DNA 被迁移到了海底,主要来自丰富的浮游植物类群。然而,并非所有浮游生物类群都在海底存档。沉积在沉积物中的浮游生物 DNA 以硅藻为主,而某些纳米和微微浮游生物类群(微微浮游生物或微微浮游植物)所占比例较低。我们的研究首次揭示了沉积物中记录的浮游生物多样性模式与北欧海洋环境条件的季节性和空间变化的关系。我们的研究结果表明,浮游生物群落的遗传组成和结构在整个水体中差异很大,与沉积物中的情况不同。因此,在解读沉积物 eDNA 档案时,应考虑到重建过去海洋生物多样性变化时可能出现的分类和丰度偏差。
{"title":"Taxonomic and abundance biases affect the record of marine eukaryotic plankton communities in sediment DNA archives.","authors":"Ngoc-Loi Nguyen, Joanna Pawłowska, Marek Zajaczkowski, Agnes K M Weiner, Tristan Cordier, Danielle M Grant, Stijn De Schepper, Jan Pawłowski","doi":"10.1111/1755-0998.14014","DOIUrl":"https://doi.org/10.1111/1755-0998.14014","url":null,"abstract":"<p><p>Environmental DNA (eDNA) preserved in marine sediments is increasingly being used to study past ecosystems. However, little is known about how accurately marine biodiversity is recorded in sediment eDNA archives, especially planktonic taxa. Here, we address this question by comparing eukaryotic diversity in 273 eDNA samples from three water depths and the surface sediments of 24 stations in the Nordic Seas. Analysis of 18S-V9 metabarcoding data reveals distinct eukaryotic assemblages between water and sediment eDNA. Only 40% of Amplicon Sequence Variants (ASVs) detected in water were also found in sediment eDNA. Remarkably, the ASVs shared between water and sediment accounted for 80% of total sequence reads suggesting that a large amount of plankton DNA is transported to the seafloor, predominantly from abundant phytoplankton taxa. However, not all plankton taxa were equally archived on the seafloor. The plankton DNA deposited in the sediments was dominated by diatoms and showed an underrepresentation of certain nano- and picoplankton taxa (Picozoa or Prymnesiophyceae). Our study offers the first insights into the patterns of plankton diversity recorded in sediment in relation to seasonality and spatial variability of environmental conditions in the Nordic Seas. Our results suggest that the genetic composition and structure of the plankton community vary considerably throughout the water column and differ from what accumulates in the sediment. Hence, the interpretation of sedimentary eDNA archives should take into account potential taxonomic and abundance biases when reconstructing past changes in marine biodiversity.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The answer, my friend, is blowin' in the wind: Blow sampling provides a new dimension to whale population monitoring. 我的朋友,答案就是 "随风飘扬":吹风取样为鲸鱼种群监测提供了一个新的维度。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-26 DOI: 10.1111/1755-0998.14012
Elena Valsecchi
{"title":"The answer, my friend, is blowin' in the wind: Blow sampling provides a new dimension to whale population monitoring.","authors":"Elena Valsecchi","doi":"10.1111/1755-0998.14012","DOIUrl":"https://doi.org/10.1111/1755-0998.14012","url":null,"abstract":"","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stability of environmental DNA methylation and its utility in tracing spawning in fish. 环境 DNA 甲基化的稳定性及其在追踪鱼类产卵方面的作用。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-19 DOI: 10.1111/1755-0998.14011
Itsuki T Hirayama, Luhan Wu, Toshifumi Minamoto

The use of environmental DNA (eDNA) is becoming prevalent as a novel method of ecological monitoring. Although eDNA can provide critical information on the distribution and biomass of particular taxa, the DNA sequences of an organism remain unaltered throughout its existence, which complicates the accurate identification of crucial events, including spawning. Therefore, we examined DNA methylation as a novel source of information from eDNA, considering that the methylation patterns in eggs and sperm released during spawning differ from those of somatic tissues. Despite its potential applications, little is known about eDNA methylation, including its stability and methods for detection and quantification. Therefore, we conducted tank experiments and performed methylation analysis targeting 18S rDNA through bisulphite amplicon sequencing. In the target region, eDNA methylation was not affected by degradation and was equivalent to the methylation rate of genomic DNA from somatic tissues. Unmethylated DNA, abundant in the ovaries, was detected in the eDNA released during fish spawning. These results indicate that eDNA methylation is a stable signal reflecting targeted gene methylation and further demonstrate that germ cell-specific methylation patterns can be used as markers for detecting fish spawning.

作为一种新的生态监测方法,环境 DNA(eDNA)的使用正变得越来越普遍。虽然 eDNA 可以提供有关特定类群的分布和生物量的重要信息,但生物的 DNA 序列在其存在期间不会改变,这使得准确识别包括产卵在内的关键事件变得复杂。因此,考虑到产卵过程中卵子和精子释放的甲基化模式与体细胞组织不同,我们研究了 DNA 甲基化作为 eDNA 信息的新来源。尽管 eDNA 甲基化具有潜在的应用价值,但人们对其知之甚少,包括其稳定性以及检测和量化方法。因此,我们进行了水槽实验,并通过亚硫酸氢盐扩增片段测序对 18S rDNA 进行了甲基化分析。在目标区域,eDNA 甲基化不受降解的影响,与体细胞组织基因组 DNA 的甲基化率相当。在鱼类产卵时释放的 eDNA 中检测到了卵巢中大量的未甲基化 DNA。这些结果表明,eDNA甲基化是反映目标基因甲基化的稳定信号,并进一步证明生殖细胞特异性甲基化模式可用作检测鱼类产卵的标记。
{"title":"Stability of environmental DNA methylation and its utility in tracing spawning in fish.","authors":"Itsuki T Hirayama, Luhan Wu, Toshifumi Minamoto","doi":"10.1111/1755-0998.14011","DOIUrl":"https://doi.org/10.1111/1755-0998.14011","url":null,"abstract":"<p><p>The use of environmental DNA (eDNA) is becoming prevalent as a novel method of ecological monitoring. Although eDNA can provide critical information on the distribution and biomass of particular taxa, the DNA sequences of an organism remain unaltered throughout its existence, which complicates the accurate identification of crucial events, including spawning. Therefore, we examined DNA methylation as a novel source of information from eDNA, considering that the methylation patterns in eggs and sperm released during spawning differ from those of somatic tissues. Despite its potential applications, little is known about eDNA methylation, including its stability and methods for detection and quantification. Therefore, we conducted tank experiments and performed methylation analysis targeting 18S rDNA through bisulphite amplicon sequencing. In the target region, eDNA methylation was not affected by degradation and was equivalent to the methylation rate of genomic DNA from somatic tissues. Unmethylated DNA, abundant in the ovaries, was detected in the eDNA released during fish spawning. These results indicate that eDNA methylation is a stable signal reflecting targeted gene methylation and further demonstrate that germ cell-specific methylation patterns can be used as markers for detecting fish spawning.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera). 代表 6 个目(昆虫纲:鞘翅目、双翅目、半翅目、膜翅目、鳞翅目、神经翅目)的 9 个非模式北美昆虫物种的高质量基因组组装。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-18 DOI: 10.1111/1755-0998.14010
Kimberly K O Walden, Yanghui Cao, Christopher J Fields, Alvaro G Hernandez, Gloria A Rendon, Gene E Robinson, Rachel K Skinner, Jeffrey A Stein, Christopher H Dietrich

Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281-72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3-5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.

利用野外采集的标本从原产于美国伊利诺伊州中部草原和热带草原的共 10 种昆虫中获得了 9 个高质量的基因组组装:Mellilla xanthometata(鳞翅目:尺蠖科)、Stenolophus ochropezus(鞘翅目:螨科)、Forcipata loca(半翅目:蝉科)、Coelinius sp.(Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera. Achilidae):Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae).尽管 DNA 产量极低(13,000 个,每个组装的最长支架从 ~23 到 439 Mb 不等,但从单个标本中成功制备了测序文库。基因组的完整性很高,BUSCO评分从最大基因组(Stictocephala bisonia)的85.5%到最小基因组(Coelinius sp.)的98.8%不等。使用 RepeatMasker 和 GenomeScope2 对唯一性含量进行了估计,其范围为 50.7% 到 75.8%,随着基因组大小的增加,唯一性含量大致下降。结构注释预测测序物种的蛋白质模型为 19,281-72,469 个。当时每个基因组的测序成本在 3-5 千美元之间,在高性能集群上平均约为 1600 个 CPU 小时,使用 PacBio HiFi 数据对样本进行生物信息学分析约需 14 个小时。通过进一步的手工整理,纠正 Omni-C 接触图中偏离对角线或耗尽的信号所显示的可能的支架错接和易位,大多数组装结果都将从中受益。
{"title":"High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera).","authors":"Kimberly K O Walden, Yanghui Cao, Christopher J Fields, Alvaro G Hernandez, Gloria A Rendon, Gene E Robinson, Rachel K Skinner, Jeffrey A Stein, Christopher H Dietrich","doi":"10.1111/1755-0998.14010","DOIUrl":"https://doi.org/10.1111/1755-0998.14010","url":null,"abstract":"<p><p>Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281-72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3-5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of spatial demographic maps from polymorphism data using a neural network 利用神经网络从多态性数据估算空间人口分布图。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-16 DOI: 10.1111/1755-0998.14005
Chris C. R. Smith, Gilia Patterson, Peter L. Ralph, Andrew D. Kern

A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and barriers to dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity-by-descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.

群体遗传学的一个基本目标是了解变异是如何在自然景观中排列的。根据第一原理,我们知道异质性种群密度和扩散障碍等共同特征应该会影响空间的遗传变异,但目前能处理这些无处不在的复杂性的工具却很少。以地理位置为参照的单核苷酸多态性(SNP)数据越来越容易获取,这为研究无数物种跨地理空间的遗传变异提供了机会。我们提出了一种新的推断方法,利用地理参照 SNP 和深度神经网络来估计种群密度和扩散率的空间异质性图谱。我们的神经网络对模拟输入和输出配对进行训练,其中输入由连续空间种群遗传模拟器生成的基因型和采样位置组成,输出则是真实人口统计参数图。我们将我们的工具与现有的方法进行比较,并讨论不同方法之间的本质区别;特别是,我们的程序是独一无二的,因为它可以推断出扩散和密度的大小以及它们在地形上的变化,而且是通过 SNP 数据实现的。类似的方法只能估算相对迁移率,或者需要输入同源区块。我们将我们的工具应用于北美灰狼的经验数据中,它估算出的人口统计参数基本合理,但受到空间采样不完整的影响。像我们这样基于遗传学的方法是对其他估算过去和现在人口统计的直接方法的补充,我们相信它将成为保护、生态学和进化生物学应用的宝贵工具。实现我们方法的开源软件包可从 https://github.com/kr-colab/mapNN 获取。
{"title":"Estimation of spatial demographic maps from polymorphism data using a neural network","authors":"Chris C. R. Smith,&nbsp;Gilia Patterson,&nbsp;Peter L. Ralph,&nbsp;Andrew D. Kern","doi":"10.1111/1755-0998.14005","DOIUrl":"10.1111/1755-0998.14005","url":null,"abstract":"<p>A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and barriers to dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity-by-descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MycoAI: Fast and accurate taxonomic classification for fungal ITS sequences. MycoAI:对真菌 ITS 序列进行快速准确的分类。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2024-08-16 DOI: 10.1111/1755-0998.14006
Luuk Romeijn, Andrius Bernatavicius, Duong Vu

Efficient and accurate classification of DNA barcode data is crucial for large-scale fungal biodiversity studies. However, existing methods are either computationally expensive or lack accuracy. Previous research has demonstrated the potential of deep learning in this domain, successfully training neural networks for biological sequence classification. We introduce the MycoAI Python package, featuring various deep learning models such as BERT and CNN tailored for fungal Internal Transcribed Spacer (ITS) sequences. We explore different neural architecture designs and encoding methods to identify optimal models. By employing a multi-head output architecture and multi-level hierarchical label smoothing, MycoAI effectively generalizes across the taxonomic hierarchy. Using over 5 million labelled sequences from the UNITE database, we develop two models: MycoAI-BERT and MycoAI-CNN. While we emphasize the necessity of verifying classification results by AI models due to insufficient reference data, MycoAI still exhibits substantial potential. When benchmarked against existing classifiers such as DNABarcoder and RDP on two independent test sets with labels present in the training dataset, MycoAI models demonstrate high accuracy at the genus and higher taxonomic levels, with MycoAI-CNN being the fastest and most accurate. In terms of efficiency, MycoAI models can classify over 300,000 sequences within 5 min. We publicly release the MycoAI models, enabling mycologists to classify their ITS barcode data efficiently. Additionally, MycoAI serves as a platform for developing further deep learning-based classification methods. The source code for MycoAI is available under the MIT Licence at https://github.com/MycoAI/MycoAI.

对 DNA 条形码数据进行高效准确的分类对大规模真菌生物多样性研究至关重要。然而,现有的方法要么计算成本高昂,要么缺乏准确性。以前的研究已经证明了深度学习在这一领域的潜力,成功地训练了用于生物序列分类的神经网络。我们介绍了 MycoAI Python 软件包,其中包含各种深度学习模型,如针对真菌内部转录间隔(ITS)序列定制的 BERT 和 CNN。我们探索了不同的神经架构设计和编码方法,以确定最佳模型。通过采用多头输出架构和多级分层标签平滑,MycoAI 可以有效地在分类学层次中进行泛化。我们利用 UNITE 数据库中超过 500 万个标记序列,开发了两个模型:MycoAI-BERT 和 MycoAI-CNN。我们强调,由于参考数据不足,必须对人工智能模型的分类结果进行验证,但 MycoAI 仍然表现出巨大的潜力。与现有的分类器(如 DNABarcoder 和 RDP)相比,MycoAI 模型在两个独立的测试集(训练数据集中存在标签)上表现出较高的属和更高分类级别的准确性,其中 MycoAI-CNN 的速度最快、准确性最高。就效率而言,MycoAI 模型可在 5 分钟内对 30 多万个序列进行分类。我们公开发布了 MycoAI 模型,使真菌学家能够高效地对其 ITS 条形码数据进行分类。此外,MycoAI 还是进一步开发基于深度学习的分类方法的平台。MycoAI 的源代码在 MIT 许可下可在 https://github.com/MycoAI/MycoAI 网站上获取。
{"title":"MycoAI: Fast and accurate taxonomic classification for fungal ITS sequences.","authors":"Luuk Romeijn, Andrius Bernatavicius, Duong Vu","doi":"10.1111/1755-0998.14006","DOIUrl":"https://doi.org/10.1111/1755-0998.14006","url":null,"abstract":"<p><p>Efficient and accurate classification of DNA barcode data is crucial for large-scale fungal biodiversity studies. However, existing methods are either computationally expensive or lack accuracy. Previous research has demonstrated the potential of deep learning in this domain, successfully training neural networks for biological sequence classification. We introduce the MycoAI Python package, featuring various deep learning models such as BERT and CNN tailored for fungal Internal Transcribed Spacer (ITS) sequences. We explore different neural architecture designs and encoding methods to identify optimal models. By employing a multi-head output architecture and multi-level hierarchical label smoothing, MycoAI effectively generalizes across the taxonomic hierarchy. Using over 5 million labelled sequences from the UNITE database, we develop two models: MycoAI-BERT and MycoAI-CNN. While we emphasize the necessity of verifying classification results by AI models due to insufficient reference data, MycoAI still exhibits substantial potential. When benchmarked against existing classifiers such as DNABarcoder and RDP on two independent test sets with labels present in the training dataset, MycoAI models demonstrate high accuracy at the genus and higher taxonomic levels, with MycoAI-CNN being the fastest and most accurate. In terms of efficiency, MycoAI models can classify over 300,000 sequences within 5 min. We publicly release the MycoAI models, enabling mycologists to classify their ITS barcode data efficiently. Additionally, MycoAI serves as a platform for developing further deep learning-based classification methods. The source code for MycoAI is available under the MIT Licence at https://github.com/MycoAI/MycoAI.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Ecology Resources
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1