Arabica coffee (Coffea arabica) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in C. arabica but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.
Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.
DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.
Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. 2024). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. 2015), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. 2024). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.
Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith 2021). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. 2013). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. 2021; Prosser et al. 2016), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to
Rapid DNA/eDNA-based ID tools, which detect specific genetic patterns without requiring sequencing, are essential for biodiversity and wildlife trade monitoring, particularly for species of conservation concern. However, the practical application of these methods remains limited by the availability of standardised protocols, accessibility of resources, and coverage across diverse taxa. This challenge is especially pronounced for Chondrichthyes, a group heavily overexploited due to fishing and illegal trade, and with data scarcity for conservation assessments. Despite their ecological and economic importance, many species lack reference sequences in databases, as well as other molecular data and tools, hindering the development of molecular tools for species identification and trade regulation. This review synthesises the current state of rapid DNA/eDNA-based ID tools for the detection of chondrichthyan species, including established and emerging methods. It also compiles available taxon-specific primers to facilitate efficient species identification and recommends the most suitable methods. We identify key gaps in taxonomic and geographic coverage, emphasising the need for further research to expand these tools to under-represented species and regions. Additionally, we highlight the importance of integrating genetic approaches into enforcement frameworks to enhance conservation strategies and regulatory compliance. By providing an accessible reference for time- and cost-effective genetic monitoring, this work will support evidence-based decision-making and improve the practical application of rapid DNA/eDNA-based ID tools in the conservation and management of Chondrichthyes species worldwide.
Advances in DNA sequencing technology have stimulated the rapid uptake of protocols—such as eDNA analysis and metabarcoding—that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, ‘BOLDistilled’ libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.
Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous class II transposable elements prevalent in eukaryotic genomes, contributing to various genomic and genic functions in plants. However, research on MITEs mainly targets a few species, limiting a comprehensive understanding and systematic comparison of MITEs in plants. Here, we developed a highly sensitive MITE annotation pipeline with a low false positive rate and applied it to 207 high-quality plant genomes. We found over a 20,000-fold variation in MITE copy numbers among species. The Mutator superfamily accounted for 41.5% of MITEs, whereas the Tc1/Mariner and PIF/Harbinger superfamilies expanded rapidly in monocots, particularly in Poaceae. Insertion time analysis revealed a general pattern of a single amplification wave, with initial insertions occurring around 30 million years ago (Mya) and peaking at 0–9 Mya. In addition, some species exhibited evidence of another ancient, slower expansion phase. In three representative families, we identified many more species-specific MITE loci than shared MITE loci, underscoring MITEs' significant role in genome diversity. Phylogenomic analyses indicate that MITEs accumulated gradually and specifically during speciation, primarily through recent insertions rather than the retention of ancient elements. MITEs preferentially insert near genes and are often associated with enhanced gene expression. Furthermore, we identified 985 MITE-derived miRNAs from 392 families across 56 species, mainly from Mutator, Tc1/Mariner, and PIF/Harbinger, targeting a variety of gene functions. This study enhances our understanding of the evolution and functional roles of MITEs in plants and provides a basis for exploring their function in further research.
Global efforts to standardise methodologies benefit greatly from open-source procedures that enable the generation of comparable data. Here, we present a modular, high-throughput nucleic acid extraction protocol standardised within the Earth Hologenome Initiative to generate both genomic and microbial metagenomic data from faecal samples of vertebrates. The procedure enables the purification of either RNA and DNA in separate fractions (DREX1) or as total nucleic acids (DREX2). We demonstrate their effectiveness across faecal samples from amphibians, reptiles and mammals, with reduced performance observed on bird guano. Despite some variation in laboratory performance metrics, both DREX1 and DREX2 yielded highly similar microbial community profiles, as well as comparable depth and breadth of host genome coverages. Benchmarking against a commercial kit widely used in microbiome research showed comparable recovery of host genomic data and microbial community complexity. Our open-source method offers a robust, cost-effective, scalable and automation-friendly nucleic acid extraction procedure to generate high-quality hologenomic data across vertebrate taxa. The method enhances research comparability and reproducibility by providing standardised, high-throughput, open-access protocols with fully transparent reagents. It is designed to integrate automatised pipelines, and its modular structure also supports continuous development and improvement.

