{"title":"A nuclear phylogenomic tree of grasses (Poaceae) recovers current classification despite gene tree incongruence","authors":"","doi":"10.1111/nph.20263","DOIUrl":null,"url":null,"abstract":"<h2> Introduction</h2>\n<p>With almost 11 800 species in 791 genera (Soreng <i>et al</i>., <span>2022</span>), grasses (Poaceae) are among the largest plant families and one of the most important for humans. Grasses include the primary food crops rice, maize and wheat, sources of fibre and building materials such as reed and bamboo, and biofuel crops such as sugarcane and switchgrass. Much of the global land surface is covered by grass-dominated ecosystems, where grasses impact productivity, nutrient cycling and vegetation structure by mediating fire and herbivory (Edwards <i>et al</i>., <span>2010</span>; Bond, <span>2016</span>). Grasses are also overrepresented among the world's most damaging agricultural weeds (Holm <i>et al</i>., <span>1977</span>) and invasive plants (Linder <i>et al</i>., <span>2018</span>). Understanding functional diversification, adaptation and novel crop breeding in this important plant group requires a solid understanding of its evolutionary relationships.</p>\n<p>Efforts to uncover the phylogenetic history of grasses have tracked the development of new technology and analytical tools, beginning with cladistic analysis of morphology (e.g. Campbell & Kellogg, <span>1987</span>). Almost as soon as nucleotide sequencing became possible, it was used to investigate grasses (rRNA sequencing, Hamby & Zimmer, <span>1988</span>, and chloroplast DNA, Clark <i>et al</i>., <span>1995</span>), and the results interpreted in the light of known morphology and classification. Hundreds of papers have been published since using nucleic acids, most recently DNA, to assess grass phylogeny at all taxonomic levels and assembling information from all three genomes in the cell (plastid, mitochondrial, and nuclear). These efforts have been punctuated by two major phylogenetic analyses, Grass Phylogeny Working Group I (GPWG, <span>2001</span>) and GPWG II (<span>2012</span>), and family-wide classifications (Kellogg, <span>2015</span>; Soreng <i>et al</i>., <span>2022</span>) were enabled by these and many other detailed phylogenetic analyses.</p>\n<p>The major outlines of grass phylogeny have now been known for several decades and corroborated by accumulating data, with major lineages recognised as subfamilies (Kellogg, <span>2015</span>; Soreng <i>et al</i>., <span>2022</span>). The earliest divergences in the grass family gave rise to three successive lineages, Anomochlooideae, Pharoideae, and Puelioideae, each comprising just a few species. After the divergence of those three, however, the remaining grasses gave rise to two sister lineages, known as BOP and PACMAD, each of which became a species-rich clade with several robust subclades. This sturdy phylogenetic framework is reflected in a strong subfamilial classification, with subfamilies divided into equally robust tribes. Attention in recent years has largely shifted to relationships of tribes, subtribes, and genera.</p>\n<p>Reticulate evolution is common in the grasses. Allopolyploidy is widespread in the family, particularly among closely related species and genera, with as many as 80% of species estimated to be of recent polyploid origin (Stebbins, <span>1985</span>). The textbook example is bread wheat (<i>Triticum aestivum</i>) and its ruderal annual ancestors, the history of which was determined in the first part of the 20<sup>th</sup> century using cytogenetic tools (Kihara, <span>1982</span>; Tsunewaki, <span>2018</span>). Nucleotide sequence data have verified the hybrid origin of wheat and gone on to show that reticulate evolution is the norm in the entire tribe Triticeae (Feldman & Levy, <span>2023</span>; Mason-Gamer & White, <span>2024</span>). We have also learned that three of the four major clades of Bambusoideae are of allopolyploid origin (Triplett <i>et al</i>., <span>2014</span>; Guo <i>et al</i>., <span>2019</span>; Chalopin <i>et al</i>., <span>2021</span>; Ma <i>et al</i>., <span>2024</span>), as are at least one third of the species in Andropogoneae (Estep <i>et al</i>., <span>2014</span>). Large-scale lateral gene transfer has also been demonstrated in <i>Alloteropsis semialata</i> (Dunning <i>et al</i>., <span>2019</span>) and for a number of genomes across the family (Hibdige <i>et al</i>., <span>2021</span>), although it remains unclear how common such genetic exchanges are. Network-like reticulations are therefore expected throughout Poaceae.</p>\n<p>Data relevant to grass phylogeny continue to accumulate in the genomic era, but in an uneven pattern. Major recent studies have inferred family trees based on the plastid genome (Saarela <i>et al</i>., <span>2018</span>; Gallaher <i>et al</i>., <span>2022</span>; Hu <i>et al</i>., <span>2023</span>) or large parts of the nuclear genome (Huang <i>et al</i>., <span>2022</span>). In addition, a wealth of full-genome assemblies is now available for grasses, mainly for groups that have been studied intensively, such as major crops and their congeners including rice (Wang & Han, <span>2022</span>), maize (Hufford <i>et al</i>., <span>2021</span>), wheat (Walkowiak <i>et al</i>., <span>2020</span>) and sugarcane (Healey <i>et al</i>., <span>2024</span>), among many others. At the same time, some genera and many species remain virtually unknown beyond a scientific name and general morphology. While the poorly known taxa may be represented in major herbaria, fresh material can be hard to obtain, weakening attempts to fully sample the grass tree of life with phylogenomic technologies.</p>\n<p>Fortunately, we are now experiencing the confluence of: (1) global sources of diversity data including plant specimens held in herbaria world-wide, (2) widespread use of short-read sequencing that can accommodate even fragmented DNA, (3) analytical tools for assembling and interpreting massive amounts of sequence data, and (4) technical tools for efficient sequencing, such as target capture. For example, the development of a universal probe set for flowering plants, Angiosperms353 (Johnson <i>et al</i>., <span>2019</span>; Baker <i>et al</i>., <span>2021</span>), has enabled initiatives to sequence all angiosperm plant genera (Baker <i>et al</i>., <span>2022</span>; Zuntini <i>et al</i>., <span>2024</span>) or entire continental floras such as that of Australia (https://www.genomicsforaustralianplants.com/). It became apparent that an updated synthesis of existing and new data for grasses, similar to the previous Grass Phylogeny Working Group efforts (GPWG, <span>2001</span>; GPWG II, <span>2012</span>), would be timely and make possible a phylogeny that incorporates representatives of most of the 791 genera of the family using genome-scale data. In the process, we will gain a broader assessment of congruence among nuclear gene histories, including insights on the frequency and impact of incomplete lineage sorting (ILS) and reticulation.</p>\n<p>Accordingly, here we present the most comprehensive nuclear phylogenomic tree of the grass family to date. Via a large community effort, we maximised taxon sampling by combining whole-genome, transcriptome, target capture and shotgun datasets. Based on the Angiosperms353 gene set, we inferred a nuclear multigene species tree using a coalescent-based method that accounts for incongruence due to ILS and uses information from multicopy gene trees. We also inferred a plastome tree and tested for incongruence between plastome and nuclear trees. Finally, we used gene tree–species tree reconciliation analyses to explore the signal for reticulation in the nuclear data.</p>","PeriodicalId":214,"journal":{"name":"New Phytologist","volume":"8 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Phytologist","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/nph.20263","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
With almost 11 800 species in 791 genera (Soreng et al., 2022), grasses (Poaceae) are among the largest plant families and one of the most important for humans. Grasses include the primary food crops rice, maize and wheat, sources of fibre and building materials such as reed and bamboo, and biofuel crops such as sugarcane and switchgrass. Much of the global land surface is covered by grass-dominated ecosystems, where grasses impact productivity, nutrient cycling and vegetation structure by mediating fire and herbivory (Edwards et al., 2010; Bond, 2016). Grasses are also overrepresented among the world's most damaging agricultural weeds (Holm et al., 1977) and invasive plants (Linder et al., 2018). Understanding functional diversification, adaptation and novel crop breeding in this important plant group requires a solid understanding of its evolutionary relationships.
Efforts to uncover the phylogenetic history of grasses have tracked the development of new technology and analytical tools, beginning with cladistic analysis of morphology (e.g. Campbell & Kellogg, 1987). Almost as soon as nucleotide sequencing became possible, it was used to investigate grasses (rRNA sequencing, Hamby & Zimmer, 1988, and chloroplast DNA, Clark et al., 1995), and the results interpreted in the light of known morphology and classification. Hundreds of papers have been published since using nucleic acids, most recently DNA, to assess grass phylogeny at all taxonomic levels and assembling information from all three genomes in the cell (plastid, mitochondrial, and nuclear). These efforts have been punctuated by two major phylogenetic analyses, Grass Phylogeny Working Group I (GPWG, 2001) and GPWG II (2012), and family-wide classifications (Kellogg, 2015; Soreng et al., 2022) were enabled by these and many other detailed phylogenetic analyses.
The major outlines of grass phylogeny have now been known for several decades and corroborated by accumulating data, with major lineages recognised as subfamilies (Kellogg, 2015; Soreng et al., 2022). The earliest divergences in the grass family gave rise to three successive lineages, Anomochlooideae, Pharoideae, and Puelioideae, each comprising just a few species. After the divergence of those three, however, the remaining grasses gave rise to two sister lineages, known as BOP and PACMAD, each of which became a species-rich clade with several robust subclades. This sturdy phylogenetic framework is reflected in a strong subfamilial classification, with subfamilies divided into equally robust tribes. Attention in recent years has largely shifted to relationships of tribes, subtribes, and genera.
Reticulate evolution is common in the grasses. Allopolyploidy is widespread in the family, particularly among closely related species and genera, with as many as 80% of species estimated to be of recent polyploid origin (Stebbins, 1985). The textbook example is bread wheat (Triticum aestivum) and its ruderal annual ancestors, the history of which was determined in the first part of the 20th century using cytogenetic tools (Kihara, 1982; Tsunewaki, 2018). Nucleotide sequence data have verified the hybrid origin of wheat and gone on to show that reticulate evolution is the norm in the entire tribe Triticeae (Feldman & Levy, 2023; Mason-Gamer & White, 2024). We have also learned that three of the four major clades of Bambusoideae are of allopolyploid origin (Triplett et al., 2014; Guo et al., 2019; Chalopin et al., 2021; Ma et al., 2024), as are at least one third of the species in Andropogoneae (Estep et al., 2014). Large-scale lateral gene transfer has also been demonstrated in Alloteropsis semialata (Dunning et al., 2019) and for a number of genomes across the family (Hibdige et al., 2021), although it remains unclear how common such genetic exchanges are. Network-like reticulations are therefore expected throughout Poaceae.
Data relevant to grass phylogeny continue to accumulate in the genomic era, but in an uneven pattern. Major recent studies have inferred family trees based on the plastid genome (Saarela et al., 2018; Gallaher et al., 2022; Hu et al., 2023) or large parts of the nuclear genome (Huang et al., 2022). In addition, a wealth of full-genome assemblies is now available for grasses, mainly for groups that have been studied intensively, such as major crops and their congeners including rice (Wang & Han, 2022), maize (Hufford et al., 2021), wheat (Walkowiak et al., 2020) and sugarcane (Healey et al., 2024), among many others. At the same time, some genera and many species remain virtually unknown beyond a scientific name and general morphology. While the poorly known taxa may be represented in major herbaria, fresh material can be hard to obtain, weakening attempts to fully sample the grass tree of life with phylogenomic technologies.
Fortunately, we are now experiencing the confluence of: (1) global sources of diversity data including plant specimens held in herbaria world-wide, (2) widespread use of short-read sequencing that can accommodate even fragmented DNA, (3) analytical tools for assembling and interpreting massive amounts of sequence data, and (4) technical tools for efficient sequencing, such as target capture. For example, the development of a universal probe set for flowering plants, Angiosperms353 (Johnson et al., 2019; Baker et al., 2021), has enabled initiatives to sequence all angiosperm plant genera (Baker et al., 2022; Zuntini et al., 2024) or entire continental floras such as that of Australia (https://www.genomicsforaustralianplants.com/). It became apparent that an updated synthesis of existing and new data for grasses, similar to the previous Grass Phylogeny Working Group efforts (GPWG, 2001; GPWG II, 2012), would be timely and make possible a phylogeny that incorporates representatives of most of the 791 genera of the family using genome-scale data. In the process, we will gain a broader assessment of congruence among nuclear gene histories, including insights on the frequency and impact of incomplete lineage sorting (ILS) and reticulation.
Accordingly, here we present the most comprehensive nuclear phylogenomic tree of the grass family to date. Via a large community effort, we maximised taxon sampling by combining whole-genome, transcriptome, target capture and shotgun datasets. Based on the Angiosperms353 gene set, we inferred a nuclear multigene species tree using a coalescent-based method that accounts for incongruence due to ILS and uses information from multicopy gene trees. We also inferred a plastome tree and tested for incongruence between plastome and nuclear trees. Finally, we used gene tree–species tree reconciliation analyses to explore the signal for reticulation in the nuclear data.
期刊介绍:
New Phytologist is an international electronic journal published 24 times a year. It is owned by the New Phytologist Foundation, a non-profit-making charitable organization dedicated to promoting plant science. The journal publishes excellent, novel, rigorous, and timely research and scholarship in plant science and its applications. The articles cover topics in five sections: Physiology & Development, Environment, Interaction, Evolution, and Transformative Plant Biotechnology. These sections encompass intracellular processes, global environmental change, and encourage cross-disciplinary approaches. The journal recognizes the use of techniques from molecular and cell biology, functional genomics, modeling, and system-based approaches in plant science. Abstracting and Indexing Information for New Phytologist includes Academic Search, AgBiotech News & Information, Agroforestry Abstracts, Biochemistry & Biophysics Citation Index, Botanical Pesticides, CAB Abstracts®, Environment Index, Global Health, and Plant Breeding Abstracts, and others.