{"title":"Near-complete assembly and comprehensive annotation of the wheat Chinese Spring genome.","authors":"Zijian Wang, Lingfeng Miao, Kaiwen Tan, Weilong Guo, Beibei Xin, Rudi Appels, Jizeng Jia, Jinsheng Lai, Fei Lu, Zhongfu Ni, Xiangdong Fu, Qixin Sun, Jian Chen","doi":"10.1016/j.molp.2025.02.002","DOIUrl":null,"url":null,"abstract":"<p><p>A complete reference genome is crucial for biology research and genetic improvement. Owing to its large size and highly repetitive nature, there are numerous gaps in the globally used wheat Chinese Spring (CS) genome. Here, we generated a 14.46 Gb near-completed assembly of the CS genome, with a contig N50 over 266 Mb and an overall base accuracy of 99.9963%. Among the 290 gaps that remained (26, 257 and 7 gaps from the A, B and D subgenomes, respectively), 278 gaps were extremely high-copy tandem repeats, whereas the remaining 12 were TE-associated gaps. Four chromosomes were completely gap-free, including chr1D, chr3D, chr4D and chr5D. Extensive annotation of the near-complete genome revealed 151,405 high-confidence genes, of which 59,180 high-confidence genes were newly annotated, including 7,602 newly assembled genes. Except for the centromere of chr1B, which has a gap associated with superlong GAA repeat arrays, the centromeric sequences of all of the remaining 20 chromosomes were completely assembled. Our near-complete assembly revealed that the extent of tandem repeats, such as SSRs, was highly uneven among different subgenomes. Similarly, the repeat compositions of the centromeres also varied among the three subgenomes. With the genome sequences of all six types of seed storage proteins fully assembled, the expression of ω-gliadin was found to be contributed entirely by the B subgenome, whereas the expression of the other 5 types of SSPs was most abundant from the D subgenome. The near-complete CS genome will serve as a valuable resource for the research and breeding of wheat as well as its related species.</p>","PeriodicalId":19012,"journal":{"name":"Molecular Plant","volume":" ","pages":""},"PeriodicalIF":17.1000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Plant","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.molp.2025.02.002","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A complete reference genome is crucial for biology research and genetic improvement. Owing to its large size and highly repetitive nature, there are numerous gaps in the globally used wheat Chinese Spring (CS) genome. Here, we generated a 14.46 Gb near-completed assembly of the CS genome, with a contig N50 over 266 Mb and an overall base accuracy of 99.9963%. Among the 290 gaps that remained (26, 257 and 7 gaps from the A, B and D subgenomes, respectively), 278 gaps were extremely high-copy tandem repeats, whereas the remaining 12 were TE-associated gaps. Four chromosomes were completely gap-free, including chr1D, chr3D, chr4D and chr5D. Extensive annotation of the near-complete genome revealed 151,405 high-confidence genes, of which 59,180 high-confidence genes were newly annotated, including 7,602 newly assembled genes. Except for the centromere of chr1B, which has a gap associated with superlong GAA repeat arrays, the centromeric sequences of all of the remaining 20 chromosomes were completely assembled. Our near-complete assembly revealed that the extent of tandem repeats, such as SSRs, was highly uneven among different subgenomes. Similarly, the repeat compositions of the centromeres also varied among the three subgenomes. With the genome sequences of all six types of seed storage proteins fully assembled, the expression of ω-gliadin was found to be contributed entirely by the B subgenome, whereas the expression of the other 5 types of SSPs was most abundant from the D subgenome. The near-complete CS genome will serve as a valuable resource for the research and breeding of wheat as well as its related species.
期刊介绍:
Molecular Plant is dedicated to serving the plant science community by publishing novel and exciting findings with high significance in plant biology. The journal focuses broadly on cellular biology, physiology, biochemistry, molecular biology, genetics, development, plant-microbe interaction, genomics, bioinformatics, and molecular evolution.
Molecular Plant publishes original research articles, reviews, Correspondence, and Spotlights on the most important developments in plant biology.