With the decreasing cost of sequencing and availability of larger numbers of sequenced genomes, comparative genomics is becoming increasingly attractive to complement experimental techniques for the task of transcription factor (TF) binding site identification. In this study, we redesigned BLSSpeller, a motif discovery algorithm, to cope with larger sequence datasets. BLSSpeller was used to identify novel motifs in Zea mays in a comparative genomics setting with 16 monocot lineages. We discovered 61 motifs of which 20 matched previously described motif models in Arabidopsis. In addition, novel, yet uncharacterized motifs were detected, several of which are supported by available sequence-based and/or functional data. Instances of the predicted motifs were enriched around transcription start sites and contained signatures of selection. Moreover, the enrichment of the predicted motif instances in open chromatin and TF binding sites indicates their functionality, supported by the fact that genes carrying instances of these motifs were often found to be co-expressed and/or enriched in similar GO functions. Overall, our study unveiled several novel candidate motifs that might help our understanding of the genotype to phenotype association in crops.
Gymnocypris przewalskii, a cyprinid fish endemic to the Qinghai-Tibetan Plateau, has evolved unique morphological, physiological and genetic characteristics to adapt to the highland environment. Herein, we assembled a high-quality G. przewalskii tetraploid genome with a size of 2.03 Gb and scaffold N50 of 44.93 Mb, which was anchored onto 46 chromosomes. The comparative analysis suggested that gene families related to highland adaptation were significantly expanded in G. przewalskii. According to the G. przewalskii genome, we evaluated the phylogenetic relationship of 13 schizothoracine fishes, and inferred that the demographic history of G. przewalskii was strongly associated with geographic and eco-environmental alterations. We noticed that G. przewalskii experienced whole-genome duplication, and genes preserved post duplication were functionally associated with adaptation to high salinity and alkalinity. In conclusion, a chromosome-scale G. przewalskii genome provides an important genomic resource for teleost fish, and will particularly promote our understanding of the molecular evolution and speciation of fish in the highland environment.
The Asian citrus psyllid, Diaphorina citri, is the insect vector of the causal agent of huanglongbing (HLB), a devastating bacterial disease of commercial citrus. Presently, few genomic resources exist for D. citri. In this study, we utilized PacBio HiFi and chromatin confirmation contact (Hi-C) sequencing to sequence, assemble, and compare three high-quality, chromosome-scale genome assemblies of D. citri collected from California, Taiwan, and Uruguay. Our assemblies had final sizes of 282.67 Mb (California), 282.89 Mb (Taiwan), and 266.67 Mb (Uruguay) assembled into 13 pseudomolecules-a reduction in assembly size of 41-45% compared with previous assemblies which we validated using flow cytometry. We identified the X chromosome in D. citri and annotated each assembly for repetitive elements, protein-coding genes, transfer RNAs, ribosomal RNAs, piwi-interacting RNA clusters, and endogenous viral elements. Between 19,083 and 20,357 protein-coding genes were predicted. Repetitive DNA accounts for 36.87-38.26% of each assembly. Comparative analyses and mitochondrial haplotype networks suggest that Taiwan and Uruguay D. citri are more closely related, while California D. citri are closely related to Florida D. citri. These high-quality, chromosome-scale assemblies provide new genomic resources to researchers to further D. citri and HLB research.
A common approach to estimate the strength and direction of selection acting on protein coding sequences is to calculate the dN/dS ratio. The method to calculate dN/dS has been widely used by many researchers and many critical reviews have been made on its application after the proposition by Nei and Gojobori in 1986. However, the method is still evolving considering the non-uniform substitution rates and pretermination codons. In our study of SNPs in 586 genes across 156 Escherichia coli strains, synonymous polymorphism in 2-fold degenerate codons were higher in comparison to that in 4-fold degenerate codons, which could be attributed to the difference between transition (Ti) and transversion (Tv) substitution rates where the average rate of a transition is four times more than that of a transversion in general. We considered both the Ti/Tv ratio, and nonsense mutation in pretermination codons, to improve estimates of synonymous (S) and non-synonymous (NS) sites. The accuracy of estimating dN/dS has been improved by considering the Ti/Tv ratio and nonsense substitutions in pretermination codons. We showed that applying the modified approach based on Ti/Tv ratio and pretermination codons results in higher values of dN/dS in 29 common genes of equal reading-frames between E. coli and Salmonella enterica. This study emphasizes the robustness of amino acid composition with varying codon degeneracy, as well as the pretermination codons when calculating dN/dS values.

