Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effects of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic single nucleotide polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 131 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing gRNA design and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.
Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere (T2T) version - T2T-CHM13 - reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. Here, to provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we assembled the genome of a male Han Chinese individual, T2T-YAO, which includes T2T assemblies of all the 22 + X + M and 22 + Y chromosomes in both haploids. The quality of T2T-YAO is much better than those of all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼ 330-Mb exclusive sequences, ∼ 3100 unique genes, and tens of thousands of nucleotide and structural variations as compared with CHM13, highlighting the necessity of a population-stratified reference genome. The construction of T2T-YAO, an accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
Exploring the natural diversity of functional genes/proteins from environmental DNA in high throughput remains challenging. In this study, we developed a sequence-based functional metagenomics procedure for mining the diversity of copper (Cu) resistance gene copA in global microbiomes, by combining the metagenomic assembly technology, local BLAST, evolutionary trace analysis (ETA), chemical synthesis, and conventional functional genomics. In total, 87 metagenomes were collected from a public database and subjected to copA detection, resulting in 93,899 hits. Manual curation of 1214 hits of high confidence led to the retrieval of 517 unique CopA candidates, which were further subjected to ETA. Eventually, 175 novel copA sequences of high quality were discovered. Phylogenetic analysis showed that almost all these putative CopA proteins were distantly related to known CopA proteins, with 55 sequences from totally unknown species. Ten novel and three known copA genes were chemically synthesized for further functional genomic tests using the Cu-sensitive Escherichia coli (ΔcopA). The growth test and Cu uptake determination showed that five novel clones had positive effects on host Cu resistance and uptake. One recombinant harboring copA-like 15 (copAL15) successfully restored Cu resistance of the host with a substantially enhanced Cu uptake. Two novel copA genes were fused with the gfp gene and expressed in E. coli for microscopic observation. Imaging results showed that they were successfully expressed and their proteins were localized to the membrane. The results here greatly expand the diversity of known CopA proteins, and the sequence-based procedure developed overcomes biases in length, screening methods, and abundance of conventional functional metagenomics.