The editing efficiencies of prime editing (PE) using ribonucleoprotein (RNP) and RNA delivery are not optimal due to the challenges in solid-phase synthesis of long PE guide RNA (pegRNA) (>125 nt). Here, we develop an efficient, rapid and cost-effective method for generating chemically modified pegRNA (125–145 nt) and engineered pegRNA (epegRNA) (170–190 nt). We use an optimized splint ligation approach and achieve approximately 90% production efficiency for these RNAs, referred to as L-pegRNA and L-epegRNA. L-epegRNA demonstrates enhanced editing efficiencies across various cell lines and human primary cells with improvements of up to more than tenfold when using RNP delivery and several hundredfold with RNA delivery of PE, compared to epegRNA produced by in vitro transcription. L-epegRNA-mediated RNP delivery also outperforms plasmid-encoded PE in most comparisons. Our study provides a solution to obtaining high-quality pegRNA and epegRNA with desired chemical modifications, paving the way for the use of PE in therapeutics and various other fields.
In primary cell types, intracellular deoxynucleotide triphosphate (dNTP) levels are tightly regulated in a cell cycle-dependent manner. We report that prime editing efficiency is increased by mutations that improve the enzymatic properties of Moloney murine leukemia virus reverse transcriptase and treatments that increase intracellular dNTP levels. In combination, these modifications produce substantial increases in precise editing rates.
Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but are limited in their ability to guide generation of proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space diffusion model based on RoseTTAFold that simultaneously generates protein sequences and structures. Beginning from a noised sequence representation, PG generates sequence and structure pairs by iterative denoising, guided by desired sequence and structural protein attributes. We designed thermostable proteins with varying amino acid compositions and internal sequence repeats and cage bioactive peptides, such as melittin. By averaging sequence logits between diffusion trajectories with distinct structural constraints, we designed multistate parent–child protein triples in which the same sequence folds to different supersecondary structures when intact in the parent versus split into two child domains. PG design trajectories can be guided by experimental sequence–activity data, providing a general approach for integrated computational and experimental optimization of protein function.
We introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach for regulated sequence variation detection in massive datasets from a wide range of sequencing technologies and biological contexts. We demonstrate biological discovery by SPLASH2 in single-cell RNA sequencing (RNA-seq) data and in bulk RNA-seq data from the Cancer Cell Line Encyclopedia, including unannotated alternative splicing in cancer transcriptomes and sensitive detection of circular RNA.
Tissue-level and organism-level biological processes often involve the coordinated action of multiple distinct cell types. The recent application of single-cell assays to many individuals should enable the study of how donor-level variation in one cell type is linked to that in other cell types. Here we introduce a computational approach called single-cell interpretable tensor decomposition (scITD) to identify common axes of interindividual variation by considering joint expression variation across multiple cell types. scITD combines expression matrices from each cell type into a higher-order matrix and factorizes the result using the Tucker tensor decomposition. Applying scITD to single-cell RNA-sequencing data on 115 persons with lupus and 83 persons with coronavirus disease 2019, we identify patterns of coordinated cellular activity linked to disease severity and specific phenotypes, such as lupus nephritis. scITD results also implicate specific signaling pathways likely mediating coordination between cell types. Overall, scITD offers a tool for understanding the covariation of cell states across individuals, which can yield insights into the complex processes that define and stratify disease.
Protein and vaccine therapies based on mRNA would benefit from an increase in translation capacity. Here, we report a method to augment translation named ligation-enabled mRNA–oligonucleotide assembly (LEGO). We systematically screen different chemotopological motifs and find that a branched mRNA cap effectively initiates translation on linear or circular mRNAs without internal ribosome entry sites. Two types of chemical modification, locked nucleic acid (LNA) N7-methylguanosine modifications on the cap and LNA + 5 × 2′ O-methyl on the 5′ untranslated region, enhance RNA–eukaryotic translation initiation factor (eIF4E–eIF4G) binding and RNA stability against decapping in vitro. Through multidimensional chemotopological engineering of dual-capped mRNA and capped circular RNA, we enhanced mRNA protein production by up to tenfold in vivo, resulting in 17-fold and 3.7-fold higher antibody production after prime and boost doses in a severe acute respiratory syndrome coronavirus 2 vaccine setting, respectively. The LEGO platform opens possibilities to design unnatural RNA structures and topologies beyond canonical linear and circular RNAs for both basic research and therapeutic applications.