CAZymes (Carbohydrate Active EnZymes) degrade, synthesize, and modify all complex carbohydrates on Earth. CAZymes are extremely important to research in human health, nutrition, gut microbiome, bioenergy, plant disease, and global carbon recycling. Current CAZyme annotation tools are all based on sequence similarity. A more powerful approach is to detect protein structural similarity between query proteins and known CAZymes indicative of distant homology. Here, we developed CAZyme3D (https://pro.unl.edu/CAZyme3D/) to fill the research gap that no dedicated 3D structure databases are currently available for CAZymes. CAZyme3D contains a total of 870,740 AlphaFold predicted 3D structures (named Whole dataset). A subset of CAZymes 3D structures from 188,574 nonredundant sequences (named ID50 dataset) were subject to structural similarity-based clustering analyses. Such clustering allowed us to organize all CAZyme structures using a hierarchical classification, which includes existing levels defined by the CAZy database (class, clan, family, subfamily) and newly defined levels (subclasses, structural cluster [SC] groups, and SCs). The inter-family structural clustering successfully grouped CAZy families and clans with the same structural folds in the same subclasses. The intra-family structural clustering classified structurally similar CAZymes into SCs, which were further classified into SC groups. SCs and SC groups differed from sequence similarity-based CAZy subfamilies. With CAZyme structures as the search database, we created job submission pages, where users can submit query protein sequences or PDB structures for a structural similarity search. CAZyme3D will be a useful new tool to assist the discovery of novel CAZymes by providing a comprehensive database of CAZyme 3D structures.
RNA is a master regulator of cellular processes and will bind to many different proteins throughout its life cycle. Dysregulation of RNA and RNA-binding proteins can lead to various diseases, including cancer. To better understand molecular mechanisms of the cellular processes, it is important to characterize protein-RNA interactions at the structural level. There is a lack of experimental structures available for protein-RNA complexes due to the RNA inherent flexibility, which complicates the experimental structure determination. The scarcity of structures can be made up for with computational modeling. Dockground is a resource for development and benchmarking of structure-based modeling of protein interactions. It contains datasets focusing on different aspects of protein recognition. The foundation of all the datasets is the database of experimentally determined protein complexes, which previously contained only protein-protein assemblies. To further expand the utility of the Dockground resource, we extended the database to protein-RNA interactions. The new functionalities are available on the Dockground website at https://dockground.compbio.ku.edu/. The database can be searched using a number of criteria, including removal of redundancies at various sequence and structure similarity thresholds. The database updates with new structures from the Protein Data Bank on a weekly basis.
tRNA-guanine transglycosylases (TGT) occur in all domains of life. They are unique among RNA-modifying enzymes as they exchange a guanine base in the primary RNA transcript by various 7-substituted 7-deazaguanines leading to the modified nucleosides queuosine and archaeosine. Archaeosine is found in the D-loop of archaeal tRNAs, queuosine in the anticodon of bacterial and eukaryotic tRNAs specific for Asp, Asn, His and Tyr. Structural and functional studies revealed a common base-exchange mechanism for all TGTs. Nonetheless, there are also significant differences between TGTs, which will be discussed here. It concerns the specificity for different 7-deazaguanine substrates as well as the recognition of substrate tRNAs. For queuosine TGT an anticodon stem-loop containing the UGU recognition motif is a minimal substrate sufficient for binding to the active site, however, full-length tRNA is bound with higher affinity due to multiple interactions with the dimeric enzyme. Archaeal TGT also binds tRNAs as homodimer, even though the interaction pattern is very different and results in a large change of tRNA conformation. Interestingly, a closely related enzyme, DpdA, exchanges guanine by 7-cyano-7-deazguanine (preQ0) in double stranded DNA of several bacteria. Bacterial TGT is a target for structure-based drug design, as the virulence of Shigella depends on TGT activity, and mammalian TGT has been used for the treatment of murine experimental autoimmune encephalomyelitis, a model for chronic multiple sclerosis. Furthermore, TGT has become a valuable tool in nucleic acid chemistry, as it facilitates the incorporation of non-natural bases in tRNA molecules, e.g. for labelling or cross-linking purposes.
Protein Data Bank Japan (PDBj, https://pdbj.org/) is the Asian hub of three-dimensional macromolecular structure data, and a founding member of the worldwide Protein Data Bank. We have accepted, processed, and distributed experimentally determined biological macromolecular structures for over two decades. Although we collaborate with RCSB PDB and BMRB in the United States, PDBe and EMDB in Europe and recently PDBc in China for our data-in activities, we have developed our own unique services and tools for searching, exploring, visualizing, and analyzing protein structures. We have also developed novel archives for computational data and raw crystal diffraction images. Recently, we introduced the Sequence Navigator Pro service to explore proteins using experimental and computational approaches, which enables experimental structural biologists to increase their insight to help them to design their experimental studies more efficiently. In addition, we also introduced a new UniProt-integrated portal to provide users with a quick overview of their target protein and it shows a recommended structure and integrates data from various internal and external resources. With these new additions, we have enhanced our service portfolio to benefit both experimental as computational structural biologists in their search to interpret protein structures, their dynamics and function.
Queuosine (Q), a 7-deazaguanosine derivative, is among the most intricate tRNA modifications, and is located at position 34 (the Wobble position) of tRNAs with a GUN anticodon. Found in most eukaryotes and many bacteria, Q is unique among tRNA modifications because its full biosynthetic pathway exists only in bacteria. In contrast, eukaryotes are auxotrophic for Q, relying on dietary sources and gut microbiota to acquire Q and the nucleobase queuine. This dependency creates a nutritional link to translation in the host. Q enhances Wobble base pairing with U and helps balance translational speed between Q codons ending in C and U in eukaryotes. The absence of Q modification impacts oxidative stress response, impairs mitochondrial function and protein folding, and has been associated with neurodegeneration, cancer, and inflammation. This review discusses our current understanding of the cellular and organismal impacts of Q deficiency in eukaryotes. Additionally, it examines recent advancements in technologies for detecting Q modifications at single-base resolution and explores the potential applications of the Q modification system in biotechnology.
We present RNAproDB (https://rnaprodb.usc.edu/), a new webserver, analysis pipeline, database, and highly interactive visualization tool, designed for protein-RNA complexes, and applicable to all forms of nucleic acid containing structures. RNAproDB computes several mapping schemes to place nucleic acid components and present protein-RNA interactions appropriately. Various structural annotations are computed including non-canonical base-pairing geometries, hydrogen bonds, and protein-RNA and RNA-RNA water-mediated interactions. This information is presented through integrated visualization and data tools. Subgraph selection facilitates studying smaller components of the interface. Molecular surface electrostatic potential can be visualized. RNAproDB enables analyzing and exploring experimentally determined, predicted, and designed protein-nucleic acid complexes. We present a quantitative analysis of pre-analyzed protein-RNA structures in RNAproDB revealing statistical patterns of molecular binding and recognition.
Protein function relies on accurate and densely packed constellations of amino acids within the active site. The high density in the active site optimizes activity but reduces tolerance to mutations, thereby frustrating efforts to engineer or design new or dramatically improved activity. Introducing new activities may therefore require simultaneous multipoint mutations. Still, in a phenomenon known as epistasis, the outcome of combinations of mutations can differ significantly-and even reverse-the impact of the individual mutations, limiting predictability. To address these challenges we previously developed FuncLib, a method for the computational design of multipoint mutants in active sites. We recently extended FuncLib to enable the design of large combinatorial mutation libraries for high-throughput screening in a method called htFuncLib that generates compatible sets of mutations likely to yield functional multipoint mutants. htFuncLib enables scalable library design and experimental screening of hundreds and up to millions of active-site variants. This approach has generated thousands of active enzymes and fluorescent proteins with diverse functional properties. We have updated the FuncLib web server (https://FuncLib.weizmann.ac.il/) to support htFuncLib and introduced an electronic notebook (https://github.com/Fleishman-Lab/htFuncLib-web-server) for customizable library design, making those tools easily accessible for protein engineering and design. The new FuncLib web server enables reliable and scalable design of function for low-, medium- and high-throughput experiments through a single computational platform. We envision that this server will accelerate the optimization and discovery of function in enzymes, antibodies, and other proteins.