PrePPI is a structure-based pipeline that predicts protein-protein interactions (PPIs) between two structured domains and between structured domains and short linear motifs (SLiMs) on a proteome-wide scale. Since the 2023 Computational Resource Issue of JMB, the PrePPI website has been significantly expanded and redesigned. The resource now includes interactomes for human, yeast, and E. coli proteomes with 3D models for high-confidence domain-level complexes and PDB templates for most of the SLiM-mediated predicted interactions. A key new addition is derived from the clustering of the PrePPI interactomes based entirely on the structure-based likelihood of an interaction. Remarkably these clusters exhibit functional coherence and provide an unprecedented proteome-wide depiction of the subnetworks of PPIs that underlie biological phenomena. The new website - https://honigcomplab.c2b2.columbia.edu/PrePPI - provides convenient access to these clusters, to structural models for each pairwise complex, and to function annotations for individual proteins, enabling multiple modes of biological discovery.
Nonsense-mediated mRNA decay (NMD) is a conserved eukaryotic surveillance pathway that eliminates transcripts containing premature termination codons (PTCs). Substantial progress has been made in defining the transcript features that mark aberrant translation termination for NMD activation, yet key mechanistic steps remain incompletely understood - including how recruitment of the central NMD factor UPF1 is coupled to the downstream effector phase in which targeted mRNAs are nucleolytically degraded. In metazoans, NMD employs an endonucleolytic route mediated by SMG6, a PIN-domain nuclease, alongside SMG5 and SMG7, which act downstream of PTC recognition. SMG5 has recently been proposed to licence SMG6 activity, yet the molecular basis of this licencing has remained elusive. Here, we combine AlphaFold structural predictions with biochemical assays to investigate interactions among human SMG5, SMG6, and SMG7. Structural models predict a high-confidence interface between SMG5 and SMG6 PIN domains that forms a composite active site: a conserved SMG5 aspartate (D893) complements the SMG6 acidic triad to reinstate the canonical tetrad required for PIN-domain catalysis. In vitro, SMG6 alone exhibits weak endonucleolytic activity, which is enhanced ∼10-fold by the SMG5 PIN domain. Mutational analyses confirm that conserved residues from both proteins are essential for this composite configuration. Our findings reveal that the SMG5 PIN domain, previously considered catalytically inert, plays a critical role in activating SMG6 by completing its active site. This work provides mechanistic insight into the SMG5-dependent licencing step and uncovers a composite PIN nuclease architecture at the heart of the metazoan NMD effector phase.
Cell lines are essential tools for studying biological mechanisms, advancing pre-clinical drug discovery and supporting biologics production. To further research in these fields, we introduce the Cell Lines CoCoPUTs (Codon and Codon Pair Usage Tables, https://dnahive.fda.gov/hivecuts/cell-lines/), a comprehensive resource of transcriptomic-weighted codon and codon-pair usages for 1866 unique cell lines derived from two cancer databases, Catalogue of Somatic Mutations in Cancer (COSMIC) and Cancer Cell Line Encyclopedia (CCLE), and the Human Protein Atlas (HPA) database. Despite differences in the number of cell lines in each database and platforms used for the analysis (microarray vs RNA-Seq), codon usage distributions were broadly similar for all overlapping cell lines across three databases. Application of unsupervised machine learning approaches, including hierarchical and spectral clustering, for the analysis of 1355 cell lines of non-metastatic origin yielded more distinct clusters based on codon-pair usage over codon usage. However, distance-based comparisons indicated that codon usage often yields equal or smaller within-group distances than codon-pair usage and that cell lines are, on average, closer to their site of origin than to their disease phenotype.

