Only a fraction of the >11 million missense variants identified in humans has a known clinical significance. Post-translational modifications (PTMs), such as phosphorylation, glycosylation and ubiquitination, are key regulators of protein function and structure. PTMs depend on correct protein folding and the recognition and binding of enzymes to specific amino acid motifs near modification sites. AlphaFold models provide an unprecedented opportunity to explore variants on 3D structures, enabling systematic identification of amino acid substitutions that could affect PTMs and should be further investigated experimentally. We present Missense3D-PTMdb, a "one-stop-shop" interactive web tool that provides a user-friendly sequence-structure mapping of 20,235 human proteins, 11,5 million naturally occurring human missense variants, >60 PTM types and 203,775 PTM residues and their neighbours in sequence and 3D structure space using AlphaFold models of the human proteome. The resource also supports visualisation of novel variants not in the database. Missense3D-PTMdb is freely available at https://missense3d.bc.ic.ac.uk/ptmdb.
Proteins from thermophilic organisms exhibit remarkable stability under extreme thermal conditions. Understanding the molecular mechanisms underlying thermostability is essential for studying protein evolution and engineering robust enzymes. In this study, we systematically analyzed four sets of mesophilic-thermophilic protein pairs to investigate the molecular basis of thermal adaptation. We have constructed independent datasets of mesophilic-thermophilic protein pairs defined by sequence identity and optimal growth temperature (OGT): (a) >90% identity and 60-80 °C OGT, (b) 50-90% identity and >80 °C OGT, (c) 50-90% identity and 60-80 °C OGT, and (d) 50-90% identity and 40-60 °C OGT. Mutational analysis revealed that thermophilic proteins consistently reduced polar, uncharged residues while enriching charged, hydrophobic, and aromatic residues, particularly in extreme thermophiles (>80 °C). Further, by integrating multiple known protein features into a hierarchical rule-based classifier, we identified the thermostable protein from a pair of sequences and also assessed the relative importance of features across datasets to provide interpretable insights into protein thermostability. The hierarchical rule-based method identified stabilizing residues as the primary distinguishing factor, followed by electrostatic energy, volume, and localized electrical effects, which correctly classified 99% of thermophilic proteins. A bagging model trained on the same features achieved a balanced accuracy of 92% in 5-fold cross-validation and 91% on the 20% hold-out test set. Furthermore, independent validation using multiple mutations in proteins accurately identified 94% of stabilizing and destabilizing mutations. The results obtained in this work provide valuable insights to understand the thermal adaptation of proteins and reliably identify thermostable proteins.

