PDB-IHM is a branch of the Protein Data Bank (PDB), a Worldwide Protein Data Bank (wwPDB) Core Archive, that expands its scope by allowing for additional biomolecular structure representations and types of experimental information (i.e., integrative/hybrid structure models). As of October 2025, PDB-IHM contained 374 entries, benefitting from multi-scale and multi-state representations and 17 types of experimental data. These structure models are assigned PDB accession codes and are archived alongside other experimental structures in the PDB. Rigorous interpretation of a structure model requires assessment of underlying data quality, consistency with the input data, and estimates of positional uncertainty of its components. Herein, we present the IHMValidation pipeline (https://validate.pdb-ihm.org; https://github.com/salilab/IHMValidation) based on recommendations from the wwPDB Integrative Methods Task Force plus the small-angle scattering (SAS), chemical crosslinking mass spectrometry (crosslinking-MS), and cryo-electron microscopy and tomography (3DEM) communities. The IHMValidation report (available in both PDF and HTML formats) comprises six sections: (i) overview; (ii) model details; (iii) data quality assessments; (iv) local geometry assessments (i.e., model quality); (v) fit of the model to the data used to generate it; and (vi) fit of the model to the data used for validation. Future expansions of the IHMValidation pipeline will: (i) reflect recommendations coming from additional experimental communities, including Förster resonance energy transfer (FRET) and hydrogen/deuterium exchange MS (HDX-MS); (ii) include other validation criteria, such as Bayesian likelihoods for the data; and (iii) represent estimates of structure model uncertainty based on the variation among alternative models satisfying input data.
Only a fraction of the >11 million missense variants identified in humans has a known clinical significance. Post-translational modifications (PTMs), such as phosphorylation, glycosylation and ubiquitination, are key regulators of protein function and structure. PTMs depend on correct protein folding and the recognition and binding of enzymes to specific amino acid motifs near modification sites. AlphaFold models provide an unprecedented opportunity to explore variants on 3D structures, enabling systematic identification of amino acid substitutions that could affect PTMs and should be further investigated experimentally. We present Missense3D-PTMdb, a "one-stop-shop" interactive web tool that provides a user-friendly sequence-structure mapping of 20,235 human proteins, 11,5 million naturally occurring human missense variants, >60 PTM types and 203,775 PTM residues and their neighbours in sequence and 3D structure space using AlphaFold models of the human proteome. The resource also supports visualisation of novel variants not in the database. Missense3D-PTMdb is freely available at https://missense3d.bc.ic.ac.uk/ptmdb.
Proteins from thermophilic organisms exhibit remarkable stability under extreme thermal conditions. Understanding the molecular mechanisms underlying thermostability is essential for studying protein evolution and engineering robust enzymes. In this study, we systematically analyzed four sets of mesophilic-thermophilic protein pairs to investigate the molecular basis of thermal adaptation. We have constructed independent datasets of mesophilic-thermophilic protein pairs defined by sequence identity and optimal growth temperature (OGT): (a) >90% identity and 60-80 °C OGT, (b) 50-90% identity and >80 °C OGT, (c) 50-90% identity and 60-80 °C OGT, and (d) 50-90% identity and 40-60 °C OGT. Mutational analysis revealed that thermophilic proteins consistently reduced polar, uncharged residues while enriching charged, hydrophobic, and aromatic residues, particularly in extreme thermophiles (>80 °C). Further, by integrating multiple known protein features into a hierarchical rule-based classifier, we identified the thermostable protein from a pair of sequences and also assessed the relative importance of features across datasets to provide interpretable insights into protein thermostability. The hierarchical rule-based method identified stabilizing residues as the primary distinguishing factor, followed by electrostatic energy, volume, and localized electrical effects, which correctly classified 99% of thermophilic proteins. A bagging model trained on the same features achieved a balanced accuracy of 92% in 5-fold cross-validation and 91% on the 20% hold-out test set. Furthermore, independent validation using multiple mutations in proteins accurately identified 94% of stabilizing and destabilizing mutations. The results obtained in this work provide valuable insights to understand the thermal adaptation of proteins and reliably identify thermostable proteins.

