Allostery proteins play a central role in biological processes and systems. Uncovering the biological effects of allosteric protein mutations and their role in disease progression remains a significant challenge. Theoretically, computational approaches hold the potential to enable large-scale interpretation of genetic variants in allosteric proteins. Nevertheless, general-purpose variant effect prediction (VEP) methodologies overlook the characteristic disparities across different genes. What is more critical is that individual tools frequently display inconsistencies, biases, and fluctuations in quality. Consequently, the predictions obtained from existing VEP approaches are considered insufficiently reliable. In the present research, we constructed an a multifaceted-feature-based ensemble learning approach to forecast the pathogenicity of missense mutations within allosteric proteins. The proposed method used categorical boosting to integrate four types of features, namely, sequence information, AlphaFold2-extracted biochemical properties, prediction scores from other VEP methods, and allele frequency from gnomAD. Our method demonstrated superior performance with an AUC of 0.912 when tested on a benchmark allosteric protein dataset, outperforming 22 general VEP methods. To facilitate the identification of pathogenic mutations in the sea of rare variants discovered as sequencing studies expand on a large scale, we provided the pathogenicity probabilities of all potential amino acid substitutions in 202 allosteric-protein-encoding genes. To sum up, our research indicates that multifaceted-feature-based ensemble learning models can offer valuable independent evidence for interpreting missense mutations in allosteric proteins, which will be broadly applicable in both research and clinical contexts.
扫码关注我们
求助内容:
应助结果提醒方式:
