Accurate identification of ABL1 tyrosine-kinase mutations associated with therapeutic resistance can support timely adjustment of anticancer regimens; however, tyrosine-kinase inhibitor (TKI) mutation datasets are typically small and strongly imbalanced, which can bias model training and inflate performance if data processing is not strictly separated between training and testing. This study proposes an end-to-end, leakage-controlled machine-learning framework for ABL1 TKI-resistance prediction, in which all data-driven operations including feature selection and Synthetic Minority Oversampling Technique (SMOTE) are performed within cross-validation training folds only, preventing information from validation folds or the test set from influencing model development. Multiple base learners were independently tuned using metaheuristic hyperparameter optimization and then integrated using a stacked-ensemble architecture to reduce overfitting and improve generalization. On a held-out test set, the final ensemble achieved 91.9% accuracy, 75.0% precision, 96.9% specificity, 60.0% sensitivity, 66.7% F1-score, 0.626 MCC, 0.938 AUROC, and 0.729 PR-AUC, showing only a modest decline relative to cross-validation estimates. Post-hoc interpretability with Shapley additive explanations (SHAP) highlighted binding-score terms, mutation physicochemical descriptors, ligand flexibility, and local mutation-environment features as the main contributors, consistent with established principles of protein–ligand recognition. Overall, the results support a methodologically disciplined and interpretable approach for mutation-level resistance prediction, while motivating external validation and downstream evaluation of clinical utility.
扫码关注我们
求助内容:
应助结果提醒方式:
