Purpose
To develop and compare machine learning models for distinguishing low and high grade meningiomas on multiparametric MRI. Methods: Pre-operative T1-weighted(T1), contrast-enhanced T1-weighted(T1CE), T2-weighted, T2 FLAIR, and DWI/ADC MRI sequences of meningiomas performed between 2000 and 2020 were retrospectively collected from a single tertiary hospital dedicated neurosurgical department. Tumours were manually segmented and handcrafted radiomic features were extracted. Deep learning features were extracted using a fine-tuned foundation model. Various oversampling techniques, feature selection algorithms and classifiers were trialled to build Handcrafted radiomics only (HRO) and handcrafted with deep learning radiomics (HDLR) models. Bootstrap was used for internal validation of model performance and calculating confidence intervals of metrices. Discrimination, calibration, feature importance and clinical utility of models were assessed via ROC AUC, calibration curve, Shapley values and decision curve analysis, respectively. Results: The analysis included 97 low grade and 18 high grade meningiomas. HRO and HDLR models had comparable diagnostic performance, using Random Forest and XGBoost respectively. They achieved mean (95 %CI): ROC AUC 0.825[0.662,0.952] and 0.794[0.662,0.948], specificity 0.913[0.793,0.952] and 0.892[0.796,0.983], sensitivity 0.499[0.204,1] and 0.509[0.225,0.851], NPV 0.909[0.851,0.971] and 0.909[0.851,0.972], and PPV 0.529[0.238,0.924] and 0.465[0.263,0.846], respectively for HRO and HDLR models. HRO and HDLR models selected 11–12 features, with T1 and T1CE having consistent importance. Conclusion: HRO and HDLR can effectively predict meningioma grades preoperatively. Challenges remain in achieving consistent sensitivity and PPV. Larger, multi-centre studies are warranted to confirm our findings, but it holds promise for improving personalized treatment strategies and patient outcomes in meningioma management. Code is available on Github https://github.com/stephano41/radiomics_ai.