To conduct a computational investigation to explore the influence of clinical reference uncertainty on magnetic resonance imaging (MRI) radiomics feature selection, modelling, and performance. This study used two sets of publicly available prostate cancer MRI = radiomics data (Dataset 1: n = 260; Dataset 2: n = 100) with Gleason score clinical references. Each dataset was divided into training and holdout testing datasets at a ratio of 7:3 and analysed independently. The clinical references of the training set were permuted at different levels (increments of 5%) and repeated 20 times. Four feature selection algorithms and two classifiers were used to construct the models. Cross-validation was employed for training, while a separate hold-out testing set was used for evaluation. The Jaccard similarity coefficient was used to evaluate feature selection, while the area under the curve (AUC) and accuracy were used to assess model performance. An analysis of variance test with Bonferroni correction was conducted to compare the metrics of each model. The consistency of the feature selection performance decreased substantially with the clinical reference permutation. AUCs of the trained models with permutation particularly after 20% were significantly lower (Dataset 1 (with ≥ 20% permutation): 0.67, and Dataset 2 (≥ 20% permutation): 0.74), compared to the AUC of models without permutation (Dataset 1: 0.94, Dataset 2: 0.97). The performances of the models were also associated with larger uncertainties and an increasing number of permuted clinical references. Clinical reference uncertainty can substantially influence MRI radiomic feature selection and modelling. The high accuracy of clinical references should be helpful in building reliable and robust radiomic models. Careful interpretation of the model performance is necessary, particularly for high-dimensional data.