Purpose: After the introduction of the Ovarian-Adnexal Reporting and Data System (O-RADS) for magnetic resonance imaging (MRI), several studies with diverse characteristics have been published to assess its diagnostic performance. This systematic review and meta-analysis aimed to assess the diagnostic performance of O-RADS MRI scoring for adnexal masses, accounting for the risk of selection bias.
Methods: The PubMed, Scopus, Web of Science, and Cochrane databases were searched for eligible studies. Borderline or malignant lesions were considered malignant. All O-RADS MRI scores ≥4 were considered positive. The quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. The pooled sensitivity, specificity, and likelihood ratio (LR) values were calculated, considering the risk of selection bias.
Results: Fifteen eligible studies were found, and five of them had a high risk of selection bias. Between-study heterogeneity was low-to-moderate for sensitivity but substantial for specificity (I2 values were 35.5% and 64.7%, respectively). The pooled sensitivity was significantly lower in the studies with a low risk of bias compared with those with a high risk of bias (93.0% and 97.5%, respectively; P = 0.043), whereas the pooled specificity was not different (90.4% for the overall population). The negative and positive LRs were 0.08 [95% confidence interval (CI) 0.05–0.11] and 10.0 (95% CI 7.7–12.9), respectively, for the studies with low risk of bias and 0.03 (95% CI 0.01–0.10) and 10.3 (95% CI 3.8–28.3), respectively, for those with high risk of bias.
Conclusion: The overall diagnostic performance of the O-RADS system is very high, particularly for ruling out borderline/malignant lesions, but with a moderate ruling-in potential. Studies with a high risk of selection bias lead to an overestimation of sensitivity.
Clinical significance: The O-RADS system demonstrates considerable diagnostic performance, particularly in ruling out borderline or malignant lesions, and should routinely be used in practice. The high between-study heterogeneity observed for specificity suggests the need for improvement in the consistent characterization of the benign lesions to reduce false positive rates.