PM2.5 is still one of the major atmospheric pollutants worldwide. Extracting contributions of anthropogenic emission control from the observed PM2.5 variations (PM2.5_anth), which are also strongly affected by meteorological changes, is critical for effective pollution control. Statistical and machine learning methods are usually used for such purpose, but the effectiveness of these methods is hard to evaluate due to the lack of observed anthropogenic contributions. In this study, we use the chemical transport model GEOS-Chem standard simulation to mimic PM2.5 variability in the real atmosphere, and use the model simulation with fixed meteorological fields as the “true value” for PM2.5_anth. We evaluate the effectiveness of three methods in meteorological normalization of PM2.5 on decadal (2006–2017) and synoptic (one month) scale: multiple linear regression (MLR), general additive model (GAM), and random forest (RF) algorithm. For meteorological normalization of PM2.5 on decadal scale, 67–72% of the MLR simulations show positive biases and 56–75% of the RF simulations show negative biases. The “true value” of PM2.5_anth falls within the range of meteorological normalization results of the three methods in most cases, but consistent positive/negative biases are observed in ∼30% of the cases, when meteorological changes dominate PM2.5 variability. In addition, the biases are correlated to the contribution of meteorological changes. As such, multiple statistical or machine learning methods are recommended to quantify the uncertainties associated with method choice in cases anthropogenic emission changes dominate PM2.5 variability. On synoptic scale, RF performs better in reproducing the daily variations of the PM2.5_anth differences than MLR (GAM) in all (83% of) the cases, and is recommended for meteorological normalization of PM2.5 in short-term in eastern China.