Predicting spring phenology in temperate forests is critical for forecasting important processes such as carbon storage. One major forecasting method for phenology is the growing degree day (GDD) model, which tracks heat accumulation. Forecasts using GDD models typically assume that the GDD threshold for a species is constant across diverse landscapes, but increasing evidence suggests otherwise. Shifts in climate with anthropogenic warming may change the required GDD. Variation in climate across space may also lead to variation in GDD requirements, with recent studies suggesting that fine-scale spatial variation in climate may matter to phenology. Here, we combine simulations, observations from an urban and a rural site, and Bayesian hierarchical models to assess how consistent GDD models of budburst are across species and space. We built GDD models using two different methods to measure climate data: on-site weather stations and local dataloggers. We find that estimated GDD thresholds can vary up to 20% across sites and methods. Our results suggest our studied urban site requires fewer GDDs until budburst and may have stronger microclimate effects than the studied rural site, though these effects depend on the method used to measure climate. Further, we find that GDD models are less accurate for early-active species and may become less accurate with warming. Our results suggest that local-scale forecasts based on GDD models for spring phenology should incorporate these inherent accuracy issues of GDD models, alongside the variations we found across space, species and warming. Testing whether these issues persist at larger spatial scales could improve forecasts for temperate forests.