The field of geosciences is replete with problems where the target variable to be predicted is inherently class-imbalanced, meaning the events of interest are rare and infrequent. Examples include predicting landslides, ice jam breakups, preferential flow, and frozen ground. Such imbalance poses substantial challenges for modeling approaches. Using frozen ground prediction as a case study, this research examines how the frequency of event occurrence influences its prediction performance and proposes a data curation strategy to improve predictability. To this end, a data-driven approach utilizing a Long Short-Term Memory neural network is first implemented to predict soil temperature and determine frozen periods. Application of this approach at 25 gaging sites in Michigan reveals model underperformance, particularly at sites where the frozen data fraction (FDF) or the ratio of the frozen period to the total observation period, is low. The. study further demonstrates that under-sampling of more prevalent non-frozen period in training data improves detection of frozen periods. Greater improvements are experienced at sites with lower FDFs. However, performance peaks after a threshold FDF, plateauing or declining thereafter due to increased class imbalance and reduced training data length. The presented training data curation approach can be used for predictions of other class-imbalanced time series.