Predicting streamflow in ungauged watersheds is a key hydrological challenge, commonly addressed through flow duration curve (FDC) regionalization. Although machine learning (ML) models are widely applied, their accuracy depends critically on both the algorithm and input variable selection. This research develops a systematic, quantile-aware ML framework to assess how input selection strategies affect FDC prediction. We evaluate three Gamma Test–based approaches: full variable set, classified variables, and expert opinion, combined with five ML techniques: Adaptive Neuro-Fuzzy Inference System (ANFIS), Support Vector Regression (SVR), Multivariate Adaptive Regression Splines (MARS), Random Forest (RF), and Boosted Regression Trees (BRT). The analysis uses data from 130 hydrometric stations across the Caspian Sea watershed. Results demonstrated that predictive performance varies not only by model but also significantly with flow quantile and input strategy. The ANFIS model enhanced with Fuzzy C-Means clustering (FCM) consistently delivered the highest accuracy. Specifically, low, medium and high flows were best predicted using the full variable set (Q90, R² = 0.94, improved by 623 %), the classified variable and expert opinion approaches (Q50, R² = 0.86, improved by 207.14 %; Q2, R² = 0.86, improved by 207.14 %), respectively. This confirms that no single ML configuration is optimal for all conditions, underscoring the necessity of flow-regime-specific variable selection for robust FDC regionalization in data-scarce areas. Accordingly, for similar watersheds, we recommend the following configurations of the ANFIS-FCM model: the full variable set for low-flow prediction, the classified variable approach for medium-flow prediction, and the expert opinion approach for high-flow prediction.
扫码关注我们
求助内容:
应助结果提醒方式:
