Multivariate time-series (MTS) anomaly detection plays a crucial role in ensuring the reliable operation of industrial, transportation, and financial systems. However, the coexistence of long- and short-term temporal dynamics together with complex inter-variable interactions poses significant challenges for existing methods, particularly in feature-scale selection and spatio-temporal dependency modeling. To address these limitations, SAST is proposed as a scale-adaptive spatio-temporal modeling framework that provides a unified representation for temporal evolution modeling, structured variable dependency learning, and data-driven scale adaptation. SAST incorporates a temporal-aware Mixture-of-Experts (MoE) architecture composed of expert subnetworks with heterogeneous receptive fields, where temporal priors are leveraged to dynamically activate the most relevant expert combinations, enabling input-dependent multi-scale feature selection. Along the temporal dimension, SAST employs multi-scale patch partitioning and cross-patch attention to jointly capture short- and long-range temporal dependencies. Along the spatial dimension, a graph neural network guided by shared node and scale embeddings explicitly models multi-scale structural relations among variables. Extensive experiments on four public benchmark datasets show that SAST achieves superior accuracy and robustness over existing state-of-the-art methods, demonstrating its strong capability in multivariate time-series anomaly detection.