Accurate prediction of welding state is essential for ensuring the quality of aluminum alloy pulsed gas tungsten arc welding (GTAW). While multimodal fusion approaches have advanced welding state prediction, complex environmental noise often introduces interference, reducing prediction accuracy. To address this, we propose a novel multimodal fusion network based on multispectral channel attention mechanism (MFCA-Net). First, our model employs a parallel feature mapping strategy to capture both local and global dependencies within each modality, enhancing receptive field interaction and improving global modeling capabilities. Second, a multi-spectral channel attention mechanism emphasizes informative features across channels, refining the fusion of local high-frequency and global low-frequency features within each mode and reducing redundancy. Finally, these multimodal features are fused to accurately predict welding state. Experimental results demonstrate that MFCA-Net accurately identifies five typical welding states—lack of penetration, normal penetration, over penetration, misalignment, and burn through—with an accuracy of 98.8 %, and 96.1 % on public datasets. Compared with state-of-the-art methods, MFCA-Net significantly enhances prediction performance, showing strong potential for real-world welding applications.