Friction stir welding is a solid-state joining process that operates below the material’s melting point commonly used to join aluminum parts, avoiding the drawbacks of fusion-based methods. These resulting advantages have accelerated growth and are increasing the number of applications across a range of industrial sectors, many of which are safety–critical. Along with the increase in applications and rise in productivity the need for reliable and cost-effective, non-destructive inline quality monitoring is rapidly growing. This publication is based on the research group’s ongoing efforts to develop a capable generalized inline-monitoring solution. To detect and classify FSW defects, convolutional neural networks (CNNs) based on the DenseNet architecture are used to evaluate recorded process data. The CNNs are modified to include weld and workpiece-specific metadata in the classification. These networks are then trained to classify transient weld data over a wide range of welding parameters, three different Al alloys, and two sheet thicknesses. The hyperparameters are incrementally tuned to increase weld defect detection. The defect detection threshold is tuned to prevent false negative classifications by adjusting the cost function to fit the needs of a force-based detection system. Classification accuracies > 99% are achieved with multiple neural network configurations. System validation is provided utilizing a newly recorded weld dataset from a different welding machine with previously used parameter/workpiece combinations as well as parameter combinations and alloys as well as sheet thicknesses outside the training parameter range. The generalization capabilities are demonstrated by the detection of > 99.9% of weld defects in the validation data.