Food safety sampling inspection is critical for risk prevention in complex supply chains. However, extreme class imbalance, where unqualified samples are significantly outnumbered by qualified ones, biases machine learning (ML) models to prioritize majority classes, compromising unqualified sample detection. Conventional oversampling methods fail to handle food inspection data’s nonlinear features, complex distributions, and multiclass scenarios, often generating low-quality synthetic samples and noisy decision boundaries. To address these challenges, we proposed LOF-KNN-CSENN (Local Outlier Factor-K-Nearest Neighbors-Combined Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors), a hybrid sampling algorithm of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN) integrating Local Outlier Factor (LOF) for noise filtering and K-Nearest Neighbors (KNN) for boundary sample preservation. LOF-KNN-CSENN synergizes minority oversampling and majority undersampling to optimize data distribution. A stacking ensemble learning framework is further introduced, combining six tree-based models with Logistic Regression (LR) as a meta model to enhance classification robustness. Experiments on a real-world food safety sampling inspection dataset demonstrated that LOF-KNN-CSENN suppresses noisy sample synthesis and balances data distribution. When integrated with stacking, the model achieves 0.4–5.6% higher precision and 0.8–30.7% higher F1-score compared to single models. Shapley Additive Explanations (SHAP) analysis identified production address, sampling stage, and location as key risk factors, supporting targeted supervision. This study provides a novel framework for intelligent food safety regulation, leveraging hybrid sampling and ensemble learning to mitigate class imbalance and enhance unqualified sample detection in multicategory food inspection.
扫码关注我们
求助内容:
应助结果提醒方式:
