This study leverages explainable machine learning, specifically XGBoost models with Shapley Additive Explanations (SHAP), to explore the chemical properties of atmospheric aerosols in Seoul, Korea, during the summer of 2019. Focusing on non-refractory particulate matter (NR-PM1) properties measured by high-resolution time-of-flight aerosol mass spectrometry (HR-ToF-AMS), the research extends to organic aerosol (OA) sources identified via positive matrix factorization of high-resolution MS data. The models achieved good predictive accuracy (R2 > 0.90) for all species concentrations, except for hydrocarbon-like OA (HOA) due to frequent concentration fluctuations. The model outcomes aligned well with those previously achieved using conventional methods (chemical transport model and correlational analysis), confirming that relative humidity is associated with nocturnal nitrate concentration and photochemistry associated with sulfate concentration in the summertime in Seoul. Importantly, the models revealed mostly nonlinear relationships between atmospheric factors, such as temperature and particulate matter (PM) components, thereby deepening the understanding of formation processes. Notably, different potential formation mechanisms were discerned for more oxidized oxygenated OA (MO-OOA) and oxidized primary OA (OPOA). For MO-OOA, SHAP analysis showed a plateau in SHAP values at an Ox concentration of 0.085 ppm, which suggested potential fragmentation from further oxidation and agreed with previous chamber experiments. Conversely, the lack of a plateau in the Ox values for OPOA implied potential ongoing oxidation, suggesting a higher and longer atmospheric oxidation potential. This approach offers rapid and potential insights into complex atmospheric aerosol formation processes. It is essential to acknowledge that SHAP values do not establish causality, and knowledge of the underlying physical and chemical processes was required to conclude valid and comprehensive interpretations of the ML results.
This study represents a pioneering effort in applying explainable machine learning techniques to HR-ToF-AMS data, achieving rapid results yet providing potential insights of OA formation mechanisms, such as the oxidation turning point for MO-OOA and OPOA.