Raymond Webrah Kazapoe , Daniel Kwayisi , Seidu Alidu , Samuel Dzidefo Sagoe , Aliyu Ohiani Umaru , Ebenezer Ebo Yahans Amuah , Millicent Obeng Addai , Obed Fiifi Fynn
{"title":"Advanced analysis of soil pollution in southwestern Ghana using Variational Autoencoders (VAE) and positive matrix factorization (PMF)","authors":"Raymond Webrah Kazapoe , Daniel Kwayisi , Seidu Alidu , Samuel Dzidefo Sagoe , Aliyu Ohiani Umaru , Ebenezer Ebo Yahans Amuah , Millicent Obeng Addai , Obed Fiifi Fynn","doi":"10.1016/j.indic.2025.100627","DOIUrl":null,"url":null,"abstract":"<div><div>The study combined the Positive Matrix Factorization (PMF) receptor model with the Variational Autoencoders (VAE) Machine Learning technique and ecological risk indices to study the spatial distribution, sources and patterns of soil pollution in the study area. 719 soil samples were analysed for selected Potentially Toxic Elements (PTEs) concentrations. As (9.68 mg/L), and Pb (7.43 mg/L) reported elevated levels across the area linked to mining activities. The PTEs displayed a decreasing trend in the order Ba > Cr > V > Zn > Cu > Ni > As > Pb > Co. The Pearson correlation matrix outlines two main groups of PTEs: (1) moderate correlation (Ba, Cr, Cu, Ni and V) and (2) weak correlation (As, Pb and Zn). These relationships are corroborated by the VAE, which outlined a low contribution by As and a high contribution by V to all the latent dimensions. The PMF revealed three factors: Factor 1 (geogenic): Ba (77.5%), Cu (54.4%), Ni (66.4%), V (54.0) and Cr (46.8%). Factor 2 (mixed) Co (61.6%), Pb (64.8%) and Zn (71.0%). Factor 3 (anthropogenic) As (86.7%). The degree of contamination analysis depicts that 69.03% of the samples are moderately polluted, while 15.14% and 0.28% revealed considerable and very high pollution, respectively. The pollution load index shows that 20% of the samples depict the existence of pollution. The Potential Ecological Risk Index (RI) values showed that most samples (97.08%) suggest low pollution, while 2.92% depict moderate pollution. Integrating chemometric and machine learning techniques provides a dynamic system that can monitor pollution shifts early, to aid remediation efforts in highly affected areas.</div></div>","PeriodicalId":36171,"journal":{"name":"Environmental and Sustainability Indicators","volume":"26 ","pages":"Article 100627"},"PeriodicalIF":5.4000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental and Sustainability Indicators","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665972725000480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The study combined the Positive Matrix Factorization (PMF) receptor model with the Variational Autoencoders (VAE) Machine Learning technique and ecological risk indices to study the spatial distribution, sources and patterns of soil pollution in the study area. 719 soil samples were analysed for selected Potentially Toxic Elements (PTEs) concentrations. As (9.68 mg/L), and Pb (7.43 mg/L) reported elevated levels across the area linked to mining activities. The PTEs displayed a decreasing trend in the order Ba > Cr > V > Zn > Cu > Ni > As > Pb > Co. The Pearson correlation matrix outlines two main groups of PTEs: (1) moderate correlation (Ba, Cr, Cu, Ni and V) and (2) weak correlation (As, Pb and Zn). These relationships are corroborated by the VAE, which outlined a low contribution by As and a high contribution by V to all the latent dimensions. The PMF revealed three factors: Factor 1 (geogenic): Ba (77.5%), Cu (54.4%), Ni (66.4%), V (54.0) and Cr (46.8%). Factor 2 (mixed) Co (61.6%), Pb (64.8%) and Zn (71.0%). Factor 3 (anthropogenic) As (86.7%). The degree of contamination analysis depicts that 69.03% of the samples are moderately polluted, while 15.14% and 0.28% revealed considerable and very high pollution, respectively. The pollution load index shows that 20% of the samples depict the existence of pollution. The Potential Ecological Risk Index (RI) values showed that most samples (97.08%) suggest low pollution, while 2.92% depict moderate pollution. Integrating chemometric and machine learning techniques provides a dynamic system that can monitor pollution shifts early, to aid remediation efforts in highly affected areas.