Taskeen Hasrod , Yannick B. Nuapia , Hlanganani Tutu
{"title":"The application of transfer machine learning to predict and impute missing sulphate levels in different Acid Mine Drainage treatment plants","authors":"Taskeen Hasrod , Yannick B. Nuapia , Hlanganani Tutu","doi":"10.1016/j.clwat.2024.100029","DOIUrl":null,"url":null,"abstract":"<div><p>An accurately pre-trained stacking ensemble machine learning regressor was used to predict sulphate levels in two other Acid Mine Drainage (AMD) treatment plants using Transfer Learning (TL). The model was trained on the large Central Rand (CR) water quality dataset and was used to predict and impute the sulphate levels in the scanty East Rand (ER) and West Rand (W<em>R</em>) datasets which would not have been sufficient to train ML models from scratch. TL was successfully used to overcome this barrier and rapidly predicted sulphate levels in the East Rand and West Rand plants using the pre-trained model and achieved a high level of accuracy (Mean Squared Error:0.00124, Mean Absolute Error:0.0290 and R<sup>2</sup>:0.963) for the East Rand plant when comparing the predicted and true sulphate values. No true sulphate values existed for the West Rand plant; however, TL was successful in imputing these missing values and rapidly completed the West Rand dataset by providing the historic sulphate levels. This was possible due to the high degree of similarity between all domains (treatment plants) since they had similar geographic locations, the same treatment process, possessed the same important features and had the same relationships between variables. TL was successful in providing three accurate datasets for AMD sulphate levels, an important accomplishment towards having reliable data for use in design of experiments aimed at recovering valuable resources such as elemental sulphur, gypsum and important metals from AMD.</p></div>","PeriodicalId":100257,"journal":{"name":"Cleaner Water","volume":"2 ","pages":"Article 100029"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950263224000279/pdfft?md5=c39a7a24f8f6b6b582c4de57512044b8&pid=1-s2.0-S2950263224000279-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cleaner Water","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950263224000279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An accurately pre-trained stacking ensemble machine learning regressor was used to predict sulphate levels in two other Acid Mine Drainage (AMD) treatment plants using Transfer Learning (TL). The model was trained on the large Central Rand (CR) water quality dataset and was used to predict and impute the sulphate levels in the scanty East Rand (ER) and West Rand (WR) datasets which would not have been sufficient to train ML models from scratch. TL was successfully used to overcome this barrier and rapidly predicted sulphate levels in the East Rand and West Rand plants using the pre-trained model and achieved a high level of accuracy (Mean Squared Error:0.00124, Mean Absolute Error:0.0290 and R2:0.963) for the East Rand plant when comparing the predicted and true sulphate values. No true sulphate values existed for the West Rand plant; however, TL was successful in imputing these missing values and rapidly completed the West Rand dataset by providing the historic sulphate levels. This was possible due to the high degree of similarity between all domains (treatment plants) since they had similar geographic locations, the same treatment process, possessed the same important features and had the same relationships between variables. TL was successful in providing three accurate datasets for AMD sulphate levels, an important accomplishment towards having reliable data for use in design of experiments aimed at recovering valuable resources such as elemental sulphur, gypsum and important metals from AMD.