Sanaa Rijab, Mohammadreza Khanmohammadi Khorrami, Mahsa Mohammadi
{"title":"A novel robust principal component analysis-multivariate adaptive regression splines approach for BOD, COD, and NH3-N determination in wastewater","authors":"Sanaa Rijab, Mohammadreza Khanmohammadi Khorrami, Mahsa Mohammadi","doi":"10.1007/s13738-024-03170-z","DOIUrl":null,"url":null,"abstract":"<div><p>One of the biggest environmental contaminants is wastewater, which can impede global sustainable development. Visible-near infrared spectroscopy can be used to enhance the management, efficiency, and wise use of water resources. However, noise information and the large dimensionality of spectral data frequently limit how accurate spectral models are for water quality metrics. The rPCA-MARS model will use visible-near infrared spectral data as a novel analytical technique for estimating the contents of biological oxygen demand, chemical oxygen demand, and NH<sub>3</sub>-N in WW. The MARS model will be built once the spectral data have been subjected to the rPCA algorithm to get principal component scores. The MARS model utilizes six PC scores as its input variables. The piecewise-linear and cubic MARS model will be used to build a mathematical correlation between the COD, BOD, and NH<sub>3</sub>-N content for each component (Y) and the data matrix (X). The rPCA-MARS model is calibrated using a set of 42 samples. An independent test set of 16 samples is then used to evaluate its performance. We will employ the duplex algorithm to select calibration and prediction sets from the data matrix. Prior to running the rPCA-MARS model on the spectral data, we will employ moving average smoothing and SNV transformation for data processing. Coefficient of determination (R<sup>2</sup>), adjusted R-squared (R<sup>2</sup><sub>adj</sub>), R<sup>2</sup> estimated by generalized cross-validation (R<sup>2</sup>GCV), and mean square error (MSE) were used to assess the effectiveness of the rPCA-MARS model. Both piecewise-linear and piecewise-cubic rPCA-MARS models demonstrated excellent performance for BOD, COD, and NH<sub>3</sub>-N determination on the calibration and test sets. High R<sup>2</sup> values (> 0.93) in both datasets indicate a strong correlation between predicted and observed values. Additionally, the high adjusted R<sup>2</sup> (0.93) suggests that the model effectively avoids overfitting. Furthermore, the relatively high R<sup>2</sup>GCV (0.90) confirms both the model’s accuracy and generalizability.</p></div>","PeriodicalId":676,"journal":{"name":"Journal of the Iranian Chemical Society","volume":"22 3","pages":"575 - 587"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Iranian Chemical Society","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s13738-024-03170-z","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
One of the biggest environmental contaminants is wastewater, which can impede global sustainable development. Visible-near infrared spectroscopy can be used to enhance the management, efficiency, and wise use of water resources. However, noise information and the large dimensionality of spectral data frequently limit how accurate spectral models are for water quality metrics. The rPCA-MARS model will use visible-near infrared spectral data as a novel analytical technique for estimating the contents of biological oxygen demand, chemical oxygen demand, and NH3-N in WW. The MARS model will be built once the spectral data have been subjected to the rPCA algorithm to get principal component scores. The MARS model utilizes six PC scores as its input variables. The piecewise-linear and cubic MARS model will be used to build a mathematical correlation between the COD, BOD, and NH3-N content for each component (Y) and the data matrix (X). The rPCA-MARS model is calibrated using a set of 42 samples. An independent test set of 16 samples is then used to evaluate its performance. We will employ the duplex algorithm to select calibration and prediction sets from the data matrix. Prior to running the rPCA-MARS model on the spectral data, we will employ moving average smoothing and SNV transformation for data processing. Coefficient of determination (R2), adjusted R-squared (R2adj), R2 estimated by generalized cross-validation (R2GCV), and mean square error (MSE) were used to assess the effectiveness of the rPCA-MARS model. Both piecewise-linear and piecewise-cubic rPCA-MARS models demonstrated excellent performance for BOD, COD, and NH3-N determination on the calibration and test sets. High R2 values (> 0.93) in both datasets indicate a strong correlation between predicted and observed values. Additionally, the high adjusted R2 (0.93) suggests that the model effectively avoids overfitting. Furthermore, the relatively high R2GCV (0.90) confirms both the model’s accuracy and generalizability.
期刊介绍:
JICS is an international journal covering general fields of chemistry. JICS welcomes high quality original papers in English dealing with experimental, theoretical and applied research related to all branches of chemistry. These include the fields of analytical, inorganic, organic and physical chemistry as well as the chemical biology area. Review articles discussing specific areas of chemistry of current chemical or biological importance are also published. JICS ensures visibility of your research results to a worldwide audience in science. You are kindly invited to submit your manuscript to the Editor-in-Chief or Regional Editor. All contributions in the form of original papers or short communications will be peer reviewed and published free of charge after acceptance.