{"title":"ATR-FTIR Spectroscopy Preprocessing Technique Selection for Identification of Geographical Origins of Gastrodia elata Blume","authors":"Hong Liu, Honggao Liu, Jieqing Li, Yuanzhong Wang","doi":"10.1002/cem.3579","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p><i>Gastrodia elata</i> Blume from different regions varies in growth conditions, soil types, and climate, which directly affects the content and quality of its medicinal components. Accurately identifying the origin can effectively ensure the medicinal value of <i>G. elata</i> Bl., prevent the circulation of counterfeit products, and thus protect the interests and health of consumers. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy is a rapid and effective method for verifying the authenticity of traditional Chinese medicines. However, the presence of scattering effects in the spectra poses challenges in establishing reliable discrimination models. Therefore, employing appropriate scattering correction techniques is crucial for improving the quality of spectral data and the accuracy of discrimination models. This study uses two ensemble preprocessing approaches; the first type is series fusion of scatter correction technologies (SCSF), and another method is sequential preprocessing through orthogonalization (SPORT). Four discriminant models were established using a single scattering correction technique and two ensemble preprocessing approaches. The results show that the data-driven version of the soft independent modeling of class analogy (DD-SIMCA) model built based on multiplicative scatter correction (MSC) preprocessing has a sensitivity of 0.98 and a specificity of 0.91, able to effectively distinguish whether a sample of <i>G. elata</i> Bl. originates from Zhaotong. In addition, three discriminant models including support vector machine (SVM), partial least squares discriminant analysis (PLS-DA), and three gradient boosting machine (GBM) algorithms built using the ensemble preprocessing approach have good classification and generalization capabilities. Among them, the SCSF-PLS-DA model has the best performance with 99.68% and 98.08% accuracy for the training and test sets, respectively, and F1 of 0.97; the SPORT-SVM model achieved the second-best classification ability. The results show that the ensemble preprocessing approach used can improve the success rate of <i>G. elata</i> Bl. geographical origin classification.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 10","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3579","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
Gastrodia elata Blume from different regions varies in growth conditions, soil types, and climate, which directly affects the content and quality of its medicinal components. Accurately identifying the origin can effectively ensure the medicinal value of G. elata Bl., prevent the circulation of counterfeit products, and thus protect the interests and health of consumers. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy is a rapid and effective method for verifying the authenticity of traditional Chinese medicines. However, the presence of scattering effects in the spectra poses challenges in establishing reliable discrimination models. Therefore, employing appropriate scattering correction techniques is crucial for improving the quality of spectral data and the accuracy of discrimination models. This study uses two ensemble preprocessing approaches; the first type is series fusion of scatter correction technologies (SCSF), and another method is sequential preprocessing through orthogonalization (SPORT). Four discriminant models were established using a single scattering correction technique and two ensemble preprocessing approaches. The results show that the data-driven version of the soft independent modeling of class analogy (DD-SIMCA) model built based on multiplicative scatter correction (MSC) preprocessing has a sensitivity of 0.98 and a specificity of 0.91, able to effectively distinguish whether a sample of G. elata Bl. originates from Zhaotong. In addition, three discriminant models including support vector machine (SVM), partial least squares discriminant analysis (PLS-DA), and three gradient boosting machine (GBM) algorithms built using the ensemble preprocessing approach have good classification and generalization capabilities. Among them, the SCSF-PLS-DA model has the best performance with 99.68% and 98.08% accuracy for the training and test sets, respectively, and F1 of 0.97; the SPORT-SVM model achieved the second-best classification ability. The results show that the ensemble preprocessing approach used can improve the success rate of G. elata Bl. geographical origin classification.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.