{"title":"基于多变量校准中的分层变量聚类和组 SCAD 的近红外光谱间隔筛选","authors":"Chen-Hao Huang","doi":"10.1016/j.vibspec.2024.103664","DOIUrl":null,"url":null,"abstract":"<div><p>Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.</p></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"131 ","pages":"Article 103664"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration\",\"authors\":\"Chen-Hao Huang\",\"doi\":\"10.1016/j.vibspec.2024.103664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.</p></div>\",\"PeriodicalId\":23656,\"journal\":{\"name\":\"Vibrational Spectroscopy\",\"volume\":\"131 \",\"pages\":\"Article 103664\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vibrational Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924203124000171\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203124000171","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration
Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.
期刊介绍:
Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation.
The topics covered by the journal include:
Sampling techniques,
Vibrational spectroscopy coupled with separation techniques,
Instrumentation (Fourier transform, conventional and laser based),
Data manipulation,
Spectra-structure correlation and group frequencies.
The application areas covered include:
Analytical chemistry,
Bio-organic and bio-inorganic chemistry,
Organic chemistry,
Inorganic chemistry,
Catalysis,
Environmental science,
Industrial chemistry,
Materials science,
Physical chemistry,
Polymer science,
Process control,
Specialized problem solving.