Bin Li, Cheng-tao Su, Hai Yin, Ji-ping Zou, Yan-de Liu
Edamame is a nutritious and economically valuable soybean. The moisture content is an important indicator of the quality of the edamame. The traditional methods in the detection of moisture content of edamame have the disadvantage of large detection errors. In this research, the fusion of transmittance and reflectance spectra of hyperspectral imaging combined with chemometrics was proposed to predict the moisture content of edamame. Also, the effect of different preprocessing of the spectra on the predictive performance was analyzed. Single spectra, primary fusion spectra, and intermediate fusion spectra were established as the prediction models for partial least squares regression (PLSR) and partial least squares support vector regression (LSSVR), respectively. The results of the prediction models showed that the spectral transform absorption (STA) combined with PLSR has the best prediction performance for a single spectrum with predictive correlation (RP) of 0.7749 and ratio of prediction to deviation (RPD) of 1.7. Standard normal variate (SNV) combined with LSSVR has the best prediction performance for primary fusion spectra with RP of 0.8821 and RPD of 1.9. SNV combined with LSSVR has the best prediction performance for intermediate fusion spectra with RP of 0.9149 and RPD of 2.4. The Rp and RPD of prediction models of the moisture content of edamame based on fusion spectra were significantly improved compared with single spectra. Compared with primary fusion, intermediate fusion is a more suitable fusion strategy. This research provides experimental basis for the prediction of moisture content of edamame using spectral fusion combined with chemometrics.
{"title":"Detection of moisture content of edamame based on the fusion of reflectance and transmittance spectra of hyperspectral imaging","authors":"Bin Li, Cheng-tao Su, Hai Yin, Ji-ping Zou, Yan-de Liu","doi":"10.1002/cem.3574","DOIUrl":"10.1002/cem.3574","url":null,"abstract":"<p>Edamame is a nutritious and economically valuable soybean. The moisture content is an important indicator of the quality of the edamame. The traditional methods in the detection of moisture content of edamame have the disadvantage of large detection errors. In this research, the fusion of transmittance and reflectance spectra of hyperspectral imaging combined with chemometrics was proposed to predict the moisture content of edamame. Also, the effect of different preprocessing of the spectra on the predictive performance was analyzed. Single spectra, primary fusion spectra, and intermediate fusion spectra were established as the prediction models for partial least squares regression (PLSR) and partial least squares support vector regression (LSSVR), respectively. The results of the prediction models showed that the spectral transform absorption (STA) combined with PLSR has the best prediction performance for a single spectrum with predictive correlation (R<sub>P</sub>) of 0.7749 and ratio of prediction to deviation (RPD) of 1.7. Standard normal variate (SNV) combined with LSSVR has the best prediction performance for primary fusion spectra with R<sub>P</sub> of 0.8821 and RPD of 1.9. SNV combined with LSSVR has the best prediction performance for intermediate fusion spectra with R<sub>P</sub> of 0.9149 and RPD of 2.4. The R<sub>p</sub> and RPD of prediction models of the moisture content of edamame based on fusion spectra were significantly improved compared with single spectra. Compared with primary fusion, intermediate fusion is a more suitable fusion strategy. This research provides experimental basis for the prediction of moisture content of edamame using spectral fusion combined with chemometrics.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141350433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jokin Ezenarro, Daniel Schorn-García, Anna Palou, Montserrat Mestres, Laura Aceña, Maribel Abadias, Ingrid Aguiló-Aguayo, Olga Busto, Ricard Boqué
Nectarines, a popular pit fruit closely related to peaches, are renowned for their nutritional value and associated health benefits. However, challenges arise in maintaining optimal organoleptic properties during harvest and handling, eventually leading to production waste and heterogeneous quality in the fruit that arrives to the consumer. This study investigates the impact of nectarine position on trees during the whole ripening process using non-destructive near-infrared (NIR) spectroscopy. Nectarines exposed to more sunlight mature faster and this influences sugar content and acidity, emphasising the significance of considering height, prominence and orientation in ripening dynamics of the fruit. Different data unfolding strategies were compared, using ANOVA-Simultaneous Component Analysis (ASCA) to reveal the significance of in-tree position factors at different ripening stages, and observing high significance at harvest. This underscores the necessity for growers and handlers to consider these factors for reducing waste. NIR spectroscopy, with adequate data analysis, is a valuable tool for the holistic analysis of fruit ripening, providing crucial insights for maintaining optimal fruit organoleptic properties from harvest to consumer.
{"title":"Characterisation of Position-Dependant Ripening Dynamics of Nectarines Using Near-Infrared Spectroscopy and ASCA","authors":"Jokin Ezenarro, Daniel Schorn-García, Anna Palou, Montserrat Mestres, Laura Aceña, Maribel Abadias, Ingrid Aguiló-Aguayo, Olga Busto, Ricard Boqué","doi":"10.1002/cem.3576","DOIUrl":"10.1002/cem.3576","url":null,"abstract":"<p>Nectarines, a popular pit fruit closely related to peaches, are renowned for their nutritional value and associated health benefits. However, challenges arise in maintaining optimal organoleptic properties during harvest and handling, eventually leading to production waste and heterogeneous quality in the fruit that arrives to the consumer. This study investigates the impact of nectarine position on trees during the whole ripening process using non-destructive near-infrared (NIR) spectroscopy. Nectarines exposed to more sunlight mature faster and this influences sugar content and acidity, emphasising the significance of considering height, prominence and orientation in ripening dynamics of the fruit. Different data unfolding strategies were compared, using ANOVA-Simultaneous Component Analysis (ASCA) to reveal the significance of in-tree position factors at different ripening stages, and observing high significance at harvest. This underscores the necessity for growers and handlers to consider these factors for reducing waste. NIR spectroscopy, with adequate data analysis, is a valuable tool for the holistic analysis of fruit ripening, providing crucial insights for maintaining optimal fruit organoleptic properties from harvest to consumer.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3576","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141367593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
László Győry, Szilveszter Gergely, Pál Péter Hanzelik
Artificial spectra were generated to match the different acid solubility properties of the rocks. The purpose of generating artificial spectra was to increase the number of samples available for future data processing with a convolutional neural network. The samples were collected from different geological matrices during targeted rock tests to support industrial applications. The inherent characteristics of the samples are their uneven distribution in the parameter space of the features and their limited availability for data-intensive studies. Both data set characteristics constrain the prediction performance of the machine learning methods to estimate the unknown solubility of samples in the chosen acids. If the sample multiplication techniques are performed without considering the relationship between solubility of samples and their infrared spectra, the synthetic samples adversely impact the efficacy of the prediction method. By utilizing a dimensionality reduction technique (principal component analysis) and a neural network, we established a relationship between the solubility of the samples and their infrared spectra. Infrared spectra of the samples used for learning the model could be efficiently reproduced and infrared spectra of created samples could be generated. The reliability of the applied method has been shown by the comparison of the original and artificial spectra through a mean Pearson correlation coefficient and by comparing the closest neighbors to each other. This method can be used to create new samples and their infrared spectra, where different constraints must be met and the samples must be connected to the infrared spectrum.
{"title":"Generating realistic infrared spectra using artificial neural networks","authors":"László Győry, Szilveszter Gergely, Pál Péter Hanzelik","doi":"10.1002/cem.3573","DOIUrl":"10.1002/cem.3573","url":null,"abstract":"<p>Artificial spectra were generated to match the different acid solubility properties of the rocks. The purpose of generating artificial spectra was to increase the number of samples available for future data processing with a convolutional neural network. The samples were collected from different geological matrices during targeted rock tests to support industrial applications. The inherent characteristics of the samples are their uneven distribution in the parameter space of the features and their limited availability for data-intensive studies. Both data set characteristics constrain the prediction performance of the machine learning methods to estimate the unknown solubility of samples in the chosen acids. If the sample multiplication techniques are performed without considering the relationship between solubility of samples and their infrared spectra, the synthetic samples adversely impact the efficacy of the prediction method. By utilizing a dimensionality reduction technique (principal component analysis) and a neural network, we established a relationship between the solubility of the samples and their infrared spectra. Infrared spectra of the samples used for learning the model could be efficiently reproduced and infrared spectra of created samples could be generated. The reliability of the applied method has been shown by the comparison of the original and artificial spectra through a mean Pearson correlation coefficient and by comparing the closest neighbors to each other. This method can be used to create new samples and their infrared spectra, where different constraints must be met and the samples must be connected to the infrared spectrum.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Bin, Yang Jin-li, Sun Zhao-xiang, Yang Shi-min, Ouyang Aiguo, Liu Yan-de
The cultivation processes of watermelon seed are often affected by issues such as empty shells and defects, resulting in significant losses. To obtain high-quality seeds, the terahertz imaging technology combined with image smoothing and enhancement algorithm was proposed to reduce the noise and non-obvious features caused by the influence in the imaging process and realize the non-destructive, efficient, and accurate detection of the internal quality of watermelon seeds. Initially, a terahertz imaging system with a spatial resolution of 0.4 mm was used to acquire images of watermelon seeds with varying levels of fullness. Subsequently, denoising techniques, including Gaussian filtering, median filtering, bilateral filtering, discrete wavelet transformation denoising, wavelet denoising, and principal component analysis denoising, were used to handle the terahertz spectral images of watermelon seeds in the frequency range of 1–1.5 THz, respectively. Image enhancement operations, involving segmented linear gray-level transformation and fractional-order differentiation, were performed on the terahertz images of watermelon seeds after denoising. The optimal image processing approach was determined based on defect assessment through threshold segmentation. Finally, the validation was conducted at a spatial resolution of 0.2 mm. The images at a spatial resolution of 0.4 mm were subjected to wavelet denoising and window slicing in segmented linear gray-level transformation (WS-SLT) enhancement; the results exhibited the following improvements in defect accuracy compared with untreated THz images. A 7.74% increase in accuracy was observed for empty seeds, along with a 6.29% increase in the defect ratio for defective seeds 1. The defect ratio for intact seeds was 0, and there was no significant difference in defect ratio accuracy for defective seeds 2. At a spatial resolution of 0.2 mm, the average defect ratio error of THz imaging handled by wavelet denoising and WS-SLT was approximately 5.04%. In conclusion, the terahertz imaging technology coupled with wavelet denoising and WS-SLT methods can be used to enhance the accuracy of internal defect detection in watermelon seeds, and it provides a technical foundation and reference for assessing watermelon seed fullness.
{"title":"Detection the internal quality of watermelon seeds based on terahertz imaging technology combined with image smoothing and enhancement algorithm","authors":"Li Bin, Yang Jin-li, Sun Zhao-xiang, Yang Shi-min, Ouyang Aiguo, Liu Yan-de","doi":"10.1002/cem.3557","DOIUrl":"10.1002/cem.3557","url":null,"abstract":"<p>The cultivation processes of watermelon seed are often affected by issues such as empty shells and defects, resulting in significant losses. To obtain high-quality seeds, the terahertz imaging technology combined with image smoothing and enhancement algorithm was proposed to reduce the noise and non-obvious features caused by the influence in the imaging process and realize the non-destructive, efficient, and accurate detection of the internal quality of watermelon seeds. Initially, a terahertz imaging system with a spatial resolution of 0.4 mm was used to acquire images of watermelon seeds with varying levels of fullness. Subsequently, denoising techniques, including Gaussian filtering, median filtering, bilateral filtering, discrete wavelet transformation denoising, wavelet denoising, and principal component analysis denoising, were used to handle the terahertz spectral images of watermelon seeds in the frequency range of 1–1.5 THz, respectively. Image enhancement operations, involving segmented linear gray-level transformation and fractional-order differentiation, were performed on the terahertz images of watermelon seeds after denoising. The optimal image processing approach was determined based on defect assessment through threshold segmentation. Finally, the validation was conducted at a spatial resolution of 0.2 mm. The images at a spatial resolution of 0.4 mm were subjected to wavelet denoising and window slicing in segmented linear gray-level transformation (WS-SLT) enhancement; the results exhibited the following improvements in defect accuracy compared with untreated THz images. A 7.74% increase in accuracy was observed for empty seeds, along with a 6.29% increase in the defect ratio for defective seeds 1. The defect ratio for intact seeds was 0, and there was no significant difference in defect ratio accuracy for defective seeds 2. At a spatial resolution of 0.2 mm, the average defect ratio error of THz imaging handled by wavelet denoising and WS-SLT was approximately 5.04%. In conclusion, the terahertz imaging technology coupled with wavelet denoising and WS-SLT methods can be used to enhance the accuracy of internal defect detection in watermelon seeds, and it provides a technical foundation and reference for assessing watermelon seed fullness.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Yao, Cheng-tao Su, Ji-ping Zou, Shang-tao Ou-yang, Jian Wu, Nan Chen, Yan de Liu, Bin Li
To reduce the number of bruised mangoes at source, it is important to determine the different storage times of mangoes after mild bruise. In order to address this issue, a hyperspectral imaging combined with deep learning model was proposed. First, the average spectrum of the sample bruised area was extracted as spectral features, and then, the six eigenvalues of the most representative PC1 image were calculated as texture features based on the gray level co-occurrence matrix. In order to find the optimal discriminative model, random forest (RF), partial least squares discriminant analysis (PLS-DA), extreme gradient boosting (XGBoost), and convolutional neural network (CNN) models were built based on spectral features, texture features, and spectral features combined with texture features (Feature Fusion 1), respectively. The results showed that the best model discriminating model was based on CNN under Feature Fusion 1, with an overall accuracy of 90.22%. To reduce the redundant information and noise introduced by the full spectrum, uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) algorithms were used to filter the spectral features. The screened spectral features were fused with texture features (Feature Fusion 2) and modeled again with RF, PLS-DA, XGBoost, and CNN. The results showed that the optimal model for discriminating different storage times of mangoes after bruise was the CNN model based on feature fusion 2 (CARS), with an overall accuracy of 93.48%. In summary, this study shows that the spectral features combined with texture features can be used to effectively improve the model's discriminative results for different storage times of mango after mild bruise. Compared to other machine learning models, the CNN model in this paper achieves better results. It provides a theoretical basis for hyperspectral imaging combined with deep learning in discriminating different storage times of mangoes after mild bruise.
{"title":"Detection storage time of mangoes after mild bruise based on hyperspectral imaging combined with deep learning","authors":"Chi Yao, Cheng-tao Su, Ji-ping Zou, Shang-tao Ou-yang, Jian Wu, Nan Chen, Yan de Liu, Bin Li","doi":"10.1002/cem.3559","DOIUrl":"10.1002/cem.3559","url":null,"abstract":"<p>To reduce the number of bruised mangoes at source, it is important to determine the different storage times of mangoes after mild bruise. In order to address this issue, a hyperspectral imaging combined with deep learning model was proposed. First, the average spectrum of the sample bruised area was extracted as spectral features, and then, the six eigenvalues of the most representative PC1 image were calculated as texture features based on the gray level co-occurrence matrix. In order to find the optimal discriminative model, random forest (RF), partial least squares discriminant analysis (PLS-DA), extreme gradient boosting (XGBoost), and convolutional neural network (CNN) models were built based on spectral features, texture features, and spectral features combined with texture features (Feature Fusion 1), respectively. The results showed that the best model discriminating model was based on CNN under Feature Fusion 1, with an overall accuracy of 90.22%. To reduce the redundant information and noise introduced by the full spectrum, uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) algorithms were used to filter the spectral features. The screened spectral features were fused with texture features (Feature Fusion 2) and modeled again with RF, PLS-DA, XGBoost, and CNN. The results showed that the optimal model for discriminating different storage times of mangoes after bruise was the CNN model based on feature fusion 2 (CARS), with an overall accuracy of 93.48%. In summary, this study shows that the spectral features combined with texture features can be used to effectively improve the model's discriminative results for different storage times of mango after mild bruise. Compared to other machine learning models, the CNN model in this paper achieves better results. It provides a theoretical basis for hyperspectral imaging combined with deep learning in discriminating different storage times of mangoes after mild bruise.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141113807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation, and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to “discrete non-binary” and “continuous” data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction (“frac”), which corresponds to the principle of parsimony. Optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.
{"title":"Alternative weighting schemes for fine-tuned extended similarity indices","authors":"Kenneth López Pérez, Anita Rácz, Dávid Bajusz, Camila Gonzalez, Károly Héberger, Ramón Alain Miranda-Quintana","doi":"10.1002/cem.3558","DOIUrl":"10.1002/cem.3558","url":null,"abstract":"<p>Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation, and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to “discrete non-binary” and “continuous” data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction (“frac”), which corresponds to the principle of parsimony. Optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this paper is twofold. First, it serves as a comprehensive tutorial on Data-Driven Soft Independent Modelling of Class Analogy (SIMCA) (DD-SIMCA) method for one-class classification. It covers all practical aspects of developing, validation, and application of DD-SIMCA models, using a set of simple examples. Second, it introduces web application that implements the main DD-SIMCA functionality. This application is freely available for everyone and does not require registration or installation. All calculations run locally in a browser without sending any information on a server, hence removing any obstacles to the dissemination of the data and models.
{"title":"A comprehensive tutorial on Data-Driven SIMCA: Theory and implementation in web","authors":"Sergey Kucheryavskiy, Oxana Rodionova, Alexey Pomerantsev","doi":"10.1002/cem.3556","DOIUrl":"10.1002/cem.3556","url":null,"abstract":"<p>The aim of this paper is twofold. First, it serves as a comprehensive tutorial on Data-Driven Soft Independent Modelling of Class Analogy (SIMCA) (DD-SIMCA) method for one-class classification. It covers all practical aspects of developing, validation, and application of DD-SIMCA models, using a set of simple examples. Second, it introduces web application that implements the main DD-SIMCA functionality. This application is freely available for everyone and does not require registration or installation. All calculations run locally in a browser without sending any information on a server, hence removing any obstacles to the dissemination of the data and models.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3556","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The just-in-time learning-based partial least squares (JIT-PLS) has been extensively applied to adaptive soft sensor modeling of complex nonlinear processes. However, it still has the problems of unreasonable relevant samples selection and unsatisfactory local modeling. Aiming at these problems, this paper proposes an improved just-in-time learning-based random mapping partial least squares (IJIT-RMPLS), including an improved relevant samples selection strategy and a random mapping PLS (RMPLS) model. On the one hand, considering the different correlation degrees between input variables and output variable, this method applies mutual information to evaluate the importance of each input variable and designs a variable-weighted Euclidean distance to select relevant samples for local modeling. On the other hand, in order to prompt the prediction precision of local soft sensor models, this method combines the idea of nonlinear random mapping in extreme learning machines with PLS and builds a RMPLS with multiple activation functions. Applications on a numerical example and a real chemical process show that the proposed IJIT-RMPLS has smaller prediction error compared with traditional JIT-PLS.
{"title":"Adaptive soft sensor modeling of chemical processes based on an improved just-in-time learning and random mapping partial least squares","authors":"Ke Zhang, Xiangrui Zhang","doi":"10.1002/cem.3554","DOIUrl":"10.1002/cem.3554","url":null,"abstract":"<p>The just-in-time learning-based partial least squares (JIT-PLS) has been extensively applied to adaptive soft sensor modeling of complex nonlinear processes. However, it still has the problems of unreasonable relevant samples selection and unsatisfactory local modeling. Aiming at these problems, this paper proposes an improved just-in-time learning-based random mapping partial least squares (IJIT-RMPLS), including an improved relevant samples selection strategy and a random mapping PLS (RMPLS) model. On the one hand, considering the different correlation degrees between input variables and output variable, this method applies mutual information to evaluate the importance of each input variable and designs a variable-weighted Euclidean distance to select relevant samples for local modeling. On the other hand, in order to prompt the prediction precision of local soft sensor models, this method combines the idea of nonlinear random mapping in extreme learning machines with PLS and builds a RMPLS with multiple activation functions. Applications on a numerical example and a real chemical process show that the proposed IJIT-RMPLS has smaller prediction error compared with traditional JIT-PLS.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Liu, Xiaoqiang Zhao, Yongyong Hui, Hongmei Jiang
Fault prediction ensures safe and stable production, and cuts maintenance costs. Due to the changing operating conditions that lead to the changes in the characteristics of industrial processes, there is a need to monitor the fault state of batch processes in real-time and to accurately predict fault trends. An adaptive slow feature analysis-neighborhood preserving embedding-improved stochastic configuration network (SFA-NPE-ISCN) algorithm for batch process fault prediction is proposed. Firstly, SFA is used to extract the time-varying features of process data and establish the update index of the NPE model. Then, to extract local nearest-neighbor features and reconstruct them by the NPE model with adaptive update capability, square prediction error (SPE) statistics are constructed as fault state features based on the reconstructed error. Further, the hunter-prey optimization (HPO) algorithm optimizes the weights and biases in the stochastic configuration network, and the singular value decomposition (SVD) and QR decomposition of column rotation are introduced to solve the ill-posed problem of SCN and obtain the prediction model of ISCN. Finally, the obtained statistics SPE is formed into a time series, and the ISCN model is used to predict the process state trend. The effectiveness of the proposed algorithm is verified by case studies of industrial-scale penicillin fermentation processes and the Hot strip mill process.
{"title":"An adaptive strategy for time-varying batch process fault prediction based on stochastic configuration network","authors":"Kai Liu, Xiaoqiang Zhao, Yongyong Hui, Hongmei Jiang","doi":"10.1002/cem.3555","DOIUrl":"10.1002/cem.3555","url":null,"abstract":"<p>Fault prediction ensures safe and stable production, and cuts maintenance costs. Due to the changing operating conditions that lead to the changes in the characteristics of industrial processes, there is a need to monitor the fault state of batch processes in real-time and to accurately predict fault trends. An adaptive slow feature analysis-neighborhood preserving embedding-improved stochastic configuration network (SFA-NPE-ISCN) algorithm for batch process fault prediction is proposed. Firstly, SFA is used to extract the time-varying features of process data and establish the update index of the NPE model. Then, to extract local nearest-neighbor features and reconstruct them by the NPE model with adaptive update capability, square prediction error (SPE) statistics are constructed as fault state features based on the reconstructed error. Further, the hunter-prey optimization (HPO) algorithm optimizes the weights and biases in the stochastic configuration network, and the singular value decomposition (SVD) and QR decomposition of column rotation are introduced to solve the ill-posed problem of SCN and obtain the prediction model of ISCN. Finally, the obtained statistics SPE is formed into a time series, and the ISCN model is used to predict the process state trend. The effectiveness of the proposed algorithm is verified by case studies of industrial-scale penicillin fermentation processes and the Hot strip mill process.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang
Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24%, Matthews coefficient (MCC) of 77.01%, and F1-score of 87.50%. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).
{"title":"A prediction model of nonclassical secreted protein based on deep learning","authors":"Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang","doi":"10.1002/cem.3553","DOIUrl":"10.1002/cem.3553","url":null,"abstract":"<p>Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24<i>%</i>, Matthews coefficient (MCC) of 77.01<i>%</i>, and F1-score of 87.50<i>%</i>. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 8","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140803214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}