Journal of Chemometrics最新文献_第8页

Detection of moisture content of edamame based on the fusion of reflectance and transmittance spectra of hyperspectral imaging 基于高光谱成像的反射和透射光谱融合检测毛豆的水分含量

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-06-12 DOI: 10.1002/cem.3574

Bin Li, Cheng-tao Su, Hai Yin, Ji-ping Zou, Yan-de Liu

Edamame is a nutritious and economically valuable soybean. The moisture content is an important indicator of the quality of the edamame. The traditional methods in the detection of moisture content of edamame have the disadvantage of large detection errors. In this research, the fusion of transmittance and reflectance spectra of hyperspectral imaging combined with chemometrics was proposed to predict the moisture content of edamame. Also, the effect of different preprocessing of the spectra on the predictive performance was analyzed. Single spectra, primary fusion spectra, and intermediate fusion spectra were established as the prediction models for partial least squares regression (PLSR) and partial least squares support vector regression (LSSVR), respectively. The results of the prediction models showed that the spectral transform absorption (STA) combined with PLSR has the best prediction performance for a single spectrum with predictive correlation (R_P) of 0.7749 and ratio of prediction to deviation (RPD) of 1.7. Standard normal variate (SNV) combined with LSSVR has the best prediction performance for primary fusion spectra with R_P of 0.8821 and RPD of 1.9. SNV combined with LSSVR has the best prediction performance for intermediate fusion spectra with R_P of 0.9149 and RPD of 2.4. The R_p and RPD of prediction models of the moisture content of edamame based on fusion spectra were significantly improved compared with single spectra. Compared with primary fusion, intermediate fusion is a more suitable fusion strategy. This research provides experimental basis for the prediction of moisture content of edamame using spectral fusion combined with chemometrics.

毛豆是一种营养丰富、经济价值高的大豆。水分含量是衡量毛豆质量的重要指标。传统的毛豆水分含量检测方法存在检测误差大的缺点。本研究提出将高光谱成像的透射光谱和反射光谱与化学计量学相结合来预测毛豆的水分含量。此外，还分析了对光谱进行不同预处理对预测性能的影响。分别建立了单光谱、初级融合光谱和中级融合光谱作为偏最小二乘回归（PLSR）和偏最小二乘支持向量回归（LSSVR）的预测模型。预测模型的结果表明，光谱变换吸收（STA）结合 PLSR 对单一光谱的预测性能最好，预测相关性（RP）为 0.7749，预测与偏差比（RPD）为 1.7。标准正态变异（SNV）与 LSSVR 的组合对主融合光谱的预测性能最佳，RP 为 0.8821，RPD 为 1.9。SNV 与 LSSVR 相结合对中间融合光谱的预测性能最好，RP 为 0.9149，RPD 为 2.4。与单一光谱相比，基于融合光谱的毛豆水分含量预测模型的 Rp 和 RPD 都有显著提高。与一次融合相比，中间融合是一种更合适的融合策略。这项研究为利用光谱融合结合化学计量学预测毛豆的水分含量提供了实验依据。

{"title":"Detection of moisture content of edamame based on the fusion of reflectance and transmittance spectra of hyperspectral imaging","authors":"Bin Li, Cheng-tao Su, Hai Yin, Ji-ping Zou, Yan-de Liu","doi":"10.1002/cem.3574","DOIUrl":"10.1002/cem.3574","url":null,"abstract":"Edamame is a nutritious and economically valuable soybean. The moisture content is an important indicator of the quality of the edamame. The traditional methods in the detection of moisture content of edamame have the disadvantage of large detection errors. In this research, the fusion of transmittance and reflectance spectra of hyperspectral imaging combined with chemometrics was proposed to predict the moisture content of edamame. Also, the effect of different preprocessing of the spectra on the predictive performance was analyzed. Single spectra, primary fusion spectra, and intermediate fusion spectra were established as the prediction models for partial least squares regression (PLSR) and partial least squares support vector regression (LSSVR), respectively. The results of the prediction models showed that the spectral transform absorption (STA) combined with PLSR has the best prediction performance for a single spectrum with predictive correlation (RP) of 0.7749 and ratio of prediction to deviation (RPD) of 1.7. Standard normal variate (SNV) combined with LSSVR has the best prediction performance for primary fusion spectra with RP of 0.8821 and RPD of 1.9. SNV combined with LSSVR has the best prediction performance for intermediate fusion spectra with RP of 0.9149 and RPD of 2.4. The Rp and RPD of prediction models of the moisture content of edamame based on fusion spectra were significantly improved compared with single spectra. Compared with primary fusion, intermediate fusion is a more suitable fusion strategy. This research provides experimental basis for the prediction of moisture content of edamame using spectral fusion combined with chemometrics.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141350433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterisation of Position-Dependant Ripening Dynamics of Nectarines Using Near-Infrared Spectroscopy and ASCA 利用近红外光谱和 ASCA 分析油桃随位置变化的成熟动力学特征

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-06-09 DOI: 10.1002/cem.3576

Jokin Ezenarro, Daniel Schorn-García, Anna Palou, Montserrat Mestres, Laura Aceña, Maribel Abadias, Ingrid Aguiló-Aguayo, Olga Busto, Ricard Boqué

Nectarines, a popular pit fruit closely related to peaches, are renowned for their nutritional value and associated health benefits. However, challenges arise in maintaining optimal organoleptic properties during harvest and handling, eventually leading to production waste and heterogeneous quality in the fruit that arrives to the consumer. This study investigates the impact of nectarine position on trees during the whole ripening process using non-destructive near-infrared (NIR) spectroscopy. Nectarines exposed to more sunlight mature faster and this influences sugar content and acidity, emphasising the significance of considering height, prominence and orientation in ripening dynamics of the fruit. Different data unfolding strategies were compared, using ANOVA-Simultaneous Component Analysis (ASCA) to reveal the significance of in-tree position factors at different ripening stages, and observing high significance at harvest. This underscores the necessity for growers and handlers to consider these factors for reducing waste. NIR spectroscopy, with adequate data analysis, is a valuable tool for the holistic analysis of fruit ripening, providing crucial insights for maintaining optimal fruit organoleptic properties from harvest to consumer.

油桃是一种广受欢迎的核果，与桃子关系密切，以其营养价值和相关的健康益处而闻名。然而，在采收和处理过程中，要保持最佳的感官特性却面临着挑战，最终导致生产浪费和到达消费者手中的水果质量参差不齐。本研究利用非破坏性近红外（NIR）光谱技术，调查了油桃在整个成熟过程中的位置对果树的影响。暴露在更多阳光下的油桃成熟更快，这影响了含糖量和酸度，强调了在果实成熟动态中考虑高度、突出度和方向的重要性。利用方差分析--同时成分分析（ASCA）对不同的数据展开策略进行了比较，以揭示树上位置因素在不同成熟阶段的重要性，并观察到收获时的高度重要性。这说明种植者和处理者有必要考虑这些因素，以减少浪费。通过适当的数据分析，近红外光谱是全面分析水果成熟度的重要工具，可为从采收到消费者整个过程中保持最佳水果感官特性提供重要见解。

{"title":"Characterisation of Position-Dependant Ripening Dynamics of Nectarines Using Near-Infrared Spectroscopy and ASCA","authors":"Jokin Ezenarro, Daniel Schorn-García, Anna Palou, Montserrat Mestres, Laura Aceña, Maribel Abadias, Ingrid Aguiló-Aguayo, Olga Busto, Ricard Boqué","doi":"10.1002/cem.3576","DOIUrl":"10.1002/cem.3576","url":null,"abstract":"Nectarines, a popular pit fruit closely related to peaches, are renowned for their nutritional value and associated health benefits. However, challenges arise in maintaining optimal organoleptic properties during harvest and handling, eventually leading to production waste and heterogeneous quality in the fruit that arrives to the consumer. This study investigates the impact of nectarine position on trees during the whole ripening process using non-destructive near-infrared (NIR) spectroscopy. Nectarines exposed to more sunlight mature faster and this influences sugar content and acidity, emphasising the significance of considering height, prominence and orientation in ripening dynamics of the fruit. Different data unfolding strategies were compared, using ANOVA-Simultaneous Component Analysis (ASCA) to reveal the significance of in-tree position factors at different ripening stages, and observing high significance at harvest. This underscores the necessity for growers and handlers to consider these factors for reducing waste. NIR spectroscopy, with adequate data analysis, is a valuable tool for the holistic analysis of fruit ripening, providing crucial insights for maintaining optimal fruit organoleptic properties from harvest to consumer.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3576","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141367593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating realistic infrared spectra using artificial neural networks 利用人工神经网络生成逼真的红外光谱

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-29 DOI: 10.1002/cem.3573

László Győry, Szilveszter Gergely, Pál Péter Hanzelik

Artificial spectra were generated to match the different acid solubility properties of the rocks. The purpose of generating artificial spectra was to increase the number of samples available for future data processing with a convolutional neural network. The samples were collected from different geological matrices during targeted rock tests to support industrial applications. The inherent characteristics of the samples are their uneven distribution in the parameter space of the features and their limited availability for data-intensive studies. Both data set characteristics constrain the prediction performance of the machine learning methods to estimate the unknown solubility of samples in the chosen acids. If the sample multiplication techniques are performed without considering the relationship between solubility of samples and their infrared spectra, the synthetic samples adversely impact the efficacy of the prediction method. By utilizing a dimensionality reduction technique (principal component analysis) and a neural network, we established a relationship between the solubility of the samples and their infrared spectra. Infrared spectra of the samples used for learning the model could be efficiently reproduced and infrared spectra of created samples could be generated. The reliability of the applied method has been shown by the comparison of the original and artificial spectra through a mean Pearson correlation coefficient and by comparing the closest neighbors to each other. This method can be used to create new samples and their infrared spectra, where different constraints must be met and the samples must be connected to the infrared spectrum.

生成的人工光谱与岩石的不同酸溶解特性相匹配。生成人工光谱的目的是增加可用于未来卷积神经网络数据处理的样本数量。这些样本是在为支持工业应用而进行的有针对性的岩石测试中从不同的地质基质中采集的。样本的固有特征是其在特征参数空间中的分布不均匀，以及其在数据密集型研究中的可用性有限。这两个数据集特征都限制了机器学习方法的预测性能，无法估算样品在所选酸中的未知溶解度。如果在不考虑样品溶解度与其红外光谱之间关系的情况下执行样品倍增技术，合成样品就会对预测方法的效果产生不利影响。通过使用降维技术（主成分分析）和神经网络，我们建立了样品溶解度与其红外光谱之间的关系。用于学习模型的样品的红外光谱可以有效地再现，创建的样品的红外光谱也可以生成。通过平均皮尔逊相关系数对原始光谱和人造光谱进行比较，并比较彼此的近邻光谱，证明了所应用方法的可靠性。该方法可用于创建新样本及其红外光谱，其中必须满足不同的限制条件，并且样本必须与红外光谱相连。

{"title":"Generating realistic infrared spectra using artificial neural networks","authors":"László Győry, Szilveszter Gergely, Pál Péter Hanzelik","doi":"10.1002/cem.3573","DOIUrl":"10.1002/cem.3573","url":null,"abstract":"Artificial spectra were generated to match the different acid solubility properties of the rocks. The purpose of generating artificial spectra was to increase the number of samples available for future data processing with a convolutional neural network. The samples were collected from different geological matrices during targeted rock tests to support industrial applications. The inherent characteristics of the samples are their uneven distribution in the parameter space of the features and their limited availability for data-intensive studies. Both data set characteristics constrain the prediction performance of the machine learning methods to estimate the unknown solubility of samples in the chosen acids. If the sample multiplication techniques are performed without considering the relationship between solubility of samples and their infrared spectra, the synthetic samples adversely impact the efficacy of the prediction method. By utilizing a dimensionality reduction technique (principal component analysis) and a neural network, we established a relationship between the solubility of the samples and their infrared spectra. Infrared spectra of the samples used for learning the model could be efficiently reproduced and infrared spectra of created samples could be generated. The reliability of the applied method has been shown by the comparison of the original and artificial spectra through a mean Pearson correlation coefficient and by comparing the closest neighbors to each other. This method can be used to create new samples and their infrared spectra, where different constraints must be met and the samples must be connected to the infrared spectrum.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection the internal quality of watermelon seeds based on terahertz imaging technology combined with image smoothing and enhancement algorithm 基于太赫兹成像技术结合图像平滑和增强算法的西瓜籽内部质量检测

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-28 DOI: 10.1002/cem.3557

Li Bin, Yang Jin-li, Sun Zhao-xiang, Yang Shi-min, Ouyang Aiguo, Liu Yan-de

The cultivation processes of watermelon seed are often affected by issues such as empty shells and defects, resulting in significant losses. To obtain high-quality seeds, the terahertz imaging technology combined with image smoothing and enhancement algorithm was proposed to reduce the noise and non-obvious features caused by the influence in the imaging process and realize the non-destructive, efficient, and accurate detection of the internal quality of watermelon seeds. Initially, a terahertz imaging system with a spatial resolution of 0.4 mm was used to acquire images of watermelon seeds with varying levels of fullness. Subsequently, denoising techniques, including Gaussian filtering, median filtering, bilateral filtering, discrete wavelet transformation denoising, wavelet denoising, and principal component analysis denoising, were used to handle the terahertz spectral images of watermelon seeds in the frequency range of 1–1.5 THz, respectively. Image enhancement operations, involving segmented linear gray-level transformation and fractional-order differentiation, were performed on the terahertz images of watermelon seeds after denoising. The optimal image processing approach was determined based on defect assessment through threshold segmentation. Finally, the validation was conducted at a spatial resolution of 0.2 mm. The images at a spatial resolution of 0.4 mm were subjected to wavelet denoising and window slicing in segmented linear gray-level transformation (WS-SLT) enhancement; the results exhibited the following improvements in defect accuracy compared with untreated THz images. A 7.74% increase in accuracy was observed for empty seeds, along with a 6.29% increase in the defect ratio for defective seeds 1. The defect ratio for intact seeds was 0, and there was no significant difference in defect ratio accuracy for defective seeds 2. At a spatial resolution of 0.2 mm, the average defect ratio error of THz imaging handled by wavelet denoising and WS-SLT was approximately 5.04%. In conclusion, the terahertz imaging technology coupled with wavelet denoising and WS-SLT methods can be used to enhance the accuracy of internal defect detection in watermelon seeds, and it provides a technical foundation and reference for assessing watermelon seed fullness.

在西瓜种子的培育过程中，经常会受到空壳、瑕疵等问题的影响，造成重大损失。为了获得高质量的种子，提出了太赫兹成像技术结合图像平滑和增强算法，以降低成像过程中受影响而产生的噪声和非明显特征，实现对西瓜种子内部质量的无损、高效、准确检测。首先，使用空间分辨率为 0.4 毫米的太赫兹成像系统获取不同饱满度的西瓜籽图像。随后，使用去噪技术，包括高斯滤波、中值滤波、双边滤波、离散小波变换去噪、小波去噪和主成分分析去噪，分别处理频率范围为 1-1.5 THz 的西瓜籽太赫兹光谱图像。对去噪后的西瓜籽太赫兹图像进行了图像增强操作，包括分段线性灰度级变换和分数阶微分。根据通过阈值分割进行的缺陷评估，确定了最佳图像处理方法。最后，在 0.2 毫米的空间分辨率下进行了验证。对空间分辨率为 0.4 毫米的图像进行了小波去噪和分段线性灰度级变换（WS-SLT）增强中的窗口切片处理；结果显示，与未经处理的 THz 图像相比，缺陷准确率有了以下提高。空种子的准确度提高了 7.74%，缺陷种子 1 的缺陷率提高了 6.29%。完整种子的缺陷率为 0，缺陷种子 2 的缺陷率准确度没有显著差异。在 0.2 毫米的空间分辨率下，小波去噪和 WS-SLT 处理的太赫兹成像平均缺陷率误差约为 5.04%。综上所述，太赫兹成像技术结合小波去噪和 WS-SLT 方法可用于提高西瓜种子内部缺陷检测的准确性，为西瓜种子饱满度评估提供了技术基础和参考。

{"title":"Detection the internal quality of watermelon seeds based on terahertz imaging technology combined with image smoothing and enhancement algorithm","authors":"Li Bin, Yang Jin-li, Sun Zhao-xiang, Yang Shi-min, Ouyang Aiguo, Liu Yan-de","doi":"10.1002/cem.3557","DOIUrl":"10.1002/cem.3557","url":null,"abstract":"The cultivation processes of watermelon seed are often affected by issues such as empty shells and defects, resulting in significant losses. To obtain high-quality seeds, the terahertz imaging technology combined with image smoothing and enhancement algorithm was proposed to reduce the noise and non-obvious features caused by the influence in the imaging process and realize the non-destructive, efficient, and accurate detection of the internal quality of watermelon seeds. Initially, a terahertz imaging system with a spatial resolution of 0.4 mm was used to acquire images of watermelon seeds with varying levels of fullness. Subsequently, denoising techniques, including Gaussian filtering, median filtering, bilateral filtering, discrete wavelet transformation denoising, wavelet denoising, and principal component analysis denoising, were used to handle the terahertz spectral images of watermelon seeds in the frequency range of 1–1.5 THz, respectively. Image enhancement operations, involving segmented linear gray-level transformation and fractional-order differentiation, were performed on the terahertz images of watermelon seeds after denoising. The optimal image processing approach was determined based on defect assessment through threshold segmentation. Finally, the validation was conducted at a spatial resolution of 0.2 mm. The images at a spatial resolution of 0.4 mm were subjected to wavelet denoising and window slicing in segmented linear gray-level transformation (WS-SLT) enhancement; the results exhibited the following improvements in defect accuracy compared with untreated THz images. A 7.74% increase in accuracy was observed for empty seeds, along with a 6.29% increase in the defect ratio for defective seeds 1. The defect ratio for intact seeds was 0, and there was no significant difference in defect ratio accuracy for defective seeds 2. At a spatial resolution of 0.2 mm, the average defect ratio error of THz imaging handled by wavelet denoising and WS-SLT was approximately 5.04%. In conclusion, the terahertz imaging technology coupled with wavelet denoising and WS-SLT methods can be used to enhance the accuracy of internal defect detection in watermelon seeds, and it provides a technical foundation and reference for assessing watermelon seed fullness.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection storage time of mangoes after mild bruise based on hyperspectral imaging combined with deep learning 基于高光谱成像与深度学习的芒果轻度瘀伤后贮藏时间检测

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-21 DOI: 10.1002/cem.3559

Chi Yao, Cheng-tao Su, Ji-ping Zou, Shang-tao Ou-yang, Jian Wu, Nan Chen, Yan de Liu, Bin Li

To reduce the number of bruised mangoes at source, it is important to determine the different storage times of mangoes after mild bruise. In order to address this issue, a hyperspectral imaging combined with deep learning model was proposed. First, the average spectrum of the sample bruised area was extracted as spectral features, and then, the six eigenvalues of the most representative PC1 image were calculated as texture features based on the gray level co-occurrence matrix. In order to find the optimal discriminative model, random forest (RF), partial least squares discriminant analysis (PLS-DA), extreme gradient boosting (XGBoost), and convolutional neural network (CNN) models were built based on spectral features, texture features, and spectral features combined with texture features (Feature Fusion 1), respectively. The results showed that the best model discriminating model was based on CNN under Feature Fusion 1, with an overall accuracy of 90.22%. To reduce the redundant information and noise introduced by the full spectrum, uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) algorithms were used to filter the spectral features. The screened spectral features were fused with texture features (Feature Fusion 2) and modeled again with RF, PLS-DA, XGBoost, and CNN. The results showed that the optimal model for discriminating different storage times of mangoes after bruise was the CNN model based on feature fusion 2 (CARS), with an overall accuracy of 93.48%. In summary, this study shows that the spectral features combined with texture features can be used to effectively improve the model's discriminative results for different storage times of mango after mild bruise. Compared to other machine learning models, the CNN model in this paper achieves better results. It provides a theoretical basis for hyperspectral imaging combined with deep learning in discriminating different storage times of mangoes after mild bruise.

为了从源头上减少淤伤芒果的数量，必须确定芒果轻度淤伤后的不同储存时间。针对这一问题，提出了一种高光谱成像与深度学习相结合的模型。首先，提取样品碰伤区域的平均光谱作为光谱特征，然后，根据灰度共现矩阵计算最具代表性的 PC1 图像的六个特征值作为纹理特征。为了找到最佳判别模型，研究人员分别根据光谱特征、纹理特征以及光谱特征与纹理特征相结合（特征融合 1）建立了随机森林（RF）、偏最小二乘判别分析（PLS-DA）、极梯度提升（XGBoost）和卷积神经网络（CNN）模型。结果表明，基于特征融合 1 的 CNN 模型判别效果最好，总体准确率为 90.22%。为了减少全光谱带来的冗余信息和噪声，采用了无信息变量消除（UVE）和竞争性自适应加权采样（CARS）算法来筛选光谱特征。筛选出的光谱特征与纹理特征进行融合（特征融合 2），并再次使用 RF、PLS-DA、XGBoost 和 CNN 进行建模。结果表明，基于特征融合 2 的 CNN 模型（CARS）是判别芒果瘀伤后不同储存时间的最佳模型，总体准确率为 93.48%。综上所述，本研究表明，光谱特征与纹理特征相结合可有效提高模型对轻度碰伤后不同贮藏时间芒果的判别结果。与其他机器学习模型相比，本文的 CNN 模型取得了更好的效果。它为高光谱成像结合深度学习判别芒果轻度碰伤后的不同储存时间提供了理论依据。

{"title":"Detection storage time of mangoes after mild bruise based on hyperspectral imaging combined with deep learning","authors":"Chi Yao, Cheng-tao Su, Ji-ping Zou, Shang-tao Ou-yang, Jian Wu, Nan Chen, Yan de Liu, Bin Li","doi":"10.1002/cem.3559","DOIUrl":"10.1002/cem.3559","url":null,"abstract":"To reduce the number of bruised mangoes at source, it is important to determine the different storage times of mangoes after mild bruise. In order to address this issue, a hyperspectral imaging combined with deep learning model was proposed. First, the average spectrum of the sample bruised area was extracted as spectral features, and then, the six eigenvalues of the most representative PC1 image were calculated as texture features based on the gray level co-occurrence matrix. In order to find the optimal discriminative model, random forest (RF), partial least squares discriminant analysis (PLS-DA), extreme gradient boosting (XGBoost), and convolutional neural network (CNN) models were built based on spectral features, texture features, and spectral features combined with texture features (Feature Fusion 1), respectively. The results showed that the best model discriminating model was based on CNN under Feature Fusion 1, with an overall accuracy of 90.22%. To reduce the redundant information and noise introduced by the full spectrum, uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) algorithms were used to filter the spectral features. The screened spectral features were fused with texture features (Feature Fusion 2) and modeled again with RF, PLS-DA, XGBoost, and CNN. The results showed that the optimal model for discriminating different storage times of mangoes after bruise was the CNN model based on feature fusion 2 (CARS), with an overall accuracy of 93.48%. In summary, this study shows that the spectral features combined with texture features can be used to effectively improve the model's discriminative results for different storage times of mango after mild bruise. Compared to other machine learning models, the CNN model in this paper achieves better results. It provides a theoretical basis for hyperspectral imaging combined with deep learning in discriminating different storage times of mangoes after mild bruise.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141113807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Alternative weighting schemes for fine-tuned extended similarity indices 微调扩展相似性指数的替代加权方案

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-11 DOI: 10.1002/cem.3558

Kenneth López Pérez, Anita Rácz, Dávid Bajusz, Camila Gonzalez, Károly Héberger, Ramón Alain Miranda-Quintana

Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation, and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to “discrete non-binary” and “continuous” data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction (“frac”), which corresponds to the principle of parsimony. Optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.

扩展的相似性指数（即成对相似性的广义化）因其简单、计算速度快以及在多样性挑选等任务中的优越性，近来越来越受到重视。然而，它们在运行时需要优化几个元参数。之前，我们将二元相似性指数扩展到了 "离散非二元 "和 "连续 "数据；现在，我们继续引入并比较多重加权函数。作为一项案例研究，我们通过基于二维描述符、MACCS 和摩根指纹的扩展相似性对 CYP 酶抑制剂（经整理后有 4016 个分子）的相似性进行了表征。为找到最佳加权函数，采用了基于排序差异总和（SRD）和方差分析（ANOVA）的统计工作流程。总体而言，最佳加权函数是分数（"frac"），它符合简约原则。我们还找到了最佳扩展相似性指数，并揭示了它们在不同数据集上的差异。我们希望这项工作能为扩展相似性指数的用户提供有关各种权重选项的指导。计算的源代码见 https://github.com/mqcomplab/MultipleComparisons。

{"title":"Alternative weighting schemes for fine-tuned extended similarity indices","authors":"Kenneth López Pérez, Anita Rácz, Dávid Bajusz, Camila Gonzalez, Károly Héberger, Ramón Alain Miranda-Quintana","doi":"10.1002/cem.3558","DOIUrl":"10.1002/cem.3558","url":null,"abstract":"Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation, and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to “discrete non-binary” and “continuous” data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction (“frac”), which corresponds to the principle of parsimony. Optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive tutorial on Data-Driven SIMCA: Theory and implementation in web 数据驱动 SIMCA 综合教程：网络理论与实施

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-10 DOI: 10.1002/cem.3556

Sergey Kucheryavskiy, Oxana Rodionova, Alexey Pomerantsev

The aim of this paper is twofold. First, it serves as a comprehensive tutorial on Data-Driven Soft Independent Modelling of Class Analogy (SIMCA) (DD-SIMCA) method for one-class classification. It covers all practical aspects of developing, validation, and application of DD-SIMCA models, using a set of simple examples. Second, it introduces web application that implements the main DD-SIMCA functionality. This application is freely available for everyone and does not require registration or installation. All calculations run locally in a browser without sending any information on a server, hence removing any obstacles to the dissemination of the data and models.

本文有两个目的。首先，它是数据驱动的类类比软独立建模（SIMCA）（DD-SIMCA）方法用于单类分类的综合教程。它使用一组简单的示例，涵盖了开发、验证和应用 DD-SIMCA 模型的所有实际方面。其次，它介绍了实现 DD-SIMCA 主要功能的网络应用程序。该应用程序对所有人免费开放，无需注册或安装。所有计算均在本地浏览器中运行，无需向服务器发送任何信息，从而消除了数据和模型传播的任何障碍。

引用次数: 0

Adaptive soft sensor modeling of chemical processes based on an improved just-in-time learning and random mapping partial least squares 基于改进的及时学习和随机映射偏最小二乘法的化学过程自适应软传感器建模

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-05-01 DOI: 10.1002/cem.3554

Ke Zhang, Xiangrui Zhang

The just-in-time learning-based partial least squares (JIT-PLS) has been extensively applied to adaptive soft sensor modeling of complex nonlinear processes. However, it still has the problems of unreasonable relevant samples selection and unsatisfactory local modeling. Aiming at these problems, this paper proposes an improved just-in-time learning-based random mapping partial least squares (IJIT-RMPLS), including an improved relevant samples selection strategy and a random mapping PLS (RMPLS) model. On the one hand, considering the different correlation degrees between input variables and output variable, this method applies mutual information to evaluate the importance of each input variable and designs a variable-weighted Euclidean distance to select relevant samples for local modeling. On the other hand, in order to prompt the prediction precision of local soft sensor models, this method combines the idea of nonlinear random mapping in extreme learning machines with PLS and builds a RMPLS with multiple activation functions. Applications on a numerical example and a real chemical process show that the proposed IJIT-RMPLS has smaller prediction error compared with traditional JIT-PLS.

基于及时学习的偏最小二乘法（JIT-PLS）已被广泛应用于复杂非线性过程的自适应软传感器建模。然而，它仍然存在相关样本选择不合理、局部建模效果不理想等问题。针对这些问题，本文提出了一种改进的基于及时学习的随机映射偏最小二乘法（IJIT-RMPLS），包括改进的相关样本选择策略和随机映射偏最小二乘法（RMPLS）模型。一方面，考虑到输入变量和输出变量之间的相关度不同，该方法采用互信息来评估每个输入变量的重要性，并设计了一个变量加权欧氏距离来选择相关样本进行局部建模。另一方面，为了提高局部软传感器模型的预测精度，该方法将极限学习机中的非线性随机映射思想与 PLS 相结合，建立了具有多个激活函数的 RMPLS。在一个数值实例和一个实际化学过程中的应用表明，与传统的 JIT-PLS 相比，所提出的 IJIT-RMPLS 具有更小的预测误差。

{"title":"Adaptive soft sensor modeling of chemical processes based on an improved just-in-time learning and random mapping partial least squares","authors":"Ke Zhang, Xiangrui Zhang","doi":"10.1002/cem.3554","DOIUrl":"10.1002/cem.3554","url":null,"abstract":"The just-in-time learning-based partial least squares (JIT-PLS) has been extensively applied to adaptive soft sensor modeling of complex nonlinear processes. However, it still has the problems of unreasonable relevant samples selection and unsatisfactory local modeling. Aiming at these problems, this paper proposes an improved just-in-time learning-based random mapping partial least squares (IJIT-RMPLS), including an improved relevant samples selection strategy and a random mapping PLS (RMPLS) model. On the one hand, considering the different correlation degrees between input variables and output variable, this method applies mutual information to evaluate the importance of each input variable and designs a variable-weighted Euclidean distance to select relevant samples for local modeling. On the other hand, in order to prompt the prediction precision of local soft sensor models, this method combines the idea of nonlinear random mapping in extreme learning machines with PLS and builds a RMPLS with multiple activation functions. Applications on a numerical example and a real chemical process show that the proposed IJIT-RMPLS has smaller prediction error compared with traditional JIT-PLS.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An adaptive strategy for time-varying batch process fault prediction based on stochastic configuration network 基于随机配置网络的时变批量工艺故障预测自适应策略

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-04-28 DOI: 10.1002/cem.3555

Kai Liu, Xiaoqiang Zhao, Yongyong Hui, Hongmei Jiang

Fault prediction ensures safe and stable production, and cuts maintenance costs. Due to the changing operating conditions that lead to the changes in the characteristics of industrial processes, there is a need to monitor the fault state of batch processes in real-time and to accurately predict fault trends. An adaptive slow feature analysis-neighborhood preserving embedding-improved stochastic configuration network (SFA-NPE-ISCN) algorithm for batch process fault prediction is proposed. Firstly, SFA is used to extract the time-varying features of process data and establish the update index of the NPE model. Then, to extract local nearest-neighbor features and reconstruct them by the NPE model with adaptive update capability, square prediction error (SPE) statistics are constructed as fault state features based on the reconstructed error. Further, the hunter-prey optimization (HPO) algorithm optimizes the weights and biases in the stochastic configuration network, and the singular value decomposition (SVD) and QR decomposition of column rotation are introduced to solve the ill-posed problem of SCN and obtain the prediction model of ISCN. Finally, the obtained statistics SPE is formed into a time series, and the ISCN model is used to predict the process state trend. The effectiveness of the proposed algorithm is verified by case studies of industrial-scale penicillin fermentation processes and the Hot strip mill process.

故障预测可确保安全稳定的生产，并降低维护成本。由于运行条件不断变化，导致工业流程的特性也随之变化，因此需要实时监控批量流程的故障状态，并准确预测故障趋势。本文提出了一种用于批量工艺故障预测的自适应慢特征分析-邻域保留嵌入-改进随机配置网络（SFA-NPE-ISCN）算法。首先，利用 SFA 提取过程数据的时变特征，建立 NPE 模型的更新指标。然后，通过具有自适应更新能力的 NPE 模型提取局部近邻特征并对其进行重构，根据重构后的误差构建平方预测误差（SPE）统计量作为故障状态特征。然后，利用猎人-猎物优化（HPO）算法优化随机配置网络中的权重和偏置，并引入奇异值分解（SVD）和列旋转 QR 分解来解决 SCN 的问题，从而得到 ISCN 的预测模型。最后，将得到的统计 SPE 形成时间序列，利用 ISCN 模型预测过程状态趋势。工业规模的青霉素发酵过程和热轧带钢过程的案例研究验证了所提算法的有效性。

{"title":"An adaptive strategy for time-varying batch process fault prediction based on stochastic configuration network","authors":"Kai Liu, Xiaoqiang Zhao, Yongyong Hui, Hongmei Jiang","doi":"10.1002/cem.3555","DOIUrl":"10.1002/cem.3555","url":null,"abstract":"Fault prediction ensures safe and stable production, and cuts maintenance costs. Due to the changing operating conditions that lead to the changes in the characteristics of industrial processes, there is a need to monitor the fault state of batch processes in real-time and to accurately predict fault trends. An adaptive slow feature analysis-neighborhood preserving embedding-improved stochastic configuration network (SFA-NPE-ISCN) algorithm for batch process fault prediction is proposed. Firstly, SFA is used to extract the time-varying features of process data and establish the update index of the NPE model. Then, to extract local nearest-neighbor features and reconstruct them by the NPE model with adaptive update capability, square prediction error (SPE) statistics are constructed as fault state features based on the reconstructed error. Further, the hunter-prey optimization (HPO) algorithm optimizes the weights and biases in the stochastic configuration network, and the singular value decomposition (SVD) and QR decomposition of column rotation are introduced to solve the ill-posed problem of SCN and obtain the prediction model of ISCN. Finally, the obtained statistics SPE is formed into a time series, and the ISCN model is used to predict the process state trend. The effectiveness of the proposed algorithm is verified by case studies of industrial-scale penicillin fermentation processes and the Hot strip mill process.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 9","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A prediction model of nonclassical secreted protein based on deep learning 基于深度学习的非经典分泌蛋白预测模型

IF 2.3 4区化学 Q1 SOCIAL WORK

Journal of Chemometrics

Pub Date : 2024-04-25 DOI: 10.1002/cem.3553

Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang

Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24%, Matthews coefficient (MCC) of 77.01%, and F1-score of 87.50%. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).

目前大多数非经典蛋白质预测方法都涉及人工特征选择，如根据蛋白质的理化性质和特定位置评分矩阵（PSSM）构建样本特征。然而，这些任务需要研究人员进行一些繁琐的搜索工作来获取蛋白质的理化性质。本文提出了一种基于深度学习的端到端非经典分泌蛋白预测模型，命名为DeepNCSPP，利用蛋白质序列信息和序列统计信息作为输入，预测其是否为非经典分泌蛋白。蛋白质序列信息和序列统计信息分别通过双向长短期记忆和卷积神经网络提取。在独立测试数据集的实验中，DeepNCSPP 取得了优异的成绩，准确率为 88.24%，马修系数（MCC）为 77.01%，F1 分数为 87.50%。独立测试数据集测试和10倍交叉验证表明，DeepNCSPP的性能与最先进的方法不相上下，可用作可靠的非经典分泌蛋白预测模型。为方便研究人员，我们还建立了一个网络服务器。网站链接为 https://www.deepncspp.top/。DeepNCSPP 的源代码托管在 GitHub 上，可在线获取（https://github.com/xiaoliu166370/DEEPNCSPP）。

{"title":"A prediction model of nonclassical secreted protein based on deep learning","authors":"Fan Zhang, Chaoyang Liu, Binjie Wang, Yiru He, Xinhong Zhang","doi":"10.1002/cem.3553","DOIUrl":"10.1002/cem.3553","url":null,"abstract":"Most of the current nonclassical proteins prediction methods involve manual feature selection, such as constructing features of samples based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks require researchers to perform some tedious search work to obtain the physicochemical properties of proteins. This paper proposes an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP, which employs the protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. The protein sequence information and sequence statistics information are extracted using bidirectional long- and short-term memory and convolutional neural networks, respectively. Among the experiments conducted on the independent test dataset, DeepNCSPP achieved excellent results with an accuracy of 88.24%, Matthews coefficient (MCC) of 77.01%, and F1-score of 87.50%. Independent test dataset testing and 10-fold cross-validation show that DeepNCSPP achieves competitive performance with state-of-the-art methods and can be used as a reliable nonclassical secreted protein prediction model. A web server has been constructed for the convenience of researchers. The web link is https://www.deepncspp.top/. The source code of DeepNCSPP has been hosted on GitHub and is available online (https://github.com/xiaoliu166370/DEEPNCSPP).","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 8","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140803214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0