Pub Date : 2024-08-23DOI: 10.1016/j.chemolab.2024.105220
This paper proposes a joint state and unknown inputs (UIs) discrete-time estimation method for industrial processes, represented by a state-space model. To cope with the outliers in process data, the measurement noise is characterized by the Student’s t-distribution. The identification of UIs is accomplished through the recursive expectation–maximization (REM) approach. Specifically, in the E-step, a recursively calculated Q-function is formulated by the maximum likelihood criterion, and the states and the variance scale factor are estimated iteratively. In the M-step, UIs are updated analytically together with the degree of freedom is updated approximately. The effectiveness of the proposed algorithm is validated using a quadruple water tank process and a continuous stirred tank reactor. It shows that the proposed method significantly enhances the robustness and estimation accuracy of state and UIs in industrial processes, effectively handling outliers and reducing computational demands for real-time applications.
本文提出了一种以状态空间模型为代表的工业过程状态和未知输入(UIs)离散时间联合估计方法。为了应对过程数据中的异常值,测量噪声采用了 Student's t 分布。UIs 的识别是通过递归期望最大化(REM)方法完成的。具体来说,在 E 步中,通过最大似然准则制定递归计算的 Q 函数,并对状态和方差比例因子进行迭代估计。在 M 步中,UIs 是通过分析更新的,自由度也是近似更新的。利用四重水槽工艺和连续搅拌罐反应器验证了所提算法的有效性。结果表明,所提出的方法大大提高了工业过程中状态和 UI 的鲁棒性和估计精度,有效地处理了异常值,降低了实时应用的计算需求。
{"title":"Joint state and process inputs estimation for state-space models with Student’s t-distribution","authors":"","doi":"10.1016/j.chemolab.2024.105220","DOIUrl":"10.1016/j.chemolab.2024.105220","url":null,"abstract":"<div><p>This paper proposes a joint state and unknown inputs (UIs) discrete-time estimation method for industrial processes, represented by a state-space model. To cope with the outliers in process data, the measurement noise is characterized by the Student’s t-distribution. The identification of UIs is accomplished through the recursive expectation–maximization (REM) approach. Specifically, in the E-step, a recursively calculated Q-function is formulated by the maximum likelihood criterion, and the states and the variance scale factor are estimated iteratively. In the M-step, UIs are updated analytically together with the degree of freedom is updated approximately. The effectiveness of the proposed algorithm is validated using a quadruple water tank process and a continuous stirred tank reactor. It shows that the proposed method significantly enhances the robustness and estimation accuracy of state and UIs in industrial processes, effectively handling outliers and reducing computational demands for real-time applications.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.chemolab.2024.105217
The research work shows the potentiality of advanced linear and nonlinear learning algorithm techniques in the prediction of apples texture sensory attributes as “hardness”, “crunchiness”, “flouriness”, “fibrousness”, and “graininess”. Starting from the information contained in the entire mechanical and acoustic curves acquired during samples compression test, the prediction performances of five different statistical tools as Partial Least Squares regression (PLS), Multilayer Perceptron (MLP), Support Vector Regression (SVR) and Gaussian Process Regression (GPR) are shown and discussed.
All Predictive models validations evidence best accuracies for texture sensory attributes “hardness” and “crunchiness” and in general for GPR learning algorithm. By combining mechanical and acoustic profiles, 5-fold cross validations produce values of coefficient of determination R2 up to 0.885 (GPR) and 0.840 (GPR), respectively for “hardness” and “crunchiness”. These results, comparable to those obtained by considering a large number of mechanical and acoustic parameters extracted from acquired profiles as predictive factors, evidence a new and reliable way for the prediction of texture sensory attributes of apples. The proposed approach can overcome the necessity to define, in advance, number and type of features to be calculated from instrumental texture profiles and can be easily implemented in an automatic process.
{"title":"Combining algorithm techniques with mechanical and acoustic profiles for the prediction of apples sensory attributes","authors":"","doi":"10.1016/j.chemolab.2024.105217","DOIUrl":"10.1016/j.chemolab.2024.105217","url":null,"abstract":"<div><p>The research work shows the potentiality of advanced linear and nonlinear learning algorithm techniques in the prediction of apples texture sensory attributes as “hardness”, “crunchiness”, “flouriness”, “fibrousness”, and “graininess”. Starting from the information contained in the entire mechanical and acoustic curves acquired during samples compression test, the prediction performances of five different statistical tools as Partial Least Squares regression (PLS), Multilayer Perceptron (MLP), Support Vector Regression (SVR) and Gaussian Process Regression (GPR) are shown and discussed.</p><p>All Predictive models validations evidence best accuracies for texture sensory attributes “hardness” and “crunchiness” and in general for GPR learning algorithm. By combining mechanical and acoustic profiles, 5-fold cross validations produce values of coefficient of determination R<sup>2</sup> up to 0.885 (GPR) and 0.840 (GPR), respectively for “hardness” and “crunchiness”. These results, comparable to those obtained by considering a large number of mechanical and acoustic parameters extracted from acquired profiles as predictive factors, evidence a new and reliable way for the prediction of texture sensory attributes of apples. The proposed approach can overcome the necessity to define, in advance, number and type of features to be calculated from instrumental texture profiles and can be easily implemented in an automatic process.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.chemolab.2024.105219
In this study, we develop predictive models for three target variables, denoted as , , and using a dataset with 86 features and 181 samples. The response parameters, which are Hansen solubility parameters, were correlated to input parameters via several machine learning techniques. The input features are molecular descriptors of coformers which are calculated based on COMSO-RS thermodynamic model and group contribution approach. The analysis includes outlier detection via Cook's distance, normalization with a min-max scaler, and feature selection through L1-based methods. Three regression models—Gaussian Process Regression (GPR), Passive Aggressive Regression (PAR), and Polynomial Regression (PR)—are employed, with hyperparameter optimization achieved using Transient Search Optimization (TSO). The results indicate that for , the PAR model outperforms others with an R2 score of 0.885, RMSE of 0.607, MAE of 0.524, and a maximum error of 1.294. The GPR model shows slightly lower performance with an R2 of 0.872, RMSE of 0.816, MAE of 0.579, and a maximum error of 2.755 for . The PR model performs on with an R2 of 0.814, RMSE of 0.923, MAE of 0.597, and a maximum error of 2.814. For , the GPR model provides the best performance, achieving an R2 score of 0.821, RMSE of 1.693, MAE of 1.391, and a maximum error of 3.457. The PAR model performs on with an R2 of 0.740, RMSE of 2.025, MAE of 1.980, and a maximum error of 6.609. Also, The PR model predicts with a R2 of 0.7, RMSE of 2.329, MAE of 2.02, and maximum error of 6.366. Similarly, for , the GPR model again shows superior performance with an R2 score of 0.983, RMSE of 1.243, MAE of 1.005, and a maximum error of 2.577. The PAR model also accurately predicts with a R2 of 0.924, RMSE of 2.713, MAE of 2.416, and maximum error of 6.307. Additionally, the PR model predicts with a R2 of 0.927, RMSE of 2.757, MAE of 2.334, and maximum error of 8.064. These results highlight the efficacy of the chosen models and optimization techniques in accurately p
{"title":"Combination of machine learning and COSMO-RS thermodynamic model in predicting solubility parameters of coformers in production of cocrystals for enhanced drug solubility","authors":"","doi":"10.1016/j.chemolab.2024.105219","DOIUrl":"10.1016/j.chemolab.2024.105219","url":null,"abstract":"<div><p>In this study, we develop predictive models for three target variables, denoted as <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>, <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span>, and <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> using a dataset with 86 features and 181 samples. The response parameters, which are Hansen solubility parameters, were correlated to input parameters via several machine learning techniques. The input features are molecular descriptors of coformers which are calculated based on COMSO-RS thermodynamic model and group contribution approach. The analysis includes outlier detection via Cook's distance, normalization with a min-max scaler, and feature selection through L1-based methods. Three regression models—Gaussian Process Regression (GPR), Passive Aggressive Regression (PAR), and Polynomial Regression (PR)—are employed, with hyperparameter optimization achieved using Transient Search Optimization (TSO). The results indicate that for <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>, the PAR model outperforms others with an R<sup>2</sup> score of 0.885, RMSE of 0.607, MAE of 0.524, and a maximum error of 1.294. The GPR model shows slightly lower performance with an R<sup>2</sup> of 0.872, RMSE of 0.816, MAE of 0.579, and a maximum error of 2.755 for <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>. The PR model performs on <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span> with an R<sup>2</sup> of 0.814, RMSE of 0.923, MAE of 0.597, and a maximum error of 2.814. For <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span>, the GPR model provides the best performance, achieving an R<sup>2</sup> score of 0.821, RMSE of 1.693, MAE of 1.391, and a maximum error of 3.457. The PAR model performs on <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span> with an R<sup>2</sup> of 0.740, RMSE of 2.025, MAE of 1.980, and a maximum error of 6.609. Also, The PR model predicts <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.7, RMSE of 2.329, MAE of 2.02, and maximum error of 6.366. Similarly, for <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span>, the GPR model again shows superior performance with an R<sup>2</sup> score of 0.983, RMSE of 1.243, MAE of 1.005, and a maximum error of 2.577. The PAR model also accurately predicts <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.924, RMSE of 2.713, MAE of 2.416, and maximum error of 6.307. Additionally, the PR model predicts <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.927, RMSE of 2.757, MAE of 2.334, and maximum error of 8.064. These results highlight the efficacy of the chosen models and optimization techniques in accurately p","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.chemolab.2024.105216
A comprehensive multi-scale computational strategy was developed in this study based on mass transfer and machine learning for simulation of drug concentration distribution in a biomaterial matrix. The controlled release was modeled and validated via the hybrid model. Mass transfer equations along with kinetics models were solved numerically and the results were then used for machine learning models. We investigated the performance of three regression models, namely Decision Tree (DT), Random Forest (RF), and Extra Tree (ET) in predicting medicine concentration (C) based on r and z data. Hyper-parameter optimization is conducted using Glowworm Swarm Optimization (GSO). Results revealed high predictive accuracy across all models, with ET demonstrating superior performance, achieving a coefficient of determination value (R2) of 0.99854, an RMSE of 1.1446E-05, and a maximum error of 6.49087E-05. DT and RF also exhibit notable performance, with coefficients of determination equal to 0.99571 and 0.99655, respectively. These results highlight the effectiveness of ensemble tree-based methods in accurately predicting chemical concentrations, with Extra Tree (ET) Regression emerging as the most promising model for this specific dataset.
本研究开发了一种基于传质和机器学习的多尺度综合计算策略,用于模拟生物材料基质中的药物浓度分布。通过混合模型对控释进行了建模和验证。对传质方程和动力学模型进行了数值求解,然后将结果用于机器学习模型。我们研究了三种回归模型,即决策树(DT)、随机森林(RF)和额外树(ET)在基于 r 和 z 数据预测药物浓度(C)方面的性能。使用萤火虫群优化(GSO)对超参数进行了优化。结果表明,所有模型的预测准确率都很高,其中 ET 表现优异,其决定系数 (R2) 为 0.99854,均方根误差为 1.1446E-05,最大误差为 6.49087E-05。DT 和 RF 也表现不俗,它们的判定系数分别为 0.99571 和 0.99655。这些结果凸显了基于集合树的方法在准确预测化学物质浓度方面的有效性,其中额外树(ET)回归是该特定数据集最有前途的模型。
{"title":"Model development using hybrid method for prediction of drug release from biomaterial matrix","authors":"","doi":"10.1016/j.chemolab.2024.105216","DOIUrl":"10.1016/j.chemolab.2024.105216","url":null,"abstract":"<div><p>A comprehensive multi-scale computational strategy was developed in this study based on mass transfer and machine learning for simulation of drug concentration distribution in a biomaterial matrix. The controlled release was modeled and validated via the hybrid model. Mass transfer equations along with kinetics models were solved numerically and the results were then used for machine learning models. We investigated the performance of three regression models, namely Decision Tree (DT), Random Forest (RF), and Extra Tree (ET) in predicting medicine concentration (C) based on r and z data. Hyper-parameter optimization is conducted using Glowworm Swarm Optimization (GSO). Results revealed high predictive accuracy across all models, with ET demonstrating superior performance, achieving a coefficient of determination value (R<sup>2</sup>) of 0.99854, an RMSE of 1.1446E-05, and a maximum error of 6.49087E-05. DT and RF also exhibit notable performance, with coefficients of determination equal to 0.99571 and 0.99655, respectively. These results highlight the effectiveness of ensemble tree-based methods in accurately predicting chemical concentrations, with Extra Tree (ET) Regression emerging as the most promising model for this specific dataset.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.chemolab.2024.105205
Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.
{"title":"Robust baseline correction for Raman spectra by constrained Gaussian radial basis function fitting","authors":"","doi":"10.1016/j.chemolab.2024.105205","DOIUrl":"10.1016/j.chemolab.2024.105205","url":null,"abstract":"<div><p>Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1016/j.chemolab.2024.105200
Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (penalized baseline correction) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed supervised and penalized baseline correction (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.
{"title":"Supervised and penalized baseline correction","authors":"","doi":"10.1016/j.chemolab.2024.105200","DOIUrl":"10.1016/j.chemolab.2024.105200","url":null,"abstract":"<div><p>Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (<em>penalized baseline correction</em>) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed <em>supervised and penalized baseline correction</em> (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.chemolab.2024.105206
This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B16N16 and Al16N16 fullerene-like cages. The SAF adsorption onto the B16N16 and Al16N16 surfaces in gas, water (H2O), and chloroform (CHCl3) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al16N16/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B16N16/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al16N16/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al16N16/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al16N16/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.
{"title":"Novel investigation on adsorption analysis of safranal interacting with boron nitride and aluminum nitride fullerene-like cages: Drug delivery system","authors":"","doi":"10.1016/j.chemolab.2024.105206","DOIUrl":"10.1016/j.chemolab.2024.105206","url":null,"abstract":"<div><p>This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> fullerene-like cages. The SAF adsorption onto the B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> surfaces in gas, water (H<sub>2</sub>O), and chloroform (CHCl<sub>3</sub>) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al<sub>16</sub>N<sub>16</sub>/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B<sub>16</sub>N<sub>16</sub>/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al<sub>16</sub>N<sub>16</sub>/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al<sub>16</sub>N<sub>16</sub>/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al<sub>16</sub>N<sub>16</sub>/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.chemolab.2024.105204
The first stage in the industrial production of Styrene-Butadiene Rubber (SBR) typically consists in obtaining a latex from a train of continuous stirred tank reactors. Accurate real-time estimation of some key process variables is of paramount importance to ensure the production of high-quality rubber. Monitoring the mass conversion of monomers in the last reactor of the train is particularly important. To this effect, various soft sensors (SS) have been proposed, however they have not addressed the underlying complex dynamic relationships existing among the process variables. In this work, a SS based on recurrent neural networks (RNN) is developed to estimate the mass conversion in the last reactor of the train. The main challenge is to obtain an adequate estimate of the conversion both in its usual steady-state operation and during its frequent transient operating phases. Three architectures of RNN: Elman, GRU (Gated Recurrent Unit), and LSTM (Long Short-Term Memory) are compared to critically evaluate their performances. Moreover, a comprehensive analysis is conducted to assess the ability of these models to represent different operational modes of the train. The results reveal that the GRU network exhibits the best performance for estimating the mass conversion of monomers. Then, the performance of the proposed model is compared with a previously-developed SS, which was based on a linear estimation model with a Bayesian bias adaptation mechanism and the use of Control Charts for decision-making. The model proposed here proved to be more efficient for estimating the mass conversion of monomers, particularly during transient operating phases. Finally, to evaluate the methodology utilized for designing the SS, the same RNN architectures were trained to online estimate another quality variable: the mass fraction of Styrene bound to the copolymer. The obtained results were also acceptable.
丁苯橡胶(SBR)工业生产的第一阶段通常是从一列连续搅拌罐反应器中获得胶乳。要确保生产出高质量的橡胶,对一些关键工艺变量进行准确的实时估算至关重要。监测反应器组最后一个反应器中单体的质量转化率尤为重要。为此,人们提出了各种软传感器(SS),但它们并没有解决工艺变量之间存在的潜在复杂动态关系。在这项工作中,开发了一种基于递归神经网络(RNN)的软传感器,用于估算列车最后一个反应器的质量转换。主要的挑战是如何在通常的稳态运行和频繁的瞬态运行阶段都能对转换率进行充分估计。RNN 有三种结构:Elman、GRU(门控递归单元)和 LSTM(长短期记忆)三种 RNN 结构进行了比较,以严格评估其性能。此外,还进行了综合分析,以评估这些模型代表列车不同运行模式的能力。结果表明,GRU 网络在估计单体的质量转换方面表现最佳。然后,将所提出模型的性能与之前开发的 SS 进行了比较,后者是基于线性估计模型和贝叶斯偏差适应机制,并使用控制图进行决策。事实证明,这里提出的模型在估算单体的质量转换方面更为有效,尤其是在瞬态运行阶段。最后,为了评估设计 SS 所采用的方法,对相同的 RNN 架构进行了训练,以在线估算另一个质量变量:苯乙烯与共聚物结合的质量分数。得到的结果也是可以接受的。
{"title":"Estimation of quality variables in a continuous train of reactors using recurrent neural networks-based soft sensors","authors":"","doi":"10.1016/j.chemolab.2024.105204","DOIUrl":"10.1016/j.chemolab.2024.105204","url":null,"abstract":"<div><p>The first stage in the industrial production of Styrene-Butadiene Rubber (SBR) typically consists in obtaining a latex from a train of continuous stirred tank reactors. Accurate real-time estimation of some key process variables is of paramount importance to ensure the production of high-quality rubber. Monitoring the mass conversion of monomers in the last reactor of the train is particularly important. To this effect, various soft sensors (SS) have been proposed, however they have not addressed the underlying complex dynamic relationships existing among the process variables. In this work, a SS based on recurrent neural networks (RNN) is developed to estimate the mass conversion in the last reactor of the train. The main challenge is to obtain an adequate estimate of the conversion both in its usual steady-state operation and during its frequent transient operating phases. Three architectures of RNN: Elman, GRU (Gated Recurrent Unit), and LSTM (Long Short-Term Memory) are compared to critically evaluate their performances. Moreover, a comprehensive analysis is conducted to assess the ability of these models to represent different operational modes of the train. The results reveal that the GRU network exhibits the best performance for estimating the mass conversion of monomers. Then, the performance of the proposed model is compared with a previously-developed SS, which was based on a linear estimation model with a Bayesian bias adaptation mechanism and the use of Control Charts for decision-making. The model proposed here proved to be more efficient for estimating the mass conversion of monomers, particularly during transient operating phases. Finally, to evaluate the methodology utilized for designing the SS, the same RNN architectures were trained to online estimate another quality variable: the mass fraction of Styrene bound to the copolymer. The obtained results were also acceptable.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.chemolab.2024.105202
Among the most frequently diagnosed diseases in citrus, citrus Huanglongbing disease has caused severe economic losses to the citrus industry worldwide since there is no curable method and it spreads quickly. As callose accumulation in phloem is one of the early response events to Asian species Candidatus Liberibacter asiaticus (CLas) infection, the dynamic perception of the sieve plate region can be used as an indicator for the early diagnosis of citrus HLB disease. In this study, one-dimensional convolutional neural network (1D-CNN) models were established to achieve early detection of HLB disease based on spectral information in the sieve plate region using Fourier transform infrared microscopy (micro-FTIR) spectrometer. Partial least squares regression (PLSR) and the least squares support vector machine regression (LS-SVR) models are used for the prediction of callose based on the micro-FTIR information in the sieve plate region of the citrus midrib. Furthermore, an improved data augmentation method by superimposing Gaussian noise was proposed to expand the spectral amplitude. The proposed method has achieved 98.65 % classification accuracy, which was higher than that of other traditional algorithms such as the logistic model tree (LMT), linear discriminant analysis (LDA), Bayes (BS), support vector machine (SVM) and k-nearest neighbors (kNN), and also than that of the molecular detection qPCR (Quantitative real-time polymerase chain reaction) method. Finally, based on the established early detection model with laboratory samples, it can also be used to detect the citrus HLB in complex field samples by using model updating methods, and the overall detection accuracy of the model reached 91.21 %. Our approach has potential for the early diagnosis of citrus HLB disease from the microscopic scale, which would provide useful and precise guidelines to prevent and control citrus HLB disease.
{"title":"A 1D-CNN model for the early detection of citrus Huanglongbing disease in the sieve plate of phloem tissue using micro-FTIR","authors":"","doi":"10.1016/j.chemolab.2024.105202","DOIUrl":"10.1016/j.chemolab.2024.105202","url":null,"abstract":"<div><p>Among the most frequently diagnosed diseases in citrus, citrus Huanglongbing disease has caused severe economic losses to the citrus industry worldwide since there is no curable method and it spreads quickly. As callose accumulation in phloem is one of the early response events to Asian species <em>Candidatus</em> Liberibacter asiaticus (<em>C</em>Las) infection, the dynamic perception of the sieve plate region can be used as an indicator for the early diagnosis of citrus HLB disease. In this study, one-dimensional convolutional neural network (1D-CNN) models were established to achieve early detection of HLB disease based on spectral information in the sieve plate region using Fourier transform infrared microscopy (micro-FTIR) spectrometer. Partial least squares regression (PLSR) and the least squares support vector machine regression (LS-SVR) models are used for the prediction of callose based on the micro-FTIR information in the sieve plate region of the citrus midrib. Furthermore, an improved data augmentation method by superimposing Gaussian noise was proposed to expand the spectral amplitude. The proposed method has achieved 98.65 % classification accuracy, which was higher than that of other traditional algorithms such as the logistic model tree (LMT), linear discriminant analysis (LDA), Bayes (BS), support vector machine (SVM) and k-nearest neighbors (kNN), and also than that of the molecular detection qPCR (Quantitative real-time polymerase chain reaction) method. Finally, based on the established early detection model with laboratory samples, it can also be used to detect the citrus HLB in complex field samples by using model updating methods, and the overall detection accuracy of the model reached 91.21 %. Our approach has potential for the early diagnosis of citrus HLB disease from the microscopic scale, which would provide useful and precise guidelines to prevent and control citrus HLB disease.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1016/j.chemolab.2024.105201
In the era of chemical big data, the high complexity and strong interdependencies present in the datasets pose considerable challenges when constructing accurate parametric models. The Gaussian process model, owing to its non-parametric nature, demonstrates better adaptability when confronted with complex and interdependent data. However, the standard Gaussian process has two significant limitations. Firstly, the time complexity of inverting its kernel matrix during the inference process is . Secondly, all data share a common kernel function parameter, which mixes different data types and reduces the model accuracy in mixing-category data identification problems. In light of this, this paper proposes a mixture Gaussian process model that addresses these limitations. This model reduces time complexity and distinguishes data based on different data features. It incorporates a Gaussian mixture distribution for the inducing variables to approximate the original data distribution. Stochastic Variational Inference is utilized to reduce the computational time required for parameter inference. The inducing variables have distinct parameters for the kernel function based on the data category, leading to improved analytical accuracy and reduced time complexity of the Gaussian process model. Numerical experiments are conducted to analyze and compare the performance of the proposed model on different-sized datasets and various data category cases.
{"title":"Mixture Gaussian process model with Gaussian mixture distribution for big data","authors":"","doi":"10.1016/j.chemolab.2024.105201","DOIUrl":"10.1016/j.chemolab.2024.105201","url":null,"abstract":"<div><p>In the era of chemical big data, the high complexity and strong interdependencies present in the datasets pose considerable challenges when constructing accurate parametric models. The Gaussian process model, owing to its non-parametric nature, demonstrates better adaptability when confronted with complex and interdependent data. However, the standard Gaussian process has two significant limitations. Firstly, the time complexity of inverting its kernel matrix during the inference process is <span><math><mrow><mi>O</mi><msup><mrow><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span>. Secondly, all data share a common kernel function parameter, which mixes different data types and reduces the model accuracy in mixing-category data identification problems. In light of this, this paper proposes a mixture Gaussian process model that addresses these limitations. This model reduces time complexity and distinguishes data based on different data features. It incorporates a Gaussian mixture distribution for the inducing variables to approximate the original data distribution. Stochastic Variational Inference is utilized to reduce the computational time required for parameter inference. The inducing variables have distinct parameters for the kernel function based on the data category, leading to improved analytical accuracy and reduced time complexity of the Gaussian process model. Numerical experiments are conducted to analyze and compare the performance of the proposed model on different-sized datasets and various data category cases.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}