首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Robust baseline correction for Raman spectra by constrained Gaussian radial basis function fitting 通过约束高斯径向基函数拟合对拉曼光谱进行稳健的基线校正
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-22 DOI: 10.1016/j.chemolab.2024.105205
Sungwon Park, Hongjoong Kim

Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.

准确的基线校正是利用拉曼光谱提取有意义的光谱信息并进行精确定量分析的基本要求。虽然已经开发出了许多基线校正技术,但这些技术往往需要对参数进行细致的调整,而且产生的结果也不一致。为了应对这些挑战,我们引入了一种新方法,即约束高斯径向基函数拟合(CGF)。我们的方法涉及在特定约束条件下使用高斯径向基函数求解曲线拟合问题。为了确保稳定性和效率,我们为所提出的方法开发了一种线性编程算法。我们使用模拟拉曼光谱评估了 CGF 的性能,并证明了它在各种情况下的鲁棒性,包括数据长度和噪声水平的变化。标准方法通常需要进行复杂的参数调整,并可能出现不同的误差,与之相比,我们的方法只需进行简单的参数搜索,并能始终保持较低的误差。我们使用真实拉曼光谱进一步评估了 CGF,从而提高了化学战剂拉曼光谱定量分析的准确性。我们的研究结果强调了 CGF 作为拉曼光谱数据分析宝贵工具的潜力,极大地推动了复杂分析技术的发展。
{"title":"Robust baseline correction for Raman spectra by constrained Gaussian radial basis function fitting","authors":"Sungwon Park,&nbsp;Hongjoong Kim","doi":"10.1016/j.chemolab.2024.105205","DOIUrl":"10.1016/j.chemolab.2024.105205","url":null,"abstract":"<div><p>Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105205"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised and penalized baseline correction 监督和惩罚基线校正
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-20 DOI: 10.1016/j.chemolab.2024.105200
Erik Andries , Ramin Nikzad-Langerodi

Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (penalized baseline correction) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed supervised and penalized baseline correction (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.

光谱测量可显示由吸收和散射混合产生的扭曲光谱形状。这些扭曲(或基线)通常表现为非恒定偏移或低频振荡。因此,这些基线会对分析和定量结果产生不利影响。基线校正是一个总称,是指应用预处理方法获取基线光谱(不需要的失真),然后通过差分去除失真。然而,目前最先进的基线校正方法并不利用分析物浓度,即使分析物浓度可用,或者即使分析物浓度对观测到的光谱变异性有重大影响。我们对一类最先进的方法(惩罚性基线校正)进行了修改,使其能够轻松地纳入先验分析物浓度,从而提高预测结果。这种修改后的方法将被视为监督和惩罚基线校正(SPBC)。我们将在两个近红外数据集上对经典的惩罚基线校正方法(无分析物信息)和改进的惩罚基线校正方法(利用分析物信息)进行性能评估。在某些情况下,SPBC 可以提供有用的基线校正信号,从而优于 AIRPLS 等最先进的惩罚性基线校正算法。我们特别注意到,性能取决于不同分析物之间的相关性:用于基线相关的分析物和用于预测的分析物--用于基线相关的分析物和用于预测的分析物之间的相关性越大,预测性能越好。
{"title":"Supervised and penalized baseline correction","authors":"Erik Andries ,&nbsp;Ramin Nikzad-Langerodi","doi":"10.1016/j.chemolab.2024.105200","DOIUrl":"10.1016/j.chemolab.2024.105200","url":null,"abstract":"<div><p>Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (<em>penalized baseline correction</em>) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed <em>supervised and penalized baseline correction</em> (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105200"},"PeriodicalIF":3.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel investigation on adsorption analysis of safranal interacting with boron nitride and aluminum nitride fullerene-like cages: Drug delivery system 关于沙夫拉尔与氮化硼和氮化铝类富勒烯笼相互作用的吸附分析的新研究:给药系统
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-17 DOI: 10.1016/j.chemolab.2024.105206
Saad M Alshahrani

This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B16N16 and Al16N16 fullerene-like cages. The SAF adsorption onto the B16N16 and Al16N16 surfaces in gas, water (H2O), and chloroform (CHCl3) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al16N16/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B16N16/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al16N16/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al16N16/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al16N16/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.

本研究说明了通过在 B16N16 和 Al16N16 富勒烯样笼上吸附沙呋纳(SAF)可有效控制 COVID-19 感染。采用密度泛函理论(DFT)和时间相关(TD)密度泛函理论方法,分析了在气体、水(H2O)和氯仿(CHCl3)环境中 SAF 在 B16N16 和 Al16N16 表面的吸附情况,并对基质及其复合物进行了评估。在 PBE0-D3 水平上,与 B16N16/SAF 复合物相比,Al16N16/SAF 复合物在水相中表现出最大的负结合能和结构稳定性。热力学参数表明,SAF 在类富勒烯笼上的吸附是放热的,尤其是 Al16N16/SAF 复合物。此外,与气体和氯仿环境相比,水相中 SAF 与类富勒烯笼的相互作用更为明显。与完美的体系相比,复合物在所有三种环境中的能隙(Eg)都有所减小,在所有相中都显著减小了 21% 以上。能隙的大幅减小表明,复合物的反应活性和对 SAF 的敏感性都有所提高,这可能是由于电子传导性发生了显著变化。分子对接结果表明,与所研究的其他化合物相比,水相中的 Al16N16/SAF 复合物具有很强的结合亲和力。这些研究结果表明,Al16N16/SAF 复合物有望成为 COVID-19 的潜在抑制剂以及生物医学应用和药物输送系统的重要材料。
{"title":"Novel investigation on adsorption analysis of safranal interacting with boron nitride and aluminum nitride fullerene-like cages: Drug delivery system","authors":"Saad M Alshahrani","doi":"10.1016/j.chemolab.2024.105206","DOIUrl":"10.1016/j.chemolab.2024.105206","url":null,"abstract":"<div><p>This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> fullerene-like cages. The SAF adsorption onto the B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> surfaces in gas, water (H<sub>2</sub>O), and chloroform (CHCl<sub>3</sub>) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al<sub>16</sub>N<sub>16</sub>/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B<sub>16</sub>N<sub>16</sub>/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al<sub>16</sub>N<sub>16</sub>/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al<sub>16</sub>N<sub>16</sub>/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al<sub>16</sub>N<sub>16</sub>/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105206"},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of quality variables in a continuous train of reactors using recurrent neural networks-based soft sensors 利用基于递归神经网络的软传感器估算连续电抗器列车中的质量变量
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-14 DOI: 10.1016/j.chemolab.2024.105204
Mariano M. Perdomo , Luis A. Clementi , Jorge R. Vega

The first stage in the industrial production of Styrene-Butadiene Rubber (SBR) typically consists in obtaining a latex from a train of continuous stirred tank reactors. Accurate real-time estimation of some key process variables is of paramount importance to ensure the production of high-quality rubber. Monitoring the mass conversion of monomers in the last reactor of the train is particularly important. To this effect, various soft sensors (SS) have been proposed, however they have not addressed the underlying complex dynamic relationships existing among the process variables. In this work, a SS based on recurrent neural networks (RNN) is developed to estimate the mass conversion in the last reactor of the train. The main challenge is to obtain an adequate estimate of the conversion both in its usual steady-state operation and during its frequent transient operating phases. Three architectures of RNN: Elman, GRU (Gated Recurrent Unit), and LSTM (Long Short-Term Memory) are compared to critically evaluate their performances. Moreover, a comprehensive analysis is conducted to assess the ability of these models to represent different operational modes of the train. The results reveal that the GRU network exhibits the best performance for estimating the mass conversion of monomers. Then, the performance of the proposed model is compared with a previously-developed SS, which was based on a linear estimation model with a Bayesian bias adaptation mechanism and the use of Control Charts for decision-making. The model proposed here proved to be more efficient for estimating the mass conversion of monomers, particularly during transient operating phases. Finally, to evaluate the methodology utilized for designing the SS, the same RNN architectures were trained to online estimate another quality variable: the mass fraction of Styrene bound to the copolymer. The obtained results were also acceptable.

丁苯橡胶(SBR)工业生产的第一阶段通常是从一列连续搅拌罐反应器中获得胶乳。要确保生产出高质量的橡胶,对一些关键工艺变量进行准确的实时估算至关重要。监测反应器组最后一个反应器中单体的质量转化率尤为重要。为此,人们提出了各种软传感器(SS),但它们并没有解决工艺变量之间存在的潜在复杂动态关系。在这项工作中,开发了一种基于递归神经网络(RNN)的软传感器,用于估算列车最后一个反应器的质量转换。主要的挑战是如何在通常的稳态运行和频繁的瞬态运行阶段都能对转换率进行充分估计。RNN 有三种结构:Elman、GRU(门控递归单元)和 LSTM(长短期记忆)三种 RNN 结构进行了比较,以严格评估其性能。此外,还进行了综合分析,以评估这些模型代表列车不同运行模式的能力。结果表明,GRU 网络在估计单体的质量转换方面表现最佳。然后,将所提出模型的性能与之前开发的 SS 进行了比较,后者是基于线性估计模型和贝叶斯偏差适应机制,并使用控制图进行决策。事实证明,这里提出的模型在估算单体的质量转换方面更为有效,尤其是在瞬态运行阶段。最后,为了评估设计 SS 所采用的方法,对相同的 RNN 架构进行了训练,以在线估算另一个质量变量:苯乙烯与共聚物结合的质量分数。得到的结果也是可以接受的。
{"title":"Estimation of quality variables in a continuous train of reactors using recurrent neural networks-based soft sensors","authors":"Mariano M. Perdomo ,&nbsp;Luis A. Clementi ,&nbsp;Jorge R. Vega","doi":"10.1016/j.chemolab.2024.105204","DOIUrl":"10.1016/j.chemolab.2024.105204","url":null,"abstract":"<div><p>The first stage in the industrial production of Styrene-Butadiene Rubber (SBR) typically consists in obtaining a latex from a train of continuous stirred tank reactors. Accurate real-time estimation of some key process variables is of paramount importance to ensure the production of high-quality rubber. Monitoring the mass conversion of monomers in the last reactor of the train is particularly important. To this effect, various soft sensors (SS) have been proposed, however they have not addressed the underlying complex dynamic relationships existing among the process variables. In this work, a SS based on recurrent neural networks (RNN) is developed to estimate the mass conversion in the last reactor of the train. The main challenge is to obtain an adequate estimate of the conversion both in its usual steady-state operation and during its frequent transient operating phases. Three architectures of RNN: Elman, GRU (Gated Recurrent Unit), and LSTM (Long Short-Term Memory) are compared to critically evaluate their performances. Moreover, a comprehensive analysis is conducted to assess the ability of these models to represent different operational modes of the train. The results reveal that the GRU network exhibits the best performance for estimating the mass conversion of monomers. Then, the performance of the proposed model is compared with a previously-developed SS, which was based on a linear estimation model with a Bayesian bias adaptation mechanism and the use of Control Charts for decision-making. The model proposed here proved to be more efficient for estimating the mass conversion of monomers, particularly during transient operating phases. Finally, to evaluate the methodology utilized for designing the SS, the same RNN architectures were trained to online estimate another quality variable: the mass fraction of Styrene bound to the copolymer. The obtained results were also acceptable.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105204"},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 1D-CNN model for the early detection of citrus Huanglongbing disease in the sieve plate of phloem tissue using micro-FTIR 利用微傅立叶变换红外技术建立早期检测柑橘黄龙病筛板韧皮部组织的一维-CNN 模型
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-14 DOI: 10.1016/j.chemolab.2024.105202
Biyun Yang , Zhiling Yang , Yong Xu , Wei Cheng , Fenglin Zhong , Dapeng Ye , Haiyong Weng

Among the most frequently diagnosed diseases in citrus, citrus Huanglongbing disease has caused severe economic losses to the citrus industry worldwide since there is no curable method and it spreads quickly. As callose accumulation in phloem is one of the early response events to Asian species Candidatus Liberibacter asiaticus (CLas) infection, the dynamic perception of the sieve plate region can be used as an indicator for the early diagnosis of citrus HLB disease. In this study, one-dimensional convolutional neural network (1D-CNN) models were established to achieve early detection of HLB disease based on spectral information in the sieve plate region using Fourier transform infrared microscopy (micro-FTIR) spectrometer. Partial least squares regression (PLSR) and the least squares support vector machine regression (LS-SVR) models are used for the prediction of callose based on the micro-FTIR information in the sieve plate region of the citrus midrib. Furthermore, an improved data augmentation method by superimposing Gaussian noise was proposed to expand the spectral amplitude. The proposed method has achieved 98.65 % classification accuracy, which was higher than that of other traditional algorithms such as the logistic model tree (LMT), linear discriminant analysis (LDA), Bayes (BS), support vector machine (SVM) and k-nearest neighbors (kNN), and also than that of the molecular detection qPCR (Quantitative real-time polymerase chain reaction) method. Finally, based on the established early detection model with laboratory samples, it can also be used to detect the citrus HLB in complex field samples by using model updating methods, and the overall detection accuracy of the model reached 91.21 %. Our approach has potential for the early diagnosis of citrus HLB disease from the microscopic scale, which would provide useful and precise guidelines to prevent and control citrus HLB disease.

在柑橘最常见的病害中,柑橘黄龙病由于没有可治愈的方法且传播迅速,给全球柑橘产业造成了严重的经济损失。由于韧皮部的胼胝质积累是亚洲物种黄龙病菌(CLas)感染的早期反应事件之一,因此筛板区域的动态感知可作为柑橘黄龙病的早期诊断指标。本研究利用傅立叶变换红外显微镜(micro-FTIR)光谱仪,基于筛板区域的光谱信息建立了一维卷积神经网络(1D-CNN)模型,以实现对 HLB 病害的早期检测。根据柑橘中脉筛板区域的显微傅立叶变换红外光谱信息,使用部分最小二乘回归(PLSR)和最小二乘支持向量机回归(LS-SVR)模型对胼胝质进行预测。此外,还提出了一种通过叠加高斯噪声来扩展光谱振幅的改进数据增强方法。所提出的方法达到了 98.65 % 的分类准确率,高于其他传统算法,如逻辑模型树(LMT)、线性判别分析(LDA)、贝叶斯(BS)、支持向量机(SVM)和 k-nearest neighbors(kNN),也高于分子检测 qPCR(定量实时聚合酶链反应)方法。最后,基于已建立的实验室样本早期检测模型,利用模型更新方法也可用于检测复杂田间样本中的柑橘 HLB,模型的总体检测准确率达到 91.21%。我们的方法有望从微观尺度上对柑橘 HLB 病害进行早期诊断,从而为防控柑橘 HLB 病害提供有用的精确指导。
{"title":"A 1D-CNN model for the early detection of citrus Huanglongbing disease in the sieve plate of phloem tissue using micro-FTIR","authors":"Biyun Yang ,&nbsp;Zhiling Yang ,&nbsp;Yong Xu ,&nbsp;Wei Cheng ,&nbsp;Fenglin Zhong ,&nbsp;Dapeng Ye ,&nbsp;Haiyong Weng","doi":"10.1016/j.chemolab.2024.105202","DOIUrl":"10.1016/j.chemolab.2024.105202","url":null,"abstract":"<div><p>Among the most frequently diagnosed diseases in citrus, citrus Huanglongbing disease has caused severe economic losses to the citrus industry worldwide since there is no curable method and it spreads quickly. As callose accumulation in phloem is one of the early response events to Asian species <em>Candidatus</em> Liberibacter asiaticus (<em>C</em>Las) infection, the dynamic perception of the sieve plate region can be used as an indicator for the early diagnosis of citrus HLB disease. In this study, one-dimensional convolutional neural network (1D-CNN) models were established to achieve early detection of HLB disease based on spectral information in the sieve plate region using Fourier transform infrared microscopy (micro-FTIR) spectrometer. Partial least squares regression (PLSR) and the least squares support vector machine regression (LS-SVR) models are used for the prediction of callose based on the micro-FTIR information in the sieve plate region of the citrus midrib. Furthermore, an improved data augmentation method by superimposing Gaussian noise was proposed to expand the spectral amplitude. The proposed method has achieved 98.65 % classification accuracy, which was higher than that of other traditional algorithms such as the logistic model tree (LMT), linear discriminant analysis (LDA), Bayes (BS), support vector machine (SVM) and k-nearest neighbors (kNN), and also than that of the molecular detection qPCR (Quantitative real-time polymerase chain reaction) method. Finally, based on the established early detection model with laboratory samples, it can also be used to detect the citrus HLB in complex field samples by using model updating methods, and the overall detection accuracy of the model reached 91.21 %. Our approach has potential for the early diagnosis of citrus HLB disease from the microscopic scale, which would provide useful and precise guidelines to prevent and control citrus HLB disease.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105202"},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture Gaussian process model with Gaussian mixture distribution for big data 针对大数据的高斯混合分布高斯过程模型
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-10 DOI: 10.1016/j.chemolab.2024.105201
Yaonan Guan , Shaoying He , Shuangshuang Ren , Shuren Liu , Dewei Li

In the era of chemical big data, the high complexity and strong interdependencies present in the datasets pose considerable challenges when constructing accurate parametric models. The Gaussian process model, owing to its non-parametric nature, demonstrates better adaptability when confronted with complex and interdependent data. However, the standard Gaussian process has two significant limitations. Firstly, the time complexity of inverting its kernel matrix during the inference process is O(n)3. Secondly, all data share a common kernel function parameter, which mixes different data types and reduces the model accuracy in mixing-category data identification problems. In light of this, this paper proposes a mixture Gaussian process model that addresses these limitations. This model reduces time complexity and distinguishes data based on different data features. It incorporates a Gaussian mixture distribution for the inducing variables to approximate the original data distribution. Stochastic Variational Inference is utilized to reduce the computational time required for parameter inference. The inducing variables have distinct parameters for the kernel function based on the data category, leading to improved analytical accuracy and reduced time complexity of the Gaussian process model. Numerical experiments are conducted to analyze and compare the performance of the proposed model on different-sized datasets and various data category cases.

在化学大数据时代,数据集的高度复杂性和强烈的相互依赖性给构建精确的参数模型带来了相当大的挑战。高斯过程模型由于其非参数性质,在面对复杂和相互依存的数据时表现出更好的适应性。然而,标准高斯过程有两个显著的局限性。首先,在推理过程中反演其核矩阵的时间复杂度为 O(n)3。其次,所有数据都共享一个共同的核函数参数,这就混合了不同的数据类型,降低了混合类别数据识别问题的模型精度。有鉴于此,本文提出了一种混合高斯过程模型来解决这些局限性。该模型降低了时间复杂性,并能根据不同的数据特征区分数据。它为诱导变量加入了高斯混合分布,以近似原始数据的分布。利用随机变量推理来减少参数推理所需的计算时间。诱导变量根据数据类别具有不同的核函数参数,从而提高了分析精度,降低了高斯过程模型的时间复杂性。通过数值实验,分析和比较了所提模型在不同规模数据集和不同数据类别情况下的性能。
{"title":"Mixture Gaussian process model with Gaussian mixture distribution for big data","authors":"Yaonan Guan ,&nbsp;Shaoying He ,&nbsp;Shuangshuang Ren ,&nbsp;Shuren Liu ,&nbsp;Dewei Li","doi":"10.1016/j.chemolab.2024.105201","DOIUrl":"10.1016/j.chemolab.2024.105201","url":null,"abstract":"<div><p>In the era of chemical big data, the high complexity and strong interdependencies present in the datasets pose considerable challenges when constructing accurate parametric models. The Gaussian process model, owing to its non-parametric nature, demonstrates better adaptability when confronted with complex and interdependent data. However, the standard Gaussian process has two significant limitations. Firstly, the time complexity of inverting its kernel matrix during the inference process is <span><math><mrow><mi>O</mi><msup><mrow><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span>. Secondly, all data share a common kernel function parameter, which mixes different data types and reduces the model accuracy in mixing-category data identification problems. In light of this, this paper proposes a mixture Gaussian process model that addresses these limitations. This model reduces time complexity and distinguishes data based on different data features. It incorporates a Gaussian mixture distribution for the inducing variables to approximate the original data distribution. Stochastic Variational Inference is utilized to reduce the computational time required for parameter inference. The inducing variables have distinct parameters for the kernel function based on the data category, leading to improved analytical accuracy and reduced time complexity of the Gaussian process model. Numerical experiments are conducted to analyze and compare the performance of the proposed model on different-sized datasets and various data category cases.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105201"},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online nonlinear data reconciliation to enhance nonlinear dynamic process monitoring using conditional dynamic variational autoencoder networks with particle filters 利用带粒子滤波器的条件动态变分自动编码器网络进行在线非线性数据调节,以加强非线性动态过程监控
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-10 DOI: 10.1016/j.chemolab.2024.105198
Kuanhsuan Chiu , Junghui Chen , Zhengjiang Zhang

In the chemical plants, data-driven process monitoring serves as a vital tool to ensure product quality and maintain production line safety. However, the accuracy of monitoring hinges directly upon the quality of process data. Given the inherently slow and complex nature of chemical processes, coupled with the potential for gross errors in process data leading to inaccuracies in model predictions, this paper proposes a method called Conditional Dynamic Variational Autoencoder combined with a Particle Filter (CDVAE-PF) for data reconciliation and subsequent process monitoring. CDVAE-PF leverages the capabilities of Conditional Dynamic Variational Autoencoder (CDVAE) to effectively model chemical process data in the presence of noise. This probabilistic model serves as the foundation for the Particle Filter (PF), which is employed for data reconciliation. Moreover, CDVAE-PF incorporates mechanisms to detect and rectify gross errors in process data, further enhancing its efficacy in data reconciliation. Subsequently, monitoring indices based on CDVAE are established to facilitate process monitoring. Through numerical simulations of a two-to-one variables Continuous Stirred Tank Reactor (CSTR) example and a fifteen-to-one variables dichloroethane distillation process from an actual chemical plant, CDVAE-PF demonstrates its effectiveness by reducing mean absolute error to 7.8 % and 12.8 % respectively in gross error data reconciliation. Moreover, in terms of monitoring performance, CDVAE-PF successfully mitigates misjudgments caused by gross errors, thereby significantly enhancing the reliability of process monitoring in chemical plants.

在化工厂,数据驱动的过程监控是确保产品质量和维护生产线安全的重要工具。然而,监控的准确性直接取决于过程数据的质量。鉴于化学过程本身的缓慢性和复杂性,以及过程数据中可能出现的严重错误导致模型预测的不准确性,本文提出了一种名为 "条件动态变异自动编码器与粒子滤波器相结合"(CDVAE-PF)的方法,用于数据调节和后续过程监控。CDVAE-PF 利用条件动态变异自动编码器 (CDVAE) 的功能,对存在噪声的化学过程数据进行有效建模。这种概率模型是粒子滤波器 (PF) 的基础,用于数据调节。此外,CDVAE-PF 还包含了检测和纠正过程数据中严重错误的机制,进一步提高了数据调节的效率。随后,建立了基于 CDVAE 的监控指数,以促进过程监控。通过对实际化工厂的二比一变量连续搅拌罐反应器(CSTR)实例和十五比一变量二氯乙烷蒸馏过程进行数值模拟,CDVAE-PF 证明了其有效性,在总误差数据调节中将平均绝对误差分别降低到 7.8 % 和 12.8 %。此外,在监测性能方面,CDVAE-PF 成功地减少了由重大误差引起的错误判断,从而显著提高了化工厂过程监测的可靠性。
{"title":"Online nonlinear data reconciliation to enhance nonlinear dynamic process monitoring using conditional dynamic variational autoencoder networks with particle filters","authors":"Kuanhsuan Chiu ,&nbsp;Junghui Chen ,&nbsp;Zhengjiang Zhang","doi":"10.1016/j.chemolab.2024.105198","DOIUrl":"10.1016/j.chemolab.2024.105198","url":null,"abstract":"<div><p>In the chemical plants, data-driven process monitoring serves as a vital tool to ensure product quality and maintain production line safety. However, the accuracy of monitoring hinges directly upon the quality of process data. Given the inherently slow and complex nature of chemical processes, coupled with the potential for gross errors in process data leading to inaccuracies in model predictions, this paper proposes a method called Conditional Dynamic Variational Autoencoder combined with a Particle Filter (CDVAE-PF) for data reconciliation and subsequent process monitoring. CDVAE-PF leverages the capabilities of Conditional Dynamic Variational Autoencoder (CDVAE) to effectively model chemical process data in the presence of noise. This probabilistic model serves as the foundation for the Particle Filter (PF), which is employed for data reconciliation. Moreover, CDVAE-PF incorporates mechanisms to detect and rectify gross errors in process data, further enhancing its efficacy in data reconciliation. Subsequently, monitoring indices based on CDVAE are established to facilitate process monitoring. Through numerical simulations of a two-to-one variables Continuous Stirred Tank Reactor (CSTR) example and a fifteen-to-one variables dichloroethane distillation process from an actual chemical plant, CDVAE-PF demonstrates its effectiveness by reducing mean absolute error to 7.8 % and 12.8 % respectively in gross error data reconciliation. Moreover, in terms of monitoring performance, CDVAE-PF successfully mitigates misjudgments caused by gross errors, thereby significantly enhancing the reliability of process monitoring in chemical plants.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105198"},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GA-XGBoost, an explainable AI technique, for analysis of thrombin inhibitory activity of diverse pool of molecules and supported by X-ray GA-XGBoost 是一种可解释的人工智能技术,用于分析不同分子池的凝血酶抑制活性,并得到 X 射线的支持
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-08 DOI: 10.1016/j.chemolab.2024.105197
Vijay H. Masand , Sami Al-Hussain , Abdullah Y. Alzahrani , Aamal A. Al-Mutairi , Arwa sultan Alqahtani , Abdul Samad , Gaurav S. Masand , Magdi E.A. Zaki

The present work involves extreme gradient boosting in combination with shapley values, a thriving amalgamation under the terrain of Explainable artificial intelligence, along with genetic algorithm for the analysis of thrombin inhibitory activity of diverse pool of 2803 molecules. The methodology involves genetic algorithm for feature selection, followed by extreme gradient boosting analysis. The eight parametric genetic algorithm - extreme gradient boosting analysis has high statistical acceptance with R2tr = 0.895, R2L10%O = 0.900, and Q2F3 = 0.873. Shapley additive explanations, which provide each variable in a model an importance value, served as the foundation for the interpretation. Then, ceteris paribus approach involving comparison of counterfactual examples has been used to understand the influence of a structural feature on activity profile. The analysis indicates that aromatic carbon, ring/non-ring nitrogen in combination with other structural features govern the inhibitory profile. The genetic algorithm - extreme gradient boosting model's simplicity and predictions suggest that “Explainable AI” is useful in the future for identifying and using structural features in drug discovery.

本研究将极端梯度提升法与 Shapley 值相结合,是可解释人工智能领域的一个蓬勃发展的组合,并结合遗传算法对 2803 种不同分子的凝血酶抑制活性进行了分析。该方法采用遗传算法进行特征选择,然后进行极端梯度提升分析。八参数遗传算法-极梯度提升分析的统计认可度很高,R2tr = 0.895,R2L10%O = 0.900,Q2F3 = 0.873。夏普利加法解释为模型中的每个变量提供了一个重要性值,是解释的基础。然后,通过比较反事实例子的比值法来了解结构特征对活性特征的影响。分析结果表明,芳香碳、环/非环氮与其他结构特征结合在一起,会对抑制作用产生影响。遗传算法-极端梯度提升模型的简易性和预测表明,"可解释的人工智能 "未来可用于在药物发现中识别和使用结构特征。
{"title":"GA-XGBoost, an explainable AI technique, for analysis of thrombin inhibitory activity of diverse pool of molecules and supported by X-ray","authors":"Vijay H. Masand ,&nbsp;Sami Al-Hussain ,&nbsp;Abdullah Y. Alzahrani ,&nbsp;Aamal A. Al-Mutairi ,&nbsp;Arwa sultan Alqahtani ,&nbsp;Abdul Samad ,&nbsp;Gaurav S. Masand ,&nbsp;Magdi E.A. Zaki","doi":"10.1016/j.chemolab.2024.105197","DOIUrl":"10.1016/j.chemolab.2024.105197","url":null,"abstract":"<div><p>The present work involves extreme gradient boosting in combination with shapley values, a thriving amalgamation under the terrain of Explainable artificial intelligence, along with genetic algorithm for the analysis of thrombin inhibitory activity of diverse pool of 2803 molecules. The methodology involves genetic algorithm for feature selection, followed by extreme gradient boosting analysis. The eight parametric genetic algorithm - extreme gradient boosting analysis has high statistical acceptance with R<sup>2</sup><sub>tr</sub> = 0.895, R<sup>2</sup><sub>L10%O</sub> = 0.900, and Q2F3 = 0.873. Shapley additive explanations, which provide each variable in a model an importance value, served as the foundation for the interpretation. Then, <em>ceteris paribus</em> approach involving comparison of counterfactual examples has been used to understand the influence of a structural feature on activity profile. The analysis indicates that aromatic carbon, ring/non-ring nitrogen in combination with other structural features govern the inhibitory profile. The genetic algorithm - extreme gradient boosting model's simplicity and predictions suggest that “Explainable AI” is useful in the future for identifying and using structural features in drug discovery.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105197"},"PeriodicalIF":3.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel feature selection framework for incomplete data 针对不完整数据的新型特征选择框架
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-06 DOI: 10.1016/j.chemolab.2024.105193
Cong Guo, Wei Yang, Zheng Li, Chun Liu

Feature selection on incomplete datasets is a challenging task. To address this challenge, existing methods first employ imputation methods to complete the dataset and then perform feature selection based on the imputed dataset. Since missing value imputation and feature selection are entirely independent, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To this end, we proposed a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: M-stage and W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. In particular, the feature importance output by the W-stage in the current iteration will be used as the input of the M-stage in the next iteration. Experimental results on artificial and real missing datasets demonstrate that the proposed method outperforms other approaches significantly.

在不完整数据集上进行特征选择是一项具有挑战性的任务。为了应对这一挑战,现有方法首先采用估算方法来完成数据集,然后根据估算数据集进行特征选择。由于缺失值估算和特征选择是完全独立的,因此在估算过程中无法考虑特征的重要性。然而,在现实世界的场景或数据集中,不同特征的重要程度各不相同。为此,我们提出了一种考虑特征重要性的新型不完整数据特征选择框架。该框架主要包括两个交替迭代阶段:M 阶段和 W 阶段。在 M 阶段,根据给定的特征重要性向量和多个初始估算结果对缺失值进行估算。在 W 阶段,采用改进的 reliefF 算法,根据估算数据学习特征重要性向量。特别是,W 阶段在当前迭代中输出的特征重要性将在下一次迭代中用作 M 阶段的输入。在人工和真实缺失数据集上的实验结果表明,所提出的方法明显优于其他方法。
{"title":"A novel feature selection framework for incomplete data","authors":"Cong Guo,&nbsp;Wei Yang,&nbsp;Zheng Li,&nbsp;Chun Liu","doi":"10.1016/j.chemolab.2024.105193","DOIUrl":"10.1016/j.chemolab.2024.105193","url":null,"abstract":"<div><p>Feature selection on incomplete datasets is a challenging task. To address this challenge, existing methods first employ imputation methods to complete the dataset and then perform feature selection based on the imputed dataset. Since missing value imputation and feature selection are entirely independent, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To this end, we proposed a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: M-stage and W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. In particular, the feature importance output by the W-stage in the current iteration will be used as the input of the M-stage in the next iteration. Experimental results on artificial and real missing datasets demonstrate that the proposed method outperforms other approaches significantly.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105193"},"PeriodicalIF":3.7,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural attributes driving λmax towards NIR region: A QSPR approach 驱动 λmax 向近红外区域移动的结构属性:QSPR 方法
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-06 DOI: 10.1016/j.chemolab.2024.105199
Payal Rani , Sandhya Chahal , Priyanka , Parvin Kumar , Devender Singh , Jayant Sindhu

Near-infrared materials find extensive applications in bio-sensing, photodynamic treatment, anti-counterfeiting and opto-electronics. Their progress has notably expanded possibilities in optical communication systems, non-invasive imaging and targeted therapy, benefiting fields such as material science, medicine, tele-communication and biology. In light of these advancements, developments of near-infrared region (NIR) based probes are highly desirable. Moreover, the prediction of the optical properties of a compound prior to its synthesis can diminish the need for expensive experimental testing. Considering the importance of prior prediction, we herein present QSPR models for the prediction of absorption maxima using a dataset of 384 compounds. The aim of the present study is to identify molecular features that could shift their λmax in the near-infrared region. The Monte Carlo Optimization approach along with the index of ideality of correlation (TF2) has been utilized using CORAL 2019 software for the development of ten splits. The predictability of the resulting ten models was assessed using various validation metrics. The model derived from the tenth split proved to be efficient, exhibiting RValidation2=0.8561, IIC=0.7849andQ2=0.8512. Good and bad fragments were also identified that are responsible for the change in absorption maxima (λmax). Identified fragments were utilized for designing ten new molecules to evaluate their reliability. It was observed that molecules designed using positive attributes shifted the absorption maxima towards the near-infrared region, specifically between 711 and 893 nm. This study opens up new possibilities for the advancement of NIR-based chromophores and will contribute significantly by reducing the overall cost of chromophore development.

近红外材料在生物传感、光动力治疗、防伪和光电子学方面有着广泛的应用。近红外材料的发展极大地拓展了光通信系统、无创成像和靶向治疗的可能性,使材料科学、医学、远程通信和生物学等领域受益匪浅。鉴于这些进步,开发基于近红外区域(NIR)的探针是非常有必要的。此外,在合成之前预测化合物的光学特性可以减少昂贵的实验测试需求。考虑到事先预测的重要性,我们在此利用 384 种化合物的数据集提出了预测吸收最大值的 QSPR 模型。本研究的目的是找出可能使其在近红外区域的 λmax 发生变化的分子特征。使用 CORAL 2019 软件,利用蒙特卡洛优化方法和相关性理想指数(TF2)开发了十个分裂模型。利用各种验证指标评估了所生成的十个模型的可预测性。第十次拆分得出的模型被证明是有效的,显示出 RValidation2=0.8561、IIC=0.7849 和 Q2=0.8512。此外,还确定了导致吸收最大值(λmax)变化的好片段和坏片段。利用鉴定出的片段设计了 10 个新分子,以评估其可靠性。结果表明,利用正面属性设计的分子将吸收最大值转移到了近红外区域,特别是 711 纳米和 893 纳米之间。这项研究为开发基于近红外的发色团提供了新的可能性,并将大大降低发色团开发的总体成本。
{"title":"Structural attributes driving λmax towards NIR region: A QSPR approach","authors":"Payal Rani ,&nbsp;Sandhya Chahal ,&nbsp;Priyanka ,&nbsp;Parvin Kumar ,&nbsp;Devender Singh ,&nbsp;Jayant Sindhu","doi":"10.1016/j.chemolab.2024.105199","DOIUrl":"10.1016/j.chemolab.2024.105199","url":null,"abstract":"<div><p>Near-infrared materials find extensive applications in <em>bio</em>-sensing, photodynamic treatment, anti-counterfeiting and <em>opto</em>-electronics. Their progress has notably expanded possibilities in optical communication systems, non-invasive imaging and targeted therapy, benefiting fields such as material science, medicine, tele-communication and biology. In light of these advancements, developments of near-infrared region (NIR) based probes are highly desirable. Moreover, the prediction of the optical properties of a compound prior to its synthesis can diminish the need for expensive experimental testing. Considering the importance of prior prediction, we herein present QSPR models for the prediction of absorption maxima using a dataset of 384 compounds. The aim of the present study is to identify molecular features that could shift their <span><math><mrow><msub><mi>λ</mi><mi>max</mi></msub></mrow></math></span> in the near-infrared region. The Monte Carlo Optimization approach along with the index of ideality of correlation (TF<sub>2</sub>) has been utilized using CORAL 2019 software for the development of ten splits. The predictability of the resulting ten models was assessed using various validation metrics. The model derived from the tenth split proved to be efficient, exhibiting <span><math><mrow><msubsup><mi>R</mi><mrow><mi>V</mi><mi>a</mi><mi>l</mi><mi>i</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow><mn>2</mn></msubsup><mo>=</mo><mn>0.8561</mn></mrow></math></span>, <span><math><mrow><mi>I</mi><mi>I</mi><mi>C</mi><mo>=</mo><mn>0.7849</mn><mspace></mspace><mi>a</mi><mi>n</mi><mi>d</mi><mspace></mspace><msup><mi>Q</mi><mn>2</mn></msup><mo>=</mo><mn>0.8512</mn></mrow></math></span>. Good and bad fragments were also identified that are responsible for the change in absorption maxima (<span><math><mrow><msub><mi>λ</mi><mi>max</mi></msub></mrow></math></span>). Identified fragments were utilized for designing ten new molecules to evaluate their reliability. It was observed that molecules designed using positive attributes shifted the absorption maxima towards the near-infrared region, specifically between 711 and 893 nm. This study opens up new possibilities for the advancement of NIR-based chromophores and will contribute significantly by reducing the overall cost of chromophore development.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105199"},"PeriodicalIF":3.7,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141985556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1