首页 > 最新文献

Chemometrics and Intelligent Laboratory Systems最新文献

英文 中文
Analyzing topological descriptors of guar gum and its derivatives for predicting physical properties in carbohydrates 分析瓜尔胶及其衍生物的拓扑描述符以预测碳水化合物的物理性质
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-24 DOI: 10.1016/j.chemolab.2024.105203
Xiujun Zhang , Shamaila Yousaf , Anisa Naeem , Ferdous M. Tawfiq , Adnan Aslam

Guar gum is a non-ionic polysaccharide found in abundance in nature. It may be used as a thickening agent, stabilizer, or emulsifier in pharmaceutical formulations, food products, or cosmetics. Its ability to form viscous solutions makes it useful in drug delivery systems, controlled release formulations, and as a matrix for oral drug delivery. The investigation of chemical structures through graph invariants is of great concern. Topological descriptors are numerical numbers associated with the molecular structure and have the ability to predict certain physical and chemical properties of the underlying structure. In this paper, we have calculated the harmonic index, the inverse sum indeg index, the third Zagreb index, the Hyper Zagreb index, the sigma index, the reformulated first Zagreb index, the reformulated multiplicative first Zagreb index, the Harmonic–arithmetic index, and the Atom Bond sum connectivity indices of guar gum and its chemical derivatives. Finally, the chemical applicability of these topological descriptors is checked for different carbohydrates (monosaccharides, disaccharides, and polysaccharides) by using straight-line, parabolic and logarithmic regression models. It has been observed that these topological descriptors are useful to predict two physical properties, namely density and molecular weight.

瓜尔胶是一种非离子多糖,在自然界中含量丰富。它可在药物配方、食品或化妆品中用作增稠剂、稳定剂或乳化剂。它能形成粘性溶液,因此可用于给药系统、控释配方和口服给药基质。通过图不变式研究化学结构备受关注。拓扑描述符是与分子结构相关联的数字,能够预测底层结构的某些物理和化学特性。本文计算了瓜尔胶及其化学衍生物的谐波指数、逆和 indeg 指数、第三萨格勒布指数、超萨格勒布指数、西格玛指数、重构第一萨格勒布指数、重构乘法第一萨格勒布指数、谐波算术指数和原子键和连通性指数。最后,通过使用直线、抛物线和对数回归模型,检验了这些拓扑描述符对不同碳水化合物(单糖、双糖和多糖)的化学适用性。结果表明,这些拓扑描述符有助于预测两种物理性质,即密度和分子量。
{"title":"Analyzing topological descriptors of guar gum and its derivatives for predicting physical properties in carbohydrates","authors":"Xiujun Zhang ,&nbsp;Shamaila Yousaf ,&nbsp;Anisa Naeem ,&nbsp;Ferdous M. Tawfiq ,&nbsp;Adnan Aslam","doi":"10.1016/j.chemolab.2024.105203","DOIUrl":"10.1016/j.chemolab.2024.105203","url":null,"abstract":"<div><p>Guar gum is a non-ionic polysaccharide found in abundance in nature. It may be used as a thickening agent, stabilizer, or emulsifier in pharmaceutical formulations, food products, or cosmetics. Its ability to form viscous solutions makes it useful in drug delivery systems, controlled release formulations, and as a matrix for oral drug delivery. The investigation of chemical structures through graph invariants is of great concern. Topological descriptors are numerical numbers associated with the molecular structure and have the ability to predict certain physical and chemical properties of the underlying structure. In this paper, we have calculated the harmonic index, the inverse sum indeg index, the third Zagreb index, the Hyper Zagreb index, the sigma index, the reformulated first Zagreb index, the reformulated multiplicative first Zagreb index, the Harmonic–arithmetic index, and the Atom Bond sum connectivity indices of guar gum and its chemical derivatives. Finally, the chemical applicability of these topological descriptors is checked for different carbohydrates (monosaccharides, disaccharides, and polysaccharides) by using straight-line, parabolic and logarithmic regression models. It has been observed that these topological descriptors are useful to predict two physical properties, namely density and molecular weight.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105203"},"PeriodicalIF":3.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression 利用引导偏最小二乘法回归解释高维确定性筛选设计
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-24 DOI: 10.1016/j.chemolab.2024.105218
Knut Dyrstad , Frank Westad

Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.

A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.

确定性筛选设计(DSD)具有优化特性,能以最少的实验次数估算主效应、交互效应和变量平方效应,因此已成为化学、制药和生物制药工艺及产品开发中广泛使用的一种实验设计类型。这些高维 DOEs 变量多于样本,而且变量之间存在部分相关性,因此统计解释经常具有挑战性。本研究的目的是使用遗传程序对自举 PLSR 进行测试,以选择最终由 MLR 评估的变量子集。遗传选择基于原始 PLSR 系数(B)除以引导估计标准偏差得出的引导 T 值。通过对 7 个主要变量和 12 个测试文献真实数据(4、5、7 和 8 个主要变量)的模拟研究发现,与两种常见的 DSD 参考方法(DSD 拟合确定性筛选和 AICc 向前逐步回归(AICc FSR))相比,自举 PLSR MLR 方法在小 DSD 特别是大 DSD 中的模型性能有所改善。与两种参考方法中的任何一种方法得出的最佳模型相比,在 13 个 DSD 中,有 6 个的变量选择准确性和预测能力得到了显著提高。其余 7 个 DSD 的模型与最佳参考模型相同。本研究发现,强遗传为所有真实数据提供了最佳模型。在 SVEM FSR 变量效应非零百分比上使用遗传程序,然后使用 MLR,显示出了很好的结果。AICc Lasso 回归是部分测试的其他方法之一,在对三个大型最小 DSD 进行测试时,发现几乎所有变量的效应都为零。虽然 DSD 拟合确定性筛选方法通常可能是 DSD 的首选,但遗传自举 PLSR MLR 和遗传 SVEM FSR MLR 可能是改进变量选择和模型精度的替代方法。
{"title":"Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression","authors":"Knut Dyrstad ,&nbsp;Frank Westad","doi":"10.1016/j.chemolab.2024.105218","DOIUrl":"10.1016/j.chemolab.2024.105218","url":null,"abstract":"<div><p>Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.</p><p>A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105218"},"PeriodicalIF":3.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NIR and MIR spectral feature information fusion strategy for multivariate quantitative analysis of tobacco components 用于烟草成分多元定量分析的近红外和中红外光谱特征信息融合策略
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-23 DOI: 10.1016/j.chemolab.2024.105222
Honghong Wang , Qiong Wu , Wuye Yang , Jie Yu , Ting Wu , Zhixin Xiong , Yiping Du

The determination of total nicotine, total sugar, reducing sugar and total nitrogen contents in tobacco is of great significance to tobacco quality evaluation and formulation design. To quickly detect the content of 4 components of tobacco, using near-infrared (NIR) and mid-infrared (MIR) spectral data from 129 solid samples of tobacco powder provided by Shanghai Tobacco Group Co., Ltd., Two NIR-MIR spectral fusion techniques are studied, that is, fusion technology 1 is to establish a model by fusing feature variables after variable selection of each spectrum. The fusion technology 2 is to first fuse the NIR-MIR spectral data and then select the variables to establish the model. Both fusion technologies use successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), backward interval PLS (biPLS), forward interval PLS (fiPLS), synergy interval PLS (siPLS), and interval interaction moving window partial least squares (iMWPLS) algorithms to filter wavelength variables. The results showed that for total nicotine and total sugar, the PLSR model established by fusion technology method 2 combined with iMWPLS algorithm is the best, and its RMSEP decreases from 0.2314 to 1.3225 to 0.0821 and 0.8079 respectively compared with the full spectrum fusion method, which is superior to the single NIR and MIR models and NIR-MIR fusion technology 1. For reducing sugars, the simple full-spectrum fusion model has the best analytical ability and the lowest RMSEP, which is superior to the single NIR-MIR models and all models established by two spectral fusion techniques combined with six wavelength selection algorithms. For total nitrogen, the prediction effect of fusion technology 1 combined with iMWPLS algorithm model was significantly improved compared with single NIR and MIR models and NIR-MIR fusion technology 2, and its RMSEP was 0.0634. The results showed that the two NIR-MIR spectral fusion techniques made full use of the complementary information provided by NIR and MIR spectroscopy, and successfully applied them to the rapid detection of total nicotine, total sugar, reducing sugar and total nitrogen content in tobacco, which provided a new method and idea for the rapid detection of tobacco components.

烟叶中总烟碱、总糖、还原糖和总氮含量的测定对烟叶质量评价和配方设计具有重要意义。为了快速检测烟草中 4 种成分的含量,利用上海烟草集团有限责任公司提供的 129 个烟草粉末固体样品的近红外和中红外光谱数据,研究了两种近红外-中红外光谱融合技术,即融合技术 1 是在对每个光谱进行变量选择后,通过融合特征变量建立模型。融合技术 2 是先融合近红外-红外光谱数据,然后选择变量建立模型。两种融合技术都使用了连续预测算法(SPA)、竞争性自适应加权采样(CARS)、后向区间PLS(biPLS)、前向区间PLS(fiPLS)、协同区间PLS(siPLS)和区间交互移动窗偏最小二乘法(iMWPLS)算法来筛选波长变量。结果表明,对于总尼古丁和总糖,融合技术方法 2 结合 iMWPLS 算法建立的 PLSR 模型效果最好,与全光谱融合方法相比,其 RMSEP 分别从 0.2314 到 1.3225 下降到 0.0821 和 0.8079,优于单一的近红外和中红外模型以及近红外-中红外融合技术 1。对于还原糖,简单的全谱融合模型的分析能力最强,RMSEP 最低,优于单一的近红外-中红外模型和所有由两种光谱融合技术结合六种波长选择算法建立的模型。对于总氮,融合技术 1 结合 iMWPLS 算法模型的预测效果较单一近红外和中红外模型以及近红外-中红外融合技术 2 有显著提高,其 RMSEP 为 0.0634。结果表明,两种近红外-近红外光谱融合技术充分利用了近红外光谱和近红外光谱提供的互补信息,成功地应用于烟草中总烟碱、总糖、还原糖和总氮含量的快速检测,为烟草成分的快速检测提供了一种新的方法和思路。
{"title":"NIR and MIR spectral feature information fusion strategy for multivariate quantitative analysis of tobacco components","authors":"Honghong Wang ,&nbsp;Qiong Wu ,&nbsp;Wuye Yang ,&nbsp;Jie Yu ,&nbsp;Ting Wu ,&nbsp;Zhixin Xiong ,&nbsp;Yiping Du","doi":"10.1016/j.chemolab.2024.105222","DOIUrl":"10.1016/j.chemolab.2024.105222","url":null,"abstract":"<div><p>The determination of total nicotine, total sugar, reducing sugar and total nitrogen contents in tobacco is of great significance to tobacco quality evaluation and formulation design. To quickly detect the content of 4 components of tobacco, using near-infrared (NIR) and mid-infrared (MIR) spectral data from 129 solid samples of tobacco powder provided by Shanghai Tobacco Group Co., Ltd., Two NIR-MIR spectral fusion techniques are studied, that is, fusion technology 1 is to establish a model by fusing feature variables after variable selection of each spectrum. The fusion technology 2 is to first fuse the NIR-MIR spectral data and then select the variables to establish the model. Both fusion technologies use successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), backward interval PLS (biPLS), forward interval PLS (fiPLS), synergy interval PLS (siPLS), and interval interaction moving window partial least squares (iMWPLS) algorithms to filter wavelength variables. The results showed that for total nicotine and total sugar, the PLSR model established by fusion technology method 2 combined with iMWPLS algorithm is the best, and its RMSEP decreases from 0.2314 to 1.3225 to 0.0821 and 0.8079 respectively compared with the full spectrum fusion method, which is superior to the single NIR and MIR models and NIR-MIR fusion technology 1. For reducing sugars, the simple full-spectrum fusion model has the best analytical ability and the lowest RMSEP, which is superior to the single NIR-MIR models and all models established by two spectral fusion techniques combined with six wavelength selection algorithms. For total nitrogen, the prediction effect of fusion technology 1 combined with iMWPLS algorithm model was significantly improved compared with single NIR and MIR models and NIR-MIR fusion technology 2, and its RMSEP was 0.0634. The results showed that the two NIR-MIR spectral fusion techniques made full use of the complementary information provided by NIR and MIR spectroscopy, and successfully applied them to the rapid detection of total nicotine, total sugar, reducing sugar and total nitrogen content in tobacco, which provided a new method and idea for the rapid detection of tobacco components.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105222"},"PeriodicalIF":3.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint state and process inputs estimation for state-space models with Student’s t-distribution 采用学生 t 分布的状态空间模型的状态和过程输入联合估计
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-23 DOI: 10.1016/j.chemolab.2024.105220
Hang Ci, Chengxi Zhang, Shunyi Zhao

This paper proposes a joint state and unknown inputs (UIs) discrete-time estimation method for industrial processes, represented by a state-space model. To cope with the outliers in process data, the measurement noise is characterized by the Student’s t-distribution. The identification of UIs is accomplished through the recursive expectation–maximization (REM) approach. Specifically, in the E-step, a recursively calculated Q-function is formulated by the maximum likelihood criterion, and the states and the variance scale factor are estimated iteratively. In the M-step, UIs are updated analytically together with the degree of freedom is updated approximately. The effectiveness of the proposed algorithm is validated using a quadruple water tank process and a continuous stirred tank reactor. It shows that the proposed method significantly enhances the robustness and estimation accuracy of state and UIs in industrial processes, effectively handling outliers and reducing computational demands for real-time applications.

本文提出了一种以状态空间模型为代表的工业过程状态和未知输入(UIs)离散时间联合估计方法。为了应对过程数据中的异常值,测量噪声采用了 Student's t 分布。UIs 的识别是通过递归期望最大化(REM)方法完成的。具体来说,在 E 步中,通过最大似然准则制定递归计算的 Q 函数,并对状态和方差比例因子进行迭代估计。在 M 步中,UIs 是通过分析更新的,自由度也是近似更新的。利用四重水槽工艺和连续搅拌罐反应器验证了所提算法的有效性。结果表明,所提出的方法大大提高了工业过程中状态和 UI 的鲁棒性和估计精度,有效地处理了异常值,降低了实时应用的计算需求。
{"title":"Joint state and process inputs estimation for state-space models with Student’s t-distribution","authors":"Hang Ci,&nbsp;Chengxi Zhang,&nbsp;Shunyi Zhao","doi":"10.1016/j.chemolab.2024.105220","DOIUrl":"10.1016/j.chemolab.2024.105220","url":null,"abstract":"<div><p>This paper proposes a joint state and unknown inputs (UIs) discrete-time estimation method for industrial processes, represented by a state-space model. To cope with the outliers in process data, the measurement noise is characterized by the Student’s t-distribution. The identification of UIs is accomplished through the recursive expectation–maximization (REM) approach. Specifically, in the E-step, a recursively calculated Q-function is formulated by the maximum likelihood criterion, and the states and the variance scale factor are estimated iteratively. In the M-step, UIs are updated analytically together with the degree of freedom is updated approximately. The effectiveness of the proposed algorithm is validated using a quadruple water tank process and a continuous stirred tank reactor. It shows that the proposed method significantly enhances the robustness and estimation accuracy of state and UIs in industrial processes, effectively handling outliers and reducing computational demands for real-time applications.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105220"},"PeriodicalIF":3.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining algorithm techniques with mechanical and acoustic profiles for the prediction of apples sensory attributes 将算法技术与机械和声学特征相结合,预测苹果的感官属性
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-22 DOI: 10.1016/j.chemolab.2024.105217
Riccardo Ricci , Annachiara Berardinelli , Flavia Gasperi , Isabella Endrizzi , Farid Melgani , Eugenio Aprea

The research work shows the potentiality of advanced linear and nonlinear learning algorithm techniques in the prediction of apples texture sensory attributes as “hardness”, “crunchiness”, “flouriness”, “fibrousness”, and “graininess”. Starting from the information contained in the entire mechanical and acoustic curves acquired during samples compression test, the prediction performances of five different statistical tools as Partial Least Squares regression (PLS), Multilayer Perceptron (MLP), Support Vector Regression (SVR) and Gaussian Process Regression (GPR) are shown and discussed.

All Predictive models validations evidence best accuracies for texture sensory attributes “hardness” and “crunchiness” and in general for GPR learning algorithm. By combining mechanical and acoustic profiles, 5-fold cross validations produce values of coefficient of determination R2 up to 0.885 (GPR) and 0.840 (GPR), respectively for “hardness” and “crunchiness”. These results, comparable to those obtained by considering a large number of mechanical and acoustic parameters extracted from acquired profiles as predictive factors, evidence a new and reliable way for the prediction of texture sensory attributes of apples. The proposed approach can overcome the necessity to define, in advance, number and type of features to be calculated from instrumental texture profiles and can be easily implemented in an automatic process.

这项研究工作表明,先进的线性和非线性学习算法技术在预测苹果的 "硬度"、"脆度"、"粉度"、"纤维度 "和 "颗粒度 "等质地感官属性方面具有潜力。从样品压缩测试过程中获取的整个机械和声学曲线所包含的信息出发,展示并讨论了五种不同统计工具的预测性能,包括偏最小二乘回归(PLS)、多层感知器(MLP)、支持向量回归(SVR)和高斯过程回归(GPR)。通过结合机械和声学特征,5 倍交叉验证得出的 "硬度 "和 "松脆度 "判定系数 R2 值分别高达 0.885(GPR)和 0.840(GPR)。这些结果与将从获取的剖面图中提取的大量机械和声学参数作为预测因子所获得的结果相当,证明这是预测苹果质地感官属性的一种可靠的新方法。所提出的方法无需事先确定从仪器纹理剖面中计算出的特征的数量和类型,而且可以很容易地在自动流程中实施。
{"title":"Combining algorithm techniques with mechanical and acoustic profiles for the prediction of apples sensory attributes","authors":"Riccardo Ricci ,&nbsp;Annachiara Berardinelli ,&nbsp;Flavia Gasperi ,&nbsp;Isabella Endrizzi ,&nbsp;Farid Melgani ,&nbsp;Eugenio Aprea","doi":"10.1016/j.chemolab.2024.105217","DOIUrl":"10.1016/j.chemolab.2024.105217","url":null,"abstract":"<div><p>The research work shows the potentiality of advanced linear and nonlinear learning algorithm techniques in the prediction of apples texture sensory attributes as “hardness”, “crunchiness”, “flouriness”, “fibrousness”, and “graininess”. Starting from the information contained in the entire mechanical and acoustic curves acquired during samples compression test, the prediction performances of five different statistical tools as Partial Least Squares regression (PLS), Multilayer Perceptron (MLP), Support Vector Regression (SVR) and Gaussian Process Regression (GPR) are shown and discussed.</p><p>All Predictive models validations evidence best accuracies for texture sensory attributes “hardness” and “crunchiness” and in general for GPR learning algorithm. By combining mechanical and acoustic profiles, 5-fold cross validations produce values of coefficient of determination R<sup>2</sup> up to 0.885 (GPR) and 0.840 (GPR), respectively for “hardness” and “crunchiness”. These results, comparable to those obtained by considering a large number of mechanical and acoustic parameters extracted from acquired profiles as predictive factors, evidence a new and reliable way for the prediction of texture sensory attributes of apples. The proposed approach can overcome the necessity to define, in advance, number and type of features to be calculated from instrumental texture profiles and can be easily implemented in an automatic process.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105217"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combination of machine learning and COSMO-RS thermodynamic model in predicting solubility parameters of coformers in production of cocrystals for enhanced drug solubility 结合机器学习和 COSMO-RS 热力学模型预测共形物的溶解度参数,生产提高药物溶解度的共晶体
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-22 DOI: 10.1016/j.chemolab.2024.105219
Wael A. Mahdi , Ahmad J. Obaidullah
<div><p>In this study, we develop predictive models for three target variables, denoted as <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>, <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span>, and <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> using a dataset with 86 features and 181 samples. The response parameters, which are Hansen solubility parameters, were correlated to input parameters via several machine learning techniques. The input features are molecular descriptors of coformers which are calculated based on COMSO-RS thermodynamic model and group contribution approach. The analysis includes outlier detection via Cook's distance, normalization with a min-max scaler, and feature selection through L1-based methods. Three regression models—Gaussian Process Regression (GPR), Passive Aggressive Regression (PAR), and Polynomial Regression (PR)—are employed, with hyperparameter optimization achieved using Transient Search Optimization (TSO). The results indicate that for <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>, the PAR model outperforms others with an R<sup>2</sup> score of 0.885, RMSE of 0.607, MAE of 0.524, and a maximum error of 1.294. The GPR model shows slightly lower performance with an R<sup>2</sup> of 0.872, RMSE of 0.816, MAE of 0.579, and a maximum error of 2.755 for <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span>. The PR model performs on <span><math><mrow><msub><mi>δ</mi><mi>d</mi></msub></mrow></math></span> with an R<sup>2</sup> of 0.814, RMSE of 0.923, MAE of 0.597, and a maximum error of 2.814. For <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span>, the GPR model provides the best performance, achieving an R<sup>2</sup> score of 0.821, RMSE of 1.693, MAE of 1.391, and a maximum error of 3.457. The PAR model performs on <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span> with an R<sup>2</sup> of 0.740, RMSE of 2.025, MAE of 1.980, and a maximum error of 6.609. Also, The PR model predicts <span><math><mrow><msub><mi>δ</mi><mi>p</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.7, RMSE of 2.329, MAE of 2.02, and maximum error of 6.366. Similarly, for <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span>, the GPR model again shows superior performance with an R<sup>2</sup> score of 0.983, RMSE of 1.243, MAE of 1.005, and a maximum error of 2.577. The PAR model also accurately predicts <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.924, RMSE of 2.713, MAE of 2.416, and maximum error of 6.307. Additionally, the PR model predicts <span><math><mrow><msub><mi>δ</mi><mi>h</mi></msub></mrow></math></span> with a R<sup>2</sup> of 0.927, RMSE of 2.757, MAE of 2.334, and maximum error of 8.064. These results highlight the efficacy of the chosen models and optimization techniques in accurately p
在本研究中,我们利用一个包含 86 个特征和 181 个样本的数据集开发了三个目标变量的预测模型,分别称为 δd、δp 和 δh。响应参数(即汉森溶解度参数)通过几种机器学习技术与输入参数相关联。输入特征是根据 COMSO-RS 热力学模型和基团贡献法计算得出的共配体分子描述符。分析包括通过库克距离(Cook's distance)进行离群点检测,使用最小-最大标度器进行归一化,以及通过基于 L1 的方法进行特征选择。采用了三种回归模型--高斯过程回归(GPR)、被动渐进回归(PAR)和多项式回归(PR),并通过瞬态搜索优化(TSO)实现了超参数优化。结果表明,对于 δd,PAR 模型的性能优于其他模型,R2 得分为 0.885,RMSE 为 0.607,MAE 为 0.524,最大误差为 1.294。GPR 模型的性能略低,δd 的 R2 为 0.872,RMSE 为 0.816,MAE 为 0.579,最大误差为 2.755。PR 模型对 δd 的 R2 为 0.814,RMSE 为 0.923,MAE 为 0.597,最大误差为 2.814。对于δp,GPR 模型性能最佳,R2 为 0.821,RMSE 为 1.693,MAE 为 1.391,最大误差为 3.457。PAR 模型预测 δp 的 R2 为 0.740,RMSE 为 2.025,MAE 为 1.980,最大误差为 6.609。同样,PR 模型预测 δp 的 R2 为 0.7,RMSE 为 2.329,MAE 为 2.02,最大误差为 6.366。同样,对于 δh,GPR 模型再次显示出卓越的性能,R2 为 0.983,RMSE 为 1.243,MAE 为 1.005,最大误差为 2.577。PAR 模型也能准确预测 δh,R2 为 0.924,RMSE 为 2.713,MAE 为 2.416,最大误差为 6.307。此外,PR 模型预测 δh 的 R2 为 0.927,RMSE 为 2.757,MAE 为 2.334,最大误差为 8.064。这些结果凸显了所选模型和优化技术在准确预测指定输出方面的功效,显示了在相关预测建模任务中的巨大应用潜力。
{"title":"Combination of machine learning and COSMO-RS thermodynamic model in predicting solubility parameters of coformers in production of cocrystals for enhanced drug solubility","authors":"Wael A. Mahdi ,&nbsp;Ahmad J. Obaidullah","doi":"10.1016/j.chemolab.2024.105219","DOIUrl":"10.1016/j.chemolab.2024.105219","url":null,"abstract":"&lt;div&gt;&lt;p&gt;In this study, we develop predictive models for three target variables, denoted as &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, and &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; using a dataset with 86 features and 181 samples. The response parameters, which are Hansen solubility parameters, were correlated to input parameters via several machine learning techniques. The input features are molecular descriptors of coformers which are calculated based on COMSO-RS thermodynamic model and group contribution approach. The analysis includes outlier detection via Cook's distance, normalization with a min-max scaler, and feature selection through L1-based methods. Three regression models—Gaussian Process Regression (GPR), Passive Aggressive Regression (PAR), and Polynomial Regression (PR)—are employed, with hyperparameter optimization achieved using Transient Search Optimization (TSO). The results indicate that for &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, the PAR model outperforms others with an R&lt;sup&gt;2&lt;/sup&gt; score of 0.885, RMSE of 0.607, MAE of 0.524, and a maximum error of 1.294. The GPR model shows slightly lower performance with an R&lt;sup&gt;2&lt;/sup&gt; of 0.872, RMSE of 0.816, MAE of 0.579, and a maximum error of 2.755 for &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;. The PR model performs on &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with an R&lt;sup&gt;2&lt;/sup&gt; of 0.814, RMSE of 0.923, MAE of 0.597, and a maximum error of 2.814. For &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, the GPR model provides the best performance, achieving an R&lt;sup&gt;2&lt;/sup&gt; score of 0.821, RMSE of 1.693, MAE of 1.391, and a maximum error of 3.457. The PAR model performs on &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with an R&lt;sup&gt;2&lt;/sup&gt; of 0.740, RMSE of 2.025, MAE of 1.980, and a maximum error of 6.609. Also, The PR model predicts &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with a R&lt;sup&gt;2&lt;/sup&gt; of 0.7, RMSE of 2.329, MAE of 2.02, and maximum error of 6.366. Similarly, for &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, the GPR model again shows superior performance with an R&lt;sup&gt;2&lt;/sup&gt; score of 0.983, RMSE of 1.243, MAE of 1.005, and a maximum error of 2.577. The PAR model also accurately predicts &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with a R&lt;sup&gt;2&lt;/sup&gt; of 0.924, RMSE of 2.713, MAE of 2.416, and maximum error of 6.307. Additionally, the PR model predicts &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with a R&lt;sup&gt;2&lt;/sup&gt; of 0.927, RMSE of 2.757, MAE of 2.334, and maximum error of 8.064. These results highlight the efficacy of the chosen models and optimization techniques in accurately p","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105219"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model development using hybrid method for prediction of drug release from biomaterial matrix 利用混合法开发模型,预测生物材料基质中的药物释放量
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-22 DOI: 10.1016/j.chemolab.2024.105216
Mohammed Alqarni , Shaimaa Mohammed Al Harthi , Mohammed Abdullah Alzubaidi , Ali Abdullah Alqarni , Bandar Saud Shukr , Hassan Talat Shawli

A comprehensive multi-scale computational strategy was developed in this study based on mass transfer and machine learning for simulation of drug concentration distribution in a biomaterial matrix. The controlled release was modeled and validated via the hybrid model. Mass transfer equations along with kinetics models were solved numerically and the results were then used for machine learning models. We investigated the performance of three regression models, namely Decision Tree (DT), Random Forest (RF), and Extra Tree (ET) in predicting medicine concentration (C) based on r and z data. Hyper-parameter optimization is conducted using Glowworm Swarm Optimization (GSO). Results revealed high predictive accuracy across all models, with ET demonstrating superior performance, achieving a coefficient of determination value (R2) of 0.99854, an RMSE of 1.1446E-05, and a maximum error of 6.49087E-05. DT and RF also exhibit notable performance, with coefficients of determination equal to 0.99571 and 0.99655, respectively. These results highlight the effectiveness of ensemble tree-based methods in accurately predicting chemical concentrations, with Extra Tree (ET) Regression emerging as the most promising model for this specific dataset.

本研究开发了一种基于传质和机器学习的多尺度综合计算策略,用于模拟生物材料基质中的药物浓度分布。通过混合模型对控释进行了建模和验证。对传质方程和动力学模型进行了数值求解,然后将结果用于机器学习模型。我们研究了三种回归模型,即决策树(DT)、随机森林(RF)和额外树(ET)在基于 r 和 z 数据预测药物浓度(C)方面的性能。使用萤火虫群优化(GSO)对超参数进行了优化。结果表明,所有模型的预测准确率都很高,其中 ET 表现优异,其决定系数 (R2) 为 0.99854,均方根误差为 1.1446E-05,最大误差为 6.49087E-05。DT 和 RF 也表现不俗,它们的判定系数分别为 0.99571 和 0.99655。这些结果凸显了基于集合树的方法在准确预测化学物质浓度方面的有效性,其中额外树(ET)回归是该特定数据集最有前途的模型。
{"title":"Model development using hybrid method for prediction of drug release from biomaterial matrix","authors":"Mohammed Alqarni ,&nbsp;Shaimaa Mohammed Al Harthi ,&nbsp;Mohammed Abdullah Alzubaidi ,&nbsp;Ali Abdullah Alqarni ,&nbsp;Bandar Saud Shukr ,&nbsp;Hassan Talat Shawli","doi":"10.1016/j.chemolab.2024.105216","DOIUrl":"10.1016/j.chemolab.2024.105216","url":null,"abstract":"<div><p>A comprehensive multi-scale computational strategy was developed in this study based on mass transfer and machine learning for simulation of drug concentration distribution in a biomaterial matrix. The controlled release was modeled and validated via the hybrid model. Mass transfer equations along with kinetics models were solved numerically and the results were then used for machine learning models. We investigated the performance of three regression models, namely Decision Tree (DT), Random Forest (RF), and Extra Tree (ET) in predicting medicine concentration (C) based on r and z data. Hyper-parameter optimization is conducted using Glowworm Swarm Optimization (GSO). Results revealed high predictive accuracy across all models, with ET demonstrating superior performance, achieving a coefficient of determination value (R<sup>2</sup>) of 0.99854, an RMSE of 1.1446E-05, and a maximum error of 6.49087E-05. DT and RF also exhibit notable performance, with coefficients of determination equal to 0.99571 and 0.99655, respectively. These results highlight the effectiveness of ensemble tree-based methods in accurately predicting chemical concentrations, with Extra Tree (ET) Regression emerging as the most promising model for this specific dataset.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105216"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust baseline correction for Raman spectra by constrained Gaussian radial basis function fitting 通过约束高斯径向基函数拟合对拉曼光谱进行稳健的基线校正
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-22 DOI: 10.1016/j.chemolab.2024.105205
Sungwon Park, Hongjoong Kim

Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.

准确的基线校正是利用拉曼光谱提取有意义的光谱信息并进行精确定量分析的基本要求。虽然已经开发出了许多基线校正技术,但这些技术往往需要对参数进行细致的调整,而且产生的结果也不一致。为了应对这些挑战,我们引入了一种新方法,即约束高斯径向基函数拟合(CGF)。我们的方法涉及在特定约束条件下使用高斯径向基函数求解曲线拟合问题。为了确保稳定性和效率,我们为所提出的方法开发了一种线性编程算法。我们使用模拟拉曼光谱评估了 CGF 的性能,并证明了它在各种情况下的鲁棒性,包括数据长度和噪声水平的变化。标准方法通常需要进行复杂的参数调整,并可能出现不同的误差,与之相比,我们的方法只需进行简单的参数搜索,并能始终保持较低的误差。我们使用真实拉曼光谱进一步评估了 CGF,从而提高了化学战剂拉曼光谱定量分析的准确性。我们的研究结果强调了 CGF 作为拉曼光谱数据分析宝贵工具的潜力,极大地推动了复杂分析技术的发展。
{"title":"Robust baseline correction for Raman spectra by constrained Gaussian radial basis function fitting","authors":"Sungwon Park,&nbsp;Hongjoong Kim","doi":"10.1016/j.chemolab.2024.105205","DOIUrl":"10.1016/j.chemolab.2024.105205","url":null,"abstract":"<div><p>Accurate baseline correction is a fundamental requirement for extracting meaningful spectral information and enabling precise quantitative analysis using Raman spectroscopy. Although numerous baseline correction techniques have been developed, they often require meticulous parameter adjustments and yield inconsistent results. To address these challenges, we have introduced a novel approach, namely constrained Gaussian radial basis function fitting (CGF). Our method involves solving a curve-fitting problem using Gaussian radial basis functions under specific constraints. To ensure stability and efficiency, we developed a linear programming algorithm for the proposed approach. We evaluated the performance of CGF using simulated Raman spectra and demonstrated its robustness across various scenarios, including changes in data length and noise levels. In contrast to standard methods, which frequently require complicated parameter adjustments and may exhibit varying errors, our approach provides a simple parameter search and consistently achieves low errors. We further assessed CGF using real Raman spectra, leading to enhanced accuracy in the quantitative analysis of the Raman spectra of chemical warfare agents. Our results emphasize the potential of CGF as a valuable tool for Raman spectroscopy data analysis, significantly advancing sophisticated analytical techniques.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105205"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised and penalized baseline correction 监督和惩罚基线校正
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-20 DOI: 10.1016/j.chemolab.2024.105200
Erik Andries , Ramin Nikzad-Langerodi

Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (penalized baseline correction) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed supervised and penalized baseline correction (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.

光谱测量可显示由吸收和散射混合产生的扭曲光谱形状。这些扭曲(或基线)通常表现为非恒定偏移或低频振荡。因此,这些基线会对分析和定量结果产生不利影响。基线校正是一个总称,是指应用预处理方法获取基线光谱(不需要的失真),然后通过差分去除失真。然而,目前最先进的基线校正方法并不利用分析物浓度,即使分析物浓度可用,或者即使分析物浓度对观测到的光谱变异性有重大影响。我们对一类最先进的方法(惩罚性基线校正)进行了修改,使其能够轻松地纳入先验分析物浓度,从而提高预测结果。这种修改后的方法将被视为监督和惩罚基线校正(SPBC)。我们将在两个近红外数据集上对经典的惩罚基线校正方法(无分析物信息)和改进的惩罚基线校正方法(利用分析物信息)进行性能评估。在某些情况下,SPBC 可以提供有用的基线校正信号,从而优于 AIRPLS 等最先进的惩罚性基线校正算法。我们特别注意到,性能取决于不同分析物之间的相关性:用于基线相关的分析物和用于预测的分析物--用于基线相关的分析物和用于预测的分析物之间的相关性越大,预测性能越好。
{"title":"Supervised and penalized baseline correction","authors":"Erik Andries ,&nbsp;Ramin Nikzad-Langerodi","doi":"10.1016/j.chemolab.2024.105200","DOIUrl":"10.1016/j.chemolab.2024.105200","url":null,"abstract":"<div><p>Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We modify a class of state-of-the-art methods (<em>penalized baseline correction</em>) that easily admit the incorporation of a priori analyte concentrations such that predictions can be enhanced. This modified approach will be deemed <em>supervised and penalized baseline correction</em> (SPBC). Performance will be assessed on two near infrared data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information). There are cases of SPBC that provide useful baseline-corrected signals such that they outperform state-of-the-art penalized baseline correction algorithms such as AIRPLS. In particular, we observe that performance is conditional on the correlation between separate analytes: the analyte used for baseline correlation and the analyte used for prediction—the greater the correlation between the analyte used for baseline correlation and the analyte used for prediction, the better the prediction performance.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105200"},"PeriodicalIF":3.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel investigation on adsorption analysis of safranal interacting with boron nitride and aluminum nitride fullerene-like cages: Drug delivery system 关于沙夫拉尔与氮化硼和氮化铝类富勒烯笼相互作用的吸附分析的新研究:给药系统
IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-08-17 DOI: 10.1016/j.chemolab.2024.105206
Saad M Alshahrani

This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B16N16 and Al16N16 fullerene-like cages. The SAF adsorption onto the B16N16 and Al16N16 surfaces in gas, water (H2O), and chloroform (CHCl3) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al16N16/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B16N16/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al16N16/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al16N16/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al16N16/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.

本研究说明了通过在 B16N16 和 Al16N16 富勒烯样笼上吸附沙呋纳(SAF)可有效控制 COVID-19 感染。采用密度泛函理论(DFT)和时间相关(TD)密度泛函理论方法,分析了在气体、水(H2O)和氯仿(CHCl3)环境中 SAF 在 B16N16 和 Al16N16 表面的吸附情况,并对基质及其复合物进行了评估。在 PBE0-D3 水平上,与 B16N16/SAF 复合物相比,Al16N16/SAF 复合物在水相中表现出最大的负结合能和结构稳定性。热力学参数表明,SAF 在类富勒烯笼上的吸附是放热的,尤其是 Al16N16/SAF 复合物。此外,与气体和氯仿环境相比,水相中 SAF 与类富勒烯笼的相互作用更为明显。与完美的体系相比,复合物在所有三种环境中的能隙(Eg)都有所减小,在所有相中都显著减小了 21% 以上。能隙的大幅减小表明,复合物的反应活性和对 SAF 的敏感性都有所提高,这可能是由于电子传导性发生了显著变化。分子对接结果表明,与所研究的其他化合物相比,水相中的 Al16N16/SAF 复合物具有很强的结合亲和力。这些研究结果表明,Al16N16/SAF 复合物有望成为 COVID-19 的潜在抑制剂以及生物医学应用和药物输送系统的重要材料。
{"title":"Novel investigation on adsorption analysis of safranal interacting with boron nitride and aluminum nitride fullerene-like cages: Drug delivery system","authors":"Saad M Alshahrani","doi":"10.1016/j.chemolab.2024.105206","DOIUrl":"10.1016/j.chemolab.2024.105206","url":null,"abstract":"<div><p>This study illustrates the effective control of COVID-19 infection through the adsorption of safranal (SAF) on B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> fullerene-like cages. The SAF adsorption onto the B<sub>16</sub>N<sub>16</sub> and Al<sub>16</sub>N<sub>16</sub> surfaces in gas, water (H<sub>2</sub>O), and chloroform (CHCl<sub>3</sub>) environments were assessed using density functional theory (DFT) and time-dependent (TD) density functional theory methods, analyzing the substrates and their complexes. The Al<sub>16</sub>N<sub>16</sub>/SAF complex exhibited the most negative binding energy and structural stability in the water phase compared to the B<sub>16</sub>N<sub>16</sub>/SAF complex at the PBE0-D3 level. The thermodynamic parameters indicated that the adsorption of SAF onto the fullerene-like cages is exothermic, particularly for the Al<sub>16</sub>N<sub>16</sub>/SAF complex. Additionally, the interaction of SAF with the fullerene-like cages in the water phase is more pronounced than in gas and chloroform environments. The complexes' energy gap (Eg) decreases in all three environments compared to the perfect systems, with a significant reduction of over 21 % in all phases. This substantial decrease in the energy gap suggests that the complexes have increased reactivity and sensitivity to SAF, likely due to a significant change in electronic conductivity. The results of molecular docking indicate that the Al<sub>16</sub>N<sub>16</sub>/SAF complex in the water phase exhibited a strong binding affinity compared to the other compounds studied. These findings suggest that the Al<sub>16</sub>N<sub>16</sub>/SAF complex holds promise as a potential inhibitor for COVID-19 and as a valuable material for biomedical applications and drug delivery systems.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105206"},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chemometrics and Intelligent Laboratory Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1