首页 > 最新文献

Journal of Chemometrics最新文献

英文 中文
A short note on deep contextual spatial and spectral information fusion for hyperspectral image processing: Case of pork belly properties prediction 高光谱图像处理中的深度上下文空间和光谱信息融合简述:猪肚属性预测案例
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-04-18 DOI: 10.1002/cem.3552
Puneet Mishra, Michela Albano-Gaglio, Maria Font-i-Furnols

This study demonstrates a new approach to process hyperspectral images where both the contextual spatial information as well as the spectral information are used to predict sample properties. The deep contextual spatial information is extracted using the deep feature extraction from pretrained resnet-18 deep learning architecture, while the spectral information was readily available as the average pixel values. To fuse the information in a complementary way, a multiblock modeling approach called sequential orthogonalized partial least squares was used. The sequential model guarantees that the information learned is complementary from spatial and spectral domains. The potential of the approach is demonstrated to predict several physical and chemical properties in pork bellies.

本研究展示了一种处理高光谱图像的新方法,即利用上下文空间信息和光谱信息来预测样本属性。深度上下文空间信息是通过预训练的 resnet-18 深度学习架构中的深度特征提取提取的,而光谱信息则是作为平均像素值随时可用的。为了以互补的方式融合这些信息,我们采用了一种称为序列正交偏最小二乘法的多块建模方法。顺序模型保证了从空间和光谱领域获得的信息是互补的。该方法在预测猪肚的几种物理和化学特性方面的潜力得到了证实。
{"title":"A short note on deep contextual spatial and spectral information fusion for hyperspectral image processing: Case of pork belly properties prediction","authors":"Puneet Mishra,&nbsp;Michela Albano-Gaglio,&nbsp;Maria Font-i-Furnols","doi":"10.1002/cem.3552","DOIUrl":"10.1002/cem.3552","url":null,"abstract":"<p>This study demonstrates a new approach to process hyperspectral images where both the contextual spatial information as well as the spectral information are used to predict sample properties. The deep contextual spatial information is extracted using the deep feature extraction from pretrained resnet-18 deep learning architecture, while the spectral information was readily available as the average pixel values. To fuse the information in a complementary way, a multiblock modeling approach called sequential orthogonalized partial least squares was used. The sequential model guarantees that the information learned is complementary from spatial and spectral domains. The potential of the approach is demonstrated to predict several physical and chemical properties in pork bellies.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 8","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3552","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection the quality of pumpkin seeds based on terahertz coupled with convolutional neural network 基于太赫兹与卷积神经网络的南瓜籽质量检测
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-04-18 DOI: 10.1002/cem.3547
Zhaoxiang Sun, Bin Li, Akun Yang, Yande Liu

Pumpkin seeds are nutritious and have some medicinal value. However, the mold and sprouting are produced during the storage of pumpkin seeds. Food safety and quality problems may be caused if they are not removed in time for processing. The traditional testing methods are cumbersome to operate, complex, and destructive in sample preparation. Therefore, terahertz time-domain spectroscopy (THz-TDS) technology was proposed to achieve the detection of the internal quality of pumpkin seeds. Firstly, samples of pumpkin seeds of different qualities were crafted, and they were moldy for 3 days, moldy for 6 days, sprouted and moldy, sprouted and normal pumpkin seeds, respectively. Then, the pumpkin seeds of different qualities were dried, ground, and pressed, and their spectral data were collected. The terahertz spectra of the five types of samples were significantly different. The support vector machine (SVM), random forest (RF), and convolutional neural network (CNN) qualitative discriminant models were established with the raw absorbance spectral data, the preprocessed absorbance spectral data, and the preprocessed and band-screened absorbance spectral data, respectively, where the CNN model based on the raw spectral data has the highest classification accuracy of 96%. The CNN models do not require advance spectral data processing, simplifying the spectral analysis process. And it achieves best classification results in the accuracy of detection compared to traditional chemometric models. The CNN combined with THz-TDS method has great potential for application in the detection of agricultural products. It provides a new detection method for the field of quality detection of agricultural products.

南瓜籽营养丰富,具有一定的药用价值。不过,南瓜籽在储存过程中会发霉、发芽。如果在加工过程中不及时清除,可能会造成食品安全和质量问题。传统的检测方法操作繁琐、复杂,而且在样品制备过程中具有破坏性。因此,有人提出了太赫兹时域光谱(THz-TDS)技术来实现对南瓜籽内部质量的检测。首先,精心制作了不同品质的南瓜籽样品,分别为霉变 3 天、霉变 6 天、发芽霉变、发芽正常的南瓜籽。然后,对不同品质的南瓜籽进行干燥、研磨和压榨,并收集其光谱数据。五种样品的太赫兹光谱具有显著差异。利用原始吸光度光谱数据、预处理吸光度光谱数据以及预处理和带筛选吸光度光谱数据分别建立了支持向量机(SVM)、随机森林(RF)和卷积神经网络(CNN)定性判别模型,其中基于原始光谱数据的 CNN 模型的分类准确率最高,达到 96%。CNN 模型不需要预先处理光谱数据,简化了光谱分析过程。与传统的化学计量学模型相比,CNN 模型在检测准确率方面取得了最佳分类效果。CNN 与 THz-TDS 方法的结合在农产品检测中具有巨大的应用潜力。它为农产品质量检测领域提供了一种新的检测方法。
{"title":"Detection the quality of pumpkin seeds based on terahertz coupled with convolutional neural network","authors":"Zhaoxiang Sun,&nbsp;Bin Li,&nbsp;Akun Yang,&nbsp;Yande Liu","doi":"10.1002/cem.3547","DOIUrl":"10.1002/cem.3547","url":null,"abstract":"<p>Pumpkin seeds are nutritious and have some medicinal value. However, the mold and sprouting are produced during the storage of pumpkin seeds. Food safety and quality problems may be caused if they are not removed in time for processing. The traditional testing methods are cumbersome to operate, complex, and destructive in sample preparation. Therefore, terahertz time-domain spectroscopy (THz-TDS) technology was proposed to achieve the detection of the internal quality of pumpkin seeds. Firstly, samples of pumpkin seeds of different qualities were crafted, and they were moldy for 3 days, moldy for 6 days, sprouted and moldy, sprouted and normal pumpkin seeds, respectively. Then, the pumpkin seeds of different qualities were dried, ground, and pressed, and their spectral data were collected. The terahertz spectra of the five types of samples were significantly different. The support vector machine (SVM), random forest (RF), and convolutional neural network (CNN) qualitative discriminant models were established with the raw absorbance spectral data, the preprocessed absorbance spectral data, and the preprocessed and band-screened absorbance spectral data, respectively, where the CNN model based on the raw spectral data has the highest classification accuracy of 96%. The CNN models do not require advance spectral data processing, simplifying the spectral analysis process. And it achieves best classification results in the accuracy of detection compared to traditional chemometric models. The CNN combined with THz-TDS method has great potential for application in the detection of agricultural products. It provides a new detection method for the field of quality detection of agricultural products.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response surface experimental design for simultaneous chromatographic determination of two antiviral agents “Favipiravir and Remdesivir” in pharmaceuticals and spiked plasma samples 采用响应面实验设计同时色谱测定药品和血浆样品中的两种抗病毒药物 "法维拉韦和雷米地西韦"
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-04-18 DOI: 10.1002/cem.3548
Ahmed Faried Abdel Hakiem, John M. Boushra, Deena A. M. Noureldeen, Adel S. Lashien, Tamer Z. Attia

The antiviral agents, Favipiravir (FAV) and Remdesivir (REM), were introduced in the last few years alone or as combination regimen for successful management of the rapidly spreading CORONA virus pandemic. A newly developed rapid and sensitive high performance liquid chromatographic method (HPLC) has been developed for the simultaneous determination of their mixture. Firstly, one factor at a time optimization (OFAT) has been applied. Afterwards, quality by design approach (QbD) has been utilized using Box Behnken experimental design (BBD) for the development of an experimental design of four independent and nine dependent variables for much better refining of the optimized parameters. The established model has given an optimum resolution at; acetonitrile percentage of 52.66, mobile phase of pH 2.91, percentage of triethylamine of 0.15 and 1.30 mL/min flow rate. The proposed method has been validated according to the USP 31 NF 26 guidelines. Good linearity ranges have been obtained from 5.00 up to 50.00 μg/mL for FAV and from 2.00 up to 60.00 μg/mL for FAV and REM, respectively. Excellent relative standard deviation values (not more than 1.40) were obtained upon investigation of accuracy, precision and robustness. The developed method has succeeded in analysis of investigated drugs in their pharmaceutical formulations and spiked plasma samples with good recoveries of 99.00 and up to 106.00%. The proposed method is considered eligible for the quality control laboratories as well as in-vivo determinations of both analytes.

过去几年中,抗病毒药物法维拉韦(FAV)和雷米地韦(REM)被单独或作为联合疗法引入,成功地控制了迅速蔓延的 CORONA 病毒大流行。我们新开发了一种快速灵敏的高效液相色谱法(HPLC),用于同时测定这两种药物的混合物。首先,采用了一次一因素优化法(OFAT)。然后,利用盒式贝肯实验设计(BBD)的质量设计方法(QbD),开发了由四个自变量和九个因变量组成的实验设计,以更好地完善优化参数。所建立的模型给出了最佳分辨率:乙腈比例为 52.66,流动相 pH 值为 2.91,三乙胺比例为 0.15,流速为 1.30 mL/min。根据美国药典(USP)31 NF 26 指南对所建议的方法进行了验证。FAV 的线性范围为 5.00 至 50.00 μg/mL,FAV 和 REM 的线性范围分别为 2.00 至 60.00 μg/mL。在对准确度、精密度和稳健性进行考察后,得到了极好的相对标准偏差值(不超过 1.40)。所开发的方法成功地分析了所研究药物的药物制剂和加标血浆样品,回收率高达99.00%和106.00%。该方法适用于质量控制实验室和体内两种分析物的测定。
{"title":"Response surface experimental design for simultaneous chromatographic determination of two antiviral agents “Favipiravir and Remdesivir” in pharmaceuticals and spiked plasma samples","authors":"Ahmed Faried Abdel Hakiem,&nbsp;John M. Boushra,&nbsp;Deena A. M. Noureldeen,&nbsp;Adel S. Lashien,&nbsp;Tamer Z. Attia","doi":"10.1002/cem.3548","DOIUrl":"10.1002/cem.3548","url":null,"abstract":"<p>The antiviral agents, Favipiravir (FAV) and Remdesivir (REM), were introduced in the last few years alone or as combination regimen for successful management of the rapidly spreading CORONA virus pandemic. A newly developed rapid and sensitive high performance liquid chromatographic method (HPLC) has been developed for the simultaneous determination of their mixture. Firstly, one factor at a time optimization (OFAT) has been applied. Afterwards, quality by design approach (QbD) has been utilized using Box Behnken experimental design (BBD) for the development of an experimental design of four independent and nine dependent variables for much better refining of the optimized parameters. The established model has given an optimum resolution at; acetonitrile percentage of 52.66, mobile phase of pH 2.91, percentage of triethylamine of 0.15 and 1.30 mL/min flow rate. The proposed method has been validated according to the USP 31 NF 26 guidelines. Good linearity ranges have been obtained from 5.00 up to 50.00 μg/mL for FAV and from 2.00 up to 60.00 μg/mL for FAV and REM, respectively. Excellent relative standard deviation values (not more than 1.40) were obtained upon investigation of accuracy, precision and robustness. The developed method has succeeded in analysis of investigated drugs in their pharmaceutical formulations and spiked plasma samples with good recoveries of 99.00 and up to 106.00%. The proposed method is considered eligible for the quality control laboratories as well as in-vivo determinations of both analytes.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 8","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemometrics as a tool for monitoring corrosion degradation of the selected alloys in real conditions 以化学计量学为工具,监测所选合金在实际条件下的腐蚀降解情况
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-04-16 DOI: 10.1002/cem.3551
Gyöngyi Vastag, Suzana Apostolov, Špiro Ivošević, Rebeka Rudolf

Monitoring of the corrosion process of alloys in real conditions often results in extensive data, which is characterized by complex interdependence, but by a large degree of mutual deviation. First of all, the large dispersion of the obtained results makes it very difficult to draw accurate conclusions about the real influence of the tested parameters on the corrosion behavior of alloys. On the other hand, in many cases, the high interdependence between the corrosion factors can also greatly burden the analyzed system and thus make it significantly difficult to recognize the main influence. Multivariate analysis, especially the principal component analysis, is becoming increasingly popular in processing of this type of data, due to its ability to recognize and eliminate redundant data. The aim of this study was to examine the possibility of using multivariate analysis methods in the processing of the corrosion test results obtained under real conditions. Based on the obtained results, it can be concluded that used multivariate method in combination with energy dispersive spectrometer analysis can be successfully used to identify the most important corrosion factors (type of corrosion environment, exposure time and technological production processes), as well as their influence on the degradation of the tested TiNi alloys under the given conditions.

在实际条件下监测合金的腐蚀过程往往会获得大量数据,这些数据的特点是相互依存关系复杂,但相互偏差较大。首先,由于获得的结果非常分散,因此很难就测试参数对合金腐蚀行为的实际影响得出准确的结论。另一方面,在许多情况下,腐蚀因素之间的高度相互依赖性也会给分析系统带来很大负担,从而使识别主要影响因素变得十分困难。多元分析,尤其是主成分分析,由于其识别和消除冗余数据的能力,在处理这类数据时越来越受欢迎。本研究的目的是探讨在处理实际条件下获得的腐蚀测试结果时使用多元分析方法的可能性。根据所获得的结果,可以得出结论:结合能量色散光谱仪分析使用的多元方法可以成功用于识别最重要的腐蚀因素(腐蚀环境类型、暴露时间和技术生产流程),以及它们在给定条件下对测试钛镍合金降解的影响。
{"title":"Chemometrics as a tool for monitoring corrosion degradation of the selected alloys in real conditions","authors":"Gyöngyi Vastag,&nbsp;Suzana Apostolov,&nbsp;Špiro Ivošević,&nbsp;Rebeka Rudolf","doi":"10.1002/cem.3551","DOIUrl":"10.1002/cem.3551","url":null,"abstract":"<p>Monitoring of the corrosion process of alloys in real conditions often results in extensive data, which is characterized by complex interdependence, but by a large degree of mutual deviation. First of all, the large dispersion of the obtained results makes it very difficult to draw accurate conclusions about the real influence of the tested parameters on the corrosion behavior of alloys. On the other hand, in many cases, the high interdependence between the corrosion factors can also greatly burden the analyzed system and thus make it significantly difficult to recognize the main influence. Multivariate analysis, especially the principal component analysis, is becoming increasingly popular in processing of this type of data, due to its ability to recognize and eliminate redundant data. The aim of this study was to examine the possibility of using multivariate analysis methods in the processing of the corrosion test results obtained under real conditions. Based on the obtained results, it can be concluded that used multivariate method in combination with energy dispersive spectrometer analysis can be successfully used to identify the most important corrosion factors (type of corrosion environment, exposure time and technological production processes), as well as their influence on the degradation of the tested TiNi alloys under the given conditions.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of potential vascular endothelial growth factor receptor inhibitors via tree-based learning modeling and molecular docking simulation 通过树状学习建模和分子对接模拟鉴定潜在的血管内皮生长因子受体抑制剂
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-04-01 DOI: 10.1002/cem.3545
Nooshin Arabi, Mohammad Reza Torabi, Afshin Fassihi, Fahimeh Ghasemi

Angiogenesis, a crucial process in tumor growth, is widely recognized as a key factor in cancer progression. The vascular endothelial growth factor (VEGF) signaling pathway is important for its pivotal role in promoting angiogenesis. The primary objective of this study was to identify a powerful classifier for distinguishing compounds as active or inactive inhibitors of VEGF receptors. To build the machine learning model, compounds were sourced from the BindingDB database. A variety of common feature selection techniques, including both filter-based and wrapper-based methods, were applied to reduce dimensionality, subsequently, overfitting problem. Robust and accurate tree-based classifiers were employed in the classification procedure. Application of the extra-tree classifier using the MultiSURF* feature selection method provided a model with superior accuracy (83.7%) compared with other feature selection techniques. High-throughput molecular docking followed by an accurate docking and comprehensive analysis of the results was performed to provide the best possible inhibitors of these receptors. Comprehensive analysis of the docking results revealed successful prediction of molecules with VEGFR1 and VEGFR2 inhibitory activity. These results emphasized that the performance of the extra-tree model, coupled with MultiSURF* feature selection, surpassed other methods in identifying chemical compounds targeting specific VEGF receptors.

血管生成是肿瘤生长的一个关键过程,被公认为是癌症进展的一个关键因素。血管内皮生长因子(VEGF)信号通路因其在促进血管生成中的关键作用而非常重要。本研究的主要目的是找出一种强大的分类器,用于区分化合物是血管内皮生长因子受体的活性抑制剂还是非活性抑制剂。为建立机器学习模型,化合物来自 BindingDB 数据库。为了降低维度和过拟合问题,研究人员采用了多种常见的特征选择技术,包括基于过滤器的方法和基于包装的方法。在分类过程中采用了稳健而准确的树型分类器。与其他特征选择技术相比,使用 MultiSURF* 特征选择方法的树外分类器提供的模型准确率更高(83.7%)。为了提供这些受体的最佳抑制剂,研究人员进行了高通量分子对接、精确对接和结果综合分析。对对接结果的综合分析表明,成功预测了具有血管内皮生长因子受体1和血管内皮生长因子受体2抑制活性的分子。这些结果表明,在鉴定针对特定血管内皮生长因子受体的化合物方面,树外模型与 MultiSURF* 特征选择相结合的性能超过了其他方法。
{"title":"Identification of potential vascular endothelial growth factor receptor inhibitors via tree-based learning modeling and molecular docking simulation","authors":"Nooshin Arabi,&nbsp;Mohammad Reza Torabi,&nbsp;Afshin Fassihi,&nbsp;Fahimeh Ghasemi","doi":"10.1002/cem.3545","DOIUrl":"10.1002/cem.3545","url":null,"abstract":"<p>Angiogenesis, a crucial process in tumor growth, is widely recognized as a key factor in cancer progression. The vascular endothelial growth factor (VEGF) signaling pathway is important for its pivotal role in promoting angiogenesis. The primary objective of this study was to identify a powerful classifier for distinguishing compounds as active or inactive inhibitors of VEGF receptors. To build the machine learning model, compounds were sourced from the BindingDB database. A variety of common feature selection techniques, including both filter-based and wrapper-based methods, were applied to reduce dimensionality, subsequently, overfitting problem. Robust and accurate tree-based classifiers were employed in the classification procedure. Application of the extra-tree classifier using the MultiSURF* feature selection method provided a model with superior accuracy (83.7%) compared with other feature selection techniques. High-throughput molecular docking followed by an accurate docking and comprehensive analysis of the results was performed to provide the best possible inhibitors of these receptors. Comprehensive analysis of the docking results revealed successful prediction of molecules with VEGFR1 and VEGFR2 inhibitory activity. These results emphasized that the performance of the extra-tree model, coupled with MultiSURF* feature selection, surpassed other methods in identifying chemical compounds targeting specific VEGF receptors.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective protein quantification on continuous chromatography equipment with limited absorbance sensing: A partial least squares and statistical wavelength selection solution 使用有限吸光度感应的连续色谱设备选择性定量蛋白质:偏最小二乘法和统计波长选择解决方案
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-03-28 DOI: 10.1002/cem.3541
Ian A. Gough, Sarah Rassenberg, Claire Velikonja, Brandon Corbett, David R. Latulippe, Prashant Mhaskar

Real-time selective protein quantification is an integral component of operating continuous chromatography processes. Partial least squares models fit with spectroscopic UV-Vis absorbance data have demonstrated the ability to selectively quantify proteins. With standard continuous chromatography equipment that is only capable of measuring absorbance at a few user-defined wavelengths, the problem of selecting appropriate wavelengths that maximize the measurement capability of the instrument remains unaddressed. Therefore, we propose a method for selecting wavelengths for continuous chromatography equipment. We illustrate our method using sets of protein mixtures composed of bovine serum albumin and lysozyme. The first step is to refine the raw wavelength set with a statistical t-test and an absorbance magnitude test. Then, the wavelengths within the refined spectroscopic range are ranked. Three existing techniques are evaluated – sequential forward search, variable importance to projection scores, and the least absolute shrinkage and selection operator. The best technique (in this case, sequential forward search) determines a subset of three wavelengths for further evaluation on the BioSMB PD. We use an exhaustive approach to determine the final wavelength set. We show that soft sensor models trained from the method's wavelength selections can quantify the two proteins more accurately than from the wavelength set of 230, 260 and 280 nm, by a factor of four. The method is shown to determine appropriate wavelengths for different path lengths and protein concentration ranges. Overall, we provide a tool that alleviates the analytical bottleneck for practitioners seeking to develop advanced monitoring and control methods on standard equipment.

实时选择性蛋白质定量是连续色谱操作过程中不可或缺的组成部分。与光谱紫外可见吸光度数据相匹配的偏最小二乘法模型证明了选择性定量蛋白质的能力。标准的连续色谱设备只能测量用户定义的几个波长的吸光度,如何选择合适的波长以最大限度地发挥仪器的测量能力仍是一个尚未解决的问题。因此,我们提出了一种为连续色谱设备选择波长的方法。我们用一组由牛血清白蛋白和溶菌酶组成的蛋白质混合物来说明我们的方法。第一步是通过统计 t 检验和吸光度大小检验来完善原始波长集。然后,对细化光谱范围内的波长进行排序。对现有的三种技术进行了评估--顺序前向搜索、投影分数的可变重要性以及最小绝对收缩和选择算子。最佳技术(本例中为顺序前向搜索)将确定三个波长的子集,以便在 BioSMB PD 上进行进一步评估。我们采用穷举法来确定最终的波长集。结果表明,根据该方法的波长选择训练出的软传感器模型对两种蛋白质的量化准确度要比根据 230、260 和 280 nm 波长集得出的结果高出四倍。该方法还能根据不同的路径长度和蛋白质浓度范围确定合适的波长。总之,我们提供了一种工具,可为寻求在标准设备上开发高级监测和控制方法的从业人员缓解分析瓶颈。
{"title":"Selective protein quantification on continuous chromatography equipment with limited absorbance sensing: A partial least squares and statistical wavelength selection solution","authors":"Ian A. Gough,&nbsp;Sarah Rassenberg,&nbsp;Claire Velikonja,&nbsp;Brandon Corbett,&nbsp;David R. Latulippe,&nbsp;Prashant Mhaskar","doi":"10.1002/cem.3541","DOIUrl":"10.1002/cem.3541","url":null,"abstract":"<p>Real-time selective protein quantification is an integral component of operating continuous chromatography processes. Partial least squares models fit with spectroscopic UV-Vis absorbance data have demonstrated the ability to selectively quantify proteins. With standard continuous chromatography equipment that is only capable of measuring absorbance at a few user-defined wavelengths, the problem of selecting appropriate wavelengths that maximize the measurement capability of the instrument remains unaddressed. Therefore, we propose a method for selecting wavelengths for continuous chromatography equipment. We illustrate our method using sets of protein mixtures composed of bovine serum albumin and lysozyme. The first step is to refine the raw wavelength set with a statistical <i>t</i>-test and an absorbance magnitude test. Then, the wavelengths within the refined spectroscopic range are ranked. Three existing techniques are evaluated – sequential forward search, variable importance to projection scores, and the least absolute shrinkage and selection operator. The best technique (in this case, sequential forward search) determines a subset of three wavelengths for further evaluation on the BioSMB PD. We use an exhaustive approach to determine the final wavelength set. We show that soft sensor models trained from the method's wavelength selections can quantify the two proteins more accurately than from the wavelength set of 230, 260 and 280 nm, by a factor of four. The method is shown to determine appropriate wavelengths for different path lengths and protein concentration ranges. Overall, we provide a tool that alleviates the analytical bottleneck for practitioners seeking to develop advanced monitoring and control methods on standard equipment.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of t-distributed stochastic neighbour embedding in vibrational spectroscopy 在振动光谱学中使用 t 分布随机邻域嵌入法
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-03-23 DOI: 10.1002/cem.3544
François Stevens, Beatriz Carrasco, Vincent Baeten, Juan A. Fernández Pierna

The t-distributed stochastic neighbour embedding algorithm or t-SNE is a non-linear dimension reduction method used to visualise multivariate data. It enables a high-dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two-dimensional graph, revealing its global and local structure. t-SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t-SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t-SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at-a-glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre-processing, by comparing rapidly different general pre-processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t-SNE and PCA methods, allowing respective advantages of each to be exploited.

t-distributed stochastic neighbour embedding algorithm(t-SNE)是一种非线性降维方法,用于可视化多变量数据。它能将高维数据集(如一组红外光谱)表示在一个单一的、典型的二维图形上,从而揭示其全局和局部结构。t-SNE 在机器学习领域非常流行,并已应用于许多领域,其目的通常是将大型数据集可视化。在振动光谱学中,t-SNE 的名气越来越大,但到目前为止,主成分分析(PCA)仍是探索性分析和降维的参考方法。然而,t-SNE 可以真正帮助分析振动光谱数据集。它提供了一个一目了然的数据集全局视图,可以区分影响光谱信号的主要因素以及这些因素之间的层次关系,并提供了进行预测建模的可能性。通过快速比较不同的一般预处理方法对相关变量的影响,它还能为选择预处理方法提供极大的支持。在此,我们建议使用不同的数据集来说明这些优势。我们还提出了一种基于 t-SNE 和 PCA 方法之间协同作用的方法,从而可以利用这两种方法各自的优势。
{"title":"Use of t-distributed stochastic neighbour embedding in vibrational spectroscopy","authors":"François Stevens,&nbsp;Beatriz Carrasco,&nbsp;Vincent Baeten,&nbsp;Juan A. Fernández Pierna","doi":"10.1002/cem.3544","DOIUrl":"10.1002/cem.3544","url":null,"abstract":"<p>The <i>t-distributed stochastic neighbour embedding</i> algorithm or <i>t-SNE</i> is a non-linear dimension reduction method used to visualise multivariate data. It enables a high-dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two-dimensional graph, revealing its global and local structure. t-SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t-SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t-SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at-a-glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre-processing, by comparing rapidly different general pre-processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t-SNE and PCA methods, allowing respective advantages of each to be exploited.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward more efficient and effective color quality control for the large-scale offset printing process 为大型胶版印刷工艺提供更高效、更有效的色彩质量控制
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-03-15 DOI: 10.1002/cem.3543
Pawel Dziki, Lukasz Pieszczek, Michal Daszykowski

This study illustrates at-line application of hyperspectral imaging in the visible range for quality control of large-scale offset printing. In particular, the measurement stability of a competitive device is assessed and compared to traditional handheld and desktop spectrophotometers. The performance of the commercially available instruments was assessed based on collected spectra and their corresponding L*, a*, and b* values. The printing process was described by hyperspectral images (in visible range) of selected regions from template color fields acquired at 17 sampling occasions. Spectra constituting hyperspectral images were visualized and evaluated in the space of significant principal components obtained from the principal component analysis. Furthermore, confidence ellipses were constructed for each set of spectra characterizing a specific moment of the printing process. Comparing their mutual locations, shapes, orientations, and sizes enabled effective visualization of process variability and was more comprehensive regarding the classic approach based on information provided by desktop and handheld spectrometers.

本研究说明了在可见光范围内高光谱成像在大规模胶版印刷质量控制中的在线应用。特别是评估了竞争设备的测量稳定性,并与传统的手持式和台式分光光度计进行了比较。根据收集到的光谱及其相应的 L*、a* 和 b* 值,对商用仪器的性能进行了评估。印刷过程是通过在 17 次采样中从模板色域采集的选定区域的高光谱图像(可见光范围)来描述的。构成高光谱图像的光谱在通过主成分分析获得的重要主成分空间中被可视化和评估。此外,还为每组光谱构建了置信椭圆,以描述印刷过程的特定时刻。通过比较它们的相互位置、形状、方向和大小,可以有效地将过程的可变性可视化,与基于台式和手持式光谱仪所提供信息的传统方法相比,这种方法更加全面。
{"title":"Toward more efficient and effective color quality control for the large-scale offset printing process","authors":"Pawel Dziki,&nbsp;Lukasz Pieszczek,&nbsp;Michal Daszykowski","doi":"10.1002/cem.3543","DOIUrl":"10.1002/cem.3543","url":null,"abstract":"<p>This study illustrates at-line application of hyperspectral imaging in the visible range for quality control of large-scale offset printing. In particular, the measurement stability of a competitive device is assessed and compared to traditional handheld and desktop spectrophotometers. The performance of the commercially available instruments was assessed based on collected spectra and their corresponding L*, a*, and b* values. The printing process was described by hyperspectral images (in visible range) of selected regions from template color fields acquired at 17 sampling occasions. Spectra constituting hyperspectral images were visualized and evaluated in the space of significant principal components obtained from the principal component analysis. Furthermore, confidence ellipses were constructed for each set of spectra characterizing a specific moment of the printing process. Comparing their mutual locations, shapes, orientations, and sizes enabled effective visualization of process variability and was more comprehensive regarding the classic approach based on information provided by desktop and handheld spectrometers.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140154995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of colorectal primer carcinoma from normal colon with mid-infrared spectra 利用中红外光谱对结肠癌和正常结肠进行分类
IF 2.3 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-03-13 DOI: 10.1002/cem.3542
B. Borkovits, E. Kontsek, A. Pesti, P. Gordon, S. Gergely, I. Csabai, A. Kiss, P. Pollner

In this project, we used formalin-fixed paraffin-embedded (FFPE) tissue samples to measure thousands of spectra per tissue core with Fourier transform mid-infrared spectroscopy using an FT-IR imaging system. These cores varied between normal colon (NC) and colorectal primer carcinoma (CRC) tissues. We created a database to manage all the multivariate data obtained from the measurements. Then, we applied classifier algorithms to identify the tissue based on its yielded spectra. For classification, we used the random forest, a support vector machine, XGBoost, and linear discriminant analysis methods, as well as three deep neural networks. We compared two data manipulation techniques using these models and then applied filtering. In the end, we compared model performances via the sum of ranking differences (SRD).

在该项目中,我们使用福尔马林固定石蜡包埋(FFPE)组织样本,利用傅立叶变换中红外光谱成像系统测量每个组织核的数千个光谱。这些组织核介于正常结肠(NC)和结直肠癌(CRC)组织之间。我们创建了一个数据库来管理从测量中获得的所有多元数据。然后,我们应用分类器算法,根据其产生的光谱来识别组织。在分类过程中,我们使用了随机森林、支持向量机、XGBoost 和线性判别分析方法,以及三种深度神经网络。我们使用这些模型比较了两种数据处理技术,然后进行了过滤。最后,我们通过排名差异总和(SRD)对模型性能进行了比较。
{"title":"Classification of colorectal primer carcinoma from normal colon with mid-infrared spectra","authors":"B. Borkovits,&nbsp;E. Kontsek,&nbsp;A. Pesti,&nbsp;P. Gordon,&nbsp;S. Gergely,&nbsp;I. Csabai,&nbsp;A. Kiss,&nbsp;P. Pollner","doi":"10.1002/cem.3542","DOIUrl":"10.1002/cem.3542","url":null,"abstract":"<p>In this project, we used formalin-fixed paraffin-embedded (FFPE) tissue samples to measure thousands of spectra per tissue core with Fourier transform mid-infrared spectroscopy using an FT-IR imaging system. These cores varied between normal colon (NC) and colorectal primer carcinoma (CRC) tissues. We created a database to manage all the multivariate data obtained from the measurements. Then, we applied classifier algorithms to identify the tissue based on its yielded spectra. For classification, we used the random forest, a support vector machine, XGBoost, and linear discriminant analysis methods, as well as three deep neural networks. We compared two data manipulation techniques using these models and then applied filtering. In the end, we compared model performances via the sum of ranking differences (SRD).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3542","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140126719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing multifruit global near-infrared model to predict dry matter based on just-in-time modeling 基于即时建模,开发预测干物质的多果全球近红外模型
IF 2.4 4区 化学 Q1 SOCIAL WORK Pub Date : 2024-03-05 DOI: 10.1002/cem.3540
Puneet Mishra

Modeling near-infrared (NIR) spectral data to predict fresh fruit properties is a challenging task. The difficulty lies in creating generalized models that can work on fruits of different cultivars, seasons, and even multiple commodities of fruit. Due to intrinsic differences in spectral properties, NIR models often fail in testing, resulting in high bias and prediction errors. One current solution for achieving generalized models is to use large calibration sets measured over multiple cultivars and harvest seasons. However, current practice primarily focuses on calibration sets for single fruit commodities, disregarding the rich information available from other fruit commodities. This study aims to demonstrate the potential of locally weighted partial least-squares an example of just-in-time (JIT) modeling to develop real-time models based on calibration sets consisting of multiple fruit commodities. The study also explores JIT modeling for leveraging relevant information from other fruit commodities or adapting the model based on new samples. The application demonstrated here predicts the dry matter in fresh fruit using portable NIR spectroscopy. The results show that JIT modeling is particularly effective for multiple fruit commodities in a single calibration set. The JIT models achieved a root mean squared error of prediction (RMSEP) of 0.69% fresh weight (FW), while the traditional partial least squares (PLS) modeling RMSEP was 0.93% FW. JIT modeling can be particularly beneficial when the user has multiple fruit datasets and wants to combine them into a single dataset to utilize all the relevant information available.

建立近红外光谱数据模型以预测新鲜水果的特性是一项具有挑战性的任务。困难在于创建通用模型,使其适用于不同品种、不同季节的水果,甚至多种商品水果。由于光谱特性的内在差异,近红外模型经常在测试中失败,导致偏差和预测误差很大。目前实现通用模型的一个解决方案是使用在多个栽培品种和收获季节测量的大型校准集。然而,目前的做法主要侧重于单一水果商品的校准集,而忽略了其他水果商品的丰富信息。本研究旨在展示局部加权偏最小二乘法(JIT)建模的潜力,以开发基于由多种水果商品组成的校准集的实时模型。该研究还探讨了利用其他水果商品的相关信息或根据新样本调整模型的 JIT 建模。这里展示的应用是利用便携式近红外光谱仪预测新鲜水果的干物质。结果表明,JIT 模型对单个校准集中的多种水果商品特别有效。JIT 模型的预测均方根误差 (RMSEP) 为 0.69%,而传统的偏最小二乘法 (PLS) 模型的预测均方根误差为 0.93%。当用户拥有多个水果数据集,并希望将它们合并为一个数据集,以利用所有可用的相关信息时,JIT 建模就显得尤为有益。
{"title":"Developing multifruit global near-infrared model to predict dry matter based on just-in-time modeling","authors":"Puneet Mishra","doi":"10.1002/cem.3540","DOIUrl":"10.1002/cem.3540","url":null,"abstract":"<p>Modeling near-infrared (NIR) spectral data to predict fresh fruit properties is a challenging task. The difficulty lies in creating generalized models that can work on fruits of different cultivars, seasons, and even multiple commodities of fruit. Due to intrinsic differences in spectral properties, NIR models often fail in testing, resulting in high bias and prediction errors. One current solution for achieving generalized models is to use large calibration sets measured over multiple cultivars and harvest seasons. However, current practice primarily focuses on calibration sets for single fruit commodities, disregarding the rich information available from other fruit commodities. This study aims to demonstrate the potential of locally weighted partial least-squares an example of just-in-time (JIT) modeling to develop real-time models based on calibration sets consisting of multiple fruit commodities. The study also explores JIT modeling for leveraging relevant information from other fruit commodities or adapting the model based on new samples. The application demonstrated here predicts the dry matter in fresh fruit using portable NIR spectroscopy. The results show that JIT modeling is particularly effective for multiple fruit commodities in a single calibration set. The JIT models achieved a root mean squared error of prediction (RMSEP) of 0.69% fresh weight (FW), while the traditional partial least squares (PLS) modeling RMSEP was 0.93% FW. JIT modeling can be particularly beneficial when the user has multiple fruit datasets and wants to combine them into a single dataset to utilize all the relevant information available.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3540","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140043948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1