纸质比色传感器阵列的数据预处理

IF 3.7 2区化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-21 DOI:10.1016/j.chemolab.2024.105237

Bahram Hemmateenejad , Knut Baumann

{"title":"纸质比色传感器阵列的数据预处理","authors":"Bahram Hemmateenejad , Knut Baumann","doi":"10.1016/j.chemolab.2024.105237","DOIUrl":null,"url":null,"abstract":"<div><div>The responses of the paper-based colorimetric sensor arrays are typically recorded by an imaging device. The color values of the images are subjected to chemometrics data analysis, with a view to extract the relevant information. As is the case with data extracted from other analytical instruments, these data must undergo pre-processing prior to undergoing further analysis. This study represents the first comprehensive and systematic investigation into the impact of data pre-processing techniques on the quality of subsequent data analysis methods applied to imaging data collected from paper-based colorimetric sensor arrays. The use of color difference data (calculated by subtracting the images of the sensors before exposure from those after exposure) revealed that pre-treatment of the data was not a critical factor, although it could reduce the complexity of the model. For example, the number of principal components in the principal component-linear discriminant analysis model was reduced from eight (for data that had not been pre-processed) to three (for pre-processed data) to achieve the same level of accuracy (92 %). Nevertheless, the pivotal role of data pre-processing was elucidated through the examination of data sets collected immediately following exposure to the samples’ vapor. It was demonstrated that the use of an appropriate pre-processing method allows for the elimination or significant reduction of between-sensor variations, obviating the necessity for the inclusion of data from images taken prior to exposure. With regard to the objective of classification, the object pre-processing methods that demonstrated particular promise were mean (or median) centering, Pareto scaling and standard normal variate. To illustrate, in the analysis of volatile organic compounds by an array of metallic nanoparticles, the cross-validation classification accuracy of the unprocessed data, which was 70 %, increased to 95 % when unit variance scaling and range scaling were applied to objects and variables, respectively. In the calibration phase, the majority of pre-processing methods enhanced the quality of the regression models. Using suitable pre-processing methods for both objects and variables, eliminated the need for using the before exposing image of the CSAs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105237"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data pre-processing for paper-based colorimetric sensor arrays\",\"authors\":\"Bahram Hemmateenejad , Knut Baumann\",\"doi\":\"10.1016/j.chemolab.2024.105237\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The responses of the paper-based colorimetric sensor arrays are typically recorded by an imaging device. The color values of the images are subjected to chemometrics data analysis, with a view to extract the relevant information. As is the case with data extracted from other analytical instruments, these data must undergo pre-processing prior to undergoing further analysis. This study represents the first comprehensive and systematic investigation into the impact of data pre-processing techniques on the quality of subsequent data analysis methods applied to imaging data collected from paper-based colorimetric sensor arrays. The use of color difference data (calculated by subtracting the images of the sensors before exposure from those after exposure) revealed that pre-treatment of the data was not a critical factor, although it could reduce the complexity of the model. For example, the number of principal components in the principal component-linear discriminant analysis model was reduced from eight (for data that had not been pre-processed) to three (for pre-processed data) to achieve the same level of accuracy (92 %). Nevertheless, the pivotal role of data pre-processing was elucidated through the examination of data sets collected immediately following exposure to the samples’ vapor. It was demonstrated that the use of an appropriate pre-processing method allows for the elimination or significant reduction of between-sensor variations, obviating the necessity for the inclusion of data from images taken prior to exposure. With regard to the objective of classification, the object pre-processing methods that demonstrated particular promise were mean (or median) centering, Pareto scaling and standard normal variate. To illustrate, in the analysis of volatile organic compounds by an array of metallic nanoparticles, the cross-validation classification accuracy of the unprocessed data, which was 70 %, increased to 95 % when unit variance scaling and range scaling were applied to objects and variables, respectively. In the calibration phase, the majority of pre-processing methods enhanced the quality of the regression models. Using suitable pre-processing methods for both objects and variables, eliminated the need for using the before exposing image of the CSAs.</div></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"254 \",\"pages\":\"Article 105237\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743924001771\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001771","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

纸质比色传感器阵列的响应通常由成像设备记录。对图像的颜色值进行化学计量学数据分析，以提取相关信息。与其他分析仪器提取的数据一样，这些数据在进行进一步分析之前必须经过预处理。本研究首次全面系统地探讨了数据预处理技术对后续数据分析方法质量的影响，这些方法适用于从纸质比色传感器阵列采集的成像数据。通过使用色差数据（将曝光前的传感器图像与曝光后的图像相减计算得出）发现，虽然数据预处理可以降低模型的复杂性，但并不是关键因素。例如，主成分-线性判别分析模型中的主成分数量从 8 个（未经过预处理的数据）减少到 3 个（经过预处理的数据），才能达到相同的准确率水平（92%）。尽管如此，通过对暴露于样品蒸汽后立即收集的数据集进行检验，还是阐明了数据预处理的关键作用。结果表明，使用适当的预处理方法可以消除或显著减少传感器之间的差异，从而无需纳入暴露前拍摄的图像数据。在分类目标方面，平均值（或中位数）居中、帕累托缩放和标准正态变量等物体预处理方法显示出了特别的前景。例如，在分析金属纳米粒子阵列的挥发性有机化合物时，如果对对象和变量分别采用单位方差缩放和范围缩放，未经处理数据的交叉验证分类准确率从 70% 提高到 95%。在校准阶段，大多数预处理方法都提高了回归模型的质量。对对象和变量采用适当的预处理方法，就无需使用 CSA 曝光前的图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data pre-processing for paper-based colorimetric sensor arrays

The responses of the paper-based colorimetric sensor arrays are typically recorded by an imaging device. The color values of the images are subjected to chemometrics data analysis, with a view to extract the relevant information. As is the case with data extracted from other analytical instruments, these data must undergo pre-processing prior to undergoing further analysis. This study represents the first comprehensive and systematic investigation into the impact of data pre-processing techniques on the quality of subsequent data analysis methods applied to imaging data collected from paper-based colorimetric sensor arrays. The use of color difference data (calculated by subtracting the images of the sensors before exposure from those after exposure) revealed that pre-treatment of the data was not a critical factor, although it could reduce the complexity of the model. For example, the number of principal components in the principal component-linear discriminant analysis model was reduced from eight (for data that had not been pre-processed) to three (for pre-processed data) to achieve the same level of accuracy (92 %). Nevertheless, the pivotal role of data pre-processing was elucidated through the examination of data sets collected immediately following exposure to the samples’ vapor. It was demonstrated that the use of an appropriate pre-processing method allows for the elimination or significant reduction of between-sensor variations, obviating the necessity for the inclusion of data from images taken prior to exposure. With regard to the objective of classification, the object pre-processing methods that demonstrated particular promise were mean (or median) centering, Pareto scaling and standard normal variate. To illustrate, in the analysis of volatile organic compounds by an array of metallic nanoparticles, the cross-validation classification accuracy of the unprocessed data, which was 70 %, increased to 95 % when unit variance scaling and range scaling were applied to objects and variables, respectively. In the calibration phase, the majority of pre-processing methods enhanced the quality of the regression models. Using suitable pre-processing methods for both objects and variables, eliminated the need for using the before exposing image of the CSAs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemometrics and Intelligent Laboratory Systems 工程技术-分析化学

CiteScore

7.50

自引率

7.70%

发文量

169

审稿时长

3.4 months

期刊介绍： Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.