Hyperspectral imaging technology combines two-dimensional imaging and spectral technology, which can simultaneously obtain spatial and spectral information of the object to be measured and is an advanced technical method. With the development of science and technology, the detection of tea has also been continuously improved, and it has developed in the direction of being nondestructive, fast, real-time, and accurate. In this paper, the principle of hyperspectral imaging technology is introduced, and according to research on hyperspectral imaging technology in the nondestructive rapid detection of tea in the past 5 years, the application of hyperspectral imaging technology in the detection of tea biochemical components, accurate classification, determination of mildew degree, and stress monitoring and the application progress in planting production management are analyzed. Additionally, the main challenges existing in the current research are analyzed, and future application prospects are proposed to provide a reference for the application and promotion of hyperspectral imaging technology in the actual production of tea.
{"title":"Research progress on the application of hyperspectral imaging techniques in tea science","authors":"Dongxia Liang, Qiaoyi Zhou, Caijin Ling, Liyang Gao, Xiaoting Mu, Zhencheng Liao","doi":"10.1002/cem.3481","DOIUrl":"10.1002/cem.3481","url":null,"abstract":"<p>Hyperspectral imaging technology combines two-dimensional imaging and spectral technology, which can simultaneously obtain spatial and spectral information of the object to be measured and is an advanced technical method. With the development of science and technology, the detection of tea has also been continuously improved, and it has developed in the direction of being nondestructive, fast, real-time, and accurate. In this paper, the principle of hyperspectral imaging technology is introduced, and according to research on hyperspectral imaging technology in the nondestructive rapid detection of tea in the past 5 years, the application of hyperspectral imaging technology in the detection of tea biochemical components, accurate classification, determination of mildew degree, and stress monitoring and the application progress in planting production management are analyzed. Additionally, the main challenges existing in the current research are analyzed, and future application prospects are proposed to provide a reference for the application and promotion of hyperspectral imaging technology in the actual production of tea.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47785080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new algorithm for robust multiblock (data fusion) modelling in the presence of outlying observations is presented. The method is a combination of a robust modelling technique called iterative reweighted partial least squares and the block order and scale-independent component-wise multiblock partial least squares modelling. The method is based on automatic down-weighting of outlying observations such that their contribution is minimal during the estimation of block-wise partial least squares models, thus leading to robust modelling minimally affected by outliers. The algorithm and test of the methods for modelling multiblock data sets (simulated and real) in the presence of outlying observation are demonstrated.
{"title":"An algorithm for robust multiblock partial least squares predictive modelling","authors":"Puneet Mishra, Kristian Hovde Liland","doi":"10.1002/cem.3480","DOIUrl":"10.1002/cem.3480","url":null,"abstract":"<p>A new algorithm for robust multiblock (data fusion) modelling in the presence of outlying observations is presented. The method is a combination of a robust modelling technique called <i>iterative reweighted partial least squares</i> and the block order and scale-independent component-wise multiblock partial least squares modelling. The method is based on automatic down-weighting of outlying observations such that their contribution is minimal during the estimation of block-wise partial least squares models, thus leading to robust modelling minimally affected by outliers. The algorithm and test of the methods for modelling multiblock data sets (simulated and real) in the presence of outlying observation are demonstrated.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3480","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43070986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caelin P. Celani, Rachel A. McCormick, Amelia M. Speed, William Johnston, James A. Jordan, Tyler B. Coplen, Karl S. Booksh
The illegal timber trade has significant impact on the survival of endangered tropical hardwood species like Dalbergia spp. (rosewood), a world-wide protected genus from the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Due to increased threat to Dalbergia spp., and lack of action to reduce threats, port of entry analysis methods are required to identify Dalbergia spp. Handheld laser-induced breakdown spectroscopy (LIBS) has been shown to be capable of identifying species and establishing provenance of Dalbergia spp. and other tropical hardwoods, but analysis methods for this work have yet to be investigated in detail. The present work investigates five well-known algorithms—partial least squares discriminant analysis (PLS-DA), classification and regression trees (CART), k-nearest neighbor (k-NN), random forest (RF), and support vector machine (SVM)—two training/test set sampling regimes, and data collection at two signal-to-noise (S/N) ratios to assess the potential for handheld LIBS analyses. Additionally, imbalanced classes are addressed. For this application, SVM and RF yield near identical results (though RF takes nearly 100 longer to compute), while the S/N ratio has a significant effect on model success assuming all else is equal. It was found that forming a training set with replicate low S/N analyses can perform as well as higher precision training sets for true prediction, even if the predicted samples have low signal to noise! This work confirms handheld LIBS analyzers can provide a viable method for classification of hardwood species, even within the same genus.
{"title":"Evaluation of spectral collection strategies for identification of Dalbergia spp. using handheld laser-induced breakdown spectroscopy","authors":"Caelin P. Celani, Rachel A. McCormick, Amelia M. Speed, William Johnston, James A. Jordan, Tyler B. Coplen, Karl S. Booksh","doi":"10.1002/cem.3479","DOIUrl":"10.1002/cem.3479","url":null,"abstract":"<p>The illegal timber trade has significant impact on the survival of endangered tropical hardwood species like <i>Dalbergia</i> spp. (rosewood), a world-wide protected genus from the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Due to increased threat to <i>Dalbergia</i> spp., and lack of action to reduce threats, port of entry analysis methods are required to identify <i>Dalbergia</i> spp. Handheld laser-induced breakdown spectroscopy (LIBS) has been shown to be capable of identifying species and establishing provenance of <i>Dalbergia</i> spp. and other tropical hardwoods, but analysis methods for this work have yet to be investigated in detail. The present work investigates five well-known algorithms—partial least squares discriminant analysis (PLS-DA), classification and regression trees (CART), <i>k</i>-nearest neighbor (<i>k</i>-NN), random forest (RF), and support vector machine (SVM)—two training/test set sampling regimes, and data collection at two signal-to-noise (S/N) ratios to assess the potential for handheld LIBS analyses. Additionally, imbalanced classes are addressed. For this application, SVM and RF yield near identical results (though RF takes nearly 100 longer to compute), while the S/N ratio has a significant effect on model success assuming all else is equal. It was found that forming a training set with replicate low S/N analyses can perform as well as higher precision training sets for true prediction, even if the predicted samples have low signal to noise! This work confirms handheld LIBS analyzers can provide a viable method for classification of hardwood species, even within the same genus.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45393298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alternating least squares within the multivariate curve resolution framework has seen a lot of practical applications and shows their distinction with their relatively simple and flexible implementation. However, the limitations of least squares should be carefully considered when deviating from the standard assumed data structure. Within this work, we highlight the effects of noise in the presence of minor components, and we propose a novel weighting scheme within the weighted multivariate curve-resolution-alternating least squares framework to resolve it. Two simulated and one Raman imaging case are investigated by comparing the novel methodology against standard multivariate curve resolution-alternating least squares and essential spectral pixel selection. A trade-off is observed between current methods, whereas the novel weighting scheme demonstrates a balance where the benefits of the previous two methods are retained.
{"title":"Weighted multivariate curve resolution—Alternating least squares based on sample relevance","authors":"Mohamad Ahmad, Raffaele Vitale, Marina Cocchi, Cyril Ruckebusch","doi":"10.1002/cem.3478","DOIUrl":"10.1002/cem.3478","url":null,"abstract":"<p>Alternating least squares within the multivariate curve resolution framework has seen a lot of practical applications and shows their distinction with their relatively simple and flexible implementation. However, the limitations of least squares should be carefully considered when deviating from the standard assumed data structure. Within this work, we highlight the effects of noise in the presence of minor components, and we propose a novel weighting scheme within the weighted multivariate curve-resolution-alternating least squares framework to resolve it. Two simulated and one Raman imaging case are investigated by comparing the novel methodology against standard multivariate curve resolution-alternating least squares and essential spectral pixel selection. A trade-off is observed between current methods, whereas the novel weighting scheme demonstrates a balance where the benefits of the previous two methods are retained.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3478","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43409795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bianca Mikulasek, Valeria Fonseca Diaz, David Gabauer, Christoph Herwig, Ramin Nikzad-Langerodi
This paper introduces the multiple domain-invariant partial least squares (mdi-PLS) method, which generalizes the recently introduced domain-invariant partial least squares method (di-PLS). In contrast to di-PLS which solely allows transferring of knowledge from a single source to a single target domain, the proposed approach enables the incorporation of data from an arbitrary number of domains. Additionally, mdi-PLS offers a high level of flexibility by accepting labeled (supervised) and unlabeled (unsupervised) data to cope with dataset shifts. We demonstrate the application of the mdi-PLS method on a simulated and one real-world dataset. Our results show a clear outperformance of both PLS and di-PLS when data from multiple related domains are available for training multivariate calibration models underpinning the benefit of mdi-PLS.
{"title":"Partial least squares regression with multiple domains","authors":"Bianca Mikulasek, Valeria Fonseca Diaz, David Gabauer, Christoph Herwig, Ramin Nikzad-Langerodi","doi":"10.1002/cem.3477","DOIUrl":"10.1002/cem.3477","url":null,"abstract":"<p>This paper introduces the multiple domain-invariant partial least squares (mdi-PLS) method, which generalizes the recently introduced domain-invariant partial least squares method (di-PLS). In contrast to di-PLS which solely allows transferring of knowledge from a single source to a single target domain, the proposed approach enables the incorporation of data from an arbitrary number of domains. Additionally, mdi-PLS offers a high level of flexibility by accepting labeled (supervised) and unlabeled (unsupervised) data to cope with dataset shifts. We demonstrate the application of the mdi-PLS method on a simulated and one real-world dataset. Our results show a clear outperformance of both PLS and di-PLS when data from multiple related domains are available for training multivariate calibration models underpinning the benefit of mdi-PLS.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44547298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhide Zhao, Laijun Sun, Hongyi Bai, Hong Zhang, Yujie Tian
In this study, near infrared spectroscopy (NIRS) technique was used for quantitative detection of quaternary blended oil. After a series of preprocessing, the prediction effects of the three models and their preprocessing combinations were compared. Taking soybean oil content prediction as an example, random forest (RF) model had better performance after second derivative (D2) optimization. In feature selection, a two-step feature selection method was adopted to extract the feature wavelength. First, the elastic net (EN) was used for the initial screening of feature wavelengths, and most irrelevant features were eliminated. The number of feature wavelengths was reduced from 1048 to 134. After that, the competitive adaptive re-weighted sampling (CARS) method was used to screen the remaining characteristic wavelengths more carefully, and 20 effective characteristic wavelengths were selected. Finally, a quantitative detection model was established based on 20 effective characteristic wavelengths selected by EN + CARS. Evaluated by the test set, The correlation coefficient of determination (R2), root-mean-square error of prediction (RMSEP) and Relative Percent Difference (RPD) values of 2D + EN + CARS + RF model were 0.97953, 1.34306 and 7.08875, respectively. The results showed that the two-step feature selection method can effectively extract the feature wavelength, and the NIRS technology can realize the intelligent detection of blended oil components.
采用近红外光谱(NIRS)技术对四元调合油进行定量检测。经过一系列预处理后,比较了三种模型及其预处理组合的预测效果。以大豆油含量预测为例,随机森林(RF)模型经过二阶导数(D2)优化后,具有较好的预测效果。在特征选择方面,采用两步特征选择方法提取特征波长。首先,利用弹性网(EN)对特征波长进行初步筛选,剔除大部分不相关的特征;特征波长的数量从1048个减少到134个。之后,采用竞争自适应加权采样(CARS)方法对剩余特征波长进行更细致的筛选,选出20个有效特征波长。最后,基于EN + CARS选择的20个有效特征波长建立了定量检测模型。经检验集评估,2D + EN + CARS + RF模型的判定相关系数(R2)、预测均方根误差(RMSEP)和相对百分比差(RPD)值分别为0.97953、1.34306和7.08875。结果表明,两步特征选择方法可以有效地提取特征波长,近红外光谱技术可以实现混合油成分的智能检测。
{"title":"Intelligent component detection of quaternary blended oil based on near infrared spectroscopy technology","authors":"Zhide Zhao, Laijun Sun, Hongyi Bai, Hong Zhang, Yujie Tian","doi":"10.1002/cem.3476","DOIUrl":"10.1002/cem.3476","url":null,"abstract":"<p>In this study, near infrared spectroscopy (NIRS) technique was used for quantitative detection of quaternary blended oil. After a series of preprocessing, the prediction effects of the three models and their preprocessing combinations were compared. Taking soybean oil content prediction as an example, random forest (RF) model had better performance after second derivative (D2) optimization. In feature selection, a two-step feature selection method was adopted to extract the feature wavelength. First, the elastic net (EN) was used for the initial screening of feature wavelengths, and most irrelevant features were eliminated. The number of feature wavelengths was reduced from 1048 to 134. After that, the competitive adaptive re-weighted sampling (CARS) method was used to screen the remaining characteristic wavelengths more carefully, and 20 effective characteristic wavelengths were selected. Finally, a quantitative detection model was established based on 20 effective characteristic wavelengths selected by EN + CARS. Evaluated by the test set, The correlation coefficient of determination (<i>R</i><sup>2</sup>), root-mean-square error of prediction (RMSEP) and Relative Percent Difference (RPD) values of 2D + EN + CARS + RF model were 0.97953, 1.34306 and 7.08875, respectively. The results showed that the two-step feature selection method can effectively extract the feature wavelength, and the NIRS technology can realize the intelligent detection of blended oil components.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46026258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A three-layer artificial neural network (ANN) model was developed to predict the efficiency of Cu(II) and Pb(II) ion removal from aqueous solution by cobalt hydroxide nano-flakes. It is based on experimental sets obtained from a D-optimal design. The input variables to the neural network were as follows: the initial concentration of Pb(II) and Cu (II) ions (mg L−1), initial pH, and sorbent mass (g). The configuration of the backpropagation neural network for both Cu(II) and Pb (II) ions was a tangent sigmoid transfer function (tansig) at the hidden layer, linear transfer function (purelin) at the output layer, and Levenberg–Marquardt training algorithm (LMA). ANN-predicted results were very close to the experimental results with a coefficient of determination (R2) of 0.9970 and mean square error (MSE) 0.000376. Analysis based on the ANN model indicated that sorbent mass appeared to be the most influential factor in the adsorption process of Cu(II) and Pb(II). Characterization of the cobalt hydroxide nano-flakes and possible metal ions-adsorbent interactions were confirmed by Fourier transform infrared spectroscopy (FT-IR), X-ray diffraction (XRD), and scanning electron microscopy (SEM).
{"title":"Artificial neural network (ANN) modeling for simultaneous removal of a binary mixture of Pb(II) and Cu(II) by cobalt hydroxide nano-flakes","authors":"Javad Zolgharnein, Tahere Shariatmanesh, Saeideh Dermanaki Farahani","doi":"10.1002/cem.3475","DOIUrl":"10.1002/cem.3475","url":null,"abstract":"<p>A three-layer artificial neural network (ANN) model was developed to predict the efficiency of Cu(II) and Pb(II) ion removal from aqueous solution by cobalt hydroxide nano-flakes. It is based on experimental sets obtained from a D-optimal design. The input variables to the neural network were as follows: the initial concentration of Pb(II) and Cu (II) ions (mg L<sup>−1</sup>), initial pH, and sorbent mass (g). The configuration of the backpropagation neural network for both Cu(II) and Pb (II) ions was a tangent sigmoid transfer function (tansig) at the hidden layer, linear transfer function (purelin) at the output layer, and Levenberg–Marquardt training algorithm (LMA). ANN-predicted results were very close to the experimental results with a coefficient of determination (<i>R</i><sup>2</sup>) of 0.9970 and mean square error (MSE) 0.000376. Analysis based on the ANN model indicated that sorbent mass appeared to be the most influential factor in the adsorption process of Cu(II) and Pb(II). Characterization of the cobalt hydroxide nano-flakes and possible metal ions-adsorbent interactions were confirmed by Fourier transform infrared spectroscopy (FT-IR), X-ray diffraction (XRD), and scanning electron microscopy (SEM).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42119878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, the simultaneous determination of aspirin, clopidogrel, and either atorvastatin or rosuvastatin in their fixed-dose combination (FDC) formulations has been reported. As a straightforward substitute for employing distinct models for each component, UV spectrophotometry was applied with chemometric approaches and artificial neural networks to achieve this. Three chemometric techniques, including principal component regression (PCR), partial least-squares (PLS), and classical least-squares (CLS), were applied in addition to the radial basis function-artificial neural network (RBF-ANN). The validation of a set of laboratory-prepared combinations of aspirin, clopidogrel, and atorvastatin in one ternary mixture and aspirin, clopidogrel, and rosuvastatin in a second ternary mixture was assessed, and the results from the use of these approaches were recorded and compared. The absorbance data matrix matching the concentration data matrix in CLS, PCR, and PLS was created using measurements of absorbances in the range of 250–280 nm at intervals of 0.2 nm in their zero-order spectra. Then, in order to forecast the unknown concentrations, calibration or regression was created utilizing the concentration and absorbance data matrices. Using RBF-ANN for the simultaneous determination of aspirin, clopidogrel, and atorvastatin or rosuvastatin in their formulations was achieved by providing the input layer with 151 neurons; there are 2 hidden layers and 3 output neurons were obtained. The green profile of the developed methods has been assessed and compared with previously reported spectrophotometric methods. The suggested techniques were effectively applied to FDC dosage forms that contained the cited medications.
{"title":"A novel eco-friendly methods for simultaneous determination of aspirin, clopidogrel, and atorvastatin or rosuvastatin in their fixed-dose combination using chemometric techniques and artificial neural networks","authors":"Norhan S. AlSawy, Ehab F. ElKady, Eman A. Mostafa","doi":"10.1002/cem.3474","DOIUrl":"10.1002/cem.3474","url":null,"abstract":"<p>In this study, the simultaneous determination of aspirin, clopidogrel, and either atorvastatin or rosuvastatin in their fixed-dose combination (FDC) formulations has been reported. As a straightforward substitute for employing distinct models for each component, UV spectrophotometry was applied with chemometric approaches and artificial neural networks to achieve this. Three chemometric techniques, including principal component regression (PCR), partial least-squares (PLS), and classical least-squares (CLS), were applied in addition to the radial basis function-artificial neural network (RBF-ANN). The validation of a set of laboratory-prepared combinations of aspirin, clopidogrel, and atorvastatin in one ternary mixture and aspirin, clopidogrel, and rosuvastatin in a second ternary mixture was assessed, and the results from the use of these approaches were recorded and compared. The absorbance data matrix matching the concentration data matrix in CLS, PCR, and PLS was created using measurements of absorbances in the range of 250–280 nm at intervals of 0.2 nm in their zero-order spectra. Then, in order to forecast the unknown concentrations, calibration or regression was created utilizing the concentration and absorbance data matrices. Using RBF-ANN for the simultaneous determination of aspirin, clopidogrel, and atorvastatin or rosuvastatin in their formulations was achieved by providing the input layer with 151 neurons; there are 2 hidden layers and 3 output neurons were obtained. The green profile of the developed methods has been assessed and compared with previously reported spectrophotometric methods. The suggested techniques were effectively applied to FDC dosage forms that contained the cited medications.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45648486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ink classification is the ability to distinguish unknown inks into different groups, and ink source prediction is the ability to predict the manufacturer or model of an unknown ink. These are regular tasks in forensic analysis. The latter is more challenging than the former, as ink source prediction has expanded beyond ink classification. In this work, we reported on an approach to predict the source of black inks based on direct analysis in real time mass spectrometry and assess the strength of black ink source prediction conclusion via the likelihood ratio, using a dataset that included 39 inks from three manufacturers with a high market share. Most of these inks contain similar or identical chemical components. Dimensionality reduction based on the principal component analysis and unified manifold approximation and projection algorithms was implemented, and subsequently, the distribution plots illustrated the variations between and within the inks. Unified manifold approximation and projection showed significant priority in avoiding overcrowding of cluster representation versus principal component analysis, with results as high as 99.83% for the prediction of the ink source using 41,432 spectra data (70% data for training and 30% data for testing) after dimensionality reduction. A likelihood ratio was used to evaluate the strength of ink evidence, and the pool-adjacent-violators algorithm and logistic algorithms were used to calibrate the likelihood ratio. The results showed that the pool-adjacent-violators algorithm and logistic algorithms both had an excellent equal error rate of 0.004 but slightly different results in the rates of misleading evidence in favor of the prosecutor's hypothesis, rates of misleading evidence in favor of the defense's hypothesis, log likelihood ratio costs after calibration (