{"title":"A retrospective view on non-linear methods in chemometrics, and future directions","authors":"Frank Westad, G. R. Flåten","doi":"10.3389/frans.2024.1393222","DOIUrl":null,"url":null,"abstract":"This perspective article reviews how the chemometrics community approached non-linear methods in the early years. In addition to the basic chemometric methods, some methods that fall under the term “machine learning” are also mentioned. Thereafter, types of non-linearity are briefly presented, followed by discussions on important aspects of modeling related to non-linear data. Lastly, a simulated data set with non-linear properties is analyzed for quantitative prediction and batch monitoring. The conclusion is that the latent variable methods to a large extent handle non-linearities by adding more linear combinations of the original variables. Nevertheless, with strong non-linearities between the X and Y space, non-linear methods such as Support Vector Machines might improve prediction performance at the cost of interpretability into both the sample and variable space. Applying multiple local models can improve performance compared to a single global model, of both linear and non-linear nature. When non-linear methods are applied, the need for conservative model validation is even more important. Another approach is pre-processing of the data which can make the data more linear before the actual modeling and prediction phase.","PeriodicalId":73063,"journal":{"name":"Frontiers in analytical science","volume":"1 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in analytical science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frans.2024.1393222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This perspective article reviews how the chemometrics community approached non-linear methods in the early years. In addition to the basic chemometric methods, some methods that fall under the term “machine learning” are also mentioned. Thereafter, types of non-linearity are briefly presented, followed by discussions on important aspects of modeling related to non-linear data. Lastly, a simulated data set with non-linear properties is analyzed for quantitative prediction and batch monitoring. The conclusion is that the latent variable methods to a large extent handle non-linearities by adding more linear combinations of the original variables. Nevertheless, with strong non-linearities between the X and Y space, non-linear methods such as Support Vector Machines might improve prediction performance at the cost of interpretability into both the sample and variable space. Applying multiple local models can improve performance compared to a single global model, of both linear and non-linear nature. When non-linear methods are applied, the need for conservative model validation is even more important. Another approach is pre-processing of the data which can make the data more linear before the actual modeling and prediction phase.
这篇视角文章回顾了早年化学计量学界是如何处理非线性方法的。除了基本的化学计量学方法外,还提到了一些属于 "机器学习 "范畴的方法。随后,简要介绍了非线性的类型,并讨论了与非线性数据相关的建模的重要方面。最后,分析了具有非线性特性的模拟数据集,以进行定量预测和批量监控。结论是,潜变量方法在很大程度上是通过增加原始变量的线性组合来处理非线性问题的。然而,如果 X 和 Y 空间之间存在很强的非线性,支持向量机等非线性方法可能会提高预测性能,但代价是样本和变量空间的可解释性。与单一的全局模型相比,应用多个局部模型(包括线性和非线性模型)可以提高性能。当采用非线性方法时,保守的模型验证就显得更加重要。另一种方法是在实际建模和预测阶段之前对数据进行预处理,使数据更加线性。