Review of Variable Selection Methods for Discriminant-Type Problems in Chemometrics

IF 1.9 Frontiers in analytical science Pub Date : 2022-05-19 DOI:10.3389/frans.2022.867938

Michael D. Sorochan Armstrong, A. P. de la Mata, J. Harynuk

{"title":"Review of Variable Selection Methods for Discriminant-Type Problems in Chemometrics","authors":"Michael D. Sorochan Armstrong, A. P. de la Mata, J. Harynuk","doi":"10.3389/frans.2022.867938","DOIUrl":null,"url":null,"abstract":"Discriminant-type analyses arise from the need to classify samples based on their measured characteristics (variables), usually with respect to some observable property. In the case of samples that are difficult to obtain, or using advanced instrumentation, it is very common to encounter situations with many more measured characteristics than samples. The method of Partial Least Squares Regression (PLS-R), and its variant for discriminant-type analyses (PLS-DA) are among the most ubiquitous of these tools. PLS utilises a rank-deficient method to solve the inverse least-squares problem in a way that maximises the co-variance between the known properties of the samples (commonly referred to as the Y-Block), and their measured characteristics (the X-block). A relatively small subset of highly co-variate variables are weighted more strongly than those that are poorly co-variate, in such a way that an ill-posed matrix inverse problem is circumvented. Feature selection is another common way of reducing the dimensionality of the data to a relatively small, robust subset of variables for use in subsequent modelling. The utility of these features can be inferred and tested any number of ways, this are the subject of this review.","PeriodicalId":73063,"journal":{"name":"Frontiers in analytical science","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in analytical science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frans.2022.867938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Discriminant-type analyses arise from the need to classify samples based on their measured characteristics (variables), usually with respect to some observable property. In the case of samples that are difficult to obtain, or using advanced instrumentation, it is very common to encounter situations with many more measured characteristics than samples. The method of Partial Least Squares Regression (PLS-R), and its variant for discriminant-type analyses (PLS-DA) are among the most ubiquitous of these tools. PLS utilises a rank-deficient method to solve the inverse least-squares problem in a way that maximises the co-variance between the known properties of the samples (commonly referred to as the Y-Block), and their measured characteristics (the X-block). A relatively small subset of highly co-variate variables are weighted more strongly than those that are poorly co-variate, in such a way that an ill-posed matrix inverse problem is circumvented. Feature selection is another common way of reducing the dimensionality of the data to a relatively small, robust subset of variables for use in subsequent modelling. The utility of these features can be inferred and tested any number of ways, this are the subject of this review.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

化学计学中判别型问题的变量选择方法综述

判别型分析产生于根据测量的特征(变量)对样本进行分类的需要，通常与一些可观察的属性有关。在样品难以获得的情况下，或使用先进的仪器，它是非常常见的遇到比样品更多的测量特性的情况。偏最小二乘回归(PLS-R)方法及其变体判别型分析(PLS-DA)是这些工具中最普遍的方法之一。PLS利用秩缺陷方法来解决逆最小二乘问题，以最大化样本的已知属性(通常称为y块)与其测量特征(x块)之间的协方差。相对较小的高协变量子集的权重比那些协变量较差的权重更强，这样就可以避免不适定矩阵逆问题。特征选择是另一种常见的方法，可以将数据的维数降低到一个相对较小的、健壮的变量子集，以便在随后的建模中使用。这些功能的效用可以通过多种方式推断和测试，这是本文的主题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in analytical science

自引率

0.00%

发文量