化学计学中判别型问题的变量选择方法综述

Michael D. Sorochan Armstrong, A. P. de la Mata, J. Harynuk
{"title":"化学计学中判别型问题的变量选择方法综述","authors":"Michael D. Sorochan Armstrong, A. P. de la Mata, J. Harynuk","doi":"10.3389/frans.2022.867938","DOIUrl":null,"url":null,"abstract":"Discriminant-type analyses arise from the need to classify samples based on their measured characteristics (variables), usually with respect to some observable property. In the case of samples that are difficult to obtain, or using advanced instrumentation, it is very common to encounter situations with many more measured characteristics than samples. The method of Partial Least Squares Regression (PLS-R), and its variant for discriminant-type analyses (PLS-DA) are among the most ubiquitous of these tools. PLS utilises a rank-deficient method to solve the inverse least-squares problem in a way that maximises the co-variance between the known properties of the samples (commonly referred to as the Y-Block), and their measured characteristics (the X-block). A relatively small subset of highly co-variate variables are weighted more strongly than those that are poorly co-variate, in such a way that an ill-posed matrix inverse problem is circumvented. Feature selection is another common way of reducing the dimensionality of the data to a relatively small, robust subset of variables for use in subsequent modelling. The utility of these features can be inferred and tested any number of ways, this are the subject of this review.","PeriodicalId":73063,"journal":{"name":"Frontiers in analytical science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Review of Variable Selection Methods for Discriminant-Type Problems in Chemometrics\",\"authors\":\"Michael D. Sorochan Armstrong, A. P. de la Mata, J. Harynuk\",\"doi\":\"10.3389/frans.2022.867938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discriminant-type analyses arise from the need to classify samples based on their measured characteristics (variables), usually with respect to some observable property. In the case of samples that are difficult to obtain, or using advanced instrumentation, it is very common to encounter situations with many more measured characteristics than samples. The method of Partial Least Squares Regression (PLS-R), and its variant for discriminant-type analyses (PLS-DA) are among the most ubiquitous of these tools. PLS utilises a rank-deficient method to solve the inverse least-squares problem in a way that maximises the co-variance between the known properties of the samples (commonly referred to as the Y-Block), and their measured characteristics (the X-block). A relatively small subset of highly co-variate variables are weighted more strongly than those that are poorly co-variate, in such a way that an ill-posed matrix inverse problem is circumvented. Feature selection is another common way of reducing the dimensionality of the data to a relatively small, robust subset of variables for use in subsequent modelling. The utility of these features can be inferred and tested any number of ways, this are the subject of this review.\",\"PeriodicalId\":73063,\"journal\":{\"name\":\"Frontiers in analytical science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in analytical science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frans.2022.867938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in analytical science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frans.2022.867938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

判别型分析产生于根据测量的特征(变量)对样本进行分类的需要,通常与一些可观察的属性有关。在样品难以获得的情况下,或使用先进的仪器,它是非常常见的遇到比样品更多的测量特性的情况。偏最小二乘回归(PLS-R)方法及其变体判别型分析(PLS-DA)是这些工具中最普遍的方法之一。PLS利用秩缺陷方法来解决逆最小二乘问题,以最大化样本的已知属性(通常称为y块)与其测量特征(x块)之间的协方差。相对较小的高协变量子集的权重比那些协变量较差的权重更强,这样就可以避免不适定矩阵逆问题。特征选择是另一种常见的方法,可以将数据的维数降低到一个相对较小的、健壮的变量子集,以便在随后的建模中使用。这些功能的效用可以通过多种方式推断和测试,这是本文的主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Review of Variable Selection Methods for Discriminant-Type Problems in Chemometrics
Discriminant-type analyses arise from the need to classify samples based on their measured characteristics (variables), usually with respect to some observable property. In the case of samples that are difficult to obtain, or using advanced instrumentation, it is very common to encounter situations with many more measured characteristics than samples. The method of Partial Least Squares Regression (PLS-R), and its variant for discriminant-type analyses (PLS-DA) are among the most ubiquitous of these tools. PLS utilises a rank-deficient method to solve the inverse least-squares problem in a way that maximises the co-variance between the known properties of the samples (commonly referred to as the Y-Block), and their measured characteristics (the X-block). A relatively small subset of highly co-variate variables are weighted more strongly than those that are poorly co-variate, in such a way that an ill-posed matrix inverse problem is circumvented. Feature selection is another common way of reducing the dimensionality of the data to a relatively small, robust subset of variables for use in subsequent modelling. The utility of these features can be inferred and tested any number of ways, this are the subject of this review.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Separation of isobaric phosphorothioate oligonucleotides in capillary electrophoresis: study of the influence of cationic cyclodextrins on chemo and stereoselectivity Simultaneous determination of small molecules and proteins in wastewater-based epidemiology A retrospective view on non-linear methods in chemometrics, and future directions A Bayesian approach for constituent estimation in nucleic acid mixture models Editorial: Plant-microbe omics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1