François Stevens, Beatriz Carrasco, Vincent Baeten, Juan A. Fernández Pierna
{"title":"在振动光谱学中使用 t 分布随机邻域嵌入法","authors":"François Stevens, Beatriz Carrasco, Vincent Baeten, Juan A. Fernández Pierna","doi":"10.1002/cem.3544","DOIUrl":null,"url":null,"abstract":"<p>The <i>t-distributed stochastic neighbour embedding</i> algorithm or <i>t-SNE</i> is a non-linear dimension reduction method used to visualise multivariate data. It enables a high-dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two-dimensional graph, revealing its global and local structure. t-SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t-SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t-SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at-a-glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre-processing, by comparing rapidly different general pre-processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t-SNE and PCA methods, allowing respective advantages of each to be exploited.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of t-distributed stochastic neighbour embedding in vibrational spectroscopy\",\"authors\":\"François Stevens, Beatriz Carrasco, Vincent Baeten, Juan A. Fernández Pierna\",\"doi\":\"10.1002/cem.3544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The <i>t-distributed stochastic neighbour embedding</i> algorithm or <i>t-SNE</i> is a non-linear dimension reduction method used to visualise multivariate data. It enables a high-dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two-dimensional graph, revealing its global and local structure. t-SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t-SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t-SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at-a-glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre-processing, by comparing rapidly different general pre-processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t-SNE and PCA methods, allowing respective advantages of each to be exploited.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"38 4\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.3544\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3544","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
Use of t-distributed stochastic neighbour embedding in vibrational spectroscopy
The t-distributed stochastic neighbour embedding algorithm or t-SNE is a non-linear dimension reduction method used to visualise multivariate data. It enables a high-dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two-dimensional graph, revealing its global and local structure. t-SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t-SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t-SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at-a-glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre-processing, by comparing rapidly different general pre-processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t-SNE and PCA methods, allowing respective advantages of each to be exploited.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.