Nicole M. North, Jessica B. Clark, Abigail A. A. Enders, Alex J. Grooms, Salmika G. Wairegi, Kezia A. Duah, Efthimia I. Palassis-Naziri, Abraham Badu-Tawiah and Heather C. Allen*,
{"title":"通过基于回归的机器学习对海洋样本进行多分析物浓度分析","authors":"Nicole M. North, Jessica B. Clark, Abigail A. A. Enders, Alex J. Grooms, Salmika G. Wairegi, Kezia A. Duah, Efthimia I. Palassis-Naziri, Abraham Badu-Tawiah and Heather C. Allen*, ","doi":"10.1021/acsearthspacechem.4c0001810.1021/acsearthspacechem.4c00018","DOIUrl":null,"url":null,"abstract":"<p >Marine systems are incredibly chemically complex. An understanding of the chemical compounds that make up the chemical diversity in marine samples is critical to understanding ecological and ocean health metrics. Using Raman spectroscopy in tandem with machine learning combines a low-cost, highly transportable analytical technique with a powerful and rapid computational approach that can aid in marine analysis. Here, we use Raman spectroscopy and machine learning to identify mM concentrations of three chemically relevant compounds in three distinct classes in a complex aqueous matrix. Saccharides are represented by glucose, fatty acids by butyric acid, and proteins by an amino acid proxy through glycine. Eight classical machine learning models (gradient boosted regressors, random forests, histogram gradient boosted regressors, decision trees, k-nearest neighbors, support vector regression, multi-layer perceptrons, and multivariate linear regression) were tested for their accuracy in identifying the concentrations of glycine, glucose, and butyric acid in marine samples, which were benchmarked through a mass spectrometric method. Support vector regression was able to best identify all three concentrations of glycine, butyric acid, and glucose. Butyric acid was similarly well described through gradient boosted regression and histogram gradient boosted regression. The described spectroscopy and machine learning methodology has the potential to significantly advance rapid field analysis of marine samples.</p>","PeriodicalId":15,"journal":{"name":"ACS Earth and Space Chemistry","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Analyte Concentration Analysis of Marine Samples through Regression-Based Machine Learning\",\"authors\":\"Nicole M. North, Jessica B. Clark, Abigail A. A. Enders, Alex J. Grooms, Salmika G. Wairegi, Kezia A. Duah, Efthimia I. Palassis-Naziri, Abraham Badu-Tawiah and Heather C. Allen*, \",\"doi\":\"10.1021/acsearthspacechem.4c0001810.1021/acsearthspacechem.4c00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Marine systems are incredibly chemically complex. An understanding of the chemical compounds that make up the chemical diversity in marine samples is critical to understanding ecological and ocean health metrics. Using Raman spectroscopy in tandem with machine learning combines a low-cost, highly transportable analytical technique with a powerful and rapid computational approach that can aid in marine analysis. Here, we use Raman spectroscopy and machine learning to identify mM concentrations of three chemically relevant compounds in three distinct classes in a complex aqueous matrix. Saccharides are represented by glucose, fatty acids by butyric acid, and proteins by an amino acid proxy through glycine. Eight classical machine learning models (gradient boosted regressors, random forests, histogram gradient boosted regressors, decision trees, k-nearest neighbors, support vector regression, multi-layer perceptrons, and multivariate linear regression) were tested for their accuracy in identifying the concentrations of glycine, glucose, and butyric acid in marine samples, which were benchmarked through a mass spectrometric method. Support vector regression was able to best identify all three concentrations of glycine, butyric acid, and glucose. Butyric acid was similarly well described through gradient boosted regression and histogram gradient boosted regression. The described spectroscopy and machine learning methodology has the potential to significantly advance rapid field analysis of marine samples.</p>\",\"PeriodicalId\":15,\"journal\":{\"name\":\"ACS Earth and Space Chemistry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Earth and Space Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acsearthspacechem.4c00018\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Earth and Space Chemistry","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsearthspacechem.4c00018","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Multi-Analyte Concentration Analysis of Marine Samples through Regression-Based Machine Learning
Marine systems are incredibly chemically complex. An understanding of the chemical compounds that make up the chemical diversity in marine samples is critical to understanding ecological and ocean health metrics. Using Raman spectroscopy in tandem with machine learning combines a low-cost, highly transportable analytical technique with a powerful and rapid computational approach that can aid in marine analysis. Here, we use Raman spectroscopy and machine learning to identify mM concentrations of three chemically relevant compounds in three distinct classes in a complex aqueous matrix. Saccharides are represented by glucose, fatty acids by butyric acid, and proteins by an amino acid proxy through glycine. Eight classical machine learning models (gradient boosted regressors, random forests, histogram gradient boosted regressors, decision trees, k-nearest neighbors, support vector regression, multi-layer perceptrons, and multivariate linear regression) were tested for their accuracy in identifying the concentrations of glycine, glucose, and butyric acid in marine samples, which were benchmarked through a mass spectrometric method. Support vector regression was able to best identify all three concentrations of glycine, butyric acid, and glucose. Butyric acid was similarly well described through gradient boosted regression and histogram gradient boosted regression. The described spectroscopy and machine learning methodology has the potential to significantly advance rapid field analysis of marine samples.
期刊介绍:
The scope of ACS Earth and Space Chemistry includes the application of analytical, experimental and theoretical chemistry to investigate research questions relevant to the Earth and Space. The journal encompasses the highly interdisciplinary nature of research in this area, while emphasizing chemistry and chemical research tools as the unifying theme. The journal publishes broadly in the domains of high- and low-temperature geochemistry, atmospheric chemistry, marine chemistry, planetary chemistry, astrochemistry, and analytical geochemistry. ACS Earth and Space Chemistry publishes Articles, Letters, Reviews, and Features to provide flexible formats to readily communicate all aspects of research in these fields.