Machine Learning Correlation of Electron Micrographs and ToF-SIMS for the Analysis of Organic Biomarkers in Mudstone.

IF 3.1 2区 化学 Q2 BIOCHEMICAL RESEARCH METHODS Journal of the American Society for Mass Spectrometry Pub Date : 2025-01-01 Epub Date: 2024-12-19 DOI:10.1021/jasms.4c00300
Michael J Pasterski, Matthias Lorenz, Anton V Ievlev, Raveendra C Wickramasinghe, Luke Hanley, Fabien Kenig
{"title":"Machine Learning Correlation of Electron Micrographs and ToF-SIMS for the Analysis of Organic Biomarkers in Mudstone.","authors":"Michael J Pasterski, Matthias Lorenz, Anton V Ievlev, Raveendra C Wickramasinghe, Luke Hanley, Fabien Kenig","doi":"10.1021/jasms.4c00300","DOIUrl":null,"url":null,"abstract":"<p><p>The spatial distribution of organics in geological samples can be used to determine when and how these organics were incorporated into the host rock. Mass spectrometry (MS) imaging can rapidly collect a large amount of data, but ions produced are mixed without discrimination, resulting in complex mass spectra that can be difficult to interpret. Here, we apply unsupervised and supervised machine learning (ML) to help interpret spectra from time-of-flight-secondary ion mass spectrometry (ToF-SIMS) of an organic-carbon-rich mudstone of the Middle Jurassic of England (UK). It was previously shown that the presence of sterane molecular biomarkers in this sample can be detected via ToF-SIMS (Pasterski, M. J. et al., <i>Astrobiology</i> 2023, 23, 936). We use unsupervised ML on scanning electron microscopy-electron dispersive spectroscopy (SEM-EDS) measurements to define compositional categories based on differences in elemental abundances. We then test the ability of four ML algorithms─k-nearest neighbors (KNN), recursive partitioning and regressive trees (RPART), eXtreme gradient boost (XGBoost), and random forest (RF)─to classify the ToF-SIM spectra using (1) the categories assigned via SEM-EDS, (2) organic and inorganic labels assigned via SEM-EDS, and (3) the presence or absence of detectable steranes in ToF-SIMS spectra. In terms of predictive accuracy and balanced accuracy, KNN was the best performing model and RPART the worst. The feature importance, or the specific features of the ToF-SIM spectra used by the models to make classifications, cannot be determined for KNN, preventing posthoc model interpretation. Nevertheless, the feature importance extracted from the other models was useful for interpreting spectra. We determined that some of the organic ions used to classify biomarker containing spectra may be fragment ions derived from kerogen which is abundant in this mudstone sample.</p>","PeriodicalId":672,"journal":{"name":"Journal of the American Society for Mass Spectrometry","volume":" ","pages":"58-71"},"PeriodicalIF":3.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jasms.4c00300","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/19 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The spatial distribution of organics in geological samples can be used to determine when and how these organics were incorporated into the host rock. Mass spectrometry (MS) imaging can rapidly collect a large amount of data, but ions produced are mixed without discrimination, resulting in complex mass spectra that can be difficult to interpret. Here, we apply unsupervised and supervised machine learning (ML) to help interpret spectra from time-of-flight-secondary ion mass spectrometry (ToF-SIMS) of an organic-carbon-rich mudstone of the Middle Jurassic of England (UK). It was previously shown that the presence of sterane molecular biomarkers in this sample can be detected via ToF-SIMS (Pasterski, M. J. et al., Astrobiology 2023, 23, 936). We use unsupervised ML on scanning electron microscopy-electron dispersive spectroscopy (SEM-EDS) measurements to define compositional categories based on differences in elemental abundances. We then test the ability of four ML algorithms─k-nearest neighbors (KNN), recursive partitioning and regressive trees (RPART), eXtreme gradient boost (XGBoost), and random forest (RF)─to classify the ToF-SIM spectra using (1) the categories assigned via SEM-EDS, (2) organic and inorganic labels assigned via SEM-EDS, and (3) the presence or absence of detectable steranes in ToF-SIMS spectra. In terms of predictive accuracy and balanced accuracy, KNN was the best performing model and RPART the worst. The feature importance, or the specific features of the ToF-SIM spectra used by the models to make classifications, cannot be determined for KNN, preventing posthoc model interpretation. Nevertheless, the feature importance extracted from the other models was useful for interpreting spectra. We determined that some of the organic ions used to classify biomarker containing spectra may be fragment ions derived from kerogen which is abundant in this mudstone sample.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
泥岩有机生物标志物分析中电子显微图与ToF-SIMS机器学习相关性研究。
地质样品中有机物的空间分布可以用来确定这些有机物何时以及如何被纳入寄主岩石。质谱(MS)成像可以快速收集大量数据,但产生的离子混合而不区分,导致复杂的质谱难以解释。在这里,我们应用无监督和监督机器学习(ML)来帮助解释英格兰(英国)中侏罗世富有机碳泥岩的飞行时间二次离子质谱(ToF-SIMS)光谱。先前的研究表明,该样品中的甾烷分子生物标志物可以通过ToF-SIMS检测到(Pasterski, m.j. et al., Astrobiology 2023,23,936)。我们在扫描电子显微镜-电子色散光谱(SEM-EDS)测量中使用无监督ML来定义基于元素丰度差异的成分类别。然后,我们测试了四种ML算法──k近邻(KNN)、递归划分和回归树(RPART)、极端梯度增强(XGBoost)和随机森林(RF)──的能力,使用(1)通过SEM-EDS分配的类别,(2)通过SEM-EDS分配的有机和无机标签,以及(3)ToF-SIMS光谱中可检测甾烷的存在或不存在,对ToF-SIM光谱进行分类。在预测精度和平衡精度方面,KNN是表现最好的模型,RPART是最差的模型。对于KNN,无法确定特征的重要性,或者模型用于分类的ToF-SIM光谱的具体特征,从而阻碍了后期模型的解释。然而,从其他模型中提取的特征重要性对解释光谱是有用的。我们确定了一些用于分类含生物标志物光谱的有机离子可能是来自该泥岩样品中丰富的干酪根的碎片离子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.50
自引率
9.40%
发文量
257
审稿时长
1 months
期刊介绍: The Journal of the American Society for Mass Spectrometry presents research papers covering all aspects of mass spectrometry, incorporating coverage of fields of scientific inquiry in which mass spectrometry can play a role. Comprehensive in scope, the journal publishes papers on both fundamentals and applications of mass spectrometry. Fundamental subjects include instrumentation principles, design, and demonstration, structures and chemical properties of gas-phase ions, studies of thermodynamic properties, ion spectroscopy, chemical kinetics, mechanisms of ionization, theories of ion fragmentation, cluster ions, and potential energy surfaces. In addition to full papers, the journal offers Communications, Application Notes, and Accounts and Perspectives
期刊最新文献
Faces of Mass Spectrometry/Ljiljana Paša-Tolić. Characterization of Sugammadex-Related Isomeric Cyclodextrin Impurities Using Cyclic Ion Mobility High-Resolution Mass Spectrometry. Locating Polyubiquitin Receptors on the 19S Regulatory Proteasome of S. cerevisiae by Cross-Linking Mass Spectrometry. Rigorous Analysis of Multimodal HDX-MS Spectra. A Hybrid Vacuum Flange RF Oscillator for Low-Cost Mass Spectrometry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1