Integration of sparse and continuous data sets using machine learning for core mineralogy interpretation

Q2 Earth and Planetary Sciences Leading Edge Pub Date : 2023-06-01 DOI:10.1190/tle42060421.1
M. Nawal, B. Shekar, P. Jaiswal
{"title":"Integration of sparse and continuous data sets using machine learning for core mineralogy interpretation","authors":"M. Nawal, B. Shekar, P. Jaiswal","doi":"10.1190/tle42060421.1","DOIUrl":null,"url":null,"abstract":"In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.","PeriodicalId":35661,"journal":{"name":"Leading Edge","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Leading Edge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1190/tle42060421.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

In earth science, integrating noninvasive continuous data streams with discrete invasive measurements remains an open challenge. We address such a problem — that of predicting whole-core mineralogy using discrete measurements with the help of machine learning. Our targets are sparsely sampled mineralogy from X-ray diffraction, and features are continually sampled elemental oxides from X-ray fluorescence. Both data sets are acquired on a core cut from a Mississippian-age mixed siliciclastic-carbonate formation in the U.S. midcontinent. The novelty lies in predicting multiple classes of output targets from input features in a small multidimensional data setting. Our workflow has three salient aspects. First, it shows how single-output models are more effective in relating selective target-feature subsets than using a multi-output model for simultaneously relating the entire target-feature set. Specifically, we adopt a competitive ensemble strategy comprising three classes of regression algorithms — elastic net (linear regression), XGBoost (tree-based), and feedforward neural networks (nonlinear regression). Second, it shows that feature selection and engineering, when done using statistical relationships within the data set and domain knowledge, can significantly improve target predictability. Third, it incorporates k-fold cross-validation and grid-search-based parameter tuning to predict targets within 4%–6% accuracy using 40% training data. Results open doors to generating a wealth of information in energy, environmental, and climate sciences where remotely sensed data are inexpensive and abundant but physical sampling may be limited due to analytic, logistic, or economic issues.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习集成稀疏和连续数据集进行岩心矿物学解释
在地球科学中,将非侵入性连续数据流与离散侵入性测量相结合仍然是一个开放的挑战。我们解决了这样一个问题-在机器学习的帮助下使用离散测量来预测整个岩心矿物学。我们的目标是来自x射线衍射的稀疏采样矿物学,特征是来自x射线荧光的连续采样元素氧化物。这两组数据都是在美国中部一个密西西比时代的混合硅-塑料-碳酸盐地层的岩心上获得的。新颖之处在于从一个小的多维数据集的输入特征预测多个类别的输出目标。我们的工作流有三个突出的方面。首先,它展示了单输出模型如何在关联选择性目标特征子集方面比使用多输出模型同时关联整个目标特征集更有效。具体来说,我们采用了一种竞争性集成策略,包括三类回归算法——弹性网络(线性回归)、XGBoost(基于树的)和前馈神经网络(非线性回归)。其次,它表明,当使用数据集和领域知识中的统计关系进行特征选择和工程时,可以显着提高目标的可预测性。第三,结合k-fold交叉验证和基于网格搜索的参数调优,使用40%的训练数据预测目标,准确率在4%-6%之间。研究结果为能源、环境和气候科学领域产生丰富的信息打开了大门,在这些领域,遥感数据既便宜又丰富,但由于分析、物流或经济问题,物理采样可能受到限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Leading Edge
Leading Edge Earth and Planetary Sciences-Geology
CiteScore
3.10
自引率
0.00%
发文量
180
期刊介绍: THE LEADING EDGE complements GEOPHYSICS, SEG"s peer-reviewed publication long unrivalled as the world"s most respected vehicle for dissemination of developments in exploration and development geophysics. TLE is a gateway publication, introducing new geophysical theory, instrumentation, and established practices to scientists in a wide range of geoscience disciplines. Most material is presented in a semitechnical manner that minimizes mathematical theory and emphasizes practical applications. TLE also serves as SEG"s publication venue for official society business.
期刊最新文献
Earth Science Week explores innovations in the geosciences Predictive monitoring of urban slope instabilities using geophysics and wireless sensor networks Seismic Soundoff: How to unlock the power of networking Hydrogeologic controls on barrier island geomorphology: Insights from electromagnetic surveys Reviews
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1