Optimizing oil-source correlation analysis using support vector machines and sensory attention networks

IF 4.2 2区 地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Geosciences Pub Date : 2024-06-01 DOI:10.1016/j.cageo.2024.105641
Yifeng Xiao, Tongxi Wang, Hua Xiang
{"title":"Optimizing oil-source correlation analysis using support vector machines and sensory attention networks","authors":"Yifeng Xiao,&nbsp;Tongxi Wang,&nbsp;Hua Xiang","doi":"10.1016/j.cageo.2024.105641","DOIUrl":null,"url":null,"abstract":"<div><p>Oil source correlation can be used to identify the origin of crude oil by linking crude oil to source rocks; however, the manual methods, which are limited by the sample or parameter quantity or imbalanced datasets, are facing uncertainties. Although the existing multivariate statistical techniques can alleviate this problem, they are facing difficulties in processing imbalanced datasets and quantifying source beds. Therefore, a novel oil-source correlation analysis model called SVM-SelectKBest combining a support vector machine (SVM) with a feature selection algorithm to mitigate the common issue of dataset imbalance in oil-source correlations is proposed in this paper. The SVM-SelectKBest offers advantages over normal SVM by dynamically selecting the most relevant features and fine-tuning model parameters to achieve higher accuracy and better generalizability in complex datasets. SVM compensates for class imbalances by heavily penalizing the misclassification of the minority class, and SelectKBest streamlines the feature set to enhance SVM's effectiveness on critical variables. Furthermore, a shallow neural network (SensoryAttentionNet) is introduced into the proposed model to address the issue of quantifying the source bed proportions in crude oil. The result show that SVM-SelectKBest has better performance in identifying key geochemical parameters and discriminating oil source correlation, its accuracy in unbalanced datasets is improved by near 40% compared to SVM. The model obtains 25 key geochemical parameters such as C17 n-heptadecane, Pr pristane, and C18 n-octadecane, it achieves F1 scores of 1.0 on the training, validation, and test sets. SensoryAttentionNet also performs robustly, with a low variance of 0.05 between its predicted and actual values. All the results indicate the effectiveness of the proposed method in dealing with the imbalance problem in oil-source source correlation datasets and in determining the proportional contribution of source beds in crude oil.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"189 ","pages":"Article 105641"},"PeriodicalIF":4.2000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001249","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Oil source correlation can be used to identify the origin of crude oil by linking crude oil to source rocks; however, the manual methods, which are limited by the sample or parameter quantity or imbalanced datasets, are facing uncertainties. Although the existing multivariate statistical techniques can alleviate this problem, they are facing difficulties in processing imbalanced datasets and quantifying source beds. Therefore, a novel oil-source correlation analysis model called SVM-SelectKBest combining a support vector machine (SVM) with a feature selection algorithm to mitigate the common issue of dataset imbalance in oil-source correlations is proposed in this paper. The SVM-SelectKBest offers advantages over normal SVM by dynamically selecting the most relevant features and fine-tuning model parameters to achieve higher accuracy and better generalizability in complex datasets. SVM compensates for class imbalances by heavily penalizing the misclassification of the minority class, and SelectKBest streamlines the feature set to enhance SVM's effectiveness on critical variables. Furthermore, a shallow neural network (SensoryAttentionNet) is introduced into the proposed model to address the issue of quantifying the source bed proportions in crude oil. The result show that SVM-SelectKBest has better performance in identifying key geochemical parameters and discriminating oil source correlation, its accuracy in unbalanced datasets is improved by near 40% compared to SVM. The model obtains 25 key geochemical parameters such as C17 n-heptadecane, Pr pristane, and C18 n-octadecane, it achieves F1 scores of 1.0 on the training, validation, and test sets. SensoryAttentionNet also performs robustly, with a low variance of 0.05 between its predicted and actual values. All the results indicate the effectiveness of the proposed method in dealing with the imbalance problem in oil-source source correlation datasets and in determining the proportional contribution of source beds in crude oil.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用支持向量机和感官注意力网络优化油源相关性分析
油源相关性可通过将原油与源岩联系起来来确定原油的来源;然而,受样本或参数数量或不平衡数据集的限制,人工方法面临着不确定性。虽然现有的多元统计技术可以缓解这一问题,但在处理不平衡数据集和量化源床方面却面临困难。因此,本文提出了一种名为 SVM-SelectKBest 的新型油源相关性分析模型,将支持向量机(SVM)与特征选择算法相结合,以缓解油源相关性分析中常见的数据集不平衡问题。与普通 SVM 相比,SVM-SelectKBest 具有动态选择最相关特征和微调模型参数的优势,从而在复杂数据集中获得更高的精度和更好的泛化能力。SVM 通过重罚少数类的误分类来补偿类的不平衡,SelectKBest 简化了特征集,以提高 SVM 对关键变量的有效性。此外,该模型还引入了浅层神经网络(SensoryAttentionNet),以解决原油中源床比例的量化问题。结果表明,SVM-SelectKBest 在识别关键地球化学参数和判别油源相关性方面具有更好的性能,与 SVM 相比,它在非平衡数据集上的准确率提高了近 40%。该模型获得了 25 个关键地球化学参数,如 C17 正十七烷、Pr pristane 和 C18 正十八烷,它在训练集、验证集和测试集上的 F1 分数都达到了 1.0。SensoryAttentionNet 的表现也很稳健,其预测值和实际值之间的方差很小,仅为 0.05。所有结果都表明,所提出的方法在处理油源相关数据集的不平衡问题和确定原油源床的贡献比例方面非常有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Geosciences
Computers & Geosciences 地学-地球科学综合
CiteScore
9.30
自引率
6.80%
发文量
164
审稿时长
3.4 months
期刊介绍: Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.
期刊最新文献
Multimodal feature integration network for lithology identification from point cloud data A two-dimensional magnetotelluric deep learning inversion approach based on improved Dense Convolutional Network Removing atmospheric noise from InSAR interferograms in mountainous regions with a convolutional neural network Novel empirical curvelet denoising strategy for suppressing mixed noise of microseismic data Curvilinear lineament extraction: Bayesian optimization of Principal Component Wavelet Analysis and Hysteresis Thresholding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1