Machine learning approaches for the prediction of serious fluid leakage from hydrocarbon wells

IF 2.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE DataCentric Engineering Pub Date : 2023-05-19 DOI:10.1017/dce.2023.9

Mehdi Rezvandehy, B. Mayer

{"title":"Machine learning approaches for the prediction of serious fluid leakage from hydrocarbon wells","authors":"Mehdi Rezvandehy, B. Mayer","doi":"10.1017/dce.2023.9","DOIUrl":null,"url":null,"abstract":"Abstract The exploitation of hydrocarbon reservoirs may potentially lead to contamination of soils, shallow water resources, and greenhouse gas emissions. Fluids such as methane or CO2 may in some cases migrate toward the groundwater zone and atmosphere through and along imperfectly sealed hydrocarbon wells. Field tests in hydrocarbon-producing regions are routinely conducted for detecting serious leakage to prevent environmental pollution. The challenge is that testing is costly, time-consuming, and sometimes labor-intensive. In this study, machine learning approaches were applied to predict serious leakage with uncertainty quantification for wells that have not been field tested in Alberta, Canada. An improved imputation technique was developed by Cholesky factorization of the covariance matrix between features, where missing data are imputed via conditioning of available values. The uncertainty in imputed values was quantified and incorporated into the final prediction to improve decision-making. Next, a wide range of predictive algorithms and various performance metrics were considered to achieve the most reliable classifier. However, a highly skewed distribution of field tests toward the negative class (nonserious leakage) forces predictive models to unrealistically underestimate the minority class (serious leakage). To address this issue, a combination of oversampling, undersampling, and ensemble learning was applied. By investigating all the models on never-before-seen data, an optimum classifier with minimal false negative prediction was determined. The developed methodology can be applied to identify the wells with the highest likelihood for serious fluid leakage within producing fields. This information is of key importance for optimizing field test operations to achieve economic and environmental benefits.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DataCentric Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dce.2023.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The exploitation of hydrocarbon reservoirs may potentially lead to contamination of soils, shallow water resources, and greenhouse gas emissions. Fluids such as methane or CO2 may in some cases migrate toward the groundwater zone and atmosphere through and along imperfectly sealed hydrocarbon wells. Field tests in hydrocarbon-producing regions are routinely conducted for detecting serious leakage to prevent environmental pollution. The challenge is that testing is costly, time-consuming, and sometimes labor-intensive. In this study, machine learning approaches were applied to predict serious leakage with uncertainty quantification for wells that have not been field tested in Alberta, Canada. An improved imputation technique was developed by Cholesky factorization of the covariance matrix between features, where missing data are imputed via conditioning of available values. The uncertainty in imputed values was quantified and incorporated into the final prediction to improve decision-making. Next, a wide range of predictive algorithms and various performance metrics were considered to achieve the most reliable classifier. However, a highly skewed distribution of field tests toward the negative class (nonserious leakage) forces predictive models to unrealistically underestimate the minority class (serious leakage). To address this issue, a combination of oversampling, undersampling, and ensemble learning was applied. By investigating all the models on never-before-seen data, an optimum classifier with minimal false negative prediction was determined. The developed methodology can be applied to identify the wells with the highest likelihood for serious fluid leakage within producing fields. This information is of key importance for optimizing field test operations to achieve economic and environmental benefits.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

预测油气井严重漏液的机器学习方法

摘要油气藏的开发可能导致土壤、浅水资源和温室气体排放的污染。在某些情况下，甲烷或CO2等流体可能会通过密封不完全的碳氢化合物井并沿其向地下水区和大气迁移。为了防止环境污染，通常在碳氢化合物生产区进行现场测试，以检测严重的泄漏。挑战在于测试成本高、耗时长，有时还需要耗费大量人力。在这项研究中，将机器学习方法应用于预测加拿大阿尔伯塔省未进行现场测试的油井的严重泄漏，并进行不确定性量化。通过对特征之间的协方差矩阵进行Cholesky因子分解，开发了一种改进的插补技术，其中通过对可用值的调节来插补缺失数据。估算值的不确定性被量化并纳入最终预测，以改进决策。接下来，考虑了广泛的预测算法和各种性能指标，以实现最可靠的分类器。然而，现场测试向负类（非严重泄漏）的高度偏斜分布迫使预测模型不切实际地低估了少数类（严重泄漏）。为了解决这个问题，应用了过采样、欠采样和集成学习的组合。通过对从未见过的数据上的所有模型进行研究，确定了具有最小假阴性预测的最优分类器。所开发的方法可用于确定生产油田内发生严重流体泄漏可能性最高的油井。这些信息对于优化现场测试操作以实现经济和环境效益至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊