数据挖掘在小数据集中的应用:非均质非常规资源关键生产驱动因素的识别

IF 2.1 4区工程技术 Q3 ENERGY & FUELS SPE Reservoir Evaluation & Engineering Pub Date : 2022-10-01 DOI:10.2118/212301-pa

Yanrui Ning, H. Schumann, G. Jin

{"title":"数据挖掘在小数据集中的应用:非均质非常规资源关键生产驱动因素的识别","authors":"Yanrui Ning, H. Schumann, G. Jin","doi":"10.2118/212301-pa","DOIUrl":null,"url":null,"abstract":"\n In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.","PeriodicalId":22066,"journal":{"name":"SPE Reservoir Evaluation & Engineering","volume":"134 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources\",\"authors\":\"Yanrui Ning, H. Schumann, G. Jin\",\"doi\":\"10.2118/212301-pa\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.\",\"PeriodicalId\":22066,\"journal\":{\"name\":\"SPE Reservoir Evaluation & Engineering\",\"volume\":\"134 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SPE Reservoir Evaluation & Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.2118/212301-pa\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SPE Reservoir Evaluation & Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2118/212301-pa","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}

引用次数: 0

摘要

在这项研究中，我们开发了一个基于数据挖掘的多变量分析(MVA)工作流程来识别小尺寸复杂高维数据集的相关性。该研究的动机是对位于美国科罗拉多州北部Denver-Julesburg (DJ)盆地的一个4平方英里的研究区域的地质、地球物理、完井和生产数据进行综合分析。目标是建立一个工作流程，可以从一个小数据集中提取学习，以指导周围地区的未来发展。在本研究中，我们提出了一个基于随机森林算法的MVA工作流，并使用K-fold交叉验证(CV)的R2评分进行评估。与传统的特征选择方法相比，MVA工作流在小数据集上的表现明显更好。这是因为MVA工作流包括(1)在每个步骤中选择表现最好的特征组合，(2)嵌入迭代，(3)避免随机关联，以及(4)在最后总结每个特征的出现。当MVA工作流最初应用于一个复杂的合成小数据集，包括数值和分类变量、线性和非线性关系、自变量内关系和高维，它正确识别所有相关变量，优于传统的特征选择方法。随后，利用MVA工作流对23口井的现场数据集进行了研究，旨在确定影响研究区域生产性能的关键因素。MVA工作流揭示了产量与遗留井效果之间的弱相关性。结果表明，影响研究区生产的关键因素是总有机碳(TOC)百分比、张开裂缝密度、粘土含量和遗留井效应，在开发DJ盆地邻近区域时应引起重视。更重要的是，该方法可以在其他盆地实施。考虑到非常规资源的异质性，在小范围内确定关键的生产驱动因素是值得的。这种MVA方法在小数据集上的优越性能使得它可以为每个特定面积提供有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources

In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SPE Reservoir Evaluation & Engineering 工程技术-地质学

CiteScore

5.30

自引率

0.00%

发文量

审稿时长

12 months

期刊介绍： Covers the application of a wide range of topics, including reservoir characterization, geology and geophysics, core analysis, well logging, well testing, reservoir management, enhanced oil recovery, fluid mechanics, performance prediction, reservoir simulation, digital energy, uncertainty/risk assessment, information management, resource and reserve evaluation, portfolio/asset management, project valuation, and petroleum economics.