数据挖掘在小数据集中的应用:非均质非常规资源关键生产驱动因素的识别

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS ACS Applied Bio Materials Pub Date : 2022-10-01 DOI:10.2118/212301-pa
Yanrui Ning, H. Schumann, G. Jin
{"title":"数据挖掘在小数据集中的应用:非均质非常规资源关键生产驱动因素的识别","authors":"Yanrui Ning, H. Schumann, G. Jin","doi":"10.2118/212301-pa","DOIUrl":null,"url":null,"abstract":"\n In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources\",\"authors\":\"Yanrui Ning, H. Schumann, G. Jin\",\"doi\":\"10.2118/212301-pa\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.2118/212301-pa\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2118/212301-pa","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

摘要

在这项研究中,我们开发了一个基于数据挖掘的多变量分析(MVA)工作流程来识别小尺寸复杂高维数据集的相关性。该研究的动机是对位于美国科罗拉多州北部Denver-Julesburg (DJ)盆地的一个4平方英里的研究区域的地质、地球物理、完井和生产数据进行综合分析。目标是建立一个工作流程,可以从一个小数据集中提取学习,以指导周围地区的未来发展。在本研究中,我们提出了一个基于随机森林算法的MVA工作流,并使用K-fold交叉验证(CV)的R2评分进行评估。与传统的特征选择方法相比,MVA工作流在小数据集上的表现明显更好。这是因为MVA工作流包括(1)在每个步骤中选择表现最好的特征组合,(2)嵌入迭代,(3)避免随机关联,以及(4)在最后总结每个特征的出现。当MVA工作流最初应用于一个复杂的合成小数据集,包括数值和分类变量、线性和非线性关系、自变量内关系和高维,它正确识别所有相关变量,优于传统的特征选择方法。随后,利用MVA工作流对23口井的现场数据集进行了研究,旨在确定影响研究区域生产性能的关键因素。MVA工作流揭示了产量与遗留井效果之间的弱相关性。结果表明,影响研究区生产的关键因素是总有机碳(TOC)百分比、张开裂缝密度、粘土含量和遗留井效应,在开发DJ盆地邻近区域时应引起重视。更重要的是,该方法可以在其他盆地实施。考虑到非常规资源的异质性,在小范围内确定关键的生产驱动因素是值得的。这种MVA方法在小数据集上的优越性能使得它可以为每个特定面积提供有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources
In this study, we developed a data mining-based multivariate analysis (MVA) workflow to identify correlations in complex high-dimensional data sets of small size. The research was motivated by the integration analysis of geologic, geophysical, completion, and production data from a 4-square-mile study field located in the Northern Denver-Julesburg (DJ) Basin, Colorado, USA. The goal is to establish a workflow that can extract learnings from a small data set to guide the future development of surrounding acreages. In this research, we propose an MVA workflow, which is modified significantly based on the random forest algorithm and assessed using the R2 score from K-fold cross-validation (CV). The MVA workflow performs significantly better in small data sets compared to traditional feature selection methods. This is because the MVA workflow includes (1) the selection of top-performing feature combinations at each step, (2) iterations embedded, (3) avoidance of random correlation, and (4) the summarization of each feature’s occurrence at the end. When the MVA workflow was initially applied on a complex synthetic small data set that included numerical and categorical variables, linear and nonlinear relationships, relationships within independent variables, and high dimensionality, it correctly identified all correlating variables and outperformed traditional feature selection methods. Following that, a field data set consisting of the information from 23 wells was investigated using the MVA workflow aiming at identifying the key factors that affect the production performance in the study area. The MVA workflow reveals the weak correlation between production and legacy well effect. The results show that the key factors affecting production in this study area are total organic carbon (TOC) percentage, open fracture densities, clay content, and legacy well effect, which should receive significant attention when developing neighboring acreage of the DJ Basin. More importantly, this MVA method can be implemented in other basins. Considering the heterogeneity of unconventional resources, it is worthwhile to identify the key production drivers on a small scale. The outperformance of this MVA method on small data sets makes it possible to provide valuable insights for each specific acreage.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
期刊最新文献
A Systematic Review of Sleep Disturbance in Idiopathic Intracranial Hypertension. Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models. Anti-Myelin-Associated Glycoprotein Neuropathy: Recent Developments. Approach to Managing the Initial Presentation of Multiple Sclerosis: A Worldwide Practice Survey. Association Between LACE+ Index Risk Category and 90-Day Mortality After Stroke.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1