盖亚探索者,一个强大的搜索平台

Xavier Du Bernard, Jonathan Gallon, J. Massot
{"title":"盖亚探索者,一个强大的搜索平台","authors":"Xavier Du Bernard, Jonathan Gallon, J. Massot","doi":"10.2118/207837-ms","DOIUrl":null,"url":null,"abstract":"\n After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios.\n This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched.\n Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer.\n In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.","PeriodicalId":10959,"journal":{"name":"Day 3 Wed, November 17, 2021","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Gaia Explorer, a Powerful Search Platform\",\"authors\":\"Xavier Du Bernard, Jonathan Gallon, J. Massot\",\"doi\":\"10.2118/207837-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios.\\n This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched.\\n Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer.\\n In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.\",\"PeriodicalId\":10959,\"journal\":{\"name\":\"Day 3 Wed, November 17, 2021\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Wed, November 17, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/207837-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Wed, November 17, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/207837-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

经过两年的发展,GAIA探索者现在已经准备好协助Total的地球科学家!这个知识平台就像一个小谷歌,但目前只关注地球科学。GAIA探测器的主要目标是节省查找正确信息的时间。因此,它对于数据库室或业务收购后快速消化知识特别有用,而且对于补充数据库,勘探综合,油藏研究,甚至员工入职(特别是在远程工作时)也特别有用。有了这些额外的时间,地球科学家可以专注于有附加值的任务,比如合成、寻找类比或提出替代方案。这个新伙伴通过机器学习(ML)从大量非结构化技术文档中自动组织和提取知识。所有模型都依赖于谷歌云平台(GCP),并在我们自己的数据集上进行了训练,这些数据集涵盖了地球科学和操作等主要石油领域。首先,对超过75,000个文档页面的布局进行分析,以训练分割模型,该模型提取三种类型的内容(文本,图像和表格)。其次,从标记为30个类别的约6,500个文档中提取文本内容,用于训练文档分类模型。第三,将5.5万多张图像分为45个类别,定制了一个图像分类模型,涵盖了地图、测井、地震剖面或岩心图片等大量图像。最后,将从对象中提取的所有术语(n-grams)与内部词库进行比较,自动标记相关主题,如盆地、油田、地质构造、采集、测量。所有这些基本的砖块都被连接起来,用于提供一个知识数据库,可以快速而详尽地搜索。今天,GAIA探索者从语料库(文档集合)中搜索文本、图像和表格,语料库可以由技术和操作报告、会议演示和学术出版物组成。通过将查询(关键字或自然语言)与大量过滤器(按类和主题)组合在一起,可以很容易地改进和利用结果。自道达尔于2021年2月发布生产版本以来,约有30个项目的180名用户定期使用该工具进行勘探和开发。第一个版本是遵循一个持续的培训周期,包括主动学习,初步的用户反馈是好的,并承认一些信息将很难定位没有GAIA Explorer。在未来,GAIA Explorer可以通过实现基于石油领域本体的知识图谱得到显著改进。在相关活动(如钻井、项目或合同)专家的帮助下,该工具可以覆盖上游主题的全部范围,并随着时间的推移对其他业务有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Gaia Explorer, a Powerful Search Platform
After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios. This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched. Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer. In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Assessment of Unconventional Resources Opportunities in the Middle East Tethyan Petroleum System in a Transfer Learning Context Block 61 Drilling Fluids Optimization Journey High Resolution Reservoir Simulator Driven Custom Scripts as the Enabler for Solving Reservoir to Surface Network Coupling Challenges Pre-Engineered Standardized Turbomachinery Solutions: A Strategic Approach to Lean Project Management Using Active and Passive Near-Field Hydrophones to Image the Near-Surface in Ultra-Shallow Waters Offshore Abu Dhabi
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1