盖亚探索者，一个强大的搜索平台

Day 3 Wed, November 17, 2021 Pub Date : 2021-12-09 DOI:10.2118/207837-ms

Xavier Du Bernard, Jonathan Gallon, J. Massot

{"title":"盖亚探索者，一个强大的搜索平台","authors":"Xavier Du Bernard, Jonathan Gallon, J. Massot","doi":"10.2118/207837-ms","DOIUrl":null,"url":null,"abstract":"\n After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios.\n This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched.\n Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer.\n In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.","PeriodicalId":10959,"journal":{"name":"Day 3 Wed, November 17, 2021","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Gaia Explorer, a Powerful Search Platform\",\"authors\":\"Xavier Du Bernard, Jonathan Gallon, J. Massot\",\"doi\":\"10.2118/207837-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios.\\n This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched.\\n Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer.\\n In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.\",\"PeriodicalId\":10959,\"journal\":{\"name\":\"Day 3 Wed, November 17, 2021\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Wed, November 17, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/207837-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Wed, November 17, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/207837-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

经过两年的发展，GAIA探索者现在已经准备好协助Total的地球科学家!这个知识平台就像一个小谷歌，但目前只关注地球科学。GAIA探测器的主要目标是节省查找正确信息的时间。因此，它对于数据库室或业务收购后快速消化知识特别有用，而且对于补充数据库，勘探综合，油藏研究，甚至员工入职(特别是在远程工作时)也特别有用。有了这些额外的时间，地球科学家可以专注于有附加值的任务，比如合成、寻找类比或提出替代方案。这个新伙伴通过机器学习(ML)从大量非结构化技术文档中自动组织和提取知识。所有模型都依赖于谷歌云平台(GCP)，并在我们自己的数据集上进行了训练，这些数据集涵盖了地球科学和操作等主要石油领域。首先，对超过75,000个文档页面的布局进行分析，以训练分割模型，该模型提取三种类型的内容(文本，图像和表格)。其次，从标记为30个类别的约6,500个文档中提取文本内容，用于训练文档分类模型。第三，将5.5万多张图像分为45个类别，定制了一个图像分类模型，涵盖了地图、测井、地震剖面或岩心图片等大量图像。最后，将从对象中提取的所有术语(n-grams)与内部词库进行比较，自动标记相关主题，如盆地、油田、地质构造、采集、测量。所有这些基本的砖块都被连接起来，用于提供一个知识数据库，可以快速而详尽地搜索。今天，GAIA探索者从语料库(文档集合)中搜索文本、图像和表格，语料库可以由技术和操作报告、会议演示和学术出版物组成。通过将查询(关键字或自然语言)与大量过滤器(按类和主题)组合在一起，可以很容易地改进和利用结果。自道达尔于2021年2月发布生产版本以来，约有30个项目的180名用户定期使用该工具进行勘探和开发。第一个版本是遵循一个持续的培训周期，包括主动学习，初步的用户反馈是好的，并承认一些信息将很难定位没有GAIA Explorer。在未来，GAIA Explorer可以通过实现基于石油领域本体的知识图谱得到显著改进。在相关活动(如钻井、项目或合同)专家的帮助下，该工具可以覆盖上游主题的全部范围，并随着时间的推移对其他业务有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Gaia Explorer, a Powerful Search Platform

After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios. This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched. Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer. In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Day 3 Wed, November 17, 2021

自引率

0.00%

发文量