Jintao Tao , Nannan Zhang , Jinyu Chang , Li Chen , Hao Zhang , Shibin Liao , Siyuan Li
{"title":"Deep learning-based mineral exploration named entity recognition: A case study of granitic pegmatite-type lithium deposits","authors":"Jintao Tao , Nannan Zhang , Jinyu Chang , Li Chen , Hao Zhang , Shibin Liao , Siyuan Li","doi":"10.1016/j.oregeorev.2024.106367","DOIUrl":null,"url":null,"abstract":"<div><div>Geological text data play a crucial role as sources of geological information and knowledge for mineral exploration. Mineral exploration involves predicting and detecting mineral resources using geological, geochemical, geophysical, and remote sensing data. However, existing named entity recognition studies on mineral deposits have mainly focused on geological environments and mineral deposit models, which are insufficient for capturing the extensive knowledge essential for mineral exploration and supporting subsequent exploration efforts. This paper presents an efficient workflow for automatically extracting mineral exploration information from unstructured geological text data using a deep learning method. Initially, 21 entity types were identified based on a conceptual prospecting model of granitic pegmatite-type lithium deposits. A mineral exploration corpus was constructed from Chinese geological literature and reports, comprising 3,386 sentences and 13,167 entities. Subsequently, a Mineral Exploration Named Entity Recognition (MENER) model is proposed to extract mineral exploration information. This model integrates entity-type enhanced characters, words, and contextual features to enhance the performance. Bidirectional encoder representations from the transformer model were employed to obtain character embeddings of the input text. Mineral exploration entity types provide external knowledge, aiding the understanding of entity semantics within sentences through multi-head attention. Convolutional neural networks and bidirectional long short-term memory models have been employed to extract word and contextual features and capture additional structural information. Geological entity nomenclature and expressions follow certain default conventions and paradigms. A boundary prediction classifier was introduced to identify the head and tail characteristics of geological entities. A conditional random field was then utilized to classify the entities. The MENER model achieved an average F1-score of 79.69% on the constructed dataset. Finally, a geological document was selected as a case study to demonstrate the effectiveness of the proposed model. The workflow outlined in this study enables the rapid and robust extraction of specific information and knowledge mining from geological text data, with potential applications across various domains.</div></div>","PeriodicalId":19644,"journal":{"name":"Ore Geology Reviews","volume":"175 ","pages":"Article 106367"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ore Geology Reviews","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169136824005006","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Geological text data play a crucial role as sources of geological information and knowledge for mineral exploration. Mineral exploration involves predicting and detecting mineral resources using geological, geochemical, geophysical, and remote sensing data. However, existing named entity recognition studies on mineral deposits have mainly focused on geological environments and mineral deposit models, which are insufficient for capturing the extensive knowledge essential for mineral exploration and supporting subsequent exploration efforts. This paper presents an efficient workflow for automatically extracting mineral exploration information from unstructured geological text data using a deep learning method. Initially, 21 entity types were identified based on a conceptual prospecting model of granitic pegmatite-type lithium deposits. A mineral exploration corpus was constructed from Chinese geological literature and reports, comprising 3,386 sentences and 13,167 entities. Subsequently, a Mineral Exploration Named Entity Recognition (MENER) model is proposed to extract mineral exploration information. This model integrates entity-type enhanced characters, words, and contextual features to enhance the performance. Bidirectional encoder representations from the transformer model were employed to obtain character embeddings of the input text. Mineral exploration entity types provide external knowledge, aiding the understanding of entity semantics within sentences through multi-head attention. Convolutional neural networks and bidirectional long short-term memory models have been employed to extract word and contextual features and capture additional structural information. Geological entity nomenclature and expressions follow certain default conventions and paradigms. A boundary prediction classifier was introduced to identify the head and tail characteristics of geological entities. A conditional random field was then utilized to classify the entities. The MENER model achieved an average F1-score of 79.69% on the constructed dataset. Finally, a geological document was selected as a case study to demonstrate the effectiveness of the proposed model. The workflow outlined in this study enables the rapid and robust extraction of specific information and knowledge mining from geological text data, with potential applications across various domains.
期刊介绍:
Ore Geology Reviews aims to familiarize all earth scientists with recent advances in a number of interconnected disciplines related to the study of, and search for, ore deposits. The reviews range from brief to longer contributions, but the journal preferentially publishes manuscripts that fill the niche between the commonly shorter journal articles and the comprehensive book coverages, and thus has a special appeal to many authors and readers.