GeoNER:利用丰富的领域预训练模型和对抗训练进行地质命名实体识别

IF 3.5 3区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Acta Geologica Sinica ‐ English Edition Pub Date : 2024-10-28 DOI:10.1111/1755-6724.15213
Kai MA, Xinxin HU, Miao TIAN, Yongjian TAN, Shuai ZHENG, Liufeng TAO, Qinjun QIU
{"title":"GeoNER:利用丰富的领域预训练模型和对抗训练进行地质命名实体识别","authors":"Kai MA,&nbsp;Xinxin HU,&nbsp;Miao TIAN,&nbsp;Yongjian TAN,&nbsp;Shuai ZHENG,&nbsp;Liufeng TAO,&nbsp;Qinjun QIU","doi":"10.1111/1755-6724.15213","DOIUrl":null,"url":null,"abstract":"<p>As important geological data, a geological report contains rich expert and geological knowledge, but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge. While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents, their effectiveness is hampered by a dearth of domain-specific knowledge, which in turn leads to a pronounced decline in recognition accuracy. This study summarizes six types of typical geological entities, with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition (GNER). In addition, GeoWoBERT-advBGP (Geological Word-base BERT-adversarial training Bi-directional Long Short-Term Memory Global Pointer) is proposed to address the issues of ambiguity, diversity and nested entities for the geological entities. The model first uses the fine-tuned word granularity-based pre-training model GeoWoBERT (Geological Word-base BERT) and combines the text features that are extracted using the BiLSTM (Bi-directional Long Short-Term Memory), followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference, the decoding finally being performed using a global association pointer algorithm. The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.</p>","PeriodicalId":7095,"journal":{"name":"Acta Geologica Sinica ‐ English Edition","volume":"98 5","pages":"1404-1417"},"PeriodicalIF":3.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GeoNER: Geological Named Entity Recognition with Enriched Domain Pre-Training Model and Adversarial Training\",\"authors\":\"Kai MA,&nbsp;Xinxin HU,&nbsp;Miao TIAN,&nbsp;Yongjian TAN,&nbsp;Shuai ZHENG,&nbsp;Liufeng TAO,&nbsp;Qinjun QIU\",\"doi\":\"10.1111/1755-6724.15213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>As important geological data, a geological report contains rich expert and geological knowledge, but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge. While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents, their effectiveness is hampered by a dearth of domain-specific knowledge, which in turn leads to a pronounced decline in recognition accuracy. This study summarizes six types of typical geological entities, with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition (GNER). In addition, GeoWoBERT-advBGP (Geological Word-base BERT-adversarial training Bi-directional Long Short-Term Memory Global Pointer) is proposed to address the issues of ambiguity, diversity and nested entities for the geological entities. The model first uses the fine-tuned word granularity-based pre-training model GeoWoBERT (Geological Word-base BERT) and combines the text features that are extracted using the BiLSTM (Bi-directional Long Short-Term Memory), followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference, the decoding finally being performed using a global association pointer algorithm. The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.</p>\",\"PeriodicalId\":7095,\"journal\":{\"name\":\"Acta Geologica Sinica ‐ English Edition\",\"volume\":\"98 5\",\"pages\":\"1404-1417\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Geologica Sinica ‐ English Edition\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1755-6724.15213\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Geologica Sinica ‐ English Edition","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-6724.15213","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

地质报告作为重要的地质数据,蕴含着丰富的专家知识和地质知识,但如何在领域知识的指导下准确理解地质报告,是当前地质知识提取和挖掘研究面临的挑战。虽然通用命名实体识别模型/工具可用于地质科学报告/文档的处理,但由于缺乏特定领域的知识,其有效性受到影响,进而导致识别准确率明显下降。本研究参照地质领域的本体系统,总结了六类典型的地质实体,并为地质命名实体识别(GNER)任务建立了高质量的语料库。此外,针对地质实体的模糊性、多样性和嵌套实体等问题,提出了 GeoWoBERT-advBGP(Geological Word-base BERT-adversarial training Bi-directional Long Short-Term Memory Global Pointer)模型。该模型首先使用基于词粒度的微调预训练模型 GeoWoBERT(地质词库 BERT),并结合使用 BiLSTM(双向长短期记忆)提取的文本特征,然后使用对抗训练算法提高模型的鲁棒性并增强其抗干扰能力,最后使用全局关联指针算法进行解码。实验结果表明,针对所建数据集提出的模型性能很高,能够挖掘出丰富的地质信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GeoNER: Geological Named Entity Recognition with Enriched Domain Pre-Training Model and Adversarial Training

As important geological data, a geological report contains rich expert and geological knowledge, but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge. While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents, their effectiveness is hampered by a dearth of domain-specific knowledge, which in turn leads to a pronounced decline in recognition accuracy. This study summarizes six types of typical geological entities, with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition (GNER). In addition, GeoWoBERT-advBGP (Geological Word-base BERT-adversarial training Bi-directional Long Short-Term Memory Global Pointer) is proposed to address the issues of ambiguity, diversity and nested entities for the geological entities. The model first uses the fine-tuned word granularity-based pre-training model GeoWoBERT (Geological Word-base BERT) and combines the text features that are extracted using the BiLSTM (Bi-directional Long Short-Term Memory), followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference, the decoding finally being performed using a global association pointer algorithm. The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Geologica Sinica ‐ English Edition
Acta Geologica Sinica ‐ English Edition 地学-地球科学综合
CiteScore
3.00
自引率
12.10%
发文量
3039
审稿时长
6 months
期刊介绍: Acta Geologica Sinica mainly reports the latest and most important achievements in the theoretical and basic research in geological sciences, together with new technologies, in China. Papers published involve various aspects of research concerning geosciences and related disciplines, such as stratigraphy, palaeontology, origin and history of the Earth, structural geology, tectonics, mineralogy, petrology, geochemistry, geophysics, geology of mineral deposits, hydrogeology, engineering geology, environmental geology, regional geology and new theories and technologies of geological exploration.
期刊最新文献
Issue Information The Continental Scale USMTArray: Lessons Learned and Synergies with SinoProbe-II Rheological Evidence of the Lithospheric Destruction of the Eastern Block of the North China Craton Study on the Physical Process and Seismogenic Mechanism of the Yangbi MS 6.4 Earthquake in Dali, Yunnan Province Research on the Extraction of OLR Anomaly Prior to Ms 7.5 Sand Point, Alaska Earthquake based on IPI Method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1