CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph

IF 2.7 4区 地球科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Earth Science Informatics Pub Date : 2024-04-09 DOI:10.1007/s12145-024-01304-8
Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang
{"title":"CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph","authors":"Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang","doi":"10.1007/s12145-024-01304-8","DOIUrl":null,"url":null,"abstract":"<p>Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01304-8","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CEDG-GeoQA:通过中文实体描述图回答地球科学领域的知识库问题
获取地球科学知识对于推动地球科学研究至关重要。目前,可通过搜索引擎或专业数据库获取地球科学知识。然而,搜索引擎结果的质量参差不齐,而且地球科学数据库不支持自然语言查询。为了应对这些挑战,人们开发了地球科学问题解答(GeoQA)系统,为自然语言查询提供答案。现有的大部分地理科学问题解答研究主要集中在地理学领域,其他领域的研究相对较少。为了弥补这一差距,我们的研究引入了一个中国地理科学质量保证数据集,该数据集涵盖了地理、气候和文化等广泛主题。此外,我们还为中文地理科学质量保证提出了 CEDG-GeoQA 框架。该模型首先利用语法分析将非结构化查询转换为实体描述图(EDG)。随后,它将实体描述图与综合地球科学知识库对齐,提取出以主题实体为中心的子图。该子图用于评估候选答案并确定最可能的响应。我们使用中国地理知识库进行的综合实验证明了我们的方法性能优越,与 WDAqua、gAnswer 和 NSQA 等现有基线相比,F1 指标提高了 5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Earth Science Informatics
Earth Science Informatics COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-GEOSCIENCES, MULTIDISCIPLINARY
CiteScore
4.60
自引率
3.60%
发文量
157
审稿时长
4.3 months
期刊介绍: The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.
期刊最新文献
Ontology-driven relational data mapping for constructing a knowledge graph of porphyry copper deposits A novel machine learning approach for interpolating seismic velocity and electrical resistivity models for early-stage soil-rock assessment ENSO dataset & comparison of deep learning models for ENSO forecasting Groundwater level estimation using improved deep learning and soft computing methods CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1