Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang
{"title":"CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph","authors":"Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang","doi":"10.1007/s12145-024-01304-8","DOIUrl":null,"url":null,"abstract":"<p>Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01304-8","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.
期刊介绍:
The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.