GeoDeepShovel:一个在人工智能帮助下从地学文献中建立科学数据库的平台

IF 3.3 3区 地球科学 Q2 GEOSCIENCES, MULTIDISCIPLINARY Geoscience Data Journal Pub Date : 2023-02-28 DOI:10.1002/gdj3.186
Shao Zhang, Hui Xu, Yuting Jia, Ying Wen, Dakuo Wang, Luoyi Fu, Xinbing Wang, Chenghu Zhou
{"title":"GeoDeepShovel:一个在人工智能帮助下从地学文献中建立科学数据库的平台","authors":"Shao Zhang,&nbsp;Hui Xu,&nbsp;Yuting Jia,&nbsp;Ying Wen,&nbsp;Dakuo Wang,&nbsp;Luoyi Fu,&nbsp;Xinbing Wang,&nbsp;Chenghu Zhou","doi":"10.1002/gdj3.186","DOIUrl":null,"url":null,"abstract":"<p>With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.186","citationCount":"0","resultStr":"{\"title\":\"GeoDeepShovel: A platform for building scientific database from geoscience literature with AI assistance\",\"authors\":\"Shao Zhang,&nbsp;Hui Xu,&nbsp;Yuting Jia,&nbsp;Ying Wen,&nbsp;Dakuo Wang,&nbsp;Luoyi Fu,&nbsp;Xinbing Wang,&nbsp;Chenghu Zhou\",\"doi\":\"10.1002/gdj3.186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.</p>\",\"PeriodicalId\":54351,\"journal\":{\"name\":\"Geoscience Data Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2023-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.186\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoscience Data Journal\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gdj3.186\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Data Journal","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gdj3.186","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

随着大数据科学的快速发展,地学领域的研究范式也开始转向大数据驱动的科学发现。研究人员需要阅读大量文献,以定位、提取和汇总以PDF格式发布和存储的相关结果和数据,从而建立科学数据库,支持大数据驱动的发现。在本文中,基于一项关于地球科学家如何注释文献、提取和聚合数据的研究结果,我们提出了GeoDeepShovel,这是一种公开的人工智能辅助数据提取系统,以支持他们的需求。GeoDeepShovel利用最先进的神经网络模型,以人工智能协作的方式,支持研究人员轻松准确地注释论文(PDF格式),并从表格、图形、地图等中提取数据。作为深度时间数字地球(DDE)计划的一部分,GeoDeepShovel已经部署了8个月,DDE计划中已经有来自44个地球科学研究团队的400名用户每天使用它来构建科学数据库,并且已经处理了240多个项目和50000多份文件来建立科学数据库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GeoDeepShovel: A platform for building scientific database from geoscience literature with AI assistance

With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Geoscience Data Journal
Geoscience Data Journal GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
5.90
自引率
9.40%
发文量
35
审稿时长
4 weeks
期刊介绍: Geoscience Data Journal provides an Open Access platform where scientific data can be formally published, in a way that includes scientific peer-review. Thus the dataset creator attains full credit for their efforts, while also improving the scientific record, providing version control for the community and allowing major datasets to be fully described, cited and discovered. An online-only journal, GDJ publishes short data papers cross-linked to – and citing – datasets that have been deposited in approved data centres and awarded DOIs. The journal will also accept articles on data services, and articles which support and inform data publishing best practices. Data is at the heart of science and scientific endeavour. The curation of data and the science associated with it is as important as ever in our understanding of the changing earth system and thereby enabling us to make future predictions. Geoscience Data Journal is working with recognised Data Centres across the globe to develop the future strategy for data publication, the recognition of the value of data and the communication and exploitation of data to the wider science and stakeholder communities.
期刊最新文献
Issue Information Exploring Jalisco's water quality: A comprehensive web tool for limnological and phytoplankton data HSPEI: A 1-km spatial resolution SPEI dataset across the Chinese mainland from 2001 to 2022 High-resolution atmospheric CO2 concentration data simulated in WRF-Chem over East Asia for 10 years The Irish drought impacts database: A 287-year database of drought impacts derived from newspaper archives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1