Xiaogang Ma, Jolyon Ralph, Jiyin Zhang, Xiang Que, Anirudh Prabhu, Shaunna M. Morrison, Robert M. Hazen, Lesley Wyborn, Kerstin Lehnert
The open data movement has brought revolutionary changes to the field of mineralogy. With a growing number of datasets made available through community efforts, researchers are now able to explore new scientific topics such as mineral ecology, mineral evolution and new classification systems. The recent results have shown that the necessary open data coupled with data science skills and expertise in mineralogy will lead to impressive new scientific discoveries. Yet, feedback from researchers also reflects the needs for better FAIRness of open data, that is, findable, accessible, interoperable and reusable for both humans and machines. In this paper, we present our recent work on building the open data service of Mindat, one of the largest mineral databases in the world. In the past years, Mindat has supported numerous scientific studies but a machine interface for data access has never been established. Through the OpenMindat project we have achieved solid progress on two activities: (1) cleanse data and improve data quality, and (2) build a data sharing platform and establish a machine interface for data query and access. We hope OpenMindat will help address the increasing data needs from researchers in mineralogy for an internationally recognized authoritative database that is fully compliant with the FAIR guiding principles and helps accelerate scientific discoveries.
{"title":"OpenMindat: Open and FAIR mineralogy data from the Mindat database","authors":"Xiaogang Ma, Jolyon Ralph, Jiyin Zhang, Xiang Que, Anirudh Prabhu, Shaunna M. Morrison, Robert M. Hazen, Lesley Wyborn, Kerstin Lehnert","doi":"10.1002/gdj3.204","DOIUrl":"10.1002/gdj3.204","url":null,"abstract":"<p>The open data movement has brought revolutionary changes to the field of mineralogy. With a growing number of datasets made available through community efforts, researchers are now able to explore new scientific topics such as mineral ecology, mineral evolution and new classification systems. The recent results have shown that the necessary open data coupled with data science skills and expertise in mineralogy will lead to impressive new scientific discoveries. Yet, feedback from researchers also reflects the needs for better FAIRness of open data, that is, findable, accessible, interoperable and reusable for both humans and machines. In this paper, we present our recent work on building the open data service of Mindat, one of the largest mineral databases in the world. In the past years, Mindat has supported numerous scientific studies but a machine interface for data access has never been established. Through the OpenMindat project we have achieved solid progress on two activities: (1) cleanse data and improve data quality, and (2) build a data sharing platform and establish a machine interface for data query and access. We hope OpenMindat will help address the increasing data needs from researchers in mineralogy for an internationally recognized authoritative database that is fully compliant with the FAIR guiding principles and helps accelerate scientific discoveries.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 1","pages":"94-104"},"PeriodicalIF":3.2,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.204","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45781796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Alumbaugh, Erika Gasperikova, Dustin Crandall, Michael Commer, Shihang Feng, William Harbert, Yaoguo Li, Youzuo Lin, Savini Samarasinghe
We present a synthetic multi-scale, multi-physics dataset constructed from the Kimberlina 1.2 CO2 reservoir model based on a potential CO2 storage site in the Southern San Joaquin Basin of California. Among 300 models, one selected reservoir-simulation scenario produces hydrologic-state models at the onset and after 20 years of CO2 injection. Subsequently, these models were transformed into geophysical properties, including P- and S-wave seismic velocities, saturated density where the saturating fluid can be a combination of brine and supercritical CO2, and electrical resistivity using established empirical petrophysical relationships. From these 3D distributions of geophysical properties, we have generated synthetic time-lapse seismic, gravity and electromagnetic responses with acquisition geometries that mimic realistic monitoring surveys and are achievable in actual field situations. We have also created a series of synthetic well logs of CO2 saturation, acoustic velocity, density and induction resistivity in the injection well and three monitoring wells. These were constructed by combining the low-frequency trend of the geophysical models with the high-frequency variations of actual well logs collected at the potential storage site. In addition, to better calibrate our datasets, measurements of permeability and pore connectivity have been made on cores of Vedder Sandstone, which forms the primary reservoir unit. These measurements provide the range of scales in the otherwise synthetic dataset to be as close to a real-world situation as possible. This dataset consisting of the reservoir models, geophysical models, simulated time-lapse geophysical responses and well logs forms a multi-scale, multi-physics testbed for designing and testing geophysical CO2 monitoring systems as well as for imaging and characterization algorithms. The suite of numerical models and data have been made publicly available for downloading on the National Energy Technology Laboratory's (NETL) Energy Data Exchange (EDX) website.
我们以加利福尼亚州南圣华金盆地一个潜在的二氧化碳封存地点为基础,介绍了由金伯利纳 1.2 二氧化碳储层模型构建的合成多尺度、多物理场数据集。在 300 个模型中,一个选定的储层模拟方案生成了二氧化碳注入开始和 20 年后的水文状态模型。随后,这些模型被转化为地球物理属性,包括 P 波和 S 波地震速度、饱和密度(其中饱和流体可以是盐水和超临界二氧化碳的组合)以及电阻率(使用已建立的经验岩石物理关系)。根据这些地球物理特性的三维分布,我们生成了合成延时地震、重力和电磁响应,其采集几何形状模仿了现实的监测勘测,并可在实际现场情况下实现。我们还在注水井和三口监测井中创建了一系列二氧化碳饱和度、声速、密度和感应电阻率的合成测井记录。这些都是通过将地球物理模型的低频趋势与在潜在封存地点采集的实际测井记录的高频变化相结合而构建的。此外,为了更好地校准我们的数据集,还对构成主要储层单元的维德砂岩岩心进行了渗透率和孔隙连通性测量。这些测量结果为合成数据集提供了尺度范围,使其尽可能接近实际情况。该数据集由储层模型、地球物理模型、模拟延时地球物理响应和测井记录组成,是设计和测试二氧化碳地球物理监测系统以及成像和表征算法的多尺度、多物理场试验台。这套数值模型和数据已在美国国家能源技术实验室(NETL)的能源数据交换(EDX)网站上公开提供下载。
{"title":"The Kimberlina synthetic multiphysics dataset for CO2 monitoring investigations","authors":"David Alumbaugh, Erika Gasperikova, Dustin Crandall, Michael Commer, Shihang Feng, William Harbert, Yaoguo Li, Youzuo Lin, Savini Samarasinghe","doi":"10.1002/gdj3.191","DOIUrl":"10.1002/gdj3.191","url":null,"abstract":"<p>We present a synthetic multi-scale, multi-physics dataset constructed from the Kimberlina 1.2 CO<sub>2</sub> reservoir model based on a potential CO<sub>2</sub> storage site in the Southern San Joaquin Basin of California. Among 300 models, one selected reservoir-simulation scenario produces hydrologic-state models at the onset and after 20 years of CO<sub>2</sub> injection. Subsequently, these models were transformed into geophysical properties, including P- and S-wave seismic velocities, saturated density where the saturating fluid can be a combination of brine and supercritical CO<sub>2</sub>, and electrical resistivity using established empirical petrophysical relationships. From these 3D distributions of geophysical properties, we have generated synthetic time-lapse seismic, gravity and electromagnetic responses with acquisition geometries that mimic realistic monitoring surveys and are achievable in actual field situations. We have also created a series of synthetic well logs of CO<sub>2</sub> saturation, acoustic velocity, density and induction resistivity in the injection well and three monitoring wells. These were constructed by combining the low-frequency trend of the geophysical models with the high-frequency variations of actual well logs collected at the potential storage site. In addition, to better calibrate our datasets, measurements of permeability and pore connectivity have been made on cores of Vedder Sandstone, which forms the primary reservoir unit. These measurements provide the range of scales in the otherwise synthetic dataset to be as close to a real-world situation as possible. This dataset consisting of the reservoir models, geophysical models, simulated time-lapse geophysical responses and well logs forms a multi-scale, multi-physics testbed for designing and testing geophysical CO<sub>2</sub> monitoring systems as well as for imaging and characterization algorithms. The suite of numerical models and data have been made publicly available for downloading on the National Energy Technology Laboratory's (NETL) Energy Data Exchange (EDX) website.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 2","pages":"216-234"},"PeriodicalIF":3.2,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44808617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abraham Lauer, Jesse Devaney, Chanh Kieu, Ben Kravitz, Travis A. O'Brien, Scott M. Robeson, Paul W. Staten, The Anh Vu
Climate change is expected to have far-reaching effects at both the global and regional scale, but local effects are difficult to determine from coarse-resolution climate studies. Dynamical downscaling can provide insight into future climate projections on local scales. Here, we present a new dynamically downscaled dataset for Indiana and the surrounding regions. Output from the Community Earth System Model (CESM) version 1 is downscaled using the Weather Research and Forecasting model (WRF). Simulations are run with a 24-hr reinitialization strategy and a 12-hr spin-up window. WRF output is bias corrected to the National Centers for Environmental Protection/National Center for Atmospheric Research 40-year Reanalysis project (NCEP) using a modified quantile mapping method. Bias-corrected 2-m air temperature and accumulated precipitation are the initial focus, with additional variables planned for future releases. Regional climate change signals agree well with larger global studies, and local fine-scaled features are visible in the resulting dataset, such as urban heat islands, frontal passages, and orographic temperature gradients. This high-resolution climate dataset could be used for down-stream applications focused on impacts across the domain, such as urban planning, energy usage, water resources, agriculture and public health.
{"title":"A convection-permitting dynamically downscaled dataset over the Midwestern United States","authors":"Abraham Lauer, Jesse Devaney, Chanh Kieu, Ben Kravitz, Travis A. O'Brien, Scott M. Robeson, Paul W. Staten, The Anh Vu","doi":"10.1002/gdj3.188","DOIUrl":"10.1002/gdj3.188","url":null,"abstract":"<p>Climate change is expected to have far-reaching effects at both the global and regional scale, but local effects are difficult to determine from coarse-resolution climate studies. Dynamical downscaling can provide insight into future climate projections on local scales. Here, we present a new dynamically downscaled dataset for Indiana and the surrounding regions. Output from the Community Earth System Model (CESM) version 1 is downscaled using the Weather Research and Forecasting model (WRF). Simulations are run with a 24-hr reinitialization strategy and a 12-hr spin-up window. WRF output is bias corrected to the National Centers for Environmental Protection/National Center for Atmospheric Research 40-year Reanalysis project (NCEP) using a modified quantile mapping method. Bias-corrected 2-m air temperature and accumulated precipitation are the initial focus, with additional variables planned for future releases. Regional climate change signals agree well with larger global studies, and local fine-scaled features are visible in the resulting dataset, such as urban heat islands, frontal passages, and orographic temperature gradients. This high-resolution climate dataset could be used for down-stream applications focused on impacts across the domain, such as urban planning, energy usage, water resources, agriculture and public health.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"10 4","pages":"429-446"},"PeriodicalIF":3.2,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.188","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43661851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developments of detrital zircon geochronology have resulted in an explosion of publications that report or discuss detrital zircon data. Combined detrital zircon U–Pb ages with Hf isotope analyses from modern and Quaternary sediments have been widely carried out with the aim of characterizing continental crustal evolution and tracing sediment provenance. Although several detrital zircon databases have compiled U–Pb and Hf data on global scale, the dataset of detrital zircon with a special focus on modern sediment has rarely been compiled. Here, we publish a new detrital zircon dataset of modern and Quaternary sediments in China with 59,535 U–Pb ages and 4,971 Hf isotope data, with detailed information of isotope ratios, ages, Th/U, etc. Four types of sediments have been classified according to sedimentary environments, including fluvial, marine, aeolian and alluvial sediments. Preliminary analysis is carried out on these compiled data to provide new insights into sedimentary provenance and crustal evolution in China. Eight age populations are identified corresponding to tectonic–thermal or magmatic events, including 2,300–2,700 Ma, 1,800–2,000 Ma, 700–1,000 Ma, 400–500 Ma, 200–300 Ma, 120–200 Ma, 80–120 Ma and < 60 Ma. Accompanying with quantitative comparison between sediments from various sedimentary environments, these U–Pb age distributions reveal the provenance link between “source” and “sink” in both exorheic and endorheic drainages. The compiled εHf(t) data reflect that the crustal history of China is apparently episodic, whose pattern is similar to that of global record. Further work will be implemented for database construction, including the integration of latest literatures, AI-based data extraction and data aggregation.
{"title":"Detrital zircon U–Pb ages and Hf isotope analyses of modern and Quaternary sediments in China: A new dataset with preliminary analysis","authors":"Xiyun Chen, Ping Wang, Hongsen Xie, Longchen Zhu, Xia Liao, Xinggong Kong","doi":"10.1002/gdj3.193","DOIUrl":"10.1002/gdj3.193","url":null,"abstract":"<p>Developments of detrital zircon geochronology have resulted in an explosion of publications that report or discuss detrital zircon data. Combined detrital zircon U–Pb ages with Hf isotope analyses from modern and Quaternary sediments have been widely carried out with the aim of characterizing continental crustal evolution and tracing sediment provenance. Although several detrital zircon databases have compiled U–Pb and Hf data on global scale, the dataset of detrital zircon with a special focus on modern sediment has rarely been compiled. Here, we publish a new detrital zircon dataset of modern and Quaternary sediments in China with 59,535 U–Pb ages and 4,971 Hf isotope data, with detailed information of isotope ratios, ages, Th/U, etc. Four types of sediments have been classified according to sedimentary environments, including fluvial, marine, aeolian and alluvial sediments. Preliminary analysis is carried out on these compiled data to provide new insights into sedimentary provenance and crustal evolution in China. Eight age populations are identified corresponding to tectonic–thermal or magmatic events, including 2,300–2,700 Ma, 1,800–2,000 Ma, 700–1,000 Ma, 400–500 Ma, 200–300 Ma, 120–200 Ma, 80–120 Ma and < 60 Ma. Accompanying with quantitative comparison between sediments from various sedimentary environments, these U–Pb age distributions reveal the provenance link between “source” and “sink” in both exorheic and endorheic drainages. The compiled ε<sub>Hf</sub>(<i>t</i>) data reflect that the crustal history of China is apparently episodic, whose pattern is similar to that of global record. Further work will be implemented for database construction, including the integration of latest literatures, AI-based data extraction and data aggregation.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 4","pages":"374-384"},"PeriodicalIF":3.3,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.193","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42542850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The detrital zircon records significant information in the ‘source-sink’ system. With the application of in situ laser ablation technology, a large number of high-quality detrital zircon data have been published since 2000. In this study, a total of 41,342 detrital zircon U–Pb ages and 6,129 Hf isotopes were compiled from the published literatures of the Middle East (Iranian and Arabian plates). Through data filtering and recalculation, valid data were employed for further analysis. The detrital zircons from the Middle East show a Cambrian–Precambrian age population of 500–1,000 Ma, with a major age peak of ~620 Ma and dispersed εHf(t) values of −35 to +20. The Alborz Mountains and central Iran terrane show a Permo–Triassic age range of 200–300 Ma. The Mesozoic–Cenozoic detrital zircons are mostly occurred in the Zagros orogenic belt and Makran accretionary complex, with three obvious age ranges of 145–180 Ma, 80–110 Ma and 15–65 Ma. The Mesozoic zircons yield positive εHf(t) values, while Cenozoic zircons have varied εHf(t) values. This database allows for the further exploration of the provenance analysis and application in constraining the timing of the major tectonic events in the Middle East, and may also help to explore the affinities of plates, thus guiding future palaeogeographic research efforts.
{"title":"A database of detrital zircon U–Pb ages and Hf isotopes for the Middle East (Iranian and Arabian plates)","authors":"Gaoyuan Sun, Jianuo Chen","doi":"10.1002/gdj3.187","DOIUrl":"10.1002/gdj3.187","url":null,"abstract":"<p>The detrital zircon records significant information in the ‘source-sink’ system. With the application of in situ laser ablation technology, a large number of high-quality detrital zircon data have been published since 2000. In this study, a total of 41,342 detrital zircon U–Pb ages and 6,129 Hf isotopes were compiled from the published literatures of the Middle East (Iranian and Arabian plates). Through data filtering and recalculation, valid data were employed for further analysis. The detrital zircons from the Middle East show a Cambrian–Precambrian age population of 500–1,000 Ma, with a major age peak of ~620 Ma and dispersed εHf(t) values of −35 to +20. The Alborz Mountains and central Iran terrane show a Permo–Triassic age range of 200–300 Ma. The Mesozoic–Cenozoic detrital zircons are mostly occurred in the Zagros orogenic belt and Makran accretionary complex, with three obvious age ranges of 145–180 Ma, 80–110 Ma and 15–65 Ma. The Mesozoic zircons yield positive εHf(t) values, while Cenozoic zircons have varied εHf(t) values. This database allows for the further exploration of the provenance analysis and application in constraining the timing of the major tectonic events in the Middle East, and may also help to explore the affinities of plates, thus guiding future palaeogeographic research efforts.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 2","pages":"107-117"},"PeriodicalIF":3.2,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.187","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50954339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a hot topic in Earth sciences, the Qinghai-Tibet Plateau has accumulated a large amount of sedimentary-related data. We constructed a dataset of detrital components for Qinghai-Tibet Plateau from 63 peer-reviewed publications. The dataset thus comprises 1813 Late Proterozoic to Pleistocene sandstones from 84 stratigraphic units. For each sample, we present details on reference, detrital composition, GPS, geographic location, depositional age, tectonic setting and depositional environment. It becomes a high-quality dataset after the information on each sandstone sample was standardized and reviewed by sedimentary experts. The dataset can be used for regional geoscience studies, exploring the general laws of the source-to-sink process. The dataset may also be useful in the field of utilities, such as assisting in finding suitable building stones, helping oil and gas and mineral exploration, and so forth.
{"title":"A dataset of sandstone detrital composition from Qinghai-Tibet Plateau","authors":"Wen Lai, Xiumian Hu, Xiaolong Dong, Anlin Ma","doi":"10.1002/gdj3.184","DOIUrl":"10.1002/gdj3.184","url":null,"abstract":"<p>As a hot topic in Earth sciences, the Qinghai-Tibet Plateau has accumulated a large amount of sedimentary-related data. We constructed a dataset of detrital components for Qinghai-Tibet Plateau from 63 peer-reviewed publications. The dataset thus comprises 1813 Late Proterozoic to Pleistocene sandstones from 84 stratigraphic units. For each sample, we present details on reference, detrital composition, GPS, geographic location, depositional age, tectonic setting and depositional environment. It becomes a high-quality dataset after the information on each sandstone sample was standardized and reviewed by sedimentary experts. The dataset can be used for regional geoscience studies, exploring the general laws of the source-to-sink process. The dataset may also be useful in the field of utilities, such as assisting in finding suitable building stones, helping oil and gas and mineral exploration, and so forth.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 1","pages":"86-93"},"PeriodicalIF":3.2,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.184","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46828568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben R. Mather, R. Dietmar Müller, Sabin Zahirovic, John Cannon, Michael Chin, Lauren Ilano, Nicky M. Wright, Christopher Alfonso, Simon Williams, Michael Tetley, Andrew Merdith
PyGPlates is an open-source Python library to visualize and edit plate tectonic reconstructions created using GPlates. The Python API affords a greater level of flexibility than GPlates to interrogate plate reconstructions and integrate with other Python workflows. GPlately was created to accelerate spatio-temporal data analysis leveraging pyGPlates and PlateTectonicTools within a simplified Python interface. This object-oriented package enables the reconstruction of data through deep geologic time (points, lines, polygons and rasters), the interrogation of plate kinematic information (plate velocities, rates of subduction and seafloor spreading), the rapid comparison between multiple plate motion models, and the plotting of reconstructed output data on maps. All tools are designed to be parallel-safe to accelerate spatio-temporal analysis over multiple CPU processors.
{"title":"Deep time spatio-temporal data analysis using pyGPlates with PlateTectonicTools and GPlately","authors":"Ben R. Mather, R. Dietmar Müller, Sabin Zahirovic, John Cannon, Michael Chin, Lauren Ilano, Nicky M. Wright, Christopher Alfonso, Simon Williams, Michael Tetley, Andrew Merdith","doi":"10.1002/gdj3.185","DOIUrl":"10.1002/gdj3.185","url":null,"abstract":"<p>PyGPlates is an open-source Python library to visualize and edit plate tectonic reconstructions created using GPlates. The Python API affords a greater level of flexibility than GPlates to interrogate plate reconstructions and integrate with other Python workflows. GPlately was created to accelerate spatio-temporal data analysis leveraging pyGPlates and PlateTectonicTools within a simplified Python interface. This object-oriented package enables the reconstruction of data through deep geologic time (points, lines, polygons and rasters), the interrogation of plate kinematic information (plate velocities, rates of subduction and seafloor spreading), the rapid comparison between multiple plate motion models, and the plotting of reconstructed output data on maps. All tools are designed to be parallel-safe to accelerate spatio-temporal analysis over multiple CPU processors.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 1","pages":"3-10"},"PeriodicalIF":3.2,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44685375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.
{"title":"GeoDeepShovel: A platform for building scientific database from geoscience literature with AI assistance","authors":"Shao Zhang, Hui Xu, Yuting Jia, Ying Wen, Dakuo Wang, Luoyi Fu, Xinbing Wang, Chenghu Zhou","doi":"10.1002/gdj3.186","DOIUrl":"10.1002/gdj3.186","url":null,"abstract":"<p>With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"10 4","pages":"519-537"},"PeriodicalIF":3.2,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.186","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44436857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marko Bermanec, Noa Vidović, Liubomyr Gavryliv, Shaunna M. Morrison, Robert M. Hazen
Crystal structures of minerals are defined by a specific atomic arrangement within the unit-cell, which follows the laws of symmetry specific to each crystal system. The causes for a mineral to crystallize in a given crystal system have been the subject of many studies showing their dependency on different formation conditions, such as the presence of aqueous fluids, biotic activity and many others. Different attempts have been made to quantify and interpret the information that we can gather from studying crystal symmetry and its distribution in the mineral kingdom. However, these methods are mostly outdated or at least not compatible for use on large datasets available today. Therefore, a revision of symmetry index calculation has been made in accordance with the growing understanding of mineral species and their characteristics. In the gathered data, we observe a gradual but significant decrease in crystal symmetry through the stages of mineral evolution, from the formation of the solar system to modern day. However, this decrease is neither uniform nor linear, which provides further implications for mineral evolution from the viewpoint of crystal symmetry. The temporal distribution of minerals based on the number of essential elements in their chemical formulae and their symmetry index has been calculated and compared to explore their behaviour. Minerals with four to eight essential elements have the lowest average symmetry index, while being the most abundant throughout all stages of mineral evolution. There are many open questions, including those pertaining to whether or not biological activity on Earth has influenced the observed decrease in mineral symmetry through time and whether or not the trajectory of planetary evolution of a geologically active body is one of decreasing mineral symmetry/increasing complexity.
{"title":"Evolution of symmetry index in minerals","authors":"Marko Bermanec, Noa Vidović, Liubomyr Gavryliv, Shaunna M. Morrison, Robert M. Hazen","doi":"10.1002/gdj3.177","DOIUrl":"10.1002/gdj3.177","url":null,"abstract":"<p>Crystal structures of minerals are defined by a specific atomic arrangement within the unit-cell, which follows the laws of symmetry specific to each crystal system. The causes for a mineral to crystallize in a given crystal system have been the subject of many studies showing their dependency on different formation conditions, such as the presence of aqueous fluids, biotic activity and many others. Different attempts have been made to quantify and interpret the information that we can gather from studying crystal symmetry and its distribution in the mineral kingdom. However, these methods are mostly outdated or at least not compatible for use on large datasets available today. Therefore, a revision of symmetry index calculation has been made in accordance with the growing understanding of mineral species and their characteristics. In the gathered data, we observe a gradual but significant decrease in crystal symmetry through the stages of mineral evolution, from the formation of the solar system to modern day. However, this decrease is neither uniform nor linear, which provides further implications for mineral evolution from the viewpoint of crystal symmetry. The temporal distribution of minerals based on the number of essential elements in their chemical formulae and their symmetry index has been calculated and compared to explore their behaviour. Minerals with four to eight essential elements have the lowest average symmetry index, while being the most abundant throughout all stages of mineral evolution. There are many open questions, including those pertaining to whether or not biological activity on Earth has influenced the observed decrease in mineral symmetry through time and whether or not the trajectory of planetary evolution of a geologically active body is one of decreasing mineral symmetry/increasing complexity.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 1","pages":"69-85"},"PeriodicalIF":3.2,"publicationDate":"2023-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.177","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42681835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}