首页 > 最新文献

Applied Computing and Geosciences最新文献

英文 中文
Using a 3D heat map to explore the diverse correlations among elements and mineral species 利用三维热图探索元素与矿物种类之间的多种关联性
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-05 DOI: 10.1016/j.acags.2024.100154
Jiyin Zhang , Xiang Que , Bhuwan Madhikarmi , Robert M. Hazen , Jolyon Ralph , Anirudh Prabhu , Shaunna M. Morrison , Xiaogang Ma

This paper presents an enhanced 3D heat map for exploratory data analysis (EDA) of open mineral data, addressing the challenges caused by rapidly evolving datasets and ensuring scientifically meaningful data exploration. The Mindat website, a crowd-sourced database of mineral species, provides a constantly updated open data source via its newly established application programming interface (API). To illustrate the potential usage of the API, we constructed an automatic workflow to retrieve and cleanse mineral data from it, thus feeding the 3D heat map with up-to-date records of mineral species. In the 3D heat map, we developed scientifically sound operations for data selection and visualization by incorporating knowledge from existing mineral classification systems and recent studies in mineralogy. The resulting 3D heat map has been shared as an online demo system, with the source code made open on GitHub. We hope this updated 3D heat map system will serve as a valuable resource for researchers, educators, and students in geosciences, demonstrating the potential for data-intensive research in mineralogy and broader geoscience disciplines.

本文介绍了一种用于开放矿物数据探索性数据分析(EDA)的增强型三维热图,以应对快速发展的数据集带来的挑战,并确保进行有科学意义的数据探索。Mindat 网站是一个矿物种类的众包数据库,通过其新建立的应用编程接口(API)提供了一个不断更新的开放数据源。为了说明 API 的潜在用途,我们构建了一个自动工作流程,从中检索和清理矿物数据,从而为三维热图提供最新的矿物种类记录。在三维热图中,我们结合现有矿物分类系统和矿物学最新研究的知识,开发了科学合理的数据选择和可视化操作。生成的三维热图已作为在线演示系统与大家分享,源代码已在 GitHub 上公开。我们希望这个更新的三维热图系统能成为地球科学研究人员、教育工作者和学生的宝贵资源,展示矿物学和更广泛的地球科学学科中数据密集型研究的潜力。
{"title":"Using a 3D heat map to explore the diverse correlations among elements and mineral species","authors":"Jiyin Zhang ,&nbsp;Xiang Que ,&nbsp;Bhuwan Madhikarmi ,&nbsp;Robert M. Hazen ,&nbsp;Jolyon Ralph ,&nbsp;Anirudh Prabhu ,&nbsp;Shaunna M. Morrison ,&nbsp;Xiaogang Ma","doi":"10.1016/j.acags.2024.100154","DOIUrl":"https://doi.org/10.1016/j.acags.2024.100154","url":null,"abstract":"<div><p>This paper presents an enhanced 3D heat map for exploratory data analysis (EDA) of open mineral data, addressing the challenges caused by rapidly evolving datasets and ensuring scientifically meaningful data exploration. The Mindat website, a crowd-sourced database of mineral species, provides a constantly updated open data source via its newly established application programming interface (API). To illustrate the potential usage of the API, we constructed an automatic workflow to retrieve and cleanse mineral data from it, thus feeding the 3D heat map with up-to-date records of mineral species. In the 3D heat map, we developed scientifically sound operations for data selection and visualization by incorporating knowledge from existing mineral classification systems and recent studies in mineralogy. The resulting 3D heat map has been shared as an online demo system, with the source code made open on GitHub. We hope this updated 3D heat map system will serve as a valuable resource for researchers, educators, and students in geosciences, demonstrating the potential for data-intensive research in mineralogy and broader geoscience disciplines.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100154"},"PeriodicalIF":3.4,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000016/pdfft?md5=0b52703561a3bfd2d7bf0ed0e4d6590e&pid=1-s2.0-S2590197424000016-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139111608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural network approach for shape-based euhedral pyrite identification in X-ray CT data with adversarial unsupervised domain adaptation 在 X 射线 CT 数据中采用对抗性无监督域适应的神经网络方法进行基于形状的正方体黄铁矿识别
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-04 DOI: 10.1016/j.acags.2023.100153
Suraj Neelakantan , Jesper Norell , Alexander Hansson , Martin Längkvist , Amy Loutfi

We explore an attenuation and shape-based identification of euhedral pyrites in high-resolution X-ray Computed Tomography (XCT) data using deep neural networks. To deal with the scarcity of annotated data we generate a complementary training set of synthetic images. To investigate and address the domain gap between the synthetic and XCT data, several deep learning models, with and without domain adaption, are trained and compared. We find that a model trained on a small set of human annotations, while displaying over-fitting, can rival the human annotators. The unsupervised domain adaptation approaches are successful in bridging the domain gap, which significantly improves their performance. A domain-adapted model, trained on a dataset that fuses synthetic and real data, is the overall best-performing model. This highlights the possibility of using synthetic datasets for the application of deep learning in mineralogy.

我们利用深度神经网络探索了一种基于衰减和形状的高分辨率 X 射线计算机断层扫描(XCT)数据中八面体黄铁矿的识别方法。为了解决注释数据稀缺的问题,我们生成了一个合成图像补充训练集。为了研究和解决合成数据与 XCT 数据之间的领域差距,我们训练了几个深度学习模型,并对其进行了领域自适应和非领域自适应的比较。我们发现,在一小部分人类注释集上训练的模型虽然表现出过拟合,但可以与人类注释者相媲美。无监督领域适应方法成功地弥合了领域差距,显著提高了性能。在融合了合成数据和真实数据的数据集上训练的领域适应模型是整体表现最佳的模型。这凸显了将合成数据集用于矿物学深度学习的可能性。
{"title":"Neural network approach for shape-based euhedral pyrite identification in X-ray CT data with adversarial unsupervised domain adaptation","authors":"Suraj Neelakantan ,&nbsp;Jesper Norell ,&nbsp;Alexander Hansson ,&nbsp;Martin Längkvist ,&nbsp;Amy Loutfi","doi":"10.1016/j.acags.2023.100153","DOIUrl":"10.1016/j.acags.2023.100153","url":null,"abstract":"<div><p>We explore an attenuation and shape-based identification of euhedral pyrites in high-resolution X-ray Computed Tomography (XCT) data using deep neural networks. To deal with the scarcity of annotated data we generate a complementary training set of synthetic images. To investigate and address the domain gap between the synthetic and XCT data, several deep learning models, with and without domain adaption, are trained and compared. We find that a model trained on a small set of human annotations, while displaying over-fitting, can rival the human annotators. The unsupervised domain adaptation approaches are successful in bridging the domain gap, which significantly improves their performance. A domain-adapted model, trained on a dataset that fuses synthetic and real data, is the overall best-performing model. This highlights the possibility of using synthetic datasets for the application of deep learning in mineralogy.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100153"},"PeriodicalIF":3.4,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000423/pdfft?md5=b48cfaa3e867a2a2e72a1453cf13f16e&pid=1-s2.0-S2590197423000423-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139393213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoCoDA: Recognizing and validating structural processes in geochemical data. A workflow on compositional data analysis in lithogeochemistry GeoCoDA:识别和验证地球化学数据中的结构过程。岩石地球化学成分数据分析工作流程
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-02 DOI: 10.1016/j.acags.2023.100149
Eric Grunsky , Michael Greenacre , Bruce Kjarsgaard

Geochemical data are compositional in nature and are subject to the problems typically associated with data that are restricted to the real non-negative number space with constant-sum constraint, that is, the simplex. Geochemistry can be considered a proxy for mineralogy, comprised of atomically ordered structures that define the placement and abundance of elements in the mineral lattice structure. Based on the innovative contributions of John Aitchison, who introduced the logratio transformation into compositional data analysis, this contribution provides a systematic workflow for assessing geochemical data in a simple and efficient way, such that significant geochemical (mineralogical) processes can be recognized and validated. This workflow, called GeoCoDA and presented here in the form of a tutorial, enables the recognition of processes from which models can be constructed based on the associations of elements that reflect mineralogy. Both the original compositional values and their transformation to logratios are considered. These models can reflect rock-forming processes, metamorphism, alteration and ore mineralization. Moreover, machine learning methods, both unsupervised and supervised, applied to an optimized set of subcompositions of the data, provide a systematic, accurate, efficient and defensible approach to geochemical data analysis. The workflow is illustrated on lithogeochemical data from exploration of the Star kimberlite, consisting of a series of eruptions with five recognized phases.

地球化学数据在本质上是组成性的,通常会遇到与限制在具有恒和约束的实数非负数空间(即单纯形)中的数据相关的问题。地球化学可被视为矿物学的代表,由原子有序结构组成,定义了元素在矿物晶格结构中的位置和丰度。约翰-艾奇逊(John Aitchison)曾将对数比例转换引入成分数据分析,在他的创新性贡献的基础上,本文提供了一个系统的工作流程,以简单高效的方式评估地球化学数据,从而识别和验证重要的地球化学(矿物学)过程。该工作流程被称为 GeoCoDA,以教程的形式在此介绍,它能够识别各种过程,并根据反映矿物学的元素关联构建模型。原始成分值及其对比率的转换都会被考虑在内。这些模型可以反映成岩过程、变质作用、蚀变作用和矿石成矿作用。此外,将无监督和有监督的机器学习方法应用于数据的优化子构成集,可为地球化学数据分析提供系统、准确、高效和可辩护的方法。该工作流程以星形金伯利岩勘探过程中的岩石地球化学数据为例作了说明,星形金伯利岩由一系列喷发和五个公认的阶段组成。
{"title":"GeoCoDA: Recognizing and validating structural processes in geochemical data. A workflow on compositional data analysis in lithogeochemistry","authors":"Eric Grunsky ,&nbsp;Michael Greenacre ,&nbsp;Bruce Kjarsgaard","doi":"10.1016/j.acags.2023.100149","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100149","url":null,"abstract":"<div><p>Geochemical data are compositional in nature and are subject to the problems typically associated with data that are restricted to the real non-negative number space with constant-sum constraint, that is, the simplex. Geochemistry can be considered a proxy for mineralogy, comprised of atomically ordered structures that define the placement and abundance of elements in the mineral lattice structure. Based on the innovative contributions of John Aitchison, who introduced the logratio transformation into compositional data analysis, this contribution provides a systematic workflow for assessing geochemical data in a simple and efficient way, such that significant geochemical (mineralogical) processes can be recognized and validated. This workflow, called GeoCoDA and presented here in the form of a tutorial, enables the recognition of processes from which models can be constructed based on the associations of elements that reflect mineralogy. Both the original compositional values and their transformation to logratios are considered. These models can reflect rock-forming processes, metamorphism, alteration and ore mineralization. Moreover, machine learning methods, both unsupervised and supervised, applied to an optimized set of subcompositions of the data, provide a systematic, accurate, efficient and defensible approach to geochemical data analysis. The workflow is illustrated on lithogeochemical data from exploration of the Star kimberlite, consisting of a series of eruptions with five recognized phases.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"22 ","pages":"Article 100149"},"PeriodicalIF":3.4,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000381/pdfft?md5=73c63e3085ea08dc140737cfd1aa2255&pid=1-s2.0-S2590197423000381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140113714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative performance analysis of simple U-Net, residual attention U-Net, and VGG16-U-Net for inventory inland water bodies 用于清查内陆水体的简单 U-网、剩余注意力 U-网和 VGG16-U-Net 的性能比较分析
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-19 DOI: 10.1016/j.acags.2023.100150
Ali Ghaznavi , Mohammadmehdi Saberioon , Jakub Brom , Sibylle Itzerott

Inland water bodies play a vital role at all scales in the terrestrial water balance and Earth’s climate variability. Thus, an inventory of inland waters is crucially important for hydrologic and ecological studies and management. Therefore, the main aim of this study was to develop a deep learning-based method for inventorying and mapping inland water bodies using the RGB band of high-resolution satellite imagery automatically and accurately.

The Sentinel-2 Harmonized dataset, together with ZABAGED-validated ground truth, was used as the main dataset for the model training step. Three different deep learning algorithms based on U-Net architecture were employed to segment inland waters, including a simple U-Net, Residual Attention U-Net, and VGG16-U-Net. All three algorithms were trained using a combination of Sentinel-2 visible bands (Red [B04; 665nm], Green [B03; 560nm], and Blue [B02; 490 nm]) at a 10-meter spatial resolution.

The Residual Attention U-Net achieved the highest computational cost due to the increased number of trainable parameters. The VGG16-U-Net had the shortest run time and the lowest number of trainable parameters, attributed to its architecture compared to the simple and Residual Attention U-Net architectures, respectively. As a result, the VGG16-U-Net provided the best segmentation results with a mean-IoU score of 0.9850, a slight improvement compared to other proposed U-Net-based architectures.

Although the accuracy of the model based on VGG16-U-Net does not make a difference from Residual Attention U-Net, the computation costs for training VGG16-U-Net were dramatically lower than Residual Attention U-Net.

内陆水体在陆地水量平衡和地球气候多变性的各个尺度上都发挥着至关重要的作用。因此,内陆水域清单对于水文和生态研究及管理至关重要。因此,本研究的主要目的是开发一种基于深度学习的方法,利用高分辨率卫星图像的 RGB 波段自动准确地清查和绘制内陆水体。在对内陆水域进行分割时,采用了三种不同的基于 U-Net 架构的深度学习算法,包括简单 U-Net、Residual Attention U-Net 和 VGG16-U-Net。这三种算法都是使用哨兵-2 的可见光波段(红波段[B04; 665nm]、绿波段[B03; 560nm]和蓝波段[B02; 490nm])组合进行训练的,空间分辨率为 10 米。由于可训练参数的数量增加,残留注意力 U-Net 的计算成本最高。VGG16-U-Net 的运行时间最短,可训练参数数量最少,这分别归因于其架构与简单 U-Net 架构和剩余注意力 U-Net 架构相比。因此,VGG16-U-Net 提供了最好的分割结果,平均 IoU 得分为 0.9850,与其他基于 U-Net 的架构相比略有提高。虽然基于 VGG16-U-Net 的模型的准确性与残差注意 U-Net 没有区别,但训练 VGG16-U-Net 的计算成本却大大低于残差注意 U-Net。
{"title":"Comparative performance analysis of simple U-Net, residual attention U-Net, and VGG16-U-Net for inventory inland water bodies","authors":"Ali Ghaznavi ,&nbsp;Mohammadmehdi Saberioon ,&nbsp;Jakub Brom ,&nbsp;Sibylle Itzerott","doi":"10.1016/j.acags.2023.100150","DOIUrl":"10.1016/j.acags.2023.100150","url":null,"abstract":"<div><p>Inland water bodies play a vital role at all scales in the terrestrial water balance and Earth’s climate variability. Thus, an inventory of inland waters is crucially important for hydrologic and ecological studies and management. Therefore, the main aim of this study was to develop a deep learning-based method for inventorying and mapping inland water bodies using the RGB band of high-resolution satellite imagery automatically and accurately.</p><p>The Sentinel-2 Harmonized dataset, together with ZABAGED-validated ground truth, was used as the main dataset for the model training step. Three different deep learning algorithms based on U-Net architecture were employed to segment inland waters, including a simple U-Net, Residual Attention U-Net, and VGG16-U-Net. All three algorithms were trained using a combination of Sentinel-2 visible bands (Red [B04; 665nm], Green [B03; 560nm], and Blue [B02; 490 nm]) at a 10-meter spatial resolution.</p><p>The Residual Attention U-Net achieved the highest computational cost due to the increased number of trainable parameters. The VGG16-U-Net had the shortest run time and the lowest number of trainable parameters, attributed to its architecture compared to the simple and Residual Attention U-Net architectures, respectively. As a result, the VGG16-U-Net provided the best segmentation results with a mean-IoU score of 0.9850, a slight improvement compared to other proposed U-Net-based architectures.</p><p>Although the accuracy of the model based on VGG16-U-Net does not make a difference from Residual Attention U-Net, the computation costs for training VGG16-U-Net were dramatically lower than Residual Attention U-Net.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100150"},"PeriodicalIF":3.4,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000393/pdfft?md5=e26e50e9fd7c6d7b45541d9f356c212b&pid=1-s2.0-S2590197423000393-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139015408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to ‘Parallel investigations of remote sensing and ground-truth lake Chad's level data using statistical and machine learning methods’ [Appl. Comput. Geosci. 20 (2023) 100135] 利用统计和机器学习方法并行研究遥感和地面实况查德湖水位数据"[Appl.
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 DOI: 10.1016/j.acags.2023.100141
Kim-Ndor Djimadoumngar
{"title":"Corrigendum to ‘Parallel investigations of remote sensing and ground-truth lake Chad's level data using statistical and machine learning methods’ [Appl. Comput. Geosci. 20 (2023) 100135]","authors":"Kim-Ndor Djimadoumngar","doi":"10.1016/j.acags.2023.100141","DOIUrl":"10.1016/j.acags.2023.100141","url":null,"abstract":"","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100141"},"PeriodicalIF":3.4,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000307/pdfft?md5=e6efd8c63afb83e52ab8e0a17a1bf13b&pid=1-s2.0-S2590197423000307-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136127382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia 评估马来西亚柔佛河流域高变化情况下降雨量数据的估算方法
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 DOI: 10.1016/j.acags.2023.100145
Zulfaqar Sa’adi , Zulkifli Yusop , Nor Eliza Alias , Ming Fai Chow , Mohd Khairul Idlan Muhammad , Muhammad Wafiy Adli Ramli , Zafar Iqbal , Mohammed Sanusi Shiru , Faizal Immaddudin Wira Rohmat , Nur Athirah Mohamad , Mohamad Faizal Ahmad

Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.

降雨记录中的缺失值可能会导致错误的预测和低效的管理方法,从而造成严重的经济、环境和社会后果。这一点对于马来西亚半岛(PM)的降雨数据集尤为重要,因为高水平的缺失会影响高度多变的时间序列中的固有模式。在这项研究中,使用 R 软件包中的 "链式方程多变量估算(MICE)",对柔佛河流域(JRB)21 个目标雨量站 1970 年至 2015 年的每日数据进行了研究,并检验了 19 种不同的多重估算方法。针对不同类型的缺失(即完全随机缺失(MCAR)、随机缺失(MAR)和非随机缺失(MNAR)),对每个测站分别按高达 5%、10%、20% 和 30% 的比例添加人工缺失数据,并保留原始缺失数据。根据几个统计性能指标,即平均绝对误差(MAE)、均方根误差(RMSE)、归一化均方根误差(NRMSE)、纳什-苏特克利夫效率(NSE)、修正一致度(MD)、判定系数(R2)、克林-古普塔效率(KGE)和容积效率(VE),对估算质量进行了评估,随后使用折中方案指数(CPI)对这些指标进行排序和汇总,以选出最佳方法。结果表明,线性回归预测值(norm.predict)在所有类型和级别的缺失率中始终排名最高。例如,在 MAR、MNAR 和 MCAR 下,该方法的 MAE 值最低,分别为 0.78 至 2.25、0.93 至 2.57 和 0.87 至 2.43。在 MAR、MCAR 和 MNAR 下,它的 NSE 和 R2 值也一直较高,分别为 0.71-0.92、0.6-0.92 和 0.66-0.91,以及 0.77-0.92、0.71-0.93 和 0.75-0.92。均值法、rf 法和推车法似乎也很有效。将折中方案设计指数(CPI)作为决策支持工具,可以对多种性能指标的输出进行客观评估,从而排序和选择性能最佳的方法。在验证过程中,概率密度函数(PDF)表明,即使缺失率高达 30%,与实际数据相比,估算后的分布形状仍得以保留。本研究提出的方法有助于为其他热带降雨数据集选择合适的估算方法,从而提高降雨估算和预测的准确性。
{"title":"Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia","authors":"Zulfaqar Sa’adi ,&nbsp;Zulkifli Yusop ,&nbsp;Nor Eliza Alias ,&nbsp;Ming Fai Chow ,&nbsp;Mohd Khairul Idlan Muhammad ,&nbsp;Muhammad Wafiy Adli Ramli ,&nbsp;Zafar Iqbal ,&nbsp;Mohammed Sanusi Shiru ,&nbsp;Faizal Immaddudin Wira Rohmat ,&nbsp;Nur Athirah Mohamad ,&nbsp;Mohamad Faizal Ahmad","doi":"10.1016/j.acags.2023.100145","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100145","url":null,"abstract":"<div><p>Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (<em>norm.predict</em>) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of <em>mean</em>, <em>rf</em>, and <em>cart</em> also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100145"},"PeriodicalIF":3.4,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000344/pdfft?md5=807ccb11378bbc7aafaff142104149e9&pid=1-s2.0-S2590197423000344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138558749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnnRG - An artificial neural network solute geothermometer 人工神经网络溶质地温计
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-15 DOI: 10.1016/j.acags.2023.100144
Lars H. Ystroem, Mark Vollmer, Thomas Kohl, Fabian Nitschke

Solute artificial neural network geothermometers offer the possibility to overcome the complexity given by the solute-mineral composition. Herein, we present a new concept, trained from high-quality hydrochemical data and verified by in-situ temperature measurements with a total of 208 data pairs of geochemical input parameters (Na+, K+, Ca2+, Mg2+, Cl, SiO2, and pH) and reservoir temperature measurements being compiled. The data comprises nine geothermal sites with a broad variety of geochemical characteristics and enthalpies. Five sites with 163 samples (Upper Rhine Graben, Pannonian Basin, German Molasse Basin, Paris Basin, and Iceland) are used to develop the ANN geothermometer, while further four sites with 45 samples (Azores, El Tatio, Miavalles, and Rotorua) are used to encounter the established artificial neural network in practice to unknown data. The setup of the application, as well as the optimisation of the network architecture and its hyperparameters, are stepwise introduced. As a result, the solute ANN geothermometer, AnnRG (Artificial neural network Regression Geothermometer), provides precise reservoir temperature predictions (RMSE of 10.442 K) with a high prediction accuracy of R2 = 0.978. In conclusion, the implementation and verification of the first adequate ANN geothermometer is an advancement in solute geothermometry. Our approach is also a basis for further broadening and refining applications in geochemistry.

溶质人工神经网络地温计提供了克服溶质矿物组成所带来的复杂性的可能性。在此,我们提出了一个新的概念,通过高质量的水化学数据进行训练,并通过总共208对地球化学输入参数(Na+, K+, Ca2+, Mg2+, Cl−,SiO2和pH)和储层温度测量的原位温度测量进行验证。这些数据包括9个地热点,具有广泛的地球化学特征和焓值。利用5个地点163个样本(上莱茵地堑、潘诺尼亚盆地、德国Molasse盆地、巴黎盆地和冰岛)开发人工神经网络地温计,另外4个地点45个样本(亚速尔群岛、El Tatio、Miavalles和罗托鲁瓦)在实践中遇到已建立的人工神经网络对未知数据的处理。逐步介绍了应用程序的设置,以及网络结构及其超参数的优化。结果表明,溶质人工神经网络回归地温计(AnnRG)能准确预测储层温度,RMSE为10.442 K,预测精度R2 = 0.978。总之,第一个合适的人工神经网络地温计的实现和验证是溶质地温计的一个进步。我们的方法也是进一步扩大和完善地球化学应用的基础。
{"title":"AnnRG - An artificial neural network solute geothermometer","authors":"Lars H. Ystroem,&nbsp;Mark Vollmer,&nbsp;Thomas Kohl,&nbsp;Fabian Nitschke","doi":"10.1016/j.acags.2023.100144","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100144","url":null,"abstract":"<div><p>Solute artificial neural network geothermometers offer the possibility to overcome the complexity given by the solute-mineral composition. Herein, we present a new concept, trained from high-quality hydrochemical data and verified by <em>in-situ</em> temperature measurements with a total of 208 data pairs of geochemical input parameters (Na<sup>+</sup>, K<sup>+</sup>, Ca<sup>2+</sup>, Mg<sup>2+</sup>, Cl<sup>−</sup>, SiO<sub>2</sub>, and pH) and reservoir temperature measurements being compiled. The data comprises nine geothermal sites with a broad variety of geochemical characteristics and enthalpies. Five sites with 163 samples (Upper Rhine Graben, Pannonian Basin, German Molasse Basin, Paris Basin, and Iceland) are used to develop the ANN geothermometer, while further four sites with 45 samples (Azores, El Tatio, Miavalles, and Rotorua) are used to encounter the established artificial neural network in practice to unknown data. The setup of the application, as well as the optimisation of the network architecture and its hyperparameters, are stepwise introduced. As a result, the solute ANN geothermometer, AnnRG (Artificial neural network Regression Geothermometer), provides precise reservoir temperature predictions (RMSE of 10.442 K) with a high prediction accuracy of R<sup>2</sup> = 0.978. In conclusion, the implementation and verification of the first adequate ANN geothermometer is an advancement in solute geothermometry. Our approach is also a basis for further broadening and refining applications in geochemistry.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100144"},"PeriodicalIF":3.4,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000332/pdfft?md5=44b6e2e297c5c6c3291a38dab912498a&pid=1-s2.0-S2590197423000332-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136696934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative analysis of super-resolution techniques for enhancing micro-CT images of carbonate rocks 碳酸盐岩微ct图像超分辨增强技术的对比分析
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-14 DOI: 10.1016/j.acags.2023.100143
Ramin Soltanmohammadi, Salah A. Faroughi

High-resolution digital rock micro-CT images captured from a wide field of view are essential for various geosystem engineering and geoscience applications. However, the resolution of these images is often constrained by the capabilities of scanners. To overcome this limitation and achieve superior image quality, advanced deep learning techniques have been used. This study compares four different super-resolution techniques, including super-resolution convolutional neural network (SRCNN), efficient sub-pixel convolutional neural networks (ESPCN), enhanced deep residual neural networks (EDRN), and super-resolution generative adversarial networks (SRGAN) to enhance the resolution of micro-CT images obtained from heterogeneous porous media. Our investigation employs a dataset consisting of 5000 micro-CT images acquired from a highly heterogeneous carbonate rock. The performance of each algorithm is evaluated based on its accuracy to reconstruct the pore geometry and connectivity, grain-pore edge sharpness, and preservation of petrophysical properties, such as porosity. Our findings indicate that EDRN outperforms other techniques in terms of the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, increased by nearly 4 dB and 17%, respectively, compared to bicubic interpolation. Furthermore, SRGAN exhibits superior performance compared to other techniques in terms of the learned perceptual image patch similarity (LPIPS) index and porosity preservation error. SRGAN shows a nearly 30% reduction in LPIPS compared to bicubic interpolation. Our results provide deeper insights into the practical applications of these techniques in the domain of porous media characterizations, facilitating the selection of optimal super-resolution CNN-based methodologies.

从宽视场捕获的高分辨率数字岩石微ct图像对于各种地球系统工程和地球科学应用至关重要。然而,这些图像的分辨率往往受到扫描仪能力的限制。为了克服这一限制并获得更好的图像质量,已经使用了先进的深度学习技术。本研究比较了四种不同的超分辨率技术,包括超分辨率卷积神经网络(SRCNN)、高效亚像素卷积神经网络(ESPCN)、增强型深度残差神经网络(EDRN)和超分辨率生成对抗网络(SRGAN),以提高非均质多孔介质微ct图像的分辨率。我们的研究使用了一个由5000张显微ct图像组成的数据集,这些图像来自高度非均质碳酸盐岩。每种算法的性能都是根据其重建孔隙几何形状和连通性的准确性、颗粒-孔隙边缘的清晰度以及岩石物理性质(如孔隙度)的保存情况来评估的。我们的研究结果表明,EDRN在峰值信噪比(PSNR)和结构相似性(SSIM)指数方面优于其他技术,与双三次插值相比,分别提高了近4 dB和17%。此外,与其他技术相比,SRGAN在学习感知图像斑块相似度(LPIPS)指数和孔隙度保存误差方面表现出优越的性能。与双三次插值相比,SRGAN显示LPIPS降低了近30%。我们的结果为这些技术在多孔介质表征领域的实际应用提供了更深入的见解,促进了基于cnn的最佳超分辨率方法的选择。
{"title":"A comparative analysis of super-resolution techniques for enhancing micro-CT images of carbonate rocks","authors":"Ramin Soltanmohammadi,&nbsp;Salah A. Faroughi","doi":"10.1016/j.acags.2023.100143","DOIUrl":"10.1016/j.acags.2023.100143","url":null,"abstract":"<div><p>High-resolution digital rock micro-CT images captured from a wide field of view are essential for various geosystem engineering and geoscience applications. However, the resolution of these images is often constrained by the capabilities of scanners. To overcome this limitation and achieve superior image quality, advanced deep learning techniques have been used. This study compares four different super-resolution techniques, including super-resolution convolutional neural network (SRCNN), efficient sub-pixel convolutional neural networks (ESPCN), enhanced deep residual neural networks (EDRN), and super-resolution generative adversarial networks (SRGAN) to enhance the resolution of micro-CT images obtained from heterogeneous porous media. Our investigation employs a dataset consisting of 5000 micro-CT images acquired from a highly heterogeneous carbonate rock. The performance of each algorithm is evaluated based on its accuracy to reconstruct the pore geometry and connectivity, grain-pore edge sharpness, and preservation of petrophysical properties, such as porosity. Our findings indicate that EDRN outperforms other techniques in terms of the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, increased by nearly 4 dB and 17%, respectively, compared to bicubic interpolation. Furthermore, SRGAN exhibits superior performance compared to other techniques in terms of the learned perceptual image patch similarity (LPIPS) index and porosity preservation error. SRGAN shows a nearly 30% reduction in LPIPS compared to bicubic interpolation. Our results provide deeper insights into the practical applications of these techniques in the domain of porous media characterizations, facilitating the selection of optimal super-resolution CNN-based methodologies.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100143"},"PeriodicalIF":3.4,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000320/pdfft?md5=ccbbe7617370fecf380cd2b36778bb1c&pid=1-s2.0-S2590197423000320-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The cultural-social nucleus of an open community: A multi-level community knowledge graph and NASA application 开放社区的文化社会核心:多层次社区知识图谱与NASA应用
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-03 DOI: 10.1016/j.acags.2023.100142
Ryan M. McGranaghan , Ellie Young , Cameron Powers , Swapnali Yadav , Edlira Vakaj

The challenges faced by science, engineering, and society are increasingly complex, requiring broad, cross-disciplinary teams to contribute to collective knowledge, cooperation, and sensemaking efforts. However, existing approaches to collaboration and knowledge sharing are largely manual, inadequate to meet the needs of teams that are not closely connected through personal ties or which lack the time to respond to dynamic requests for contextual information sharing. Nonetheless, in the current remote-first, complexity-driven, time-constrained workplace, such teams are both more common and more necessary. For example, the NASA Center for HelioAnalytics (CfHA) is a growing and cross-disciplinary community that is dedicated to aiding the application of emerging data science techniques and technologies, including AI/ML, to increase the speed, rigor, and depth of space physics scientific discovery. The members of that community possess innumerable skills and competencies and are involved in hundreds of projects, including proposals, committees, papers, presentations, conferences, groups, and missions. Traditional structures for information and knowledge representation do not permit the community to search and discover activities that are ongoing across the Center, nor to understand where skills and knowledge exist. The approaches that do exist are burdensome and result in inefficient use of resources, reinvention of solutions, and missed important connections. The challenge faced by the CfHA is a common one across modern groups and one that must be solved if we are to respond to the grand challenges that face our society, such as complex scientific phenomena, global pandemics and climate change. We present a solution to the problem: a community knowledge graph (KG) that aids an organization to better understand the resources (people, capabilities, affiliations, assets, content, data, models) available across its membership base, and thus supports a more cohesive community and more capable teams, enables robust and responsible application of new technologies, and provides the foundation for all members of the community to co-evolve the shared information space. We call this the Community Action and Understanding via Semantic Enrichment (CAUSE) ontology. We demonstrate the efficacy of KGs that can be instantiated from the ontology together with data from a given community (shown here for the CfHA). Finally, we discuss the implications, including the importance of the community KG for open science.

科学、工程和社会面临的挑战越来越复杂,需要广泛的、跨学科的团队为集体知识、合作和意义创造做出贡献。然而,现有的协作和知识共享方法在很大程度上是手动的,不足以满足没有通过个人关系紧密联系或缺乏时间响应上下文信息共享动态请求的团队的需求。尽管如此,在当前远程优先、复杂性驱动、时间限制的工作场所中,这样的团队更常见,也更必要。例如,NASA太阳神分析中心(CfHA)是一个不断发展的跨学科社区,致力于帮助新兴数据科学技术和技术的应用,包括人工智能/机器学习,以提高空间物理科学发现的速度、严密性和深度。这个社区的成员拥有无数的技能和能力,并参与了数百个项目,包括提案、委员会、论文、演讲、会议、小组和任务。传统的信息和知识表示结构不允许社区搜索和发现整个中心正在进行的活动,也不允许社区了解技能和知识存在的地方。现有的方法负担沉重,导致资源使用效率低下,解决方案的重新发明,并错过了重要的联系。CfHA面临的挑战是所有现代团体共同面临的挑战,如果我们要应对我们社会面临的重大挑战,如复杂的科学现象、全球流行病和气候变化,就必须解决这个挑战。我们提出了这个问题的解决方案:一个社区知识图(KG),它帮助组织更好地理解其成员群中可用的资源(人员、能力、从属关系、资产、内容、数据、模型),从而支持一个更有凝聚力的社区和更有能力的团队,支持新技术的健壮和负责任的应用,并为社区的所有成员共同发展共享的信息空间提供基础。我们将其称为基于语义丰富的社区行动和理解(CAUSE)本体。我们演示了可以从本体和来自给定社区的数据(此处显示的是CfHA)实例化KGs的有效性。最后,我们讨论了其含义,包括社区KG对开放科学的重要性。
{"title":"The cultural-social nucleus of an open community: A multi-level community knowledge graph and NASA application","authors":"Ryan M. McGranaghan ,&nbsp;Ellie Young ,&nbsp;Cameron Powers ,&nbsp;Swapnali Yadav ,&nbsp;Edlira Vakaj","doi":"10.1016/j.acags.2023.100142","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100142","url":null,"abstract":"<div><p>The challenges faced by science, engineering, and society are increasingly complex, requiring broad, cross-disciplinary teams to contribute to collective knowledge, cooperation, and sensemaking efforts. However, existing approaches to collaboration and knowledge sharing are largely manual, inadequate to meet the needs of teams that are not closely connected through personal ties or which lack the time to respond to dynamic requests for contextual information sharing. Nonetheless, in the current remote-first, complexity-driven, time-constrained workplace, such teams are both more common and more necessary. For example, the NASA Center for HelioAnalytics (CfHA) is a growing and cross-disciplinary community that is dedicated to aiding the application of emerging data science techniques and technologies, including AI/ML, to increase the speed, rigor, and depth of space physics scientific discovery. The members of that community possess innumerable skills and competencies and are involved in hundreds of projects, including proposals, committees, papers, presentations, conferences, groups, and missions. Traditional structures for information and knowledge representation do not permit the community to search and discover activities that are ongoing across the Center, nor to understand where skills and knowledge exist. The approaches that do exist are burdensome and result in inefficient use of resources, reinvention of solutions, and missed important connections. The challenge faced by the CfHA is a common one across modern groups and one that must be solved if we are to respond to the grand challenges that face our society, such as complex scientific phenomena, global pandemics and climate change. We present a solution to the problem: a community knowledge graph (KG) that aids an organization to better understand the resources (people, capabilities, affiliations, assets, content, data, models) available across its membership base, and thus supports a more cohesive community and more capable teams, enables robust and responsible application of new technologies, and provides the foundation for all members of the community to co-evolve the shared information space. We call this the Community Action and Understanding via Semantic Enrichment (CAUSE) ontology. We demonstrate the efficacy of KGs that can be instantiated from the ontology together with data from a given community (shown here for the CfHA). Finally, we discuss the implications, including the importance of the community KG for open science.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100142"},"PeriodicalIF":3.4,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000319/pdfft?md5=4019b0e03e4f84f5bfcd8583a36134a7&pid=1-s2.0-S2590197423000319-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92043917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies 英国地质调查局的岩石分类方案,其作为关联数据的表示,以及与其他一些岩性词汇的比较
IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-17 DOI: 10.1016/j.acags.2023.100140
Tim McCormick, Rachel E. Heaven

Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and Mindat.org. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS.

受控词汇表对于构建FAIR(可查找、可访问、可互操作、可重用)数据至关重要。地球科学中要求最广泛但最复杂的词汇之一是岩石和沉积物类型,或“岩性”。自1999年以来,英国地质调查局在其许多工作流程和产品中使用了自己的岩石分类方案,包括国家数字地质图。这一方案早于其他已发表的方案,并深深植根于BGS的流程中。通过现在将该分类方案作为简单知识组织系统(SKOS)机器可读的非正式本体发布,我们使其可供我们自己和第三方在现代语义应用中使用,并为使用SKOS提供的工具将我们的方案与其他已发布的方案相一致开辟了未来的可能性。其中包括IUGS-CGI简单岩性方案、欧盟委员会INSPIRE岩性代码列表、昆士兰地质调查局岩性方案、美国地质调查局地质图单元岩性分类和Mindat.org。BGS岩性分类最初基于四份叙述性报告,可从BGS网站下载,但后来又添加了。该分类本质上几乎完全是单层次的,包括3454个目前有效的概念,分类深度为11级。它包括火成岩和沉积物、变质岩、沉积物和沉积岩,以及包括人为沉积物在内的浅层沉积物。建立在其上的SKOS非正式本体存储在三元组存储中,并且通过从维护本体的关系数据库中提取来每晚更新三元组。github提供批量下载和版本历史记录。RCS概念本身也用于其他BGS关联数据,即命名岩石单元词典和英国1:625 000比例地质图的关联数据表示。将RCS与其他已公布的岩性方案进行比较,所有方案都大致相似,但显示出的特征揭示了开发这些方案的群体的兴趣和要求,就其整体和组成部分的详细程度而言。应该可以将RCS与其他分类保持一致,未来的工作将集中在自动化机制上,并可能为RCS构建一个正式的本体。
{"title":"The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies","authors":"Tim McCormick,&nbsp;Rachel E. Heaven","doi":"10.1016/j.acags.2023.100140","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100140","url":null,"abstract":"<div><p>Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and <span>Mindat.org</span><svg><path></path></svg>. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100140"},"PeriodicalIF":3.4,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49758675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Computing and Geosciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1