Pub Date : 2024-01-05DOI: 10.1016/j.acags.2024.100154
Jiyin Zhang , Xiang Que , Bhuwan Madhikarmi , Robert M. Hazen , Jolyon Ralph , Anirudh Prabhu , Shaunna M. Morrison , Xiaogang Ma
This paper presents an enhanced 3D heat map for exploratory data analysis (EDA) of open mineral data, addressing the challenges caused by rapidly evolving datasets and ensuring scientifically meaningful data exploration. The Mindat website, a crowd-sourced database of mineral species, provides a constantly updated open data source via its newly established application programming interface (API). To illustrate the potential usage of the API, we constructed an automatic workflow to retrieve and cleanse mineral data from it, thus feeding the 3D heat map with up-to-date records of mineral species. In the 3D heat map, we developed scientifically sound operations for data selection and visualization by incorporating knowledge from existing mineral classification systems and recent studies in mineralogy. The resulting 3D heat map has been shared as an online demo system, with the source code made open on GitHub. We hope this updated 3D heat map system will serve as a valuable resource for researchers, educators, and students in geosciences, demonstrating the potential for data-intensive research in mineralogy and broader geoscience disciplines.
本文介绍了一种用于开放矿物数据探索性数据分析(EDA)的增强型三维热图,以应对快速发展的数据集带来的挑战,并确保进行有科学意义的数据探索。Mindat 网站是一个矿物种类的众包数据库,通过其新建立的应用编程接口(API)提供了一个不断更新的开放数据源。为了说明 API 的潜在用途,我们构建了一个自动工作流程,从中检索和清理矿物数据,从而为三维热图提供最新的矿物种类记录。在三维热图中,我们结合现有矿物分类系统和矿物学最新研究的知识,开发了科学合理的数据选择和可视化操作。生成的三维热图已作为在线演示系统与大家分享,源代码已在 GitHub 上公开。我们希望这个更新的三维热图系统能成为地球科学研究人员、教育工作者和学生的宝贵资源,展示矿物学和更广泛的地球科学学科中数据密集型研究的潜力。
{"title":"Using a 3D heat map to explore the diverse correlations among elements and mineral species","authors":"Jiyin Zhang , Xiang Que , Bhuwan Madhikarmi , Robert M. Hazen , Jolyon Ralph , Anirudh Prabhu , Shaunna M. Morrison , Xiaogang Ma","doi":"10.1016/j.acags.2024.100154","DOIUrl":"https://doi.org/10.1016/j.acags.2024.100154","url":null,"abstract":"<div><p>This paper presents an enhanced 3D heat map for exploratory data analysis (EDA) of open mineral data, addressing the challenges caused by rapidly evolving datasets and ensuring scientifically meaningful data exploration. The Mindat website, a crowd-sourced database of mineral species, provides a constantly updated open data source via its newly established application programming interface (API). To illustrate the potential usage of the API, we constructed an automatic workflow to retrieve and cleanse mineral data from it, thus feeding the 3D heat map with up-to-date records of mineral species. In the 3D heat map, we developed scientifically sound operations for data selection and visualization by incorporating knowledge from existing mineral classification systems and recent studies in mineralogy. The resulting 3D heat map has been shared as an online demo system, with the source code made open on GitHub. We hope this updated 3D heat map system will serve as a valuable resource for researchers, educators, and students in geosciences, demonstrating the potential for data-intensive research in mineralogy and broader geoscience disciplines.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100154"},"PeriodicalIF":3.4,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000016/pdfft?md5=0b52703561a3bfd2d7bf0ed0e4d6590e&pid=1-s2.0-S2590197424000016-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139111608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-04DOI: 10.1016/j.acags.2023.100153
Suraj Neelakantan , Jesper Norell , Alexander Hansson , Martin Längkvist , Amy Loutfi
We explore an attenuation and shape-based identification of euhedral pyrites in high-resolution X-ray Computed Tomography (XCT) data using deep neural networks. To deal with the scarcity of annotated data we generate a complementary training set of synthetic images. To investigate and address the domain gap between the synthetic and XCT data, several deep learning models, with and without domain adaption, are trained and compared. We find that a model trained on a small set of human annotations, while displaying over-fitting, can rival the human annotators. The unsupervised domain adaptation approaches are successful in bridging the domain gap, which significantly improves their performance. A domain-adapted model, trained on a dataset that fuses synthetic and real data, is the overall best-performing model. This highlights the possibility of using synthetic datasets for the application of deep learning in mineralogy.
我们利用深度神经网络探索了一种基于衰减和形状的高分辨率 X 射线计算机断层扫描(XCT)数据中八面体黄铁矿的识别方法。为了解决注释数据稀缺的问题,我们生成了一个合成图像补充训练集。为了研究和解决合成数据与 XCT 数据之间的领域差距,我们训练了几个深度学习模型,并对其进行了领域自适应和非领域自适应的比较。我们发现,在一小部分人类注释集上训练的模型虽然表现出过拟合,但可以与人类注释者相媲美。无监督领域适应方法成功地弥合了领域差距,显著提高了性能。在融合了合成数据和真实数据的数据集上训练的领域适应模型是整体表现最佳的模型。这凸显了将合成数据集用于矿物学深度学习的可能性。
{"title":"Neural network approach for shape-based euhedral pyrite identification in X-ray CT data with adversarial unsupervised domain adaptation","authors":"Suraj Neelakantan , Jesper Norell , Alexander Hansson , Martin Längkvist , Amy Loutfi","doi":"10.1016/j.acags.2023.100153","DOIUrl":"10.1016/j.acags.2023.100153","url":null,"abstract":"<div><p>We explore an attenuation and shape-based identification of euhedral pyrites in high-resolution X-ray Computed Tomography (XCT) data using deep neural networks. To deal with the scarcity of annotated data we generate a complementary training set of synthetic images. To investigate and address the domain gap between the synthetic and XCT data, several deep learning models, with and without domain adaption, are trained and compared. We find that a model trained on a small set of human annotations, while displaying over-fitting, can rival the human annotators. The unsupervised domain adaptation approaches are successful in bridging the domain gap, which significantly improves their performance. A domain-adapted model, trained on a dataset that fuses synthetic and real data, is the overall best-performing model. This highlights the possibility of using synthetic datasets for the application of deep learning in mineralogy.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100153"},"PeriodicalIF":3.4,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000423/pdfft?md5=b48cfaa3e867a2a2e72a1453cf13f16e&pid=1-s2.0-S2590197423000423-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139393213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1016/j.acags.2023.100149
Eric Grunsky , Michael Greenacre , Bruce Kjarsgaard
Geochemical data are compositional in nature and are subject to the problems typically associated with data that are restricted to the real non-negative number space with constant-sum constraint, that is, the simplex. Geochemistry can be considered a proxy for mineralogy, comprised of atomically ordered structures that define the placement and abundance of elements in the mineral lattice structure. Based on the innovative contributions of John Aitchison, who introduced the logratio transformation into compositional data analysis, this contribution provides a systematic workflow for assessing geochemical data in a simple and efficient way, such that significant geochemical (mineralogical) processes can be recognized and validated. This workflow, called GeoCoDA and presented here in the form of a tutorial, enables the recognition of processes from which models can be constructed based on the associations of elements that reflect mineralogy. Both the original compositional values and their transformation to logratios are considered. These models can reflect rock-forming processes, metamorphism, alteration and ore mineralization. Moreover, machine learning methods, both unsupervised and supervised, applied to an optimized set of subcompositions of the data, provide a systematic, accurate, efficient and defensible approach to geochemical data analysis. The workflow is illustrated on lithogeochemical data from exploration of the Star kimberlite, consisting of a series of eruptions with five recognized phases.
{"title":"GeoCoDA: Recognizing and validating structural processes in geochemical data. A workflow on compositional data analysis in lithogeochemistry","authors":"Eric Grunsky , Michael Greenacre , Bruce Kjarsgaard","doi":"10.1016/j.acags.2023.100149","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100149","url":null,"abstract":"<div><p>Geochemical data are compositional in nature and are subject to the problems typically associated with data that are restricted to the real non-negative number space with constant-sum constraint, that is, the simplex. Geochemistry can be considered a proxy for mineralogy, comprised of atomically ordered structures that define the placement and abundance of elements in the mineral lattice structure. Based on the innovative contributions of John Aitchison, who introduced the logratio transformation into compositional data analysis, this contribution provides a systematic workflow for assessing geochemical data in a simple and efficient way, such that significant geochemical (mineralogical) processes can be recognized and validated. This workflow, called GeoCoDA and presented here in the form of a tutorial, enables the recognition of processes from which models can be constructed based on the associations of elements that reflect mineralogy. Both the original compositional values and their transformation to logratios are considered. These models can reflect rock-forming processes, metamorphism, alteration and ore mineralization. Moreover, machine learning methods, both unsupervised and supervised, applied to an optimized set of subcompositions of the data, provide a systematic, accurate, efficient and defensible approach to geochemical data analysis. The workflow is illustrated on lithogeochemical data from exploration of the Star kimberlite, consisting of a series of eruptions with five recognized phases.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"22 ","pages":"Article 100149"},"PeriodicalIF":3.4,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000381/pdfft?md5=73c63e3085ea08dc140737cfd1aa2255&pid=1-s2.0-S2590197423000381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140113714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.1016/j.acags.2023.100150
Ali Ghaznavi , Mohammadmehdi Saberioon , Jakub Brom , Sibylle Itzerott
Inland water bodies play a vital role at all scales in the terrestrial water balance and Earth’s climate variability. Thus, an inventory of inland waters is crucially important for hydrologic and ecological studies and management. Therefore, the main aim of this study was to develop a deep learning-based method for inventorying and mapping inland water bodies using the RGB band of high-resolution satellite imagery automatically and accurately.
The Sentinel-2 Harmonized dataset, together with ZABAGED-validated ground truth, was used as the main dataset for the model training step. Three different deep learning algorithms based on U-Net architecture were employed to segment inland waters, including a simple U-Net, Residual Attention U-Net, and VGG16-U-Net. All three algorithms were trained using a combination of Sentinel-2 visible bands (Red [B04; 665nm], Green [B03; 560nm], and Blue [B02; 490 nm]) at a 10-meter spatial resolution.
The Residual Attention U-Net achieved the highest computational cost due to the increased number of trainable parameters. The VGG16-U-Net had the shortest run time and the lowest number of trainable parameters, attributed to its architecture compared to the simple and Residual Attention U-Net architectures, respectively. As a result, the VGG16-U-Net provided the best segmentation results with a mean-IoU score of 0.9850, a slight improvement compared to other proposed U-Net-based architectures.
Although the accuracy of the model based on VGG16-U-Net does not make a difference from Residual Attention U-Net, the computation costs for training VGG16-U-Net were dramatically lower than Residual Attention U-Net.
{"title":"Comparative performance analysis of simple U-Net, residual attention U-Net, and VGG16-U-Net for inventory inland water bodies","authors":"Ali Ghaznavi , Mohammadmehdi Saberioon , Jakub Brom , Sibylle Itzerott","doi":"10.1016/j.acags.2023.100150","DOIUrl":"10.1016/j.acags.2023.100150","url":null,"abstract":"<div><p>Inland water bodies play a vital role at all scales in the terrestrial water balance and Earth’s climate variability. Thus, an inventory of inland waters is crucially important for hydrologic and ecological studies and management. Therefore, the main aim of this study was to develop a deep learning-based method for inventorying and mapping inland water bodies using the RGB band of high-resolution satellite imagery automatically and accurately.</p><p>The Sentinel-2 Harmonized dataset, together with ZABAGED-validated ground truth, was used as the main dataset for the model training step. Three different deep learning algorithms based on U-Net architecture were employed to segment inland waters, including a simple U-Net, Residual Attention U-Net, and VGG16-U-Net. All three algorithms were trained using a combination of Sentinel-2 visible bands (Red [B04; 665nm], Green [B03; 560nm], and Blue [B02; 490 nm]) at a 10-meter spatial resolution.</p><p>The Residual Attention U-Net achieved the highest computational cost due to the increased number of trainable parameters. The VGG16-U-Net had the shortest run time and the lowest number of trainable parameters, attributed to its architecture compared to the simple and Residual Attention U-Net architectures, respectively. As a result, the VGG16-U-Net provided the best segmentation results with a mean-IoU score of 0.9850, a slight improvement compared to other proposed U-Net-based architectures.</p><p>Although the accuracy of the model based on VGG16-U-Net does not make a difference from Residual Attention U-Net, the computation costs for training VGG16-U-Net were dramatically lower than Residual Attention U-Net.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"21 ","pages":"Article 100150"},"PeriodicalIF":3.4,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000393/pdfft?md5=e26e50e9fd7c6d7b45541d9f356c212b&pid=1-s2.0-S2590197423000393-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139015408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.acags.2023.100141
Kim-Ndor Djimadoumngar
{"title":"Corrigendum to ‘Parallel investigations of remote sensing and ground-truth lake Chad's level data using statistical and machine learning methods’ [Appl. Comput. Geosci. 20 (2023) 100135]","authors":"Kim-Ndor Djimadoumngar","doi":"10.1016/j.acags.2023.100141","DOIUrl":"10.1016/j.acags.2023.100141","url":null,"abstract":"","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100141"},"PeriodicalIF":3.4,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000307/pdfft?md5=e6efd8c63afb83e52ab8e0a17a1bf13b&pid=1-s2.0-S2590197423000307-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136127382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.acags.2023.100145
Zulfaqar Sa’adi , Zulkifli Yusop , Nor Eliza Alias , Ming Fai Chow , Mohd Khairul Idlan Muhammad , Muhammad Wafiy Adli Ramli , Zafar Iqbal , Mohammed Sanusi Shiru , Faizal Immaddudin Wira Rohmat , Nur Athirah Mohamad , Mohamad Faizal Ahmad
Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.
{"title":"Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia","authors":"Zulfaqar Sa’adi , Zulkifli Yusop , Nor Eliza Alias , Ming Fai Chow , Mohd Khairul Idlan Muhammad , Muhammad Wafiy Adli Ramli , Zafar Iqbal , Mohammed Sanusi Shiru , Faizal Immaddudin Wira Rohmat , Nur Athirah Mohamad , Mohamad Faizal Ahmad","doi":"10.1016/j.acags.2023.100145","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100145","url":null,"abstract":"<div><p>Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (<em>norm.predict</em>) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of <em>mean</em>, <em>rf</em>, and <em>cart</em> also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100145"},"PeriodicalIF":3.4,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000344/pdfft?md5=807ccb11378bbc7aafaff142104149e9&pid=1-s2.0-S2590197423000344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138558749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-15DOI: 10.1016/j.acags.2023.100144
Lars H. Ystroem, Mark Vollmer, Thomas Kohl, Fabian Nitschke
Solute artificial neural network geothermometers offer the possibility to overcome the complexity given by the solute-mineral composition. Herein, we present a new concept, trained from high-quality hydrochemical data and verified by in-situ temperature measurements with a total of 208 data pairs of geochemical input parameters (Na+, K+, Ca2+, Mg2+, Cl−, SiO2, and pH) and reservoir temperature measurements being compiled. The data comprises nine geothermal sites with a broad variety of geochemical characteristics and enthalpies. Five sites with 163 samples (Upper Rhine Graben, Pannonian Basin, German Molasse Basin, Paris Basin, and Iceland) are used to develop the ANN geothermometer, while further four sites with 45 samples (Azores, El Tatio, Miavalles, and Rotorua) are used to encounter the established artificial neural network in practice to unknown data. The setup of the application, as well as the optimisation of the network architecture and its hyperparameters, are stepwise introduced. As a result, the solute ANN geothermometer, AnnRG (Artificial neural network Regression Geothermometer), provides precise reservoir temperature predictions (RMSE of 10.442 K) with a high prediction accuracy of R2 = 0.978. In conclusion, the implementation and verification of the first adequate ANN geothermometer is an advancement in solute geothermometry. Our approach is also a basis for further broadening and refining applications in geochemistry.
{"title":"AnnRG - An artificial neural network solute geothermometer","authors":"Lars H. Ystroem, Mark Vollmer, Thomas Kohl, Fabian Nitschke","doi":"10.1016/j.acags.2023.100144","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100144","url":null,"abstract":"<div><p>Solute artificial neural network geothermometers offer the possibility to overcome the complexity given by the solute-mineral composition. Herein, we present a new concept, trained from high-quality hydrochemical data and verified by <em>in-situ</em> temperature measurements with a total of 208 data pairs of geochemical input parameters (Na<sup>+</sup>, K<sup>+</sup>, Ca<sup>2+</sup>, Mg<sup>2+</sup>, Cl<sup>−</sup>, SiO<sub>2</sub>, and pH) and reservoir temperature measurements being compiled. The data comprises nine geothermal sites with a broad variety of geochemical characteristics and enthalpies. Five sites with 163 samples (Upper Rhine Graben, Pannonian Basin, German Molasse Basin, Paris Basin, and Iceland) are used to develop the ANN geothermometer, while further four sites with 45 samples (Azores, El Tatio, Miavalles, and Rotorua) are used to encounter the established artificial neural network in practice to unknown data. The setup of the application, as well as the optimisation of the network architecture and its hyperparameters, are stepwise introduced. As a result, the solute ANN geothermometer, AnnRG (Artificial neural network Regression Geothermometer), provides precise reservoir temperature predictions (RMSE of 10.442 K) with a high prediction accuracy of R<sup>2</sup> = 0.978. In conclusion, the implementation and verification of the first adequate ANN geothermometer is an advancement in solute geothermometry. Our approach is also a basis for further broadening and refining applications in geochemistry.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100144"},"PeriodicalIF":3.4,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000332/pdfft?md5=44b6e2e297c5c6c3291a38dab912498a&pid=1-s2.0-S2590197423000332-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136696934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.1016/j.acags.2023.100143
Ramin Soltanmohammadi, Salah A. Faroughi
High-resolution digital rock micro-CT images captured from a wide field of view are essential for various geosystem engineering and geoscience applications. However, the resolution of these images is often constrained by the capabilities of scanners. To overcome this limitation and achieve superior image quality, advanced deep learning techniques have been used. This study compares four different super-resolution techniques, including super-resolution convolutional neural network (SRCNN), efficient sub-pixel convolutional neural networks (ESPCN), enhanced deep residual neural networks (EDRN), and super-resolution generative adversarial networks (SRGAN) to enhance the resolution of micro-CT images obtained from heterogeneous porous media. Our investigation employs a dataset consisting of 5000 micro-CT images acquired from a highly heterogeneous carbonate rock. The performance of each algorithm is evaluated based on its accuracy to reconstruct the pore geometry and connectivity, grain-pore edge sharpness, and preservation of petrophysical properties, such as porosity. Our findings indicate that EDRN outperforms other techniques in terms of the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, increased by nearly 4 dB and 17%, respectively, compared to bicubic interpolation. Furthermore, SRGAN exhibits superior performance compared to other techniques in terms of the learned perceptual image patch similarity (LPIPS) index and porosity preservation error. SRGAN shows a nearly 30% reduction in LPIPS compared to bicubic interpolation. Our results provide deeper insights into the practical applications of these techniques in the domain of porous media characterizations, facilitating the selection of optimal super-resolution CNN-based methodologies.
{"title":"A comparative analysis of super-resolution techniques for enhancing micro-CT images of carbonate rocks","authors":"Ramin Soltanmohammadi, Salah A. Faroughi","doi":"10.1016/j.acags.2023.100143","DOIUrl":"10.1016/j.acags.2023.100143","url":null,"abstract":"<div><p>High-resolution digital rock micro-CT images captured from a wide field of view are essential for various geosystem engineering and geoscience applications. However, the resolution of these images is often constrained by the capabilities of scanners. To overcome this limitation and achieve superior image quality, advanced deep learning techniques have been used. This study compares four different super-resolution techniques, including super-resolution convolutional neural network (SRCNN), efficient sub-pixel convolutional neural networks (ESPCN), enhanced deep residual neural networks (EDRN), and super-resolution generative adversarial networks (SRGAN) to enhance the resolution of micro-CT images obtained from heterogeneous porous media. Our investigation employs a dataset consisting of 5000 micro-CT images acquired from a highly heterogeneous carbonate rock. The performance of each algorithm is evaluated based on its accuracy to reconstruct the pore geometry and connectivity, grain-pore edge sharpness, and preservation of petrophysical properties, such as porosity. Our findings indicate that EDRN outperforms other techniques in terms of the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, increased by nearly 4 dB and 17%, respectively, compared to bicubic interpolation. Furthermore, SRGAN exhibits superior performance compared to other techniques in terms of the learned perceptual image patch similarity (LPIPS) index and porosity preservation error. SRGAN shows a nearly 30% reduction in LPIPS compared to bicubic interpolation. Our results provide deeper insights into the practical applications of these techniques in the domain of porous media characterizations, facilitating the selection of optimal super-resolution CNN-based methodologies.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100143"},"PeriodicalIF":3.4,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000320/pdfft?md5=ccbbe7617370fecf380cd2b36778bb1c&pid=1-s2.0-S2590197423000320-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1016/j.acags.2023.100142
Ryan M. McGranaghan , Ellie Young , Cameron Powers , Swapnali Yadav , Edlira Vakaj
The challenges faced by science, engineering, and society are increasingly complex, requiring broad, cross-disciplinary teams to contribute to collective knowledge, cooperation, and sensemaking efforts. However, existing approaches to collaboration and knowledge sharing are largely manual, inadequate to meet the needs of teams that are not closely connected through personal ties or which lack the time to respond to dynamic requests for contextual information sharing. Nonetheless, in the current remote-first, complexity-driven, time-constrained workplace, such teams are both more common and more necessary. For example, the NASA Center for HelioAnalytics (CfHA) is a growing and cross-disciplinary community that is dedicated to aiding the application of emerging data science techniques and technologies, including AI/ML, to increase the speed, rigor, and depth of space physics scientific discovery. The members of that community possess innumerable skills and competencies and are involved in hundreds of projects, including proposals, committees, papers, presentations, conferences, groups, and missions. Traditional structures for information and knowledge representation do not permit the community to search and discover activities that are ongoing across the Center, nor to understand where skills and knowledge exist. The approaches that do exist are burdensome and result in inefficient use of resources, reinvention of solutions, and missed important connections. The challenge faced by the CfHA is a common one across modern groups and one that must be solved if we are to respond to the grand challenges that face our society, such as complex scientific phenomena, global pandemics and climate change. We present a solution to the problem: a community knowledge graph (KG) that aids an organization to better understand the resources (people, capabilities, affiliations, assets, content, data, models) available across its membership base, and thus supports a more cohesive community and more capable teams, enables robust and responsible application of new technologies, and provides the foundation for all members of the community to co-evolve the shared information space. We call this the Community Action and Understanding via Semantic Enrichment (CAUSE) ontology. We demonstrate the efficacy of KGs that can be instantiated from the ontology together with data from a given community (shown here for the CfHA). Finally, we discuss the implications, including the importance of the community KG for open science.
{"title":"The cultural-social nucleus of an open community: A multi-level community knowledge graph and NASA application","authors":"Ryan M. McGranaghan , Ellie Young , Cameron Powers , Swapnali Yadav , Edlira Vakaj","doi":"10.1016/j.acags.2023.100142","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100142","url":null,"abstract":"<div><p>The challenges faced by science, engineering, and society are increasingly complex, requiring broad, cross-disciplinary teams to contribute to collective knowledge, cooperation, and sensemaking efforts. However, existing approaches to collaboration and knowledge sharing are largely manual, inadequate to meet the needs of teams that are not closely connected through personal ties or which lack the time to respond to dynamic requests for contextual information sharing. Nonetheless, in the current remote-first, complexity-driven, time-constrained workplace, such teams are both more common and more necessary. For example, the NASA Center for HelioAnalytics (CfHA) is a growing and cross-disciplinary community that is dedicated to aiding the application of emerging data science techniques and technologies, including AI/ML, to increase the speed, rigor, and depth of space physics scientific discovery. The members of that community possess innumerable skills and competencies and are involved in hundreds of projects, including proposals, committees, papers, presentations, conferences, groups, and missions. Traditional structures for information and knowledge representation do not permit the community to search and discover activities that are ongoing across the Center, nor to understand where skills and knowledge exist. The approaches that do exist are burdensome and result in inefficient use of resources, reinvention of solutions, and missed important connections. The challenge faced by the CfHA is a common one across modern groups and one that must be solved if we are to respond to the grand challenges that face our society, such as complex scientific phenomena, global pandemics and climate change. We present a solution to the problem: a community knowledge graph (KG) that aids an organization to better understand the resources (people, capabilities, affiliations, assets, content, data, models) available across its membership base, and thus supports a more cohesive community and more capable teams, enables robust and responsible application of new technologies, and provides the foundation for all members of the community to co-evolve the shared information space. We call this the Community Action and Understanding via Semantic Enrichment (CAUSE) ontology. We demonstrate the efficacy of KGs that can be instantiated from the ontology together with data from a given community (shown here for the CfHA). Finally, we discuss the implications, including the importance of the community KG for open science.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100142"},"PeriodicalIF":3.4,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000319/pdfft?md5=4019b0e03e4f84f5bfcd8583a36134a7&pid=1-s2.0-S2590197423000319-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92043917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17DOI: 10.1016/j.acags.2023.100140
Tim McCormick, Rachel E. Heaven
Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and Mindat.org. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS.
{"title":"The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies","authors":"Tim McCormick, Rachel E. Heaven","doi":"10.1016/j.acags.2023.100140","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100140","url":null,"abstract":"<div><p>Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and <span>Mindat.org</span><svg><path></path></svg>. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100140"},"PeriodicalIF":3.4,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49758675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}