Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels
{"title":"Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database","authors":"Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels","doi":"10.1007/s11004-024-10138-5","DOIUrl":null,"url":null,"abstract":"<p>Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].</p>","PeriodicalId":51117,"journal":{"name":"Mathematical Geosciences","volume":"56 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Geosciences","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s11004-024-10138-5","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].
期刊介绍:
Mathematical Geosciences (formerly Mathematical Geology) publishes original, high-quality, interdisciplinary papers in geomathematics focusing on quantitative methods and studies of the Earth, its natural resources and the environment. This international publication is the official journal of the IAMG. Mathematical Geosciences is an essential reference for researchers and practitioners of geomathematics who develop and apply quantitative models to earth science and geo-engineering problems.