通过重复地球化学分析估算岩石成分：理论及在 GeoPT 数据库岩浆岩中的应用

IF 3.6 3区地球科学 Q2 GEOSCIENCES, MULTIDISCIPLINARY Mathematical Geosciences Pub Date : 2024-04-08 DOI:10.1007/s11004-024-10138-5

Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels

{"title":"通过重复地球化学分析估算岩石成分：理论及在 GeoPT 数据库岩浆岩中的应用","authors":"Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels","doi":"10.1007/s11004-024-10138-5","DOIUrl":null,"url":null,"abstract":"<p>Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].</p>","PeriodicalId":51117,"journal":{"name":"Mathematical Geosciences","volume":"56 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database\",\"authors\":\"Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels\",\"doi\":\"10.1007/s11004-024-10138-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].</p>\",\"PeriodicalId\":51117,\"journal\":{\"name\":\"Mathematical Geosciences\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical Geosciences\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s11004-024-10138-5\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Geosciences","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s11004-024-10138-5","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

不同实验室对粉末状岩石进行化学分析的结果往往各不相同，这就需要对岩石的真实成分和相关不确定性进行估算。地球化学数据的特殊性带来了挑战。传统上，主要元素和痕量元素的测量方法各不相同，导致化学分析结果的各部分之和在 1 上下波动，而不是精确地合计为 1。此外，所有化学分析都包含一个未披露的质量分数，代表未检测到的化学元素。由于这种未披露和未知的质量分数，地球化学数据代表了一种特殊的成分数据，在这种数据中，无法保证闭合为 1。我们认为，化学分析存在于超立方体中，而取样则来自于简单方体中的真实成分。因此，我们提出了一种算法，通过模拟地球化学中的数据采集协议来生成随机化学分析。利用该算法的输出，我们测量了真实平均成分的各种估计值的偏差和均方误差 (MSE)。此外，我们还探讨了缺失值对估计器性能的影响。我们的研究结果表明，经过优化的二元对数比率平均值作为一种新的估计器，显示出最低的 MSE 和偏差。与算术平均数或几何平均数等其他经典估计器相比，即使缺失值高达 70%，它也能表现出色。GeoPT 数据库包含来自众多地球化学实验室的火成岩重复分析结果，将我们的方法应用到该数据库中，我们引入了一种离群点检测技术，该技术基于实验室对数坐标与优化平均估计值之间的马哈拉诺比距离。这样就可以根据实验室的非典型表现对其进行概率排序。最后，我们通过与本文链接的 GitHub 存储库提供了我们研究成果的 R 实现[主题分类号：10（构成）85（统计）]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database

Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mathematical Geosciences 地学-地球科学综合

CiteScore

5.30

自引率

15.40%

发文量

审稿时长

>12 weeks

期刊介绍： Mathematical Geosciences (formerly Mathematical Geology) publishes original, high-quality, interdisciplinary papers in geomathematics focusing on quantitative methods and studies of the Earth, its natural resources and the environment. This international publication is the official journal of the IAMG. Mathematical Geosciences is an essential reference for researchers and practitioners of geomathematics who develop and apply quantitative models to earth science and geo-engineering problems.