Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels
{"title":"通过重复地球化学分析估算岩石成分:理论及在 GeoPT 数据库岩浆岩中的应用","authors":"Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels","doi":"10.1007/s11004-024-10138-5","DOIUrl":null,"url":null,"abstract":"<p>Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database\",\"authors\":\"Maxime Keutgen De Greef, Gert Jan Weltje, Irène Gijbels\",\"doi\":\"10.1007/s11004-024-10138-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s11004-024-10138-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s11004-024-10138-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database
Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock’s true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm’s output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory’s logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)].