Mingxi Zhang , Zefang Shen , Lewis Walden , Farid Sepanta , Zhongkui Luo , Lei Gao , Oscar Serrano , Raphael A. Viscarra Rossel
{"title":"Deep learning of the particulate and mineral-associated organic carbon fractions using a compositional transform and mid-infrared spectroscopy","authors":"Mingxi Zhang , Zefang Shen , Lewis Walden , Farid Sepanta , Zhongkui Luo , Lei Gao , Oscar Serrano , Raphael A. Viscarra Rossel","doi":"10.1016/j.geoderma.2025.117207","DOIUrl":null,"url":null,"abstract":"<div><div>We need soil organic carbon (SOC) and the SOC fractions, the particulate and mineral-associated organic carbon (POC, MAOC), to understand SOC dynamics. They have implications for soil management, carbon sequestration and climate change mitigation. However, conventional laboratory measurements of the SOC fractions, which involve physical or chemical separations, are elaborate, time-consuming and expensive. Mid-infrared (MIR) spectroscopy combined with multivariate modelling can alleviate these limitations because the method can estimate SOC and its fractions rapidly, cost-effectively and accurately. Previous spectroscopic modelling has mostly ignored the compositional nature of the SOC fractions (i.e. SOC = <span><math><mo>∑</mo></math></span>fractions), causing discrepancies in the estimation such that the sum of the fractions does not equal the total SOC. We recorded the MIR spectra (4000–450 cm<sup>−1</sup>) of 397 soil samples from across Australia and then performed a granulometric fractionation to derive three SOC fractions, the POC in the macroaggregates (250–<span><math><mrow><mn>2000</mn><mspace></mspace><mi>μ</mi><mi>m</mi></mrow></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span>), POC in the micro-aggregates (50–<span><math><mrow><mn>250</mn><mspace></mspace><mi>μ</mi><mi>m</mi></mrow></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>), and MAOC (<span><math><mrow><mo><</mo><mn>50</mn><mspace></mspace><mi>μ</mi><mi>m</mi></mrow></math></span>). We used the centred log ratio (CLR) method to transform the data compositionally and then modelled POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>, POC (POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span> + POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>), and MAOC with the spectra, using convolutional neural networks (CNN) and <span>cubist</span> for benchmarking. We interpreted the models using the SHapley Additive exPlanations (SHAP) values and a land use classification of the data. Modelling the CLR-transformed SOC fractions with CNN maintained the composition of the fractions and improved the accuracy of the estimates (Lin’s concordance correlation coefficient (<span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>c</mi></mrow></msub></math></span>) of 0.58, 0.86, and 0.94 for the POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>, and MAOC), compared to CLR with <span>cubist</span> (<span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>c</mi></mrow></msub></math></span> of 0.49, 0.84, and 0.87 for the POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>, and MAOC) and <span>cubist</span> with no compositional transformation (<span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>c</mi></mrow></msub></math></span> of 0.53, 0.85, and 0.88 for the POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>a</mi><mi>c</mi></mrow></msub></math></span>, POC<span><math><msub><mrow></mrow><mrow><mi>m</mi><mi>i</mi><mi>c</mi></mrow></msub></math></span>, and MAOC). The SHAP values reflected the compositional modelling and identified important organic and inorganic functional groups that differed by fraction and land use. Our approach can complement conventional physical SOC fractionations and improve the cost-effectiveness of the measurements, especially when there are many samples to measure, thus enhancing our understanding of SOC dynamics.</div></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":"455 ","pages":"Article 117207"},"PeriodicalIF":5.6000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001670612500045X","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We need soil organic carbon (SOC) and the SOC fractions, the particulate and mineral-associated organic carbon (POC, MAOC), to understand SOC dynamics. They have implications for soil management, carbon sequestration and climate change mitigation. However, conventional laboratory measurements of the SOC fractions, which involve physical or chemical separations, are elaborate, time-consuming and expensive. Mid-infrared (MIR) spectroscopy combined with multivariate modelling can alleviate these limitations because the method can estimate SOC and its fractions rapidly, cost-effectively and accurately. Previous spectroscopic modelling has mostly ignored the compositional nature of the SOC fractions (i.e. SOC = fractions), causing discrepancies in the estimation such that the sum of the fractions does not equal the total SOC. We recorded the MIR spectra (4000–450 cm−1) of 397 soil samples from across Australia and then performed a granulometric fractionation to derive three SOC fractions, the POC in the macroaggregates (250–, POC), POC in the micro-aggregates (50–, POC), and MAOC (). We used the centred log ratio (CLR) method to transform the data compositionally and then modelled POC, POC, POC (POC + POC), and MAOC with the spectra, using convolutional neural networks (CNN) and cubist for benchmarking. We interpreted the models using the SHapley Additive exPlanations (SHAP) values and a land use classification of the data. Modelling the CLR-transformed SOC fractions with CNN maintained the composition of the fractions and improved the accuracy of the estimates (Lin’s concordance correlation coefficient () of 0.58, 0.86, and 0.94 for the POC, POC, and MAOC), compared to CLR with cubist ( of 0.49, 0.84, and 0.87 for the POC, POC, and MAOC) and cubist with no compositional transformation ( of 0.53, 0.85, and 0.88 for the POC, POC, and MAOC). The SHAP values reflected the compositional modelling and identified important organic and inorganic functional groups that differed by fraction and land use. Our approach can complement conventional physical SOC fractionations and improve the cost-effectiveness of the measurements, especially when there are many samples to measure, thus enhancing our understanding of SOC dynamics.
期刊介绍:
Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.