Causal hybrid modeling with double machine learning—applications in carbon flux modeling

IF 6.3 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Science and Technology Pub Date : 2024-07-18 DOI:10.1088/2632-2153/ad5a60

Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls

{"title":"Causal hybrid modeling with double machine learning—applications in carbon flux modeling","authors":"Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls","doi":"10.1088/2632-2153/ad5a60","DOIUrl":null,"url":null,"abstract":"Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad5a60","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

双机器学习的因果混合建模--在碳通量建模中的应用

混合建模将机器学习与科学知识相结合，以增强可解释性、概括性和对自然规律的遵循。然而，等价性和正则化偏差给混合建模实现这些目的带来了挑战。本文介绍了一种通过因果推理框架来估计混合模型的新方法，特别是采用双重机器学习（DML）来估计因果效应。我们在两个与二氧化碳通量有关的问题上展示了这种方法在地球科学中的应用。在 Q10 模型中，我们证明了基于 DML 的混合建模在估计因果参数方面优于端到端深度神经网络方法，证明了其效率、对正则化方法产生的偏差的鲁棒性以及规避等效性。我们的方法适用于碳通量分区，在适应异质因果效应方面表现出灵活性。该研究强调了明确定义因果图和因果关系的必要性，并倡导将此作为一般最佳实践。我们鼓励在混合模型中继续探索因果关系，以便在知识引导的机器学习中获得更可解释、更可信的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.

期刊最新文献

Transfer learning with generative models for object detection on limited datasets Trainability issues in quantum policy gradients Molecular relaxation by reverse diffusion with time step prediction Towards robust data-driven automated recovery of symbolic conservation laws from limited data Coincident learning for unsupervised anomaly detection of scientific instruments