A data-driven method for automated data superposition with applications in soft matter science

IF 2.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE DataCentric Engineering Pub Date : 2022-04-20 DOI:10.1017/dce.2023.3

Kyle R. Lennon, G. McKinley, J. Swan

{"title":"A data-driven method for automated data superposition with applications in soft matter science","authors":"Kyle R. Lennon, G. McKinley, J. Swan","doi":"10.1017/dce.2023.3","DOIUrl":null,"url":null,"abstract":"Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DataCentric Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dce.2023.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据驱动的自动数据叠加方法及其在软物质科学中的应用

具有内部参数自相似性的数据集的叠加是物理科学中许多类型的实验数据分析的长期和广泛的技术。通常，这种叠加是手动执行的，或者最近通过应用几种自动化算法中的一种来执行。然而，这些方法在本质上往往是启发式的，容易因手动数据转移或参数化而导致用户偏差，并且缺乏处理数据和叠加数据的结果模型中的不确定性的原生框架。在这项工作中，我们开发了一种数据驱动的非参数方法，用于将实验数据与任意坐标变换叠加在一起，该方法使用高斯过程回归来学习描述数据的统计模型，然后使用最大后验估计来最佳地叠加数据集。该统计框架对实验噪声具有较强的鲁棒性，并对学习到的坐标变换自动产生不确定性估计。此外，它与黑箱机器学习的区别在于它的可解释性——具体来说，它产生的模型本身可以被询问，以深入了解所研究的系统。我们通过将其应用于表征软材料力学的四个代表性数据集来展示我们方法的这些显著特征。在每种情况下，我们的方法都重复了使用其他方法获得的结果，但减少了偏差并增加了不确定性估计。这种方法可以对许多领域的自相似数据进行标准化的统计处理，产生可解释的数据驱动模型，可以为材料分类、设计和发现等应用提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊