Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation

IF 3.7 2区化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-28 DOI:10.1016/j.chemolab.2024.105221

Peng Shan , Hongming Xiao , Xiang Li , Ruige Yang , Lin Zhang , Yuliang Zhao

{"title":"Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation","authors":"Peng Shan , Hongming Xiao , Xiang Li , Ruige Yang , Lin Zhang , Yuliang Zhao","doi":"10.1016/j.chemolab.2024.105221","DOIUrl":null,"url":null,"abstract":"<div><div>Honey is a nourishing and natural food product that is widely favored by a diverse group of consumers. Proton Nuclear Magnetic Resonance (<sup>1</sup>H NMR) is a powerful tool for quantitative analysis of honey and plays a crucial role in ensuring its quality. The <sup>1</sup>H NMR technique necessitates the utilization of multivariate calibration models to facilitate the quantitative analysis of key compounds present in honey. However, maintaining consistent measurement conditions across different years is scarcely possible, which can significantly impact the distribution of training and test spectra, ultimately leading to reduced performance of predictive models. Unsupervised domain adaptation (UDA) methods have gained considerable attention for their ability to match distribution differences between the labeled source spectra and the unlabeled target spectra without costly annotation. To enhance the quantitative model generalizability on honey from different years, we propose a UDA method known as partial least squares subspace and optimal transport-based UDA (PLSS-OT-UDA). This approach eliminates distribution differences between the source subspace and target subspace via partial least squares (PLS) dimensionality reduction and OT. Firstly, the optimal latent variable weight matrix from the source domain (i.e., labeled <sup>1</sup>H NMR data in 2017) is extracted with PLS. Next, the dimension of both source and target domains (i.e., unlabeled <sup>1</sup>H NMR data in 2018) is reduced and their corresponding subspaces are obtained with weight matrix of the source domain. Finally, OT is then employed to align the distribution of the source and target domains within the subspace. Experimental results on the honey dataset demonstrate that the PLSS-OT-UDA outperforms traditional methods, including transfer component analysis (TCA), optimal transport for domain adaptation (OTDA), domain adaptation based on principal component analysis and optimal transport (PCA-OTDA), and subspace alignment (SA), with respect to generalization performance on three components: baume degree, sugar content, and water content.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105221"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001618","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Honey is a nourishing and natural food product that is widely favored by a diverse group of consumers. Proton Nuclear Magnetic Resonance (¹H NMR) is a powerful tool for quantitative analysis of honey and plays a crucial role in ensuring its quality. The ¹H NMR technique necessitates the utilization of multivariate calibration models to facilitate the quantitative analysis of key compounds present in honey. However, maintaining consistent measurement conditions across different years is scarcely possible, which can significantly impact the distribution of training and test spectra, ultimately leading to reduced performance of predictive models. Unsupervised domain adaptation (UDA) methods have gained considerable attention for their ability to match distribution differences between the labeled source spectra and the unlabeled target spectra without costly annotation. To enhance the quantitative model generalizability on honey from different years, we propose a UDA method known as partial least squares subspace and optimal transport-based UDA (PLSS-OT-UDA). This approach eliminates distribution differences between the source subspace and target subspace via partial least squares (PLS) dimensionality reduction and OT. Firstly, the optimal latent variable weight matrix from the source domain (i.e., labeled ¹H NMR data in 2017) is extracted with PLS. Next, the dimension of both source and target domains (i.e., unlabeled ¹H NMR data in 2018) is reduced and their corresponding subspaces are obtained with weight matrix of the source domain. Finally, OT is then employed to align the distribution of the source and target domains within the subspace. Experimental results on the honey dataset demonstrate that the PLSS-OT-UDA outperforms traditional methods, including transfer component analysis (TCA), optimal transport for domain adaptation (OTDA), domain adaptation based on principal component analysis and optimal transport (PCA-OTDA), and subspace alignment (SA), with respect to generalization performance on three components: baume degree, sugar content, and water content.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过偏最小二乘子空间和基于无监督域适应的优化传输，增强不同年份蜂蜜的定量 1H NMR 模型通用性

蜂蜜是一种营养丰富的天然食品，受到不同消费者的广泛青睐。质子核磁共振（1H NMR）是定量分析蜂蜜的有力工具，在确保蜂蜜质量方面发挥着至关重要的作用。1H NMR 技术需要利用多元校准模型来促进对蜂蜜中主要化合物的定量分析。然而，在不同年份保持一致的测量条件几乎是不可能的，这会严重影响训练和测试光谱的分布，最终导致预测模型的性能下降。无监督领域适应（UDA）方法能够在不耗费大量标注的情况下匹配已标注源光谱和未标注目标光谱之间的分布差异，因此受到广泛关注。为了提高定量模型在不同年份蜂蜜上的通用性，我们提出了一种 UDA 方法，即基于偏最小二乘子空间和最优传输的 UDA（PLSS-OT-UDA）。这种方法通过偏最小二乘法（PLS）降维和 OT 消除源子空间和目标子空间之间的分布差异。首先，用 PLS 从源域（即 2017 年标记的 1H NMR 数据）提取最佳潜变量权重矩阵。接着，降低源域和目标域（即 2018 年未标记的 1H NMR 数据）的维度，并利用源域的权重矩阵得到其对应的子空间。最后，再利用 OT 对齐子空间内源域和目标域的分布。在蜂蜜数据集上的实验结果表明，PLSS-OT-UDA 在波美度、含糖量和含水量三个成分上的泛化性能优于传统方法，包括转移分量分析（TCA）、域自适应最优传输（OTDA）、基于主成分分析和最优传输的域自适应（PCA-OTDA）以及子空间配准（SA）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Chemometrics and Intelligent Laboratory Systems 工程技术-分析化学

CiteScore

7.50

自引率

7.70%

发文量

169

审稿时长

3.4 months

期刊介绍： Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.