Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-28 DOI:10.1016/j.chemolab.2024.105221
Peng Shan , Hongming Xiao , Xiang Li , Ruige Yang , Lin Zhang , Yuliang Zhao
{"title":"Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation","authors":"Peng Shan ,&nbsp;Hongming Xiao ,&nbsp;Xiang Li ,&nbsp;Ruige Yang ,&nbsp;Lin Zhang ,&nbsp;Yuliang Zhao","doi":"10.1016/j.chemolab.2024.105221","DOIUrl":null,"url":null,"abstract":"<div><div>Honey is a nourishing and natural food product that is widely favored by a diverse group of consumers. Proton Nuclear Magnetic Resonance (<sup>1</sup>H NMR) is a powerful tool for quantitative analysis of honey and plays a crucial role in ensuring its quality. The <sup>1</sup>H NMR technique necessitates the utilization of multivariate calibration models to facilitate the quantitative analysis of key compounds present in honey. However, maintaining consistent measurement conditions across different years is scarcely possible, which can significantly impact the distribution of training and test spectra, ultimately leading to reduced performance of predictive models. Unsupervised domain adaptation (UDA) methods have gained considerable attention for their ability to match distribution differences between the labeled source spectra and the unlabeled target spectra without costly annotation. To enhance the quantitative model generalizability on honey from different years, we propose a UDA method known as partial least squares subspace and optimal transport-based UDA (PLSS-OT-UDA). This approach eliminates distribution differences between the source subspace and target subspace via partial least squares (PLS) dimensionality reduction and OT. Firstly, the optimal latent variable weight matrix from the source domain (i.e., labeled <sup>1</sup>H NMR data in 2017) is extracted with PLS. Next, the dimension of both source and target domains (i.e., unlabeled <sup>1</sup>H NMR data in 2018) is reduced and their corresponding subspaces are obtained with weight matrix of the source domain. Finally, OT is then employed to align the distribution of the source and target domains within the subspace. Experimental results on the honey dataset demonstrate that the PLSS-OT-UDA outperforms traditional methods, including transfer component analysis (TCA), optimal transport for domain adaptation (OTDA), domain adaptation based on principal component analysis and optimal transport (PCA-OTDA), and subspace alignment (SA), with respect to generalization performance on three components: baume degree, sugar content, and water content.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105221"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001618","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Honey is a nourishing and natural food product that is widely favored by a diverse group of consumers. Proton Nuclear Magnetic Resonance (1H NMR) is a powerful tool for quantitative analysis of honey and plays a crucial role in ensuring its quality. The 1H NMR technique necessitates the utilization of multivariate calibration models to facilitate the quantitative analysis of key compounds present in honey. However, maintaining consistent measurement conditions across different years is scarcely possible, which can significantly impact the distribution of training and test spectra, ultimately leading to reduced performance of predictive models. Unsupervised domain adaptation (UDA) methods have gained considerable attention for their ability to match distribution differences between the labeled source spectra and the unlabeled target spectra without costly annotation. To enhance the quantitative model generalizability on honey from different years, we propose a UDA method known as partial least squares subspace and optimal transport-based UDA (PLSS-OT-UDA). This approach eliminates distribution differences between the source subspace and target subspace via partial least squares (PLS) dimensionality reduction and OT. Firstly, the optimal latent variable weight matrix from the source domain (i.e., labeled 1H NMR data in 2017) is extracted with PLS. Next, the dimension of both source and target domains (i.e., unlabeled 1H NMR data in 2018) is reduced and their corresponding subspaces are obtained with weight matrix of the source domain. Finally, OT is then employed to align the distribution of the source and target domains within the subspace. Experimental results on the honey dataset demonstrate that the PLSS-OT-UDA outperforms traditional methods, including transfer component analysis (TCA), optimal transport for domain adaptation (OTDA), domain adaptation based on principal component analysis and optimal transport (PCA-OTDA), and subspace alignment (SA), with respect to generalization performance on three components: baume degree, sugar content, and water content.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过偏最小二乘子空间和基于无监督域适应的优化传输,增强不同年份蜂蜜的定量 1H NMR 模型通用性
蜂蜜是一种营养丰富的天然食品,受到不同消费者的广泛青睐。质子核磁共振(1H NMR)是定量分析蜂蜜的有力工具,在确保蜂蜜质量方面发挥着至关重要的作用。1H NMR 技术需要利用多元校准模型来促进对蜂蜜中主要化合物的定量分析。然而,在不同年份保持一致的测量条件几乎是不可能的,这会严重影响训练和测试光谱的分布,最终导致预测模型的性能下降。无监督领域适应(UDA)方法能够在不耗费大量标注的情况下匹配已标注源光谱和未标注目标光谱之间的分布差异,因此受到广泛关注。为了提高定量模型在不同年份蜂蜜上的通用性,我们提出了一种 UDA 方法,即基于偏最小二乘子空间和最优传输的 UDA(PLSS-OT-UDA)。这种方法通过偏最小二乘法(PLS)降维和 OT 消除源子空间和目标子空间之间的分布差异。首先,用 PLS 从源域(即 2017 年标记的 1H NMR 数据)提取最佳潜变量权重矩阵。接着,降低源域和目标域(即 2018 年未标记的 1H NMR 数据)的维度,并利用源域的权重矩阵得到其对应的子空间。最后,再利用 OT 对齐子空间内源域和目标域的分布。在蜂蜜数据集上的实验结果表明,PLSS-OT-UDA 在波美度、含糖量和含水量三个成分上的泛化性能优于传统方法,包括转移分量分析(TCA)、域自适应最优传输(OTDA)、基于主成分分析和最优传输的域自适应(PCA-OTDA)以及子空间配准(SA)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
期刊最新文献
A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking Spectra data calibration based on deep residual modeling of independent component regression Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0 Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1