利用引导偏最小二乘法回归解释高维确定性筛选设计

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-24 DOI:10.1016/j.chemolab.2024.105218
Knut Dyrstad , Frank Westad
{"title":"利用引导偏最小二乘法回归解释高维确定性筛选设计","authors":"Knut Dyrstad ,&nbsp;Frank Westad","doi":"10.1016/j.chemolab.2024.105218","DOIUrl":null,"url":null,"abstract":"<div><p>Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.</p><p>A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105218"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression\",\"authors\":\"Knut Dyrstad ,&nbsp;Frank Westad\",\"doi\":\"10.1016/j.chemolab.2024.105218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.</p><p>A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.</p></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"253 \",\"pages\":\"Article 105218\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743924001588\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001588","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

确定性筛选设计(DSD)具有优化特性,能以最少的实验次数估算主效应、交互效应和变量平方效应,因此已成为化学、制药和生物制药工艺及产品开发中广泛使用的一种实验设计类型。这些高维 DOEs 变量多于样本,而且变量之间存在部分相关性,因此统计解释经常具有挑战性。本研究的目的是使用遗传程序对自举 PLSR 进行测试,以选择最终由 MLR 评估的变量子集。遗传选择基于原始 PLSR 系数(B)除以引导估计标准偏差得出的引导 T 值。通过对 7 个主要变量和 12 个测试文献真实数据(4、5、7 和 8 个主要变量)的模拟研究发现,与两种常见的 DSD 参考方法(DSD 拟合确定性筛选和 AICc 向前逐步回归(AICc FSR))相比,自举 PLSR MLR 方法在小 DSD 特别是大 DSD 中的模型性能有所改善。与两种参考方法中的任何一种方法得出的最佳模型相比,在 13 个 DSD 中,有 6 个的变量选择准确性和预测能力得到了显著提高。其余 7 个 DSD 的模型与最佳参考模型相同。本研究发现,强遗传为所有真实数据提供了最佳模型。在 SVEM FSR 变量效应非零百分比上使用遗传程序,然后使用 MLR,显示出了很好的结果。AICc Lasso 回归是部分测试的其他方法之一,在对三个大型最小 DSD 进行测试时,发现几乎所有变量的效应都为零。虽然 DSD 拟合确定性筛选方法通常可能是 DSD 的首选,但遗传自举 PLSR MLR 和遗传 SVEM FSR MLR 可能是改进变量选择和模型精度的替代方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression

Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.

A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
期刊最新文献
A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking Spectra data calibration based on deep residual modeling of independent component regression Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0 Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1