Oliver Polushkina-Merchanskaya , Michael D. Sorochan Armstrong , Carolina Gómez-Llorente , Patricia Ferrer , Sergi Fernandez-Gonzalez , Miriam Perez-Cruz , María Dolores Gómez-Roig , José Camacho
{"title":"Considerations for missing data, outliers and transformations in permutation testing for ANOVA with multivariate responses","authors":"Oliver Polushkina-Merchanskaya , Michael D. Sorochan Armstrong , Carolina Gómez-Llorente , Patricia Ferrer , Sergi Fernandez-Gonzalez , Miriam Perez-Cruz , María Dolores Gómez-Roig , José Camacho","doi":"10.1016/j.chemolab.2025.105320","DOIUrl":null,"url":null,"abstract":"<div><div>Multifactorial experimental designs allow us to assess the contribution of several factors, and potentially their interactions, to one or several responses of interests. Following the principles of the partition of the variance advocated by Sir R.A. Fisher, the experimental responses are factored into the quantitative contribution of main factors and interactions. A popular approach to perform this factorization in ANOVA and related factorizations like ASCA(+) is through General Linear Models. Subsequently, different inferential approaches can be used to identify whether the contributions are statistically significant or not. Unfortunately, the performance of inferential approaches in terms of Type I and Type II errors can be heavily affected by missing data, outliers and/or the departure from normality of the distribution of the responses, which are commonplace problems in modern analytical experiments. In this paper, we study these problems and suggest good practices of application.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"258 ","pages":"Article 105320"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016974392500005X","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Multifactorial experimental designs allow us to assess the contribution of several factors, and potentially their interactions, to one or several responses of interests. Following the principles of the partition of the variance advocated by Sir R.A. Fisher, the experimental responses are factored into the quantitative contribution of main factors and interactions. A popular approach to perform this factorization in ANOVA and related factorizations like ASCA(+) is through General Linear Models. Subsequently, different inferential approaches can be used to identify whether the contributions are statistically significant or not. Unfortunately, the performance of inferential approaches in terms of Type I and Type II errors can be heavily affected by missing data, outliers and/or the departure from normality of the distribution of the responses, which are commonplace problems in modern analytical experiments. In this paper, we study these problems and suggest good practices of application.
多因素实验设计使我们能够评估多个因素及其相互作用对一个或多个相关反应的影响。根据 R.A. Fisher 爵士倡导的方差分配原则,实验反应被分解为主要因素和交互作用的定量贡献。在方差分析和相关因子分析(如 ASCA(+))中进行这种因子分析的常用方法是通用线性模型。随后,可以使用不同的推论方法来确定贡献是否具有统计意义。遗憾的是,推论方法在 I 类和 II 类误差方面的表现会受到缺失数据、异常值和/或响应分布偏离正态性的严重影响,而这些都是现代分析实验中常见的问题。在本文中,我们将对这些问题进行研究,并提出良好的应用实践建议。
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.