Pub Date : 2021-07-12DOI: 10.1080/00224065.2021.1947163
Shuai Huang
This book offers a good introduction to some statistical methods used in elections. It has two parts. The first part contains four chapters that cover estimation methods for polls. The main technical problem is the estimation of the proportion of the population holding a particular preference in voting. The analytic core of the problem is binomial distribution and the sampling and estimation procedures center around this distribution. As in the real world there are always complications. Various remedies are provided to address these complications. The author has done a great job of introducing the critical concepts and considerations in both the problem formulation and solution. For example, when introducing the importance of weighting in deriving the estimate of the poll, the author pretends to write a press release of his poll result on the issue of gender fairness in the military. It is clear that if a different source of demographics statistics is used, the poll result is quite different. Examples as such are quite useful for readers to understand the subject matter and its complexity. The second part of the book covers a few techniques to detect frauds and anomalies by examining the election results. Some techniques build on an interesting premise that humans are bad at mimicking randomness. This echoes what Fisher (1958) had said, “if one tries to think of numbers at random, one thinks of numbers very far from at random.” The Benford test is introduced in detail, including its history and its interesting applications in analyzing election data to detect anomaly based on the distributions of the leading digits reported by different precincts. The differential invalidation and some regression models are introduced as well. Spatial correlations could be modeled by using the geographical information in the data. The book concludes with a detailed discussion on data from Sri Lanka since 1994. This is a useful book that can help a broad range of readers to appreciate the power of statistics in understanding the election process from an analytic and scientific perspective. On top of the techniques introduced in the book, there are anecdotes and comments and insights that can enrich the reading experience. E.g., as in the preface the statement from a Nicaraguan leader “Indeed, you won the elections, but I won the count.” or the comment in the end of Chapter 4 “as with many things in statistics, increasing quality in one area tends to reduce quality in another.” Statistical techniques in this book are tightly bonded with the contexts and the backgrounds of their application. After reading the book, I appreciate the book has helped me understand a complex problem in a complex world. Not everything is what it appears to be, but we can equip ourselves with sufficient knowledge and useful tools to help us look at the data in every angle and really feel the data as it is.
{"title":"Understanding elections through statistics by Ole J. Forsberg, CRC press, Taylor & Francis group, boca Raton, FL, 2020, 225 pp., $69.95, ISBN 978-0367895372","authors":"Shuai Huang","doi":"10.1080/00224065.2021.1947163","DOIUrl":"https://doi.org/10.1080/00224065.2021.1947163","url":null,"abstract":"This book offers a good introduction to some statistical methods used in elections. It has two parts. The first part contains four chapters that cover estimation methods for polls. The main technical problem is the estimation of the proportion of the population holding a particular preference in voting. The analytic core of the problem is binomial distribution and the sampling and estimation procedures center around this distribution. As in the real world there are always complications. Various remedies are provided to address these complications. The author has done a great job of introducing the critical concepts and considerations in both the problem formulation and solution. For example, when introducing the importance of weighting in deriving the estimate of the poll, the author pretends to write a press release of his poll result on the issue of gender fairness in the military. It is clear that if a different source of demographics statistics is used, the poll result is quite different. Examples as such are quite useful for readers to understand the subject matter and its complexity. The second part of the book covers a few techniques to detect frauds and anomalies by examining the election results. Some techniques build on an interesting premise that humans are bad at mimicking randomness. This echoes what Fisher (1958) had said, “if one tries to think of numbers at random, one thinks of numbers very far from at random.” The Benford test is introduced in detail, including its history and its interesting applications in analyzing election data to detect anomaly based on the distributions of the leading digits reported by different precincts. The differential invalidation and some regression models are introduced as well. Spatial correlations could be modeled by using the geographical information in the data. The book concludes with a detailed discussion on data from Sri Lanka since 1994. This is a useful book that can help a broad range of readers to appreciate the power of statistics in understanding the election process from an analytic and scientific perspective. On top of the techniques introduced in the book, there are anecdotes and comments and insights that can enrich the reading experience. E.g., as in the preface the statement from a Nicaraguan leader “Indeed, you won the elections, but I won the count.” or the comment in the end of Chapter 4 “as with many things in statistics, increasing quality in one area tends to reduce quality in another.” Statistical techniques in this book are tightly bonded with the contexts and the backgrounds of their application. After reading the book, I appreciate the book has helped me understand a complex problem in a complex world. Not everything is what it appears to be, but we can equip ourselves with sufficient knowledge and useful tools to help us look at the data in every angle and really feel the data as it is.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"9 1","pages":"122 - 122"},"PeriodicalIF":2.5,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80102706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/00224065.2021.1948373
Xiao Liu, Rong Pan
Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.
重复数据来自多学科领域,包括可靠性、网络安全、医疗保健、在线零售等。本文研究了一种基于加性树的方法,称为Boost-R (Boosting for recurrent Data),用于具有静态和动态特征的循环事件数据。Boost-R构建了一个梯度增强的加性树集合来估计循环事件过程的累积强度函数,其中通过最小化观测到的和预测的累积强度之间的正则化l2距离,将新树添加到集合中。与传统的回归树不同,Boost-R在每个树叶上构建了一个时间相关的函数。这些函数的和,从多个树,产生累积强度的集合估计。当隐藏的子种群存在于异质种群中时,基于树的方法的分而治之的特性很有吸引力。回归树的非参数性质有助于避免对事件过程和特征之间复杂的相互作用进行参数假设。通过全面的数值实例研究了Boost-R的关键见解和优势。Boost-R的数据集和计算机代码可在GitHub上获得。据我们所知,Boost-R是第一个基于梯度增强加性树的方法,用于对具有静态和动态特征信息的大规模循环事件数据建模。
{"title":"Boost-R: Gradient boosted trees for recurrence data","authors":"Xiao Liu, Rong Pan","doi":"10.1080/00224065.2021.1948373","DOIUrl":"https://doi.org/10.1080/00224065.2021.1948373","url":null,"abstract":"Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"11 1","pages":"545 - 565"},"PeriodicalIF":2.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78715817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-30DOI: 10.1080/00224065.2021.1937409
Zhanpan Zhang, N. Doganaksoy
Abstract Modern industrial assets (e.g., generators, turbines, engines) are outfitted with numerous sensors to monitor key operating and environmental variables. Unusual sensor readings, such as high temperature, excessive vibration, or low current, could trigger rule-based actions (also known as faults) that range from warning alarms to immediate shutdown of the asset to prevent potential damage. In the case study of this article, a wind park experienced a sudden surge in vibration-induced shutdowns. We utilize fault data logs from the park with the goal of detecting common change points across turbines. Another important goal is the localization of fault occurrences to an identifiable set of turbines. The literature on change point detection and localization for multiple assets is highly sparse. Our technical development is based on the generalized linear modeling framework. We combine well-known solutions to change point detection for a single asset with a heuristics-based approach to identify a common change point(s) for multiple assets. The performance of the proposed detection and localization algorithms is evaluated through synthetic (Monte Carlo) fault data streams. Several novel performance metrics are defined to characterize different aspects of a change point detection algorithm for multiple assets. For the case study example, the proposed methodology identified the change point and the subset of affected turbines with a high degree of accuracy. The problem described here warrants further study to accommodate general fault distributions, change point detection algorithms, and very large fleet sizes.
{"title":"Change point detection and issue localization based on fleet-wide fault data","authors":"Zhanpan Zhang, N. Doganaksoy","doi":"10.1080/00224065.2021.1937409","DOIUrl":"https://doi.org/10.1080/00224065.2021.1937409","url":null,"abstract":"Abstract Modern industrial assets (e.g., generators, turbines, engines) are outfitted with numerous sensors to monitor key operating and environmental variables. Unusual sensor readings, such as high temperature, excessive vibration, or low current, could trigger rule-based actions (also known as faults) that range from warning alarms to immediate shutdown of the asset to prevent potential damage. In the case study of this article, a wind park experienced a sudden surge in vibration-induced shutdowns. We utilize fault data logs from the park with the goal of detecting common change points across turbines. Another important goal is the localization of fault occurrences to an identifiable set of turbines. The literature on change point detection and localization for multiple assets is highly sparse. Our technical development is based on the generalized linear modeling framework. We combine well-known solutions to change point detection for a single asset with a heuristics-based approach to identify a common change point(s) for multiple assets. The performance of the proposed detection and localization algorithms is evaluated through synthetic (Monte Carlo) fault data streams. Several novel performance metrics are defined to characterize different aspects of a change point detection algorithm for multiple assets. For the case study example, the proposed methodology identified the change point and the subset of affected turbines with a high degree of accuracy. The problem described here warrants further study to accommodate general fault distributions, change point detection algorithms, and very large fleet sizes.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"10 1","pages":"453 - 465"},"PeriodicalIF":2.5,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90219229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-30DOI: 10.1080/00224065.2021.1926377
B. Colosimo, M. Grasso, Federica Garghetti, B. Rossi
Abstract The production of novel types of complex shapes is nowadays enabled by new manufacturing paradigms such as additive manufacturing, also known as 3D printing. The continuous increase of shape complexity imposes new challenges in terms of inspection, product qualification and process monitoring methodologies. Previously proposed methods for 2.5D free-form surfaces are no longer applicable in the presence of this kind of new full 3D geometries. This paper aims to tackle this challenge by presenting a statistical quality monitoring approach for structures that cannot be described in terms of parametric models. The goal consists of identifying out-of-control geometrical distortions by analyzing either local variations within the part or changes from part to part. The proposed approach involves an innovative solution for modeling the deviation between the nominal geometry (the originating 3D model) and the real geometry (measured via x-ray computed tomography) by slicing the shapes and estimating the deviation slice by slice. 3D deviation maps are then transformed into 1D deviation profiles enabling the use of a profile monitoring scheme for local defect detection. The feasibility and potential of this method are demonstrated by focusing on a category of complex shapes where an elemental geometry regularly repeats in space. These shapes are known as lattice structures, or metamaterials, and their trabecular shape is thought to provide innovative mechanical and functional performance. The performance of the proposed method is shown in real and simulated case studies.
{"title":"Complex geometries in additive manufacturing: A new solution for lattice structure modeling and monitoring","authors":"B. Colosimo, M. Grasso, Federica Garghetti, B. Rossi","doi":"10.1080/00224065.2021.1926377","DOIUrl":"https://doi.org/10.1080/00224065.2021.1926377","url":null,"abstract":"Abstract The production of novel types of complex shapes is nowadays enabled by new manufacturing paradigms such as additive manufacturing, also known as 3D printing. The continuous increase of shape complexity imposes new challenges in terms of inspection, product qualification and process monitoring methodologies. Previously proposed methods for 2.5D free-form surfaces are no longer applicable in the presence of this kind of new full 3D geometries. This paper aims to tackle this challenge by presenting a statistical quality monitoring approach for structures that cannot be described in terms of parametric models. The goal consists of identifying out-of-control geometrical distortions by analyzing either local variations within the part or changes from part to part. The proposed approach involves an innovative solution for modeling the deviation between the nominal geometry (the originating 3D model) and the real geometry (measured via x-ray computed tomography) by slicing the shapes and estimating the deviation slice by slice. 3D deviation maps are then transformed into 1D deviation profiles enabling the use of a profile monitoring scheme for local defect detection. The feasibility and potential of this method are demonstrated by focusing on a category of complex shapes where an elemental geometry regularly repeats in space. These shapes are known as lattice structures, or metamaterials, and their trabecular shape is thought to provide innovative mechanical and functional performance. The performance of the proposed method is shown in real and simulated case studies.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"2 1","pages":"392 - 414"},"PeriodicalIF":2.5,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78524869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-25DOI: 10.1080/00224065.2022.2041376
S. Mathieu, L. Lefèvre, R. von Sachs, V. Delouille, Christian Ritter, F. Clette
Abstract Solar activity is an important driver of long-term climate trends and must be accounted for in climate models. Unfortunately, direct measurements of this quantity over long periods do not exist. The only observation related to solar activity whose records reach back to the seventeenth century are sunspots. Surprisingly, determining the number of sunspots consistently over time has remained until today a challenging statistical problem. It arises from the need of consolidating data from multiple observing stations around the world in a context of low signal-to-noise ratios, non-stationarity, missing data, nonstandard distributions and errors of different kind. The data from some stations experience therefore severe and various deviations over time. In this paper, we apply a systematic statistical approach for monitoring these complex and important series. It consists of three steps essential for successful treatment of the data: smoothing on multiple time-scales, monitoring using block bootstrap calibrated CUSUM charts and classifying of out-of-control situations by support vector techniques. This approach allows us to detect a wide range of anomalies (such as sudden jumps or more progressive drifts), unseen in previous analyses. It helps us to identify the causes of major deviations, which are often observer or equipment related. Their detection and identification will contribute to improve future observations. Their elimination or correction in past data will lead to a more precise reconstruction of the world reference index for solar activity: the International Sunspot Number.
{"title":"Nonparametric monitoring of sunspot number observations","authors":"S. Mathieu, L. Lefèvre, R. von Sachs, V. Delouille, Christian Ritter, F. Clette","doi":"10.1080/00224065.2022.2041376","DOIUrl":"https://doi.org/10.1080/00224065.2022.2041376","url":null,"abstract":"Abstract Solar activity is an important driver of long-term climate trends and must be accounted for in climate models. Unfortunately, direct measurements of this quantity over long periods do not exist. The only observation related to solar activity whose records reach back to the seventeenth century are sunspots. Surprisingly, determining the number of sunspots consistently over time has remained until today a challenging statistical problem. It arises from the need of consolidating data from multiple observing stations around the world in a context of low signal-to-noise ratios, non-stationarity, missing data, nonstandard distributions and errors of different kind. The data from some stations experience therefore severe and various deviations over time. In this paper, we apply a systematic statistical approach for monitoring these complex and important series. It consists of three steps essential for successful treatment of the data: smoothing on multiple time-scales, monitoring using block bootstrap calibrated CUSUM charts and classifying of out-of-control situations by support vector techniques. This approach allows us to detect a wide range of anomalies (such as sudden jumps or more progressive drifts), unseen in previous analyses. It helps us to identify the causes of major deviations, which are often observer or equipment related. Their detection and identification will contribute to improve future observations. Their elimination or correction in past data will lead to a more precise reconstruction of the world reference index for solar activity: the International Sunspot Number.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"21 1","pages":"104 - 118"},"PeriodicalIF":2.5,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81646009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-14DOI: 10.1080/00224065.2021.1930617
Elisa Cabana, R. Lillo
Abstract A robust multivariate quality control technique for individual observations is proposed, based on the robust reweighted shrinkage estimators. A simulation study is done to check the performance and compare the method with the classical Hotelling approach, and the robust alternative based on the reweighted minimum covariance determinant estimator. The results show the appropriateness of the method even when the dimension or the Phase I contamination are high, with both independent and correlated variables, showing additional advantages about computational efficiency. The approach is illustrated with two real data-set examples from production processes.
{"title":"Robust multivariate control chart based on shrinkage for individual observations","authors":"Elisa Cabana, R. Lillo","doi":"10.1080/00224065.2021.1930617","DOIUrl":"https://doi.org/10.1080/00224065.2021.1930617","url":null,"abstract":"Abstract A robust multivariate quality control technique for individual observations is proposed, based on the robust reweighted shrinkage estimators. A simulation study is done to check the performance and compare the method with the classical Hotelling approach, and the robust alternative based on the reweighted minimum covariance determinant estimator. The results show the appropriateness of the method even when the dimension or the Phase I contamination are high, with both independent and correlated variables, showing additional advantages about computational efficiency. The approach is illustrated with two real data-set examples from production processes.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"1 1","pages":"415 - 440"},"PeriodicalIF":2.5,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79893083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1080/00224065.2022.2053795
D. Cole, R. Gramacy, J. Warner, G. Bomarito, P. Leser, W. Leser
Abstract In reliability analysis, methods used to estimate failure probability are often limited by the costs associated with model evaluations. Many of these methods, such as multifidelity importance sampling (MFIS), rely upon a computationally efficient surrogate model like a Gaussian process (GP) to quickly generate predictions. The quality of the GP fit, particularly in the vicinity of the failure region(s), is instrumental in supplying accurately predicted failures for such strategies. We introduce an entropy-based GP adaptive design that, when paired with MFIS, provides more accurate failure probability estimates and with higher confidence. We show that our greedy data acquisition strategy better identifies multiple failure regions compared to existing contour-finding schemes. We then extend the method to batch selection, without sacrificing accuracy. Illustrative examples are provided on benchmark data as well as an application to an impact damage simulator for National Aeronautics and Space Administration (NASA) spacesuits.
{"title":"Entropy-based adaptive design for contour finding and estimating reliability","authors":"D. Cole, R. Gramacy, J. Warner, G. Bomarito, P. Leser, W. Leser","doi":"10.1080/00224065.2022.2053795","DOIUrl":"https://doi.org/10.1080/00224065.2022.2053795","url":null,"abstract":"Abstract In reliability analysis, methods used to estimate failure probability are often limited by the costs associated with model evaluations. Many of these methods, such as multifidelity importance sampling (MFIS), rely upon a computationally efficient surrogate model like a Gaussian process (GP) to quickly generate predictions. The quality of the GP fit, particularly in the vicinity of the failure region(s), is instrumental in supplying accurately predicted failures for such strategies. We introduce an entropy-based GP adaptive design that, when paired with MFIS, provides more accurate failure probability estimates and with higher confidence. We show that our greedy data acquisition strategy better identifies multiple failure regions compared to existing contour-finding schemes. We then extend the method to batch selection, without sacrificing accuracy. Illustrative examples are provided on benchmark data as well as an application to an impact damage simulator for National Aeronautics and Space Administration (NASA) spacesuits.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"37 1","pages":"43 - 60"},"PeriodicalIF":2.5,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80598161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-17DOI: 10.1080/00224065.2021.1920347
Caleb King, T. Bzik, P. Parker
Abstract In design of experiments, setting exact replicates of factor settings enables estimation of pure-error; a model-independent estimate of experimental error useful in communicating inherent system noise and testing model lack-of-fit. Often in practice, the factor levels for replicates are precisely measured rather than precisely set, resulting in near-replicates. This can result in inflated estimates of pure-error due to uncompensated set-point variation. In this article, we review previous strategies for estimating pure-error from near-replicates and propose a simple alternative. We derive key analytical properties and investigate them via simulation. Finally, we illustrate the new approach with an application.
{"title":"Estimating pure-error from near replicates in design of experiments","authors":"Caleb King, T. Bzik, P. Parker","doi":"10.1080/00224065.2021.1920347","DOIUrl":"https://doi.org/10.1080/00224065.2021.1920347","url":null,"abstract":"Abstract In design of experiments, setting exact replicates of factor settings enables estimation of pure-error; a model-independent estimate of experimental error useful in communicating inherent system noise and testing model lack-of-fit. Often in practice, the factor levels for replicates are precisely measured rather than precisely set, resulting in near-replicates. This can result in inflated estimates of pure-error due to uncompensated set-point variation. In this article, we review previous strategies for estimating pure-error from near-replicates and propose a simple alternative. We derive key analytical properties and investigate them via simulation. Finally, we illustrate the new approach with an application.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"24 1","pages":"102 - 117"},"PeriodicalIF":2.5,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81562697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-13DOI: 10.1080/00224065.2021.1916413
Konstantinos Bourazas, Dimitrios Kiagias, P. Tsiamyrtzis
Abstract Performing online monitoring for short horizon data is a challenging, though cost effective benefit. Self-starting methods attempt to address this issue adopting a hybrid scheme that executes calibration and monitoring simultaneously. In this work, we propose a Bayesian alternative that will utilize prior information and possible historical data (via power priors), offering a head-start in online monitoring, putting emphasis on outlier detection. For cases of complete prior ignorance, the objective Bayesian version will be provided. Charting will be based on the predictive distribution and the methodological framework will be derived in a general way, to facilitate discrete and continuous data from any distribution that belongs to the regular exponential family (with Normal, Poisson and Binomial being the most representative). Being in the Bayesian arena, we will be able to not only perform process monitoring, but also draw online inference regarding the unknown process parameter(s). An extended simulation study will evaluate the proposed methodology against frequentist based competitors and it will cover topics regarding prior sensitivity and model misspecification robustness. A continuous and a discrete real data set will illustrate its use in practice. Technical details, algorithms, guidelines on prior elicitation and R-codes are provided in appendices and supplementary material. Short production runs and online phase I monitoring are among the best candidates to benefit from the developed methodology.
{"title":"Predictive Control Charts (PCC): A Bayesian approach in online monitoring of short runs","authors":"Konstantinos Bourazas, Dimitrios Kiagias, P. Tsiamyrtzis","doi":"10.1080/00224065.2021.1916413","DOIUrl":"https://doi.org/10.1080/00224065.2021.1916413","url":null,"abstract":"Abstract Performing online monitoring for short horizon data is a challenging, though cost effective benefit. Self-starting methods attempt to address this issue adopting a hybrid scheme that executes calibration and monitoring simultaneously. In this work, we propose a Bayesian alternative that will utilize prior information and possible historical data (via power priors), offering a head-start in online monitoring, putting emphasis on outlier detection. For cases of complete prior ignorance, the objective Bayesian version will be provided. Charting will be based on the predictive distribution and the methodological framework will be derived in a general way, to facilitate discrete and continuous data from any distribution that belongs to the regular exponential family (with Normal, Poisson and Binomial being the most representative). Being in the Bayesian arena, we will be able to not only perform process monitoring, but also draw online inference regarding the unknown process parameter(s). An extended simulation study will evaluate the proposed methodology against frequentist based competitors and it will cover topics regarding prior sensitivity and model misspecification robustness. A continuous and a discrete real data set will illustrate its use in practice. Technical details, algorithms, guidelines on prior elicitation and R-codes are provided in appendices and supplementary material. Short production runs and online phase I monitoring are among the best candidates to benefit from the developed methodology.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"31 1","pages":"367 - 391"},"PeriodicalIF":2.5,"publicationDate":"2021-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89514912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-11DOI: 10.1080/00224065.2021.1916412
Alan R. Vazquez, E. Schoen, P. Goos
Abstract Due to recent advances in the development of laboratory equipment, large screening experiments can now be conducted to study the joint impact of up to a few dozen factors. While much is known about orthogonal designs involving 64 and 128 runs, there is a lack of literature on screening designs with intermediate run sizes. In this article, we therefore construct screening designs with 80, 96 and 112 runs which allow the main effects to be estimated independently from the two-factor interactions and limit the aliasing among the interactions. We motivate our work using a 14-factor tuberculosis inhibition experiment and compare our new designs with alternatives from the literature using simulations.
{"title":"Two-level orthogonal screening designs with 80, 96, and 112 runs, and up to 29 factors","authors":"Alan R. Vazquez, E. Schoen, P. Goos","doi":"10.1080/00224065.2021.1916412","DOIUrl":"https://doi.org/10.1080/00224065.2021.1916412","url":null,"abstract":"Abstract Due to recent advances in the development of laboratory equipment, large screening experiments can now be conducted to study the joint impact of up to a few dozen factors. While much is known about orthogonal designs involving 64 and 128 runs, there is a lack of literature on screening designs with intermediate run sizes. In this article, we therefore construct screening designs with 80, 96 and 112 runs which allow the main effects to be estimated independently from the two-factor interactions and limit the aliasing among the interactions. We motivate our work using a 14-factor tuberculosis inhibition experiment and compare our new designs with alternatives from the literature using simulations.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"78 1","pages":"338 - 358"},"PeriodicalIF":2.5,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90084561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}