Pub Date : 2022-02-25DOI: 10.1080/00224065.2022.2041379
Shuai Huang
This book is useful for readers who want to hone their skills in Bayesian modeling and computation. Written by experts in the area of Bayesian software and major contributors to some existing widely used Bayesian computational tools, this book covers not only basic Bayesian probabilistic inference but also a range of models from linear models (and mixed effect models, hierarchical models, splines, etc) to time series models such as the state space model. It also covers the Bayesian additive regression trees. Almost all the concepts and techniques are implemented using PyMC3, Tensorflow Probability (TFP), ArviZ and other libraries. By doing all the modeling, computation, and data analysis, the authors not only show how these things work, but also show how and why things don’t work by emphasis on exploratory data analysis, model comparison, and diagnostics. To learn from the book, readers may need some statistical background such as basic training in statistics and probability theory. Some understanding of Bayesian modeling and inference is also needed, such as the concepts of prior, likelihood, posterior, the bayes’s law, and Monte Carlo sampling. Some experience with Python would also be very beneficial for readers to get started on this journey of Bayesian modeling. The authors suggested a few books as possible preliminaries for their book. I feel that the readers may also benefit from reading Andrew Gelman’s book, Bayesian Data Analysis, Chapman & Hall/CRC, 3rd Edition, 2013. Of course, as the authors pointed it out, this book is not for a Bayesian Reader but a Bayesian practitioner. The book is more of an interactive experience for Bayesian practitioners by learning all the computational tools to model and to negotiate with data for a good modeling practice. On the other hand, if readers have already had experience with real-world data analysis using Python or R or other similar tools, even if this book is their first experience with Bayesian modeling and computation, readers may still learn a lot from this book. There are an abundance of figures and detailed explanations of how things are done and how the results are interpreted. Picking up these details would need some trained sensibility when dealing with real-world data, but aspiring and experienced practitioners should find all the details useful and impressive. And there are also many big picture schematic drawings to help readers connect all the details with overall concepts such as end-to-end workflows. The Figure 9.1 is a remarkable example. Overall, as Kevin Murphy pointed out in the Forward, “this is a valuable addition to the literature, which should hopefully further the adoption of Bayesian methods”. I highly recommend readers who are interested in learning Bayesian models and their applications in practice to have this book on their bookshelf.
这本书对想要磨练贝叶斯建模和计算技能的读者很有用。由贝叶斯软件领域的专家和一些现有广泛使用的贝叶斯计算工具的主要贡献者撰写,本书不仅涵盖了基本的贝叶斯概率推断,还涵盖了从线性模型(和混合效应模型,层次模型,样条等)到时间序列模型(如状态空间模型)的一系列模型。它还涵盖了贝叶斯加性回归树。几乎所有的概念和技术都是使用PyMC3、Tensorflow Probability (TFP)、ArviZ和其他库实现的。通过进行所有的建模、计算和数据分析,作者不仅展示了这些东西是如何工作的,而且通过强调探索性数据分析、模型比较和诊断,还展示了事情是如何以及为什么不工作的。为了从这本书中学习,读者可能需要一些统计背景,如统计和概率论的基本训练。还需要对贝叶斯建模和推理有一定的了解,例如先验、似然、后验、贝叶斯定律和蒙特卡罗抽样的概念。对于开始贝叶斯建模之旅的读者来说,一些Python的经验也是非常有益的。作者们推荐了几本书作为他们这本书可能的序言。我觉得读者也可以从Andrew Gelman的书中受益,Bayesian Data Analysis, Chapman & Hall/CRC, 3rd Edition, 2013。当然,正如作者所指出的,这本书不是为贝叶斯读者而写,而是为贝叶斯实践者而写。这本书更多的是贝叶斯实践者的互动体验,通过学习所有的计算工具来建模和与数据协商,以获得良好的建模实践。另一方面,如果读者已经有了使用Python或R或其他类似工具进行实际数据分析的经验,即使本书是他们第一次使用贝叶斯建模和计算,读者仍然可以从本书中学到很多东西。书中有大量的数据和详细的解释,说明事情是如何完成的,结果是如何解释的。在处理真实世界的数据时,获取这些细节需要一些训练有素的敏感性,但是有抱负和经验丰富的从业者应该会发现所有的细节都是有用的和令人印象深刻的。此外,还有许多大图原理图,帮助读者将所有细节与整体概念(如端到端工作流)联系起来。图9.1就是一个很好的例子。总的来说,正如Kevin Murphy在前言中指出的,“这是对文献的一个有价值的补充,它应该有望进一步采用贝叶斯方法”。我强烈建议对学习贝叶斯模型及其在实践中的应用感兴趣的读者把这本书放在书架上。
{"title":"Bayesian Modeling and Computation in Python","authors":"Shuai Huang","doi":"10.1080/00224065.2022.2041379","DOIUrl":"https://doi.org/10.1080/00224065.2022.2041379","url":null,"abstract":"This book is useful for readers who want to hone their skills in Bayesian modeling and computation. Written by experts in the area of Bayesian software and major contributors to some existing widely used Bayesian computational tools, this book covers not only basic Bayesian probabilistic inference but also a range of models from linear models (and mixed effect models, hierarchical models, splines, etc) to time series models such as the state space model. It also covers the Bayesian additive regression trees. Almost all the concepts and techniques are implemented using PyMC3, Tensorflow Probability (TFP), ArviZ and other libraries. By doing all the modeling, computation, and data analysis, the authors not only show how these things work, but also show how and why things don’t work by emphasis on exploratory data analysis, model comparison, and diagnostics. To learn from the book, readers may need some statistical background such as basic training in statistics and probability theory. Some understanding of Bayesian modeling and inference is also needed, such as the concepts of prior, likelihood, posterior, the bayes’s law, and Monte Carlo sampling. Some experience with Python would also be very beneficial for readers to get started on this journey of Bayesian modeling. The authors suggested a few books as possible preliminaries for their book. I feel that the readers may also benefit from reading Andrew Gelman’s book, Bayesian Data Analysis, Chapman & Hall/CRC, 3rd Edition, 2013. Of course, as the authors pointed it out, this book is not for a Bayesian Reader but a Bayesian practitioner. The book is more of an interactive experience for Bayesian practitioners by learning all the computational tools to model and to negotiate with data for a good modeling practice. On the other hand, if readers have already had experience with real-world data analysis using Python or R or other similar tools, even if this book is their first experience with Bayesian modeling and computation, readers may still learn a lot from this book. There are an abundance of figures and detailed explanations of how things are done and how the results are interpreted. Picking up these details would need some trained sensibility when dealing with real-world data, but aspiring and experienced practitioners should find all the details useful and impressive. And there are also many big picture schematic drawings to help readers connect all the details with overall concepts such as end-to-end workflows. The Figure 9.1 is a remarkable example. Overall, as Kevin Murphy pointed out in the Forward, “this is a valuable addition to the literature, which should hopefully further the adoption of Bayesian methods”. I highly recommend readers who are interested in learning Bayesian models and their applications in practice to have this book on their bookshelf.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"30 1","pages":"266 - 266"},"PeriodicalIF":2.5,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78872910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1080/00224065.2022.2035284
Guanghui Wang, Changliang Zou
Abstract Change-point detection is a popular statistical method for Phase I analysis in statistical process control. The cpss package has been developed to provide users with multiple choices of change-point searching algorithms for a variety of frequently considered parametric change-point models, including the univariate and multivariate mean and/or (co)variance change models, changes in linear models and generalized linear models, and change models in exponential families. In particular, it integrates the recently proposed COPSS criterion to determine the number of change-points in a data-driven fashion that avoids selecting or specifying additional tuning parameters in existing approaches. Hence it is more convenient to use in practical applications. In addition, the cpss package brings great possibilities to handle user-customized change-point models.
{"title":"cpss: an package for change-point detection by sample-splitting methods","authors":"Guanghui Wang, Changliang Zou","doi":"10.1080/00224065.2022.2035284","DOIUrl":"https://doi.org/10.1080/00224065.2022.2035284","url":null,"abstract":"Abstract Change-point detection is a popular statistical method for Phase I analysis in statistical process control. The cpss package has been developed to provide users with multiple choices of change-point searching algorithms for a variety of frequently considered parametric change-point models, including the univariate and multivariate mean and/or (co)variance change models, changes in linear models and generalized linear models, and change models in exponential families. In particular, it integrates the recently proposed COPSS criterion to determine the number of change-points in a data-driven fashion that avoids selecting or specifying additional tuning parameters in existing approaches. Hence it is more convenient to use in practical applications. In addition, the cpss package brings great possibilities to handle user-customized change-point models.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"100 1","pages":"61 - 74"},"PeriodicalIF":2.5,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85866418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-24DOI: 10.1080/00224065.2022.2035285
Yueyao Wang, Li Xu, Yili Hong, Rong Pan, Tyler H. Chang, T. Lux, Jon Bernard, L. Watson, K. Cameron
Abstract Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to full factorial designs to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models, as deterministic methods, could be biased particularly if extrapolations are needed. In statistics literature, space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability management setting, however, needs investigation. In this case study, we investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In our synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to the choice of approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. Although we observe that the GBD can also outperform SFDs for smooth underlying surface, GBD is not scalable to high dimensional experimental regions. Therefore, SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model. This article has online supplementary materials.
{"title":"Design strategies and approximation methods for high-performance computing variability management","authors":"Yueyao Wang, Li Xu, Yili Hong, Rong Pan, Tyler H. Chang, T. Lux, Jon Bernard, L. Watson, K. Cameron","doi":"10.1080/00224065.2022.2035285","DOIUrl":"https://doi.org/10.1080/00224065.2022.2035285","url":null,"abstract":"Abstract Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to full factorial designs to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models, as deterministic methods, could be biased particularly if extrapolations are needed. In statistics literature, space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability management setting, however, needs investigation. In this case study, we investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In our synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to the choice of approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. Although we observe that the GBD can also outperform SFDs for smooth underlying surface, GBD is not scalable to high dimensional experimental regions. Therefore, SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model. This article has online supplementary materials.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"39 1","pages":"88 - 103"},"PeriodicalIF":2.5,"publicationDate":"2022-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82735014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-10DOI: 10.1080/00224065.2023.2180457
Shuxiang Fan, S. Wong, J. Zidek
Abstract When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that govern knots are informed by subjective judgment to some extent, particularly the spatial interaction of knots and their relationship with lumber strength. This case study reports the results of an experiment that investigated and modeled the strength-reducing effects of knots on a sample of Douglas Fir lumber. Experimental data were obtained by taking scans of lumber surfaces and applying tensile strength testing. The modeling approach presented incorporates all relevant knot information in a Bayesian framework, thereby contributing a more refined way of managing the quality of manufactured lumber.
{"title":"Knots and their effect on the tensile strength of lumber: A case study","authors":"Shuxiang Fan, S. Wong, J. Zidek","doi":"10.1080/00224065.2023.2180457","DOIUrl":"https://doi.org/10.1080/00224065.2023.2180457","url":null,"abstract":"Abstract When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that govern knots are informed by subjective judgment to some extent, particularly the spatial interaction of knots and their relationship with lumber strength. This case study reports the results of an experiment that investigated and modeled the strength-reducing effects of knots on a sample of Douglas Fir lumber. Experimental data were obtained by taking scans of lumber surfaces and applying tensile strength testing. The modeling approach presented incorporates all relevant knot information in a Bayesian framework, thereby contributing a more refined way of managing the quality of manufactured lumber.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"8 1","pages":"510 - 522"},"PeriodicalIF":2.5,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85147875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-30DOI: 10.1080/00224065.2020.1764418
Shuai Huang
As a sister book of the book Probability by the same author, this book is supposed to be the second course in a mathematical statistics sequence of classes. The readers should have learned calculus and completed a calculusbased course in probability. As with most mathematical statistics textbooks, point estimations, interval estimation, and hypothesis testing are the core concepts. This book is particularly written for students who would have their first exposure to mathematical statistics, so the author carefully selected his materials and had focused on the understanding of statistics such as the sample mean and sample variance being also random variables as well. R is used throughout the text for graphics, computation, and Monte Carlo simulation. The homework is comprehensive. From all these aspects, this book has a similar style as the other book of Probability by the same author. The book’s organization is deceptively simple: it only has four chapters. Chapter 1, almost 100 pages, is about random sampling. Chapter 2, another 100 pages, is about point estimation. Chapter 3, 135 pages, is about interval estimation. Chapter 4, 133 pages, is about hypothesis testing. This “simple” structure makes the four pillars of mathematical statistics very clear to readers who first learn the topic. Within each chapter, just like in the book Probability, each concept is presented in detail and in multiple aspects. And when calculation is involved, enough middle steps are preserved so readers can easily follow the steps. One notable example is the presentation of the hypothesis testing. Not like many other textbooks that start with proven methods such as the Z-test, this book introduces the big picture first, and this big picture includes “a hunch”: it presents in the very beginning a clear outline of the 12 steps for hypothesis testing, starts with “a hunch, or theory, concerning a problem of interest,” then moves to the second step “translate the theory into a question concerning an unknown parameter theta,” then “state the null hypothesis of theta.” ... Then technical explanation of many of these steps is given in detail. The type 1 and type 2 errors are also presented right along with this 12-step outline. What is more, strange (i.e., idiosyncratic) forms of hypothesis testing are presented! It concerns three brothers, Chico, Harpo, and Groucho. Each of them comes up with their own testing statistics, e.g., x1þ x2, min (x1, x2), or max(x1, x2), where x1 and x2 are random samples of size 2 from a uniform distribution U(0, theta). Is theta 1⁄4 5, or theta 1⁄4 2? Is this an allusion to the three little pigs? Nonetheless, this is a hilarious example that very effectively instructs the technical details of hypothesis testing, but also revives this “ancient” technique that tells readers that, in using the proven hypothesis testing methods, we actually have made choices (i.e., each of the three brothers’ proposals have pros and cons, in terms of the type 1 and
{"title":"Mathematical Statistics","authors":"Shuai Huang","doi":"10.1080/00224065.2020.1764418","DOIUrl":"https://doi.org/10.1080/00224065.2020.1764418","url":null,"abstract":"As a sister book of the book Probability by the same author, this book is supposed to be the second course in a mathematical statistics sequence of classes. The readers should have learned calculus and completed a calculusbased course in probability. As with most mathematical statistics textbooks, point estimations, interval estimation, and hypothesis testing are the core concepts. This book is particularly written for students who would have their first exposure to mathematical statistics, so the author carefully selected his materials and had focused on the understanding of statistics such as the sample mean and sample variance being also random variables as well. R is used throughout the text for graphics, computation, and Monte Carlo simulation. The homework is comprehensive. From all these aspects, this book has a similar style as the other book of Probability by the same author. The book’s organization is deceptively simple: it only has four chapters. Chapter 1, almost 100 pages, is about random sampling. Chapter 2, another 100 pages, is about point estimation. Chapter 3, 135 pages, is about interval estimation. Chapter 4, 133 pages, is about hypothesis testing. This “simple” structure makes the four pillars of mathematical statistics very clear to readers who first learn the topic. Within each chapter, just like in the book Probability, each concept is presented in detail and in multiple aspects. And when calculation is involved, enough middle steps are preserved so readers can easily follow the steps. One notable example is the presentation of the hypothesis testing. Not like many other textbooks that start with proven methods such as the Z-test, this book introduces the big picture first, and this big picture includes “a hunch”: it presents in the very beginning a clear outline of the 12 steps for hypothesis testing, starts with “a hunch, or theory, concerning a problem of interest,” then moves to the second step “translate the theory into a question concerning an unknown parameter theta,” then “state the null hypothesis of theta.” ... Then technical explanation of many of these steps is given in detail. The type 1 and type 2 errors are also presented right along with this 12-step outline. What is more, strange (i.e., idiosyncratic) forms of hypothesis testing are presented! It concerns three brothers, Chico, Harpo, and Groucho. Each of them comes up with their own testing statistics, e.g., x1þ x2, min (x1, x2), or max(x1, x2), where x1 and x2 are random samples of size 2 from a uniform distribution U(0, theta). Is theta 1⁄4 5, or theta 1⁄4 2? Is this an allusion to the three little pigs? Nonetheless, this is a hilarious example that very effectively instructs the technical details of hypothesis testing, but also revives this “ancient” technique that tells readers that, in using the proven hypothesis testing methods, we actually have made choices (i.e., each of the three brothers’ proposals have pros and cons, in terms of the type 1 and ","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"78 1 1","pages":"118 - 118"},"PeriodicalIF":2.5,"publicationDate":"2021-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85795417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-30DOI: 10.1080/00224065.2021.2019569
Caleb King, T. Bzik, Peter A. Parker, M. Wells, Benjamin R. Baer
{"title":"Addendum to “Estimating pure-error from near replicates in design of experiments”","authors":"Caleb King, T. Bzik, Peter A. Parker, M. Wells, Benjamin R. Baer","doi":"10.1080/00224065.2021.2019569","DOIUrl":"https://doi.org/10.1080/00224065.2021.2019569","url":null,"abstract":"","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"385 1","pages":"123 - 123"},"PeriodicalIF":2.5,"publicationDate":"2021-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76614565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-08DOI: 10.1080/00224065.2021.2006583
Jing Li
In Industrial Data Analytics for Diagnosis and Prognosis A Random Effects Modelling Approach, distinguished engineers Shiyu Zhou and Yong Chen deliver a rigorous and practical introduction to the random effects modeling approach for industrial system diagnosis and prognosis. In the book’s two parts, general statistical concepts and useful theory are described and explained, as are industrial diagnosis and prognosis methods. The accomplished authors describe and model fixed effects, random effects, and variation in univariate and multivariate datasets and cover the application of the random effects approach to diagnosis of variation sources in industrial processes. They offer a detailed performance comparison of different diagnosis methods before moving on to the application of the random effects approach to failure prognosis in industrial processes and systems.
{"title":"Industrial Data Analytics for Diagnosis and Prognosis: A Random Effects Modeling Approach","authors":"Jing Li","doi":"10.1080/00224065.2021.2006583","DOIUrl":"https://doi.org/10.1080/00224065.2021.2006583","url":null,"abstract":"In Industrial Data Analytics for Diagnosis and Prognosis A Random Effects Modelling Approach, distinguished engineers Shiyu Zhou and Yong Chen deliver a rigorous and practical introduction to the random effects modeling approach for industrial system diagnosis and prognosis. In the book’s two parts, general statistical concepts and useful theory are described and explained, as are industrial diagnosis and prognosis methods. The accomplished authors describe and model fixed effects, random effects, and variation in univariate and multivariate datasets and cover the application of the random effects approach to diagnosis of variation sources in industrial processes. They offer a detailed performance comparison of different diagnosis methods before moving on to the application of the random effects approach to failure prognosis in industrial processes and systems.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"10 1","pages":"606 - 606"},"PeriodicalIF":2.5,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89676795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1080/00224065.2021.1991863
Qi Zhou, William Li, Hongquan Xu
Abstract Extensive studies have been conducted on how to select efficient designs with respect to a criterion. Most design criteria aim to capture the overall efficiency of the design across all columns. When prior information indicated that a small number of factors and their two-factor interactions (2fi’s) are likely to be more significant than other effects, commonly used minimum aberration designs may no longer be the best choice. Motivated by a real-life experiment, we propose a new class of regular fractional factorial designs that focus on estimating a subset of columns and their corresponding 2fi’s clear of other important effects. After introducing the concept of individual clear effects (iCE) to describe clear 2fi’s involving a specific factor, we define the clear effect pattern criterion to characterize the distribution of iCE’s over all columns. We then obtain a new class of designs that sequentially maximize the clear effect pattern. These newly constructed designs are often different from existing optimal designs. We develop a series of theoretical results that can be particularly useful for constructing designs with large run sizes, for which algorithmic construction becomes computationally challenging. We also provide some practical guidelines on how to choose appropriate designs with respect to different run size, the number of factors, and the number of 2fi’s that need to be clear.
{"title":"Utilizing individual clear effects for intelligent factor allocations and design selections","authors":"Qi Zhou, William Li, Hongquan Xu","doi":"10.1080/00224065.2021.1991863","DOIUrl":"https://doi.org/10.1080/00224065.2021.1991863","url":null,"abstract":"Abstract Extensive studies have been conducted on how to select efficient designs with respect to a criterion. Most design criteria aim to capture the overall efficiency of the design across all columns. When prior information indicated that a small number of factors and their two-factor interactions (2fi’s) are likely to be more significant than other effects, commonly used minimum aberration designs may no longer be the best choice. Motivated by a real-life experiment, we propose a new class of regular fractional factorial designs that focus on estimating a subset of columns and their corresponding 2fi’s clear of other important effects. After introducing the concept of individual clear effects (iCE) to describe clear 2fi’s involving a specific factor, we define the clear effect pattern criterion to characterize the distribution of iCE’s over all columns. We then obtain a new class of designs that sequentially maximize the clear effect pattern. These newly constructed designs are often different from existing optimal designs. We develop a series of theoretical results that can be particularly useful for constructing designs with large run sizes, for which algorithmic construction becomes computationally challenging. We also provide some practical guidelines on how to choose appropriate designs with respect to different run size, the number of factors, and the number of 2fi’s that need to be clear.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"171 1","pages":"3 - 17"},"PeriodicalIF":2.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78446053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-28DOI: 10.1080/00224065.2021.1991251
Zhijun Wang, Chunjie Wu, Miaomiao Yu, F. Tsung
Abstract Conventional self-starting control schemes can perform poorly when monitoring processes with early shifts, being limited by the number of historical observations sampled. In real applications, pre-observed data sets from other production lines are always available, prompting us to propose a scheme that monitors the target process using historical data obtained from other sources. The methodology of self-taught clustering from unsupervised transfer learning is revised to transfer knowledge from previous observations and improve out-of-control (OC) performance, especially for processes with early shifts. However, if the difference in distribution between the target process and the pre-observed data set is large, our scheme may not be the best. Simulation results and two illustrative examples demonstrate the superiority of the proposed scheme.
{"title":"Self-starting process monitoring based on transfer learning","authors":"Zhijun Wang, Chunjie Wu, Miaomiao Yu, F. Tsung","doi":"10.1080/00224065.2021.1991251","DOIUrl":"https://doi.org/10.1080/00224065.2021.1991251","url":null,"abstract":"Abstract Conventional self-starting control schemes can perform poorly when monitoring processes with early shifts, being limited by the number of historical observations sampled. In real applications, pre-observed data sets from other production lines are always available, prompting us to propose a scheme that monitors the target process using historical data obtained from other sources. The methodology of self-taught clustering from unsupervised transfer learning is revised to transfer knowledge from previous observations and improve out-of-control (OC) performance, especially for processes with early shifts. However, if the difference in distribution between the target process and the pre-observed data set is large, our scheme may not be the best. Simulation results and two illustrative examples demonstrate the superiority of the proposed scheme.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"8 1","pages":"589 - 604"},"PeriodicalIF":2.5,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82710881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-26DOI: 10.1080/00224065.2023.2196034
M. Ebadi, Shoja'eddin Chenouri, Stefan H. Steiner
Abstract One of the significant challenges in monitoring the quality of products today is the high dimensionality of quality characteristics. In this paper, we address Phase I analysis of high-dimensional processes with individual observations when the available number of samples collected over time is limited. Using a new charting statistic, we propose a robust procedure for parameter estimation in Phase I. This robust procedure is efficient in parameter estimation in the presence of outliers or contamination in the data. A consistent estimator is proposed for parameter estimation and a finite sample correction coefficient is derived and evaluated through simulation. We assess the statistical performance of the proposed method in Phase I. This assessment is carried out in the absence and presence of outliers. We show that, in both cases, the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Besides, we present two real-world examples to illustrate the applicability of our proposed method.
{"title":"Phase I analysis of high-dimensional processes in the presence of outliers","authors":"M. Ebadi, Shoja'eddin Chenouri, Stefan H. Steiner","doi":"10.1080/00224065.2023.2196034","DOIUrl":"https://doi.org/10.1080/00224065.2023.2196034","url":null,"abstract":"Abstract One of the significant challenges in monitoring the quality of products today is the high dimensionality of quality characteristics. In this paper, we address Phase I analysis of high-dimensional processes with individual observations when the available number of samples collected over time is limited. Using a new charting statistic, we propose a robust procedure for parameter estimation in Phase I. This robust procedure is efficient in parameter estimation in the presence of outliers or contamination in the data. A consistent estimator is proposed for parameter estimation and a finite sample correction coefficient is derived and evaluated through simulation. We assess the statistical performance of the proposed method in Phase I. This assessment is carried out in the absence and presence of outliers. We show that, in both cases, the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Besides, we present two real-world examples to illustrate the applicability of our proposed method.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"35 1","pages":"469 - 488"},"PeriodicalIF":2.5,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80554616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}