Pub Date : 2022-08-03DOI: 10.1080/23737484.2022.2106325
Billie Anderson
Abstract In the last several years, there has been significant research in applying semi-supervised machine learning models to the reject inference problem. When a financial institution wants to build a model to predict the default of credit applicants, the institution only has a known good/bad outcome loan status for the accepted applicants; this causes an inherent bias in the model. Reject inference is used to infer the good or bad loan status of credit applicants that were rejected by a financial institution. This paper presents a reject inference technique in which a semi-supervised framework is developed using a Naive Bayes model. The framework uses the expectation-maximization (EM) algorithm to incorporate rejected applicants into the parameter estimation of the model using a bootstrapping approach. The proposed method has an advantage over traditional reject inference methods because the rejected applicant data will participate in the estimation of the model parameters, thus avoiding the extrapolation problem. The Naive Bayes model using the EM algorithm is compared to logistic regression and several semi-supervised techniques.
{"title":"Naive Bayes using the expectation-maximization algorithm for reject inference","authors":"Billie Anderson","doi":"10.1080/23737484.2022.2106325","DOIUrl":"https://doi.org/10.1080/23737484.2022.2106325","url":null,"abstract":"Abstract In the last several years, there has been significant research in applying semi-supervised machine learning models to the reject inference problem. When a financial institution wants to build a model to predict the default of credit applicants, the institution only has a known good/bad outcome loan status for the accepted applicants; this causes an inherent bias in the model. Reject inference is used to infer the good or bad loan status of credit applicants that were rejected by a financial institution. This paper presents a reject inference technique in which a semi-supervised framework is developed using a Naive Bayes model. The framework uses the expectation-maximization (EM) algorithm to incorporate rejected applicants into the parameter estimation of the model using a bootstrapping approach. The proposed method has an advantage over traditional reject inference methods because the rejected applicant data will participate in the estimation of the model parameters, thus avoiding the extrapolation problem. The Naive Bayes model using the EM algorithm is compared to logistic regression and several semi-supervised techniques.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"196 1","pages":"484 - 504"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78094282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-22DOI: 10.1080/23737484.2022.2093294
Oopashna Devi Fokeer, J. Narsoo
Abstract This paper evaluates the accuracy performance of eight stochastic mortality models in the forecasting of the male mortality rates pertaining to different age groups and countries. The mortality datasets for three developed countries (Canada, France and Japan) and two developing countries (Taiwan and Ukraine) are employed in this study. For each country, the age range is split into three age groups – A (0–19), B (20–60) and C (61–90). The forecasting accuracy of the mortality models is evaluated using the RMSE, MAE, MPE and MAPE metrics. Mortality models with more complex specifications perform better for the age groups B and C, than for the age group A. The cohort feature is more significant for age categories B and C, especially for the developed countries where there are significant medical and health advances. From an overall perspective, the Lee-Carter, Renshaw-Haberman and Age-Period-Cohort models are superior for the age group A while the Plat model proves to be the best forecasting model for the age categories B and C. The empirical analysis concludes that the mortality patterns diverge for different age categories and countries with different development status. The occurrence of extreme mortality events also negatively affects the patterns of human mortality.
摘要:本文评价了8种随机死亡率模型在预测不同年龄组和国家男性死亡率方面的准确性。本研究采用三个发达国家(加拿大、法国和日本)和两个发展中国家(台湾和乌克兰)的死亡率数据集。对于每个国家,年龄范围分为三个年龄组:A(0-19岁),B(20-60岁)和C(61-90岁)。使用RMSE、MAE、MPE和MAPE指标评估死亡率模型的预测准确性。具有更复杂规格的死亡率模型在B和C年龄组的表现优于在a年龄组的表现。队列特征在B和C年龄组更为显著,特别是在医疗和卫生取得重大进展的发达国家。从整体上看,Lee-Carter、Renshaw-Haberman和age - period - cohort模型对A年龄组的预测效果较好,而Plat模型对B和c年龄组的预测效果最好。实证分析表明,不同年龄组和不同发展水平国家的死亡率模式存在差异。极端死亡事件的发生也对人类死亡模式产生负面影响。
{"title":"Evaluation of the forecasting accuracy of stochastic mortality models: An analysis of developed and developing countries","authors":"Oopashna Devi Fokeer, J. Narsoo","doi":"10.1080/23737484.2022.2093294","DOIUrl":"https://doi.org/10.1080/23737484.2022.2093294","url":null,"abstract":"Abstract This paper evaluates the accuracy performance of eight stochastic mortality models in the forecasting of the male mortality rates pertaining to different age groups and countries. The mortality datasets for three developed countries (Canada, France and Japan) and two developing countries (Taiwan and Ukraine) are employed in this study. For each country, the age range is split into three age groups – A (0–19), B (20–60) and C (61–90). The forecasting accuracy of the mortality models is evaluated using the RMSE, MAE, MPE and MAPE metrics. Mortality models with more complex specifications perform better for the age groups B and C, than for the age group A. The cohort feature is more significant for age categories B and C, especially for the developed countries where there are significant medical and health advances. From an overall perspective, the Lee-Carter, Renshaw-Haberman and Age-Period-Cohort models are superior for the age group A while the Plat model proves to be the best forecasting model for the age categories B and C. The empirical analysis concludes that the mortality patterns diverge for different age categories and countries with different development status. The occurrence of extreme mortality events also negatively affects the patterns of human mortality.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"8 1","pages":"434 - 462"},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78071359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-12DOI: 10.1080/23737484.2022.2093801
Özlem Türkşen, Gözde Ulu Metin
Abstract Response variables may have replicated measures in experimental studies. The replications of the responses may cause variability due to several reasons, e.g., uncertainty, randomness. It is not proper to define the replicated response measures as a single numerical quantity. In this case, interval-valued response can be used to represent the replicated response values. There have been widely used popular modeling methods for the interval-valued responses in the literature, e.g., Center method, MinMax method and Center and Range (CR) method. This paper introduces an adapted linear modeling method based on CR method. The spread of replicated response measures and golden ratio are used for center point calculation of the CR method. The proposed modeling method is called Golden Center and Range (GCR) method. Three data sets from the literature, polyphenol extraction, wheel cover component and printing ink, were used for application purpose. The performances of the predicted linear regression models were compared by using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) criteria with 5-fold cross-validation (CV). It is seen from the comparison results that the proposed GCR method has similar prediction performance with the CR method for interval-valued response measured data sets according to nonparametric statistical test.
{"title":"An adapted linear modeling method for interval-valued responses: Golden center and range method","authors":"Özlem Türkşen, Gözde Ulu Metin","doi":"10.1080/23737484.2022.2093801","DOIUrl":"https://doi.org/10.1080/23737484.2022.2093801","url":null,"abstract":"Abstract Response variables may have replicated measures in experimental studies. The replications of the responses may cause variability due to several reasons, e.g., uncertainty, randomness. It is not proper to define the replicated response measures as a single numerical quantity. In this case, interval-valued response can be used to represent the replicated response values. There have been widely used popular modeling methods for the interval-valued responses in the literature, e.g., Center method, MinMax method and Center and Range (CR) method. This paper introduces an adapted linear modeling method based on CR method. The spread of replicated response measures and golden ratio are used for center point calculation of the CR method. The proposed modeling method is called Golden Center and Range (GCR) method. Three data sets from the literature, polyphenol extraction, wheel cover component and printing ink, were used for application purpose. The performances of the predicted linear regression models were compared by using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) criteria with 5-fold cross-validation (CV). It is seen from the comparison results that the proposed GCR method has similar prediction performance with the CR method for interval-valued response measured data sets according to nonparametric statistical test.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"42 1","pages":"463 - 483"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78102790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-05DOI: 10.1080/23737484.2022.2087121
S. Bersimis, A. Sachlas
Abstract A challenge, in the era of economic crisis and uncertainty, is to provide health care services in an efficient and effective manner. The protection of public health, the provision of quality healthcare services to patients, the location of health centers, the geographical distribution of patients, and the provision of specialist services are some of the topics that the government and/or a health organization responsible for health care services provision has to arrange. Other topics are the assessment of quality, safety, and effectiveness of healthcare services provided by healthcare providers. Moreover, a central pylon in designing healthcare policy is expenditure monitoring and control. However, among all these topics the most significant is the protection of public health; especially now that viruses such as Coronavirus are spreading rapidly worldwide. This paper aims to review the use of Statistical Process Monitoring techniques in the public health domain in order to improve health care decision-making under uncertainty and further on to provide an innovative three-layer framework for the collection, processing, and real-time analysis of related data like Coronavirus or any other infectious disease that will emerge in the future for both proper and effective case management and effective health policy planning.
{"title":"Surveilling public health through statistical process monitoring: A literature review and a unified framework","authors":"S. Bersimis, A. Sachlas","doi":"10.1080/23737484.2022.2087121","DOIUrl":"https://doi.org/10.1080/23737484.2022.2087121","url":null,"abstract":"Abstract A challenge, in the era of economic crisis and uncertainty, is to provide health care services in an efficient and effective manner. The protection of public health, the provision of quality healthcare services to patients, the location of health centers, the geographical distribution of patients, and the provision of specialist services are some of the topics that the government and/or a health organization responsible for health care services provision has to arrange. Other topics are the assessment of quality, safety, and effectiveness of healthcare services provided by healthcare providers. Moreover, a central pylon in designing healthcare policy is expenditure monitoring and control. However, among all these topics the most significant is the protection of public health; especially now that viruses such as Coronavirus are spreading rapidly worldwide. This paper aims to review the use of Statistical Process Monitoring techniques in the public health domain in order to improve health care decision-making under uncertainty and further on to provide an innovative three-layer framework for the collection, processing, and real-time analysis of related data like Coronavirus or any other infectious disease that will emerge in the future for both proper and effective case management and effective health policy planning.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"35 1","pages":"515 - 543"},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77231435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-14DOI: 10.1080/23737484.2022.2087122
Omar Abbara, M. Zevallos
Abstract In empirical finance, it is usual to consider holidays as if they do not exist. The main goal of this paper is to assess the effects of holidays on volatility estimation and prediction. Holidays are taken into account by assuming they are missing values in a time series of returns generated by a Stochastic volatility (SV) model. Estimation is evaluated through Monte Carlo experiments. In addition, we assess the effects of holidays on one-step ahead Value-at-Risk forecasting using several time series returns. The results are slightly better when we take into account the missing values, especially for VaR forecasting.
{"title":"Stochastic volatility with missing data: Assessing the effects of holidays","authors":"Omar Abbara, M. Zevallos","doi":"10.1080/23737484.2022.2087122","DOIUrl":"https://doi.org/10.1080/23737484.2022.2087122","url":null,"abstract":"Abstract In empirical finance, it is usual to consider holidays as if they do not exist. The main goal of this paper is to assess the effects of holidays on volatility estimation and prediction. Holidays are taken into account by assuming they are missing values in a time series of returns generated by a Stochastic volatility (SV) model. Estimation is evaluated through Monte Carlo experiments. In addition, we assess the effects of holidays on one-step ahead Value-at-Risk forecasting using several time series returns. The results are slightly better when we take into account the missing values, especially for VaR forecasting.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"11 1","pages":"423 - 433"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87143684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-19DOI: 10.1080/23737484.2022.2073925
F. E. Salama, Ahmed M. Gad, A. A. E. Sheikh, A. M. Mohamed
Abstract A common way to deal with count data is to fit a generalized linear model. The most common approaches are the Poisson regression model and the negative binomial regression model. However, Conway-Maxwell Poisson (COM-Poisson) regression model is more flexible to fit count data. This model has been widely used to describe under- or over-dispersion problem for count data in cross-sectional setting. However, there is no application of the COM-Poisson model in longitudinal data. We propose and develop the COM-Poisson regression model to fit longitudinal count data. We compare this model with the Poisson regression model and the negative binomial model, under two different working correlation structures; exchangeable and autoregressive of order 1, AR(1). The results show that the COM-Poisson model is very suitable to longitudinal count data, even in presence of dispersion; it gives the smallest AIC values. Also, it is insensitive to the choice of the working structure. Extensive simulation is conducted for small, moderate and large sample sizes, to evaluate the proposed model. The proposed approach has good results compared with other models using different criteria.
{"title":"A flexible model to fit over-dispersed longitudinal count data","authors":"F. E. Salama, Ahmed M. Gad, A. A. E. Sheikh, A. M. Mohamed","doi":"10.1080/23737484.2022.2073925","DOIUrl":"https://doi.org/10.1080/23737484.2022.2073925","url":null,"abstract":"Abstract A common way to deal with count data is to fit a generalized linear model. The most common approaches are the Poisson regression model and the negative binomial regression model. However, Conway-Maxwell Poisson (COM-Poisson) regression model is more flexible to fit count data. This model has been widely used to describe under- or over-dispersion problem for count data in cross-sectional setting. However, there is no application of the COM-Poisson model in longitudinal data. We propose and develop the COM-Poisson regression model to fit longitudinal count data. We compare this model with the Poisson regression model and the negative binomial model, under two different working correlation structures; exchangeable and autoregressive of order 1, AR(1). The results show that the COM-Poisson model is very suitable to longitudinal count data, even in presence of dispersion; it gives the smallest AIC values. Also, it is insensitive to the choice of the working structure. Extensive simulation is conducted for small, moderate and large sample sizes, to evaluate the proposed model. The proposed approach has good results compared with other models using different criteria.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"29 2 1","pages":"407 - 422"},"PeriodicalIF":0.0,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89309403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-13DOI: 10.1080/23737484.2022.2074913
Yahya A. Nkrumah, E. Aidoo, Williams Ackaah
Abstract Red-light violations have been associated with road traffic crashes across the globe. This study was conducted to determine the rate of red-light violations among motorcyclists in the Accra metropolis, Ghana, and the associated risk factors. Observational data collected at four signalized intersections were used. Possible risk factors for red-light violation were determined using mixed-effect logistic regression model. The results showed that 64% of motorcyclists violated the red-light. The results further revealed that motorcyclists with pillion passengers were more likely to violate red-lights. Also, motorcyclists were more likely to violate red-lights in the evenings, on weekends and when the traffic cycle length was more than two minutes. The study also found that motorcyclists were less likely to violate red-lights at T-junctions and during times that other motorcyclists stop when a red traffic signal is on.
{"title":"Mixed-effect logit modeling of red-light violations among motorcyclists","authors":"Yahya A. Nkrumah, E. Aidoo, Williams Ackaah","doi":"10.1080/23737484.2022.2074913","DOIUrl":"https://doi.org/10.1080/23737484.2022.2074913","url":null,"abstract":"Abstract Red-light violations have been associated with road traffic crashes across the globe. This study was conducted to determine the rate of red-light violations among motorcyclists in the Accra metropolis, Ghana, and the associated risk factors. Observational data collected at four signalized intersections were used. Possible risk factors for red-light violation were determined using mixed-effect logistic regression model. The results showed that 64% of motorcyclists violated the red-light. The results further revealed that motorcyclists with pillion passengers were more likely to violate red-lights. Also, motorcyclists were more likely to violate red-lights in the evenings, on weekends and when the traffic cycle length was more than two minutes. The study also found that motorcyclists were less likely to violate red-lights at T-junctions and during times that other motorcyclists stop when a red traffic signal is on.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"46 1","pages":"505 - 514"},"PeriodicalIF":0.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74937149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-13DOI: 10.1080/23737484.2022.2072410
S. Balaswamy, R. Vishnu Vardhan
Abstract The Receiver Operating Characteristic curve is one of the widely used classification tools that helps in assessing the performance of the diagnostic test as well as accommodates for comparing two diagnostic tests/statistical procedures using its intrinsic and accuracy measures, such as, sensitivity; specificity, and the Area under the Curve. The conventional and standard ROC model is the Bi-normal ROC model which is based on the assumption that the test scores/marker values underlie Normal distributions. Over the years, several researchers have developed various bi-distributional ROC models where the data possess the pattern of Exponential, Gamma, the combination of Half Normal and Rayleigh, etc. However, there are many practical situations, particularly in the field of medicine, where these available distributions may not be of fit for the data at hand. In this article, we attempted to propose two new ROC models and showed that these models have a better fit and explain better accuracy than that of the existing ROC models. The work is supported by a real dataset and simulated datasets.
{"title":"Estimation of the area under the ROC curve for non-normal data","authors":"S. Balaswamy, R. Vishnu Vardhan","doi":"10.1080/23737484.2022.2072410","DOIUrl":"https://doi.org/10.1080/23737484.2022.2072410","url":null,"abstract":"Abstract The Receiver Operating Characteristic curve is one of the widely used classification tools that helps in assessing the performance of the diagnostic test as well as accommodates for comparing two diagnostic tests/statistical procedures using its intrinsic and accuracy measures, such as, sensitivity; specificity, and the Area under the Curve. The conventional and standard ROC model is the Bi-normal ROC model which is based on the assumption that the test scores/marker values underlie Normal distributions. Over the years, several researchers have developed various bi-distributional ROC models where the data possess the pattern of Exponential, Gamma, the combination of Half Normal and Rayleigh, etc. However, there are many practical situations, particularly in the field of medicine, where these available distributions may not be of fit for the data at hand. In this article, we attempted to propose two new ROC models and showed that these models have a better fit and explain better accuracy than that of the existing ROC models. The work is supported by a real dataset and simulated datasets.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"109 1","pages":"393 - 406"},"PeriodicalIF":0.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80817495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-08DOI: 10.1080/23737484.2022.2056547
Eddie Sainte-Rose, J. Vaillant
Abstract We consider the spatial and temporal distribution of teleconsultations associated with the COVID-19 epidemic in Martinique, French West Indies from March to May 2020. Statistical tools for the detection of high-frequency areas are presented. The mathematical modeling underlying the so-called scanning methods are discussed taking into account the influence of covariates on teleconsultation occurrences and their evolution over time. Some tools available in the R programming environment and the SaTScan software are presented. The spatio-temporal statistical analysis of COVID-19 teleconsultations is performed. Areas for which the frequencies of people using teleconsultations are significantly higher than elsewhere are presented and these results are discussed with respect to covariates providing relevant information on specific characteristics of Martinique island.
{"title":"Statistical analysis of teleconsultations in the context of the COVID-19 epidemic","authors":"Eddie Sainte-Rose, J. Vaillant","doi":"10.1080/23737484.2022.2056547","DOIUrl":"https://doi.org/10.1080/23737484.2022.2056547","url":null,"abstract":"Abstract We consider the spatial and temporal distribution of teleconsultations associated with the COVID-19 epidemic in Martinique, French West Indies from March to May 2020. Statistical tools for the detection of high-frequency areas are presented. The mathematical modeling underlying the so-called scanning methods are discussed taking into account the influence of covariates on teleconsultation occurrences and their evolution over time. Some tools available in the R programming environment and the SaTScan software are presented. The spatio-temporal statistical analysis of COVID-19 teleconsultations is performed. Areas for which the frequencies of people using teleconsultations are significantly higher than elsewhere are presented and these results are discussed with respect to covariates providing relevant information on specific characteristics of Martinique island.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"19 1","pages":"381 - 392"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86033521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/23737484.2022.2044411
Sigeng Chen, S. Steiner, R. J. MacKay, Asokan Mulayath Variyath
Abstract Suppose an important continuous quality characteristic with specification is expensive to measure with a gold standard measurement system. A 100% pass/fail inspection scheme uses a binary measurement system such as a no-go gauge to avoid the expensive gold standard measurements. The inspection scheme makes some errors and we are interested in estimating both the probability of passing a bad part and of failing a good part. We assume that the inspection system is not destructive so we can inspect parts multiple times if we so choose. A part is verified if we use the gold standard system to determine if the part is within specification or not. We propose and quantify the benefits of a new cost-effective assessment plan that verifies only a small fraction of the parts selected for the study.
{"title":"Assessing a binary measurement system with an underlying continuous measurand using targeted verification","authors":"Sigeng Chen, S. Steiner, R. J. MacKay, Asokan Mulayath Variyath","doi":"10.1080/23737484.2022.2044411","DOIUrl":"https://doi.org/10.1080/23737484.2022.2044411","url":null,"abstract":"Abstract Suppose an important continuous quality characteristic with specification is expensive to measure with a gold standard measurement system. A 100% pass/fail inspection scheme uses a binary measurement system such as a no-go gauge to avoid the expensive gold standard measurements. The inspection scheme makes some errors and we are interested in estimating both the probability of passing a bad part and of failing a good part. We assume that the inspection system is not destructive so we can inspect parts multiple times if we so choose. A part is verified if we use the gold standard system to determine if the part is within specification or not. We propose and quantify the benefits of a new cost-effective assessment plan that verifies only a small fraction of the parts selected for the study.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"12 1","pages":"308 - 330"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88765396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}