首页 > 最新文献

Communications in Statistics Case Studies Data Analysis and Applications最新文献

英文 中文
Naive Bayes using the expectation-maximization algorithm for reject inference 朴素贝叶斯利用期望最大化算法进行拒绝推理
Q4 Mathematics Pub Date : 2022-08-03 DOI: 10.1080/23737484.2022.2106325
Billie Anderson
Abstract In the last several years, there has been significant research in applying semi-supervised machine learning models to the reject inference problem. When a financial institution wants to build a model to predict the default of credit applicants, the institution only has a known good/bad outcome loan status for the accepted applicants; this causes an inherent bias in the model. Reject inference is used to infer the good or bad loan status of credit applicants that were rejected by a financial institution. This paper presents a reject inference technique in which a semi-supervised framework is developed using a Naive Bayes model. The framework uses the expectation-maximization (EM) algorithm to incorporate rejected applicants into the parameter estimation of the model using a bootstrapping approach. The proposed method has an advantage over traditional reject inference methods because the rejected applicant data will participate in the estimation of the model parameters, thus avoiding the extrapolation problem. The Naive Bayes model using the EM algorithm is compared to logistic regression and several semi-supervised techniques.
在过去的几年里,将半监督机器学习模型应用于拒绝推理问题已经有了大量的研究。当金融机构想要建立一个模型来预测信贷申请人的违约情况时,该机构对已接受的申请人只有一个已知的好/坏结果贷款状态;这导致了模型中固有的偏差。拒绝推理用于推断被金融机构拒绝的信贷申请人的良好或不良贷款状态。本文提出了一种基于朴素贝叶斯模型的半监督框架拒绝推理技术。该框架使用期望最大化(EM)算法,采用自举方法将被拒绝的申请人纳入模型的参数估计中。与传统的拒绝推理方法相比,该方法的优点是被拒绝的申请人数据将参与模型参数的估计,从而避免了外推问题。将采用EM算法的朴素贝叶斯模型与逻辑回归和几种半监督技术进行了比较。
{"title":"Naive Bayes using the expectation-maximization algorithm for reject inference","authors":"Billie Anderson","doi":"10.1080/23737484.2022.2106325","DOIUrl":"https://doi.org/10.1080/23737484.2022.2106325","url":null,"abstract":"Abstract In the last several years, there has been significant research in applying semi-supervised machine learning models to the reject inference problem. When a financial institution wants to build a model to predict the default of credit applicants, the institution only has a known good/bad outcome loan status for the accepted applicants; this causes an inherent bias in the model. Reject inference is used to infer the good or bad loan status of credit applicants that were rejected by a financial institution. This paper presents a reject inference technique in which a semi-supervised framework is developed using a Naive Bayes model. The framework uses the expectation-maximization (EM) algorithm to incorporate rejected applicants into the parameter estimation of the model using a bootstrapping approach. The proposed method has an advantage over traditional reject inference methods because the rejected applicant data will participate in the estimation of the model parameters, thus avoiding the extrapolation problem. The Naive Bayes model using the EM algorithm is compared to logistic regression and several semi-supervised techniques.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"196 1","pages":"484 - 504"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78094282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of the forecasting accuracy of stochastic mortality models: An analysis of developed and developing countries 评价随机死亡率模型的预测准确性:发达国家和发展中国家的分析
Q4 Mathematics Pub Date : 2022-07-22 DOI: 10.1080/23737484.2022.2093294
Oopashna Devi Fokeer, J. Narsoo
Abstract This paper evaluates the accuracy performance of eight stochastic mortality models in the forecasting of the male mortality rates pertaining to different age groups and countries. The mortality datasets for three developed countries (Canada, France and Japan) and two developing countries (Taiwan and Ukraine) are employed in this study. For each country, the age range is split into three age groups – A (0–19), B (20–60) and C (61–90). The forecasting accuracy of the mortality models is evaluated using the RMSE, MAE, MPE and MAPE metrics. Mortality models with more complex specifications perform better for the age groups B and C, than for the age group A. The cohort feature is more significant for age categories B and C, especially for the developed countries where there are significant medical and health advances. From an overall perspective, the Lee-Carter, Renshaw-Haberman and Age-Period-Cohort models are superior for the age group A while the Plat model proves to be the best forecasting model for the age categories B and C. The empirical analysis concludes that the mortality patterns diverge for different age categories and countries with different development status. The occurrence of extreme mortality events also negatively affects the patterns of human mortality.
摘要:本文评价了8种随机死亡率模型在预测不同年龄组和国家男性死亡率方面的准确性。本研究采用三个发达国家(加拿大、法国和日本)和两个发展中国家(台湾和乌克兰)的死亡率数据集。对于每个国家,年龄范围分为三个年龄组:A(0-19岁),B(20-60岁)和C(61-90岁)。使用RMSE、MAE、MPE和MAPE指标评估死亡率模型的预测准确性。具有更复杂规格的死亡率模型在B和C年龄组的表现优于在a年龄组的表现。队列特征在B和C年龄组更为显著,特别是在医疗和卫生取得重大进展的发达国家。从整体上看,Lee-Carter、Renshaw-Haberman和age - period - cohort模型对A年龄组的预测效果较好,而Plat模型对B和c年龄组的预测效果最好。实证分析表明,不同年龄组和不同发展水平国家的死亡率模式存在差异。极端死亡事件的发生也对人类死亡模式产生负面影响。
{"title":"Evaluation of the forecasting accuracy of stochastic mortality models: An analysis of developed and developing countries","authors":"Oopashna Devi Fokeer, J. Narsoo","doi":"10.1080/23737484.2022.2093294","DOIUrl":"https://doi.org/10.1080/23737484.2022.2093294","url":null,"abstract":"Abstract This paper evaluates the accuracy performance of eight stochastic mortality models in the forecasting of the male mortality rates pertaining to different age groups and countries. The mortality datasets for three developed countries (Canada, France and Japan) and two developing countries (Taiwan and Ukraine) are employed in this study. For each country, the age range is split into three age groups – A (0–19), B (20–60) and C (61–90). The forecasting accuracy of the mortality models is evaluated using the RMSE, MAE, MPE and MAPE metrics. Mortality models with more complex specifications perform better for the age groups B and C, than for the age group A. The cohort feature is more significant for age categories B and C, especially for the developed countries where there are significant medical and health advances. From an overall perspective, the Lee-Carter, Renshaw-Haberman and Age-Period-Cohort models are superior for the age group A while the Plat model proves to be the best forecasting model for the age categories B and C. The empirical analysis concludes that the mortality patterns diverge for different age categories and countries with different development status. The occurrence of extreme mortality events also negatively affects the patterns of human mortality.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"8 1","pages":"434 - 462"},"PeriodicalIF":0.0,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78071359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An adapted linear modeling method for interval-valued responses: Golden center and range method 区间值响应的一种自适应线性建模方法:黄金中心和极差法
Q4 Mathematics Pub Date : 2022-07-12 DOI: 10.1080/23737484.2022.2093801
Özlem Türkşen, Gözde Ulu Metin
Abstract Response variables may have replicated measures in experimental studies. The replications of the responses may cause variability due to several reasons, e.g., uncertainty, randomness. It is not proper to define the replicated response measures as a single numerical quantity. In this case, interval-valued response can be used to represent the replicated response values. There have been widely used popular modeling methods for the interval-valued responses in the literature, e.g., Center method, MinMax method and Center and Range (CR) method. This paper introduces an adapted linear modeling method based on CR method. The spread of replicated response measures and golden ratio are used for center point calculation of the CR method. The proposed modeling method is called Golden Center and Range (GCR) method. Three data sets from the literature, polyphenol extraction, wheel cover component and printing ink, were used for application purpose. The performances of the predicted linear regression models were compared by using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) criteria with 5-fold cross-validation (CV). It is seen from the comparison results that the proposed GCR method has similar prediction performance with the CR method for interval-valued response measured data sets according to nonparametric statistical test.
在实验研究中,响应变量可能具有重复测量。由于不确定性、随机性等原因,响应的重复可能会引起变异性。将重复响应度量定义为单个数值量是不合适的。在这种情况下,可以使用间隔值响应来表示复制的响应值。文献中广泛采用了较为流行的区间值响应建模方法,如中心法、最小最大值法和中心极差(CR)法。本文介绍了一种基于CR方法的自适应线性建模方法。CR方法的中心点计算采用了重复响应测度的扩散和黄金分割。所提出的建模方法被称为黄金中心距离(GCR)方法。本文采用了文献中的三个数据集:多酚萃取物、轮套成分和油墨。采用平均绝对误差(MAE)和均方根误差(RMSE)标准进行5倍交叉验证(CV),比较预测的线性回归模型的性能。对比结果表明,对于非参数统计检验的区间值响应实测数据集,本文提出的GCR方法与CR方法具有相似的预测性能。
{"title":"An adapted linear modeling method for interval-valued responses: Golden center and range method","authors":"Özlem Türkşen, Gözde Ulu Metin","doi":"10.1080/23737484.2022.2093801","DOIUrl":"https://doi.org/10.1080/23737484.2022.2093801","url":null,"abstract":"Abstract Response variables may have replicated measures in experimental studies. The replications of the responses may cause variability due to several reasons, e.g., uncertainty, randomness. It is not proper to define the replicated response measures as a single numerical quantity. In this case, interval-valued response can be used to represent the replicated response values. There have been widely used popular modeling methods for the interval-valued responses in the literature, e.g., Center method, MinMax method and Center and Range (CR) method. This paper introduces an adapted linear modeling method based on CR method. The spread of replicated response measures and golden ratio are used for center point calculation of the CR method. The proposed modeling method is called Golden Center and Range (GCR) method. Three data sets from the literature, polyphenol extraction, wheel cover component and printing ink, were used for application purpose. The performances of the predicted linear regression models were compared by using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) criteria with 5-fold cross-validation (CV). It is seen from the comparison results that the proposed GCR method has similar prediction performance with the CR method for interval-valued response measured data sets according to nonparametric statistical test.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"42 1","pages":"463 - 483"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78102790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surveilling public health through statistical process monitoring: A literature review and a unified framework 通过统计过程监测监测公共卫生:文献综述和统一框架
Q4 Mathematics Pub Date : 2022-07-05 DOI: 10.1080/23737484.2022.2087121
S. Bersimis, A. Sachlas
Abstract A challenge, in the era of economic crisis and uncertainty, is to provide health care services in an efficient and effective manner. The protection of public health, the provision of quality healthcare services to patients, the location of health centers, the geographical distribution of patients, and the provision of specialist services are some of the topics that the government and/or a health organization responsible for health care services provision has to arrange. Other topics are the assessment of quality, safety, and effectiveness of healthcare services provided by healthcare providers. Moreover, a central pylon in designing healthcare policy is expenditure monitoring and control. However, among all these topics the most significant is the protection of public health; especially now that viruses such as Coronavirus are spreading rapidly worldwide. This paper aims to review the use of Statistical Process Monitoring techniques in the public health domain in order to improve health care decision-making under uncertainty and further on to provide an innovative three-layer framework for the collection, processing, and real-time analysis of related data like Coronavirus or any other infectious disease that will emerge in the future for both proper and effective case management and effective health policy planning.
在经济危机和不确定的时代,如何以高效有效的方式提供卫生保健服务是一个挑战。保护公众健康、向病人提供高质量的保健服务、保健中心的位置、病人的地理分布以及提供专家服务是政府和(或)负责提供保健服务的卫生组织必须安排的一些主题。其他主题是评估医疗保健提供者提供的医疗保健服务的质量、安全性和有效性。此外,医疗保健政策设计的一个中心问题是支出监测和控制。然而,在所有这些主题中,最重要的是保护公众健康;特别是现在冠状病毒等病毒正在全球迅速传播。本文旨在回顾统计过程监测技术在公共卫生领域的应用,以改善不确定条件下的卫生保健决策,并进一步提供一个创新的三层框架,用于收集、处理和实时分析相关数据,如冠状病毒或未来将出现的任何其他传染病,以便进行适当有效的病例管理和有效的卫生政策规划。
{"title":"Surveilling public health through statistical process monitoring: A literature review and a unified framework","authors":"S. Bersimis, A. Sachlas","doi":"10.1080/23737484.2022.2087121","DOIUrl":"https://doi.org/10.1080/23737484.2022.2087121","url":null,"abstract":"Abstract A challenge, in the era of economic crisis and uncertainty, is to provide health care services in an efficient and effective manner. The protection of public health, the provision of quality healthcare services to patients, the location of health centers, the geographical distribution of patients, and the provision of specialist services are some of the topics that the government and/or a health organization responsible for health care services provision has to arrange. Other topics are the assessment of quality, safety, and effectiveness of healthcare services provided by healthcare providers. Moreover, a central pylon in designing healthcare policy is expenditure monitoring and control. However, among all these topics the most significant is the protection of public health; especially now that viruses such as Coronavirus are spreading rapidly worldwide. This paper aims to review the use of Statistical Process Monitoring techniques in the public health domain in order to improve health care decision-making under uncertainty and further on to provide an innovative three-layer framework for the collection, processing, and real-time analysis of related data like Coronavirus or any other infectious disease that will emerge in the future for both proper and effective case management and effective health policy planning.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"35 1","pages":"515 - 543"},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77231435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stochastic volatility with missing data: Assessing the effects of holidays 缺少数据的随机波动:评估假期的影响
Q4 Mathematics Pub Date : 2022-06-14 DOI: 10.1080/23737484.2022.2087122
Omar Abbara, M. Zevallos
Abstract In empirical finance, it is usual to consider holidays as if they do not exist. The main goal of this paper is to assess the effects of holidays on volatility estimation and prediction. Holidays are taken into account by assuming they are missing values in a time series of returns generated by a Stochastic volatility (SV) model. Estimation is evaluated through Monte Carlo experiments. In addition, we assess the effects of holidays on one-step ahead Value-at-Risk forecasting using several time series returns. The results are slightly better when we take into account the missing values, especially for VaR forecasting.
在实证金融学中,通常认为假期好像不存在一样。本文的主要目的是评估假期对波动率估计和预测的影响。通过假设假日是由随机波动率(SV)模型产生的收益时间序列中的缺失值来考虑假日。通过蒙特卡罗实验对估计进行了评估。此外,我们使用几个时间序列回报评估假期对提前一步风险价值预测的影响。当我们考虑缺失值时,结果略好,特别是对于VaR预测。
{"title":"Stochastic volatility with missing data: Assessing the effects of holidays","authors":"Omar Abbara, M. Zevallos","doi":"10.1080/23737484.2022.2087122","DOIUrl":"https://doi.org/10.1080/23737484.2022.2087122","url":null,"abstract":"Abstract In empirical finance, it is usual to consider holidays as if they do not exist. The main goal of this paper is to assess the effects of holidays on volatility estimation and prediction. Holidays are taken into account by assuming they are missing values in a time series of returns generated by a Stochastic volatility (SV) model. Estimation is evaluated through Monte Carlo experiments. In addition, we assess the effects of holidays on one-step ahead Value-at-Risk forecasting using several time series returns. The results are slightly better when we take into account the missing values, especially for VaR forecasting.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"11 1","pages":"423 - 433"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87143684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A flexible model to fit over-dispersed longitudinal count data 一个灵活的模型来拟合过于分散的纵向计数数据
Q4 Mathematics Pub Date : 2022-05-19 DOI: 10.1080/23737484.2022.2073925
F. E. Salama, Ahmed M. Gad, A. A. E. Sheikh, A. M. Mohamed
Abstract A common way to deal with count data is to fit a generalized linear model. The most common approaches are the Poisson regression model and the negative binomial regression model. However, Conway-Maxwell Poisson (COM-Poisson) regression model is more flexible to fit count data. This model has been widely used to describe under- or over-dispersion problem for count data in cross-sectional setting. However, there is no application of the COM-Poisson model in longitudinal data. We propose and develop the COM-Poisson regression model to fit longitudinal count data. We compare this model with the Poisson regression model and the negative binomial model, under two different working correlation structures; exchangeable and autoregressive of order 1, AR(1). The results show that the COM-Poisson model is very suitable to longitudinal count data, even in presence of dispersion; it gives the smallest AIC values. Also, it is insensitive to the choice of the working structure. Extensive simulation is conducted for small, moderate and large sample sizes, to evaluate the proposed model. The proposed approach has good results compared with other models using different criteria.
处理计数数据的一种常用方法是拟合广义线性模型。最常用的方法是泊松回归模型和负二项回归模型。然而,康威-麦克斯韦泊松(com -泊松)回归模型更灵活地拟合计数数据。该模型已被广泛用于描述横截面设置中计数数据的过分散或过分散问题。然而,com -泊松模型在纵向数据中没有应用。我们提出并发展了com -泊松回归模型来拟合纵向计数数据。在两种不同的工作相关结构下,将该模型与泊松回归模型和负二项模型进行比较;1阶可交换自回归,AR(1)。结果表明,COM-Poisson模型对纵向计数数据非常适用,即使存在色散;它给出最小的AIC值。此外,它对工作结构的选择不敏感。对小样本、中等样本和大样本进行了广泛的模拟,以评估所提出的模型。与采用不同准则的其他模型相比,该方法具有较好的效果。
{"title":"A flexible model to fit over-dispersed longitudinal count data","authors":"F. E. Salama, Ahmed M. Gad, A. A. E. Sheikh, A. M. Mohamed","doi":"10.1080/23737484.2022.2073925","DOIUrl":"https://doi.org/10.1080/23737484.2022.2073925","url":null,"abstract":"Abstract A common way to deal with count data is to fit a generalized linear model. The most common approaches are the Poisson regression model and the negative binomial regression model. However, Conway-Maxwell Poisson (COM-Poisson) regression model is more flexible to fit count data. This model has been widely used to describe under- or over-dispersion problem for count data in cross-sectional setting. However, there is no application of the COM-Poisson model in longitudinal data. We propose and develop the COM-Poisson regression model to fit longitudinal count data. We compare this model with the Poisson regression model and the negative binomial model, under two different working correlation structures; exchangeable and autoregressive of order 1, AR(1). The results show that the COM-Poisson model is very suitable to longitudinal count data, even in presence of dispersion; it gives the smallest AIC values. Also, it is insensitive to the choice of the working structure. Extensive simulation is conducted for small, moderate and large sample sizes, to evaluate the proposed model. The proposed approach has good results compared with other models using different criteria.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"29 2 1","pages":"407 - 422"},"PeriodicalIF":0.0,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89309403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixed-effect logit modeling of red-light violations among motorcyclists 摩托车驾驶员闯红灯行为的混合效应logit模型
Q4 Mathematics Pub Date : 2022-05-13 DOI: 10.1080/23737484.2022.2074913
Yahya A. Nkrumah, E. Aidoo, Williams Ackaah
Abstract Red-light violations have been associated with road traffic crashes across the globe. This study was conducted to determine the rate of red-light violations among motorcyclists in the Accra metropolis, Ghana, and the associated risk factors. Observational data collected at four signalized intersections were used. Possible risk factors for red-light violation were determined using mixed-effect logistic regression model. The results showed that 64% of motorcyclists violated the red-light. The results further revealed that motorcyclists with pillion passengers were more likely to violate red-lights. Also, motorcyclists were more likely to violate red-lights in the evenings, on weekends and when the traffic cycle length was more than two minutes. The study also found that motorcyclists were less likely to violate red-lights at T-junctions and during times that other motorcyclists stop when a red traffic signal is on.
在全球范围内,红灯违规与道路交通事故有关。本研究旨在确定加纳阿克拉市摩托车手违反红灯的比率及其相关风险因素。在四个信号交叉口收集的观测数据被使用。采用混合效应logistic回归模型确定可能的危险因素。结果显示,64%的摩托车手违反了红灯。结果进一步表明,骑摩托车的人更容易违反红灯。此外,摩托车手更容易在晚上、周末以及交通周期超过两分钟时闯红灯。研究还发现,在t型路口和其他摩托车手在红灯亮时停车的时候,骑摩托车的人不太可能违反红灯。
{"title":"Mixed-effect logit modeling of red-light violations among motorcyclists","authors":"Yahya A. Nkrumah, E. Aidoo, Williams Ackaah","doi":"10.1080/23737484.2022.2074913","DOIUrl":"https://doi.org/10.1080/23737484.2022.2074913","url":null,"abstract":"Abstract Red-light violations have been associated with road traffic crashes across the globe. This study was conducted to determine the rate of red-light violations among motorcyclists in the Accra metropolis, Ghana, and the associated risk factors. Observational data collected at four signalized intersections were used. Possible risk factors for red-light violation were determined using mixed-effect logistic regression model. The results showed that 64% of motorcyclists violated the red-light. The results further revealed that motorcyclists with pillion passengers were more likely to violate red-lights. Also, motorcyclists were more likely to violate red-lights in the evenings, on weekends and when the traffic cycle length was more than two minutes. The study also found that motorcyclists were less likely to violate red-lights at T-junctions and during times that other motorcyclists stop when a red traffic signal is on.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"46 1","pages":"505 - 514"},"PeriodicalIF":0.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74937149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimation of the area under the ROC curve for non-normal data 对非正态数据的ROC曲线下面积的估计
Q4 Mathematics Pub Date : 2022-05-13 DOI: 10.1080/23737484.2022.2072410
S. Balaswamy, R. Vishnu Vardhan
Abstract The Receiver Operating Characteristic curve is one of the widely used classification tools that helps in assessing the performance of the diagnostic test as well as accommodates for comparing two diagnostic tests/statistical procedures using its intrinsic and accuracy measures, such as, sensitivity; specificity, and the Area under the Curve. The conventional and standard ROC model is the Bi-normal ROC model which is based on the assumption that the test scores/marker values underlie Normal distributions. Over the years, several researchers have developed various bi-distributional ROC models where the data possess the pattern of Exponential, Gamma, the combination of Half Normal and Rayleigh, etc. However, there are many practical situations, particularly in the field of medicine, where these available distributions may not be of fit for the data at hand. In this article, we attempted to propose two new ROC models and showed that these models have a better fit and explain better accuracy than that of the existing ROC models. The work is supported by a real dataset and simulated datasets.
接收者工作特征曲线是一种广泛使用的分类工具,有助于评估诊断测试的性能,并适应比较两种诊断测试/统计程序,使用其固有和准确性措施,如灵敏度;特异性和曲线下面积。传统和标准的ROC模型是双正态ROC模型,该模型基于测试分数/标记值为正态分布的假设。多年来,一些研究人员开发了各种双分布ROC模型,其中数据具有指数模式,伽马模式,半正态和瑞利的组合模式等。然而,在许多实际情况下,特别是在医学领域,这些可用的分布可能不适合手头的数据。在本文中,我们试图提出两个新的ROC模型,并表明这些模型比现有的ROC模型具有更好的拟合和更好的解释精度。该工作得到了真实数据集和模拟数据集的支持。
{"title":"Estimation of the area under the ROC curve for non-normal data","authors":"S. Balaswamy, R. Vishnu Vardhan","doi":"10.1080/23737484.2022.2072410","DOIUrl":"https://doi.org/10.1080/23737484.2022.2072410","url":null,"abstract":"Abstract The Receiver Operating Characteristic curve is one of the widely used classification tools that helps in assessing the performance of the diagnostic test as well as accommodates for comparing two diagnostic tests/statistical procedures using its intrinsic and accuracy measures, such as, sensitivity; specificity, and the Area under the Curve. The conventional and standard ROC model is the Bi-normal ROC model which is based on the assumption that the test scores/marker values underlie Normal distributions. Over the years, several researchers have developed various bi-distributional ROC models where the data possess the pattern of Exponential, Gamma, the combination of Half Normal and Rayleigh, etc. However, there are many practical situations, particularly in the field of medicine, where these available distributions may not be of fit for the data at hand. In this article, we attempted to propose two new ROC models and showed that these models have a better fit and explain better accuracy than that of the existing ROC models. The work is supported by a real dataset and simulated datasets.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"109 1","pages":"393 - 406"},"PeriodicalIF":0.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80817495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical analysis of teleconsultations in the context of the COVID-19 epidemic 新冠肺炎疫情背景下远程会诊统计分析
Q4 Mathematics Pub Date : 2022-04-08 DOI: 10.1080/23737484.2022.2056547
Eddie Sainte-Rose, J. Vaillant
Abstract We consider the spatial and temporal distribution of teleconsultations associated with the COVID-19 epidemic in Martinique, French West Indies from March to May 2020. Statistical tools for the detection of high-frequency areas are presented. The mathematical modeling underlying the so-called scanning methods are discussed taking into account the influence of covariates on teleconsultation occurrences and their evolution over time. Some tools available in the R programming environment and the SaTScan software are presented. The spatio-temporal statistical analysis of COVID-19 teleconsultations is performed. Areas for which the frequencies of people using teleconsultations are significantly higher than elsewhere are presented and these results are discussed with respect to covariates providing relevant information on specific characteristics of Martinique island.
研究2020年3 - 5月法属西印度群岛马提尼克岛新冠肺炎疫情相关远程会诊的时空分布。提出了用于检测高频区域的统计工具。考虑到协变量对远程会诊的影响及其随时间的演变,讨论了所谓扫描方法的数学建模。介绍了R编程环境和SaTScan软件中可用的一些工具。对新型冠状病毒肺炎远程会诊进行时空统计分析。介绍了人们使用远程咨询频率明显高于其他地方的地区,并根据协变量对这些结果进行了讨论,这些协变量提供了有关马提尼克岛具体特征的相关信息。
{"title":"Statistical analysis of teleconsultations in the context of the COVID-19 epidemic","authors":"Eddie Sainte-Rose, J. Vaillant","doi":"10.1080/23737484.2022.2056547","DOIUrl":"https://doi.org/10.1080/23737484.2022.2056547","url":null,"abstract":"Abstract We consider the spatial and temporal distribution of teleconsultations associated with the COVID-19 epidemic in Martinique, French West Indies from March to May 2020. Statistical tools for the detection of high-frequency areas are presented. The mathematical modeling underlying the so-called scanning methods are discussed taking into account the influence of covariates on teleconsultation occurrences and their evolution over time. Some tools available in the R programming environment and the SaTScan software are presented. The spatio-temporal statistical analysis of COVID-19 teleconsultations is performed. Areas for which the frequencies of people using teleconsultations are significantly higher than elsewhere are presented and these results are discussed with respect to covariates providing relevant information on specific characteristics of Martinique island.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"19 1","pages":"381 - 392"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86033521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing a binary measurement system with an underlying continuous measurand using targeted verification 使用目标验证评估具有底层连续测量的二元测量系统
Q4 Mathematics Pub Date : 2022-04-03 DOI: 10.1080/23737484.2022.2044411
Sigeng Chen, S. Steiner, R. J. MacKay, Asokan Mulayath Variyath
Abstract Suppose an important continuous quality characteristic with specification is expensive to measure with a gold standard measurement system. A 100% pass/fail inspection scheme uses a binary measurement system such as a no-go gauge to avoid the expensive gold standard measurements. The inspection scheme makes some errors and we are interested in estimating both the probability of passing a bad part and of failing a good part. We assume that the inspection system is not destructive so we can inspect parts multiple times if we so choose. A part is verified if we use the gold standard system to determine if the part is within specification or not. We propose and quantify the benefits of a new cost-effective assessment plan that verifies only a small fraction of the parts selected for the study.
假设一个重要的具有规格的连续质量特性,用金标准测量系统来测量是昂贵的。100%合格/不合格检查方案使用二元测量系统,如无量规,以避免昂贵的金标准测量。检测方案会产生一些误差,我们感兴趣的是估计坏零件合格和好零件不合格的概率。我们假设检测系统不是破坏性的,所以我们可以多次检查零件,如果我们这样选择的话。如果我们使用金标准系统来确定零件是否在规格范围内,则对零件进行验证。我们提出并量化了一个新的具有成本效益的评估计划的好处,该计划只验证了为研究选择的一小部分。
{"title":"Assessing a binary measurement system with an underlying continuous measurand using targeted verification","authors":"Sigeng Chen, S. Steiner, R. J. MacKay, Asokan Mulayath Variyath","doi":"10.1080/23737484.2022.2044411","DOIUrl":"https://doi.org/10.1080/23737484.2022.2044411","url":null,"abstract":"Abstract Suppose an important continuous quality characteristic with specification is expensive to measure with a gold standard measurement system. A 100% pass/fail inspection scheme uses a binary measurement system such as a no-go gauge to avoid the expensive gold standard measurements. The inspection scheme makes some errors and we are interested in estimating both the probability of passing a bad part and of failing a good part. We assume that the inspection system is not destructive so we can inspect parts multiple times if we so choose. A part is verified if we use the gold standard system to determine if the part is within specification or not. We propose and quantify the benefits of a new cost-effective assessment plan that verifies only a small fraction of the parts selected for the study.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"12 1","pages":"308 - 330"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88765396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Communications in Statistics Case Studies Data Analysis and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1