首页 > 最新文献

Journal of Probability and Statistics最新文献

英文 中文
Estimation of Risk Factors Affecting Screening Outcomes of Prostate Cancer Using the Bayesian Ordinal Logistic Model 应用贝叶斯有序逻辑模型估计影响癌症筛查结果的危险因素
IF 1.1 Pub Date : 2023-03-06 DOI: 10.1155/2023/4987764
J. Sirengo, D. Alilah, D. Mbete, R. Keli
Prostate cancer occurs when cells in the prostate gland grow out of control. Almost all prostate cancers are adenocarcinomas. The survival rate for prostate cancer patients depends on the screening outcome, which can be either no prostate cancer, early detection, and late detection or advanced stage detection. The main objective of this study was to estimate the risk factors affecting the screening outcome of prostate cancer. With ordinal outcomes, a generalized Bayesian ordinal logistic model was considered in the analysis. The generalized Bayesian ordinal logistic model helped in estimation of coefficient parameters of the risk factors affecting each level of prostate cancer-screening outcomes. In the study, positive coefficients, that is, β k > 0 , indicated that the higher values on the explanatory variable increased the chances of the respondent being in a higher category of the dependent variable than the current one, while the negative coefficients, that is, β k < 0 , signified that the higher values on the explanatory variable increased the likelihood of being in the current or lower category of prostate cancer. For instance, from the analysis, positive or negative outcomes of prostate cancer showed that an increase in weight lowered the chances of an individual having the disease.
当前列腺中的细胞生长失控时,就会发生前列腺癌症。几乎所有的前列腺癌都是腺癌。前列腺癌症患者的存活率取决于筛查结果,筛查结果可以是无前列腺癌症、早期检测、晚期检测或晚期检测。本研究的主要目的是评估影响癌症筛查结果的危险因素。在有序结果的情况下,分析中考虑了广义贝叶斯有序逻辑模型。广义贝叶斯有序逻辑模型有助于估计影响各级前列腺癌筛查结果的危险因素的系数参数。在研究中,正系数,即βk>0,表明解释变量上的较高值增加了被调查者处于比当前因变量更高类别的机会,而负系数,即,βk<0,表明解释变量的较高值增加了处于当前或较低类别的前列腺癌症的可能性。例如,从分析来看,前列腺癌症的阳性或阴性结果表明,体重增加会降低个体患该疾病的几率。
{"title":"Estimation of Risk Factors Affecting Screening Outcomes of Prostate Cancer Using the Bayesian Ordinal Logistic Model","authors":"J. Sirengo, D. Alilah, D. Mbete, R. Keli","doi":"10.1155/2023/4987764","DOIUrl":"https://doi.org/10.1155/2023/4987764","url":null,"abstract":"Prostate cancer occurs when cells in the prostate gland grow out of control. Almost all prostate cancers are adenocarcinomas. The survival rate for prostate cancer patients depends on the screening outcome, which can be either no prostate cancer, early detection, and late detection or advanced stage detection. The main objective of this study was to estimate the risk factors affecting the screening outcome of prostate cancer. With ordinal outcomes, a generalized Bayesian ordinal logistic model was considered in the analysis. The generalized Bayesian ordinal logistic model helped in estimation of coefficient parameters of the risk factors affecting each level of prostate cancer-screening outcomes. In the study, positive coefficients, that is, \u0000 \u0000 \u0000 \u0000 β\u0000 \u0000 \u0000 k\u0000 \u0000 \u0000 >\u0000 0\u0000 \u0000 , indicated that the higher values on the explanatory variable increased the chances of the respondent being in a higher category of the dependent variable than the current one, while the negative coefficients, that is, \u0000 \u0000 \u0000 \u0000 β\u0000 \u0000 \u0000 k\u0000 \u0000 \u0000 <\u0000 0\u0000 \u0000 , signified that the higher values on the explanatory variable increased the likelihood of being in the current or lower category of prostate cancer. For instance, from the analysis, positive or negative outcomes of prostate cancer showed that an increase in weight lowered the chances of an individual having the disease.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46516343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New Test for the Comparison of Survival Curves to Detect Late Differences 比较生存曲线以发现晚期差异的新试验
IF 1.1 Pub Date : 2023-03-01 DOI: 10.1155/2023/9945446
Ildephonse Nizeyimana, S. Mwalili, G. Orwa
Background. Survival analysis attracted the attention of different scientists from various domains such as engineering, health, and social sciences. It has been widely exploited in clinical trials when comparing different treatments looking at their survival probabilities. Kaplan–Meier curves plotted from the Kaplan–Meier estimates of survival probabilities are used to depict the general image for such situations. Methods. The weighted log-rank test has been dealt with by suggesting different weight functions which give specific strength in specific situations. In this work, we proposed a new weight function comprising all numbers at risk, i.e., the overall number at risk and the separate numbers at risk in the groups under study, to detect late differences between survival curves. Results. The new test has been found to be a good alternative after the FH (0, 1) test in detecting late differences, and it outperformed all tests in case of small samples and heavy censoring rates according to the simulation studies. The new test kept the same strength when applied to real data where it showed itself to be among the powerful ones or even outperforms all other tests under consideration. Conclusion. As the new test stays stronger in the case of small samples and heavy censoring rates, it may be a better choice whenever targeting the detection of late differences between the survival curves.
背景生存分析吸引了来自工程、卫生和社会科学等各个领域的不同科学家的注意。在临床试验中,当比较不同的治疗方法的生存概率时,它被广泛利用。根据生存概率的Kaplan–Meier估计绘制的Kaplan-Meier曲线用于描述此类情况的一般图像。方法。加权对数秩检验是通过提出不同的权重函数来处理的,这些函数在特定情况下给出特定的强度。在这项工作中,我们提出了一个新的权重函数,包括所有风险数字,即研究组中的总风险数字和单独的风险数字,以检测生存曲线之间的后期差异。后果在检测后期差异方面,新的测试被发现是FH(0,1)测试之后的一个很好的替代方案,根据模拟研究,在小样本和高删失率的情况下,它优于所有测试。新测试在应用于真实数据时保持了相同的强度,在真实数据中,它显示出自己是强大的测试之一,甚至优于正在考虑的所有其他测试。结论由于新测试在小样本和高审查率的情况下保持更强,因此无论何时针对生存曲线之间的后期差异检测,它都可能是一个更好的选择。
{"title":"New Test for the Comparison of Survival Curves to Detect Late Differences","authors":"Ildephonse Nizeyimana, S. Mwalili, G. Orwa","doi":"10.1155/2023/9945446","DOIUrl":"https://doi.org/10.1155/2023/9945446","url":null,"abstract":"Background. Survival analysis attracted the attention of different scientists from various domains such as engineering, health, and social sciences. It has been widely exploited in clinical trials when comparing different treatments looking at their survival probabilities. Kaplan–Meier curves plotted from the Kaplan–Meier estimates of survival probabilities are used to depict the general image for such situations. Methods. The weighted log-rank test has been dealt with by suggesting different weight functions which give specific strength in specific situations. In this work, we proposed a new weight function comprising all numbers at risk, i.e., the overall number at risk and the separate numbers at risk in the groups under study, to detect late differences between survival curves. Results. The new test has been found to be a good alternative after the FH (0, 1) test in detecting late differences, and it outperformed all tests in case of small samples and heavy censoring rates according to the simulation studies. The new test kept the same strength when applied to real data where it showed itself to be among the powerful ones or even outperforms all other tests under consideration. Conclusion. As the new test stays stronger in the case of small samples and heavy censoring rates, it may be a better choice whenever targeting the detection of late differences between the survival curves.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48477253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using ORRT Models for Mean Estimation under Nonresponse and Measurement Errors in Stratified Successive Sampling 用ORRT模型估计分层连续抽样中无响应和测量误差下的均值
IF 1.1 Pub Date : 2023-02-16 DOI: 10.1155/2023/1340068
M. Choudhary, S. P. Kour, Sunil Kumar, C. Bouza, Agustín Santiago
In the context of a sample survey, the collection of information on a sensitive variable is difficult, which may cause nonresponse and measurement errors. Due to this, the estimates can be biased and the variation may increase. To overcome this difficulty, we propose an estimator for the estimation of a sensitive variable by using auxiliary information in the presence of nonresponse and measurement errors simultaneously. The properties of the proposed estimators have been studied, and the results have been compared with those of the usual complete response estimator. Theoretical results have been verified through a simulation study using an artificial population and two real-life applications. With the outcomes of the proposed estimator, a suitable recommendation has been made to the survey statisticians for their real-life application.
在抽样调查的情况下,收集敏感变量的信息是困难的,这可能导致无响应和测量误差。因此,估计可能会有偏差,变化可能会增加。为了克服这一困难,我们提出了在同时存在非响应和测量误差的情况下,利用辅助信息对敏感变量进行估计的估计器。研究了所提估计量的性质,并将结果与一般的完全响应估计量进行了比较。通过一个人工种群和两个实际应用的模拟研究,验证了理论结果。根据所建议的估计器的结果,已向调查统计人员提出了适合其实际应用的建议。
{"title":"Using ORRT Models for Mean Estimation under Nonresponse and Measurement Errors in Stratified Successive Sampling","authors":"M. Choudhary, S. P. Kour, Sunil Kumar, C. Bouza, Agustín Santiago","doi":"10.1155/2023/1340068","DOIUrl":"https://doi.org/10.1155/2023/1340068","url":null,"abstract":"In the context of a sample survey, the collection of information on a sensitive variable is difficult, which may cause nonresponse and measurement errors. Due to this, the estimates can be biased and the variation may increase. To overcome this difficulty, we propose an estimator for the estimation of a sensitive variable by using auxiliary information in the presence of nonresponse and measurement errors simultaneously. The properties of the proposed estimators have been studied, and the results have been compared with those of the usual complete response estimator. Theoretical results have been verified through a simulation study using an artificial population and two real-life applications. With the outcomes of the proposed estimator, a suitable recommendation has been made to the survey statisticians for their real-life application.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42738081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fitting the Distribution of Linear Combinations of t − Variables with more than 2 Degrees of Freedom 具有2个以上自由度的t变量线性组合的分布拟合
IF 1.1 Pub Date : 2023-02-01 DOI: 10.1155/2023/9967290
O. L. Alcaraz López, E. M. García Fernández, M. Latva-aho
The linear combination of Student’s t random variables (RVs) appears in many statistical applications. Unfortunately, the Student’s t distribution is not closed under convolution, thus, deriving an exact and general distribution for the linear combination of K Student’s t RVs is infeasible, which motivates a fitting/approximation approach. Here, we focus on the scenario where the only constraint is that the number of degrees of freedom of each t − RV is greater than two. Notice that since the odd moments/cumulants of the Student’s t distribution are zero and the even moments/cumulants do not exist when their order is greater than the number of degrees of freedom, it becomes impossible to use conventional approaches based on moments/cumulants of order one or higher than two. To circumvent this issue, herein we propose fitting such a distribution to that of a scaled Student’s t RV by exploiting the second moment together with either the first absolute moment or the characteristic function (CF). For the fitting based on the absolute moment, we depart from the case of the linear combination of K = 2 Student’s t RVs and then generalize to K ≥ 2 through a simple iterative procedure. Meanwhile, the CF-based fitting is direct, but its accuracy (measured in terms of the Bhattacharyya distance metric) depends on the CF parameter configuration, for which we propose a simple but accurate approach. We numerically show that the CF-based fitting usually outperforms the absolute moment-based fitting and that both the scale and number of degrees of freedom of the fitting distribution increase almost linearly with K .
学生随机变量(RVs)的线性组合出现在许多统计应用中。不幸的是,学生的t分布在卷积下不是封闭的,因此,为K个学生的t个rv的线性组合导出一个精确的和一般的分布是不可实现的,这激发了拟合/近似方法。这里,我们关注的场景是,唯一的约束条件是每个t - RV的自由度大于2。请注意,由于学生t分布的奇数矩/累积量为零,而当它们的阶数大于自由度数时,偶数矩/累积量不存在,因此不可能使用基于一阶或高于二阶的矩/累积量的传统方法。为了避免这个问题,本文提出通过利用第二矩和第一绝对矩或特征函数(CF)来拟合缩放后的Student 's t RV分布。对于基于绝对矩的拟合,我们从K = 2个学生的t rv线性组合的情况出发,通过简单的迭代过程推广到K≥2。同时,基于CF的拟合是直接的,但其精度(以Bhattacharyya距离度量衡量)取决于CF参数的配置,为此我们提出了一种简单而准确的方法。数值计算表明,基于cf的拟合通常优于基于绝对矩的拟合,并且拟合分布的规模和自由度数量几乎随K线性增加。
{"title":"Fitting the Distribution of Linear Combinations of \u0000 t\u0000 −\u0000 Variables with more than 2 Degrees of Freedom","authors":"O. L. Alcaraz López, E. M. García Fernández, M. Latva-aho","doi":"10.1155/2023/9967290","DOIUrl":"https://doi.org/10.1155/2023/9967290","url":null,"abstract":"The linear combination of Student’s \u0000 \u0000 t\u0000 \u0000 random variables (RVs) appears in many statistical applications. Unfortunately, the Student’s \u0000 \u0000 t\u0000 \u0000 distribution is not closed under convolution, thus, deriving an exact and general distribution for the linear combination of \u0000 \u0000 K\u0000 \u0000 Student’s \u0000 \u0000 t\u0000 \u0000 RVs is infeasible, which motivates a fitting/approximation approach. Here, we focus on the scenario where the only constraint is that the number of degrees of freedom of each \u0000 \u0000 t\u0000 −\u0000 \u0000 RV is greater than two. Notice that since the odd moments/cumulants of the Student’s \u0000 \u0000 t\u0000 \u0000 distribution are zero and the even moments/cumulants do not exist when their order is greater than the number of degrees of freedom, it becomes impossible to use conventional approaches based on moments/cumulants of order one or higher than two. To circumvent this issue, herein we propose fitting such a distribution to that of a scaled Student’s \u0000 \u0000 t\u0000 \u0000 RV by exploiting the second moment together with either the first absolute moment or the characteristic function (CF). For the fitting based on the absolute moment, we depart from the case of the linear combination of \u0000 \u0000 K\u0000 =\u0000 2\u0000 \u0000 Student’s \u0000 \u0000 t\u0000 \u0000 RVs and then generalize to \u0000 \u0000 K\u0000 ≥\u0000 2\u0000 \u0000 through a simple iterative procedure. Meanwhile, the CF-based fitting is direct, but its accuracy (measured in terms of the Bhattacharyya distance metric) depends on the CF parameter configuration, for which we propose a simple but accurate approach. We numerically show that the CF-based fitting usually outperforms the absolute moment-based fitting and that both the scale and number of degrees of freedom of the fitting distribution increase almost linearly with \u0000 \u0000 K\u0000 \u0000 .","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46839023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Improved Measurement Error Model for Analyzing Unreplicated Method Comparison Data under Asymmetric Heavy-Tailed Distributions 非对称重尾分布下非复制方法比较数据分析的改进测量误差模型
IF 1.1 Pub Date : 2022-12-15 DOI: 10.1155/2022/3453912
Jeevana Duwarahan, Lakshika S. Nawarathna
Method comparison studies mainly focus on determining if the two methods of measuring a continuous variable are agreeable enough to be used interchangeably. Typically, a standard mixed-effects model uses to model the method comparison data that assume normality for both random effects and errors. However, these assumptions are frequently violated in practice due to the skewness and heavy tails. In particular, the biases of the methods may vary with the extent of measurement. Thus, we propose a methodology for method comparison data to deal with these issues in the context of the measurement error model (MEM) that assumes a skew- t (ST) distribution for the true covariates and centered Student’s t (cT) distribution for the errors with known error variances, named STcT-MEM. An expectation conditional maximization (ECM) algorithm is used to compute the maximum likelihood (ML) estimates. The simulation study is performed to validate the proposed methodology. This methodology is illustrated by analyzing gold particle data and then compared with the standard measurement error model (SMEM). The likelihood ratio (LR) test is used to identify the most appropriate model among the above models. In addition, the total deviation index (TDI) and concordance correlation coefficient (CCC) were used to check the agreement between the methods. The findings suggest that our proposed framework for analyzing unreplicated method comparison data with asymmetry and heavy tails works effectively for modest and large samples.
方法比较研究主要集中在确定测量连续变量的两种方法是否足够一致,可以互换使用。通常,使用标准混合效应模型对假设随机效应和误差均为正态的方法比较数据建模。然而,由于偏态和重尾,这些假设在实践中经常被违背。特别是,方法的偏差可能随测量的程度而变化。因此,我们提出了一种方法比较数据的方法,以处理测量误差模型(MEM)背景下的这些问题,该模型假设真实协变量为倾斜t (ST)分布,已知误差方差的误差为中心学生t (cT)分布,称为STcT-MEM。使用期望条件最大化(ECM)算法计算最大似然估计。仿真研究验证了所提出的方法。通过分析金颗粒数据,并与标准测量误差模型(SMEM)进行比较,说明了该方法的可行性。使用似然比(LR)检验从上述模型中找出最合适的模型。此外,采用总偏差指数(TDI)和一致性相关系数(CCC)来检验方法之间的一致性。研究结果表明,我们提出的分析具有不对称和重尾的非重复方法比较数据的框架对中等和大样本有效。
{"title":"An Improved Measurement Error Model for Analyzing Unreplicated Method Comparison Data under Asymmetric Heavy-Tailed Distributions","authors":"Jeevana Duwarahan, Lakshika S. Nawarathna","doi":"10.1155/2022/3453912","DOIUrl":"https://doi.org/10.1155/2022/3453912","url":null,"abstract":"Method comparison studies mainly focus on determining if the two methods of measuring a continuous variable are agreeable enough to be used interchangeably. Typically, a standard mixed-effects model uses to model the method comparison data that assume normality for both random effects and errors. However, these assumptions are frequently violated in practice due to the skewness and heavy tails. In particular, the biases of the methods may vary with the extent of measurement. Thus, we propose a methodology for method comparison data to deal with these issues in the context of the measurement error model (MEM) that assumes a skew-\u0000 \u0000 t\u0000 \u0000 (ST) distribution for the true covariates and centered Student’s \u0000 \u0000 t\u0000 \u0000 (cT) distribution for the errors with known error variances, named STcT-MEM. An expectation conditional maximization (ECM) algorithm is used to compute the maximum likelihood (ML) estimates. The simulation study is performed to validate the proposed methodology. This methodology is illustrated by analyzing gold particle data and then compared with the standard measurement error model (SMEM). The likelihood ratio (LR) test is used to identify the most appropriate model among the above models. In addition, the total deviation index (TDI) and concordance correlation coefficient (CCC) were used to check the agreement between the methods. The findings suggest that our proposed framework for analyzing unreplicated method comparison data with asymmetry and heavy tails works effectively for modest and large samples.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43032382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Random Dynamical Systems Generated by White Noise Time Change of Deterministic Dynamical Systems 确定性动力系统白噪声时变产生的随机动力系统
IF 1.1 Pub Date : 2022-12-09 DOI: 10.1155/2022/3881486
M. Hmissi, F. Mokchaha
In this paper, we apply the random time change by the real white noise to deterministic dynamical systems. We prove that the obtained random dynamical systems are solutions of some stochastic differential equations whenever the deterministic dynamical systems are solutions of ordinary differential equations.
本文将真实白噪声引起的随机时间变化应用于确定性动力系统。证明了当确定性动力系统是常微分方程的解时,所得到的随机动力系统是一些随机微分方程的解。
{"title":"On Random Dynamical Systems Generated by White Noise Time Change of Deterministic Dynamical Systems","authors":"M. Hmissi, F. Mokchaha","doi":"10.1155/2022/3881486","DOIUrl":"https://doi.org/10.1155/2022/3881486","url":null,"abstract":"In this paper, we apply the random time change by the real white noise to deterministic dynamical systems. We prove that the obtained random dynamical systems are solutions of some stochastic differential equations whenever the deterministic dynamical systems are solutions of ordinary differential equations.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64777174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance 随机森林计数数据建模:数据特征和过度分散对回归性能的影响分析
IF 1.1 Pub Date : 2022-12-01 DOI: 10.1155/2022/2833537
C. A. Mushagalusa, A. B. Fandohan, R. G. Glèlè Kakaï
Machine learning algorithms, especially random forests (RFs), have become an integrated part of the modern scientific methodology and represent an efficient alternative to conventional parametric algorithms. This study aimed to assess the influence of data features and overdispersion on RF regression performance. We assessed the effect of types of predictors (100, 75, 50, and 20% continuous, and 100% categorical), the number of predictors (p = 816 and 24), and the sample size (N = 50, 250, and 1250) on RF parameter settings. We also compared RF performance to that of classical generalized linear models (Poisson, negative binomial, and zero-inflated Poisson) and the linear model applied to log-transformed data. Two real datasets were analysed to demonstrate the usefulness of RF for overdispersed data modelling. Goodness-of-fit statistics such as root mean square error (RMSE) and biases were used to determine RF accuracy and validity. Results revealed that the number of variables to be randomly selected for each split, the proportion of samples to train the model, the minimal number of samples within each terminal node, and RF regression performance are not influenced by the sample size, number, and type of predictors. However, the ratio of observations to the number of predictors affects the stability of the best RF parameters. RF performs well for all types of covariates and different levels of dispersion. The magnitude of dispersion does not significantly influence RF predictive validity. In contrast, its predictive accuracy is significantly influenced by the magnitude of dispersion in the response variable, conditional on the explanatory variables. RF has performed almost as well as the models of the classical Poisson family in the presence of overdispersion. Given RF’s advantages, it is an appropriate statistical alternative for counting data.
机器学习算法,特别是随机森林(RFs),已经成为现代科学方法论的一个组成部分,代表了传统参数算法的有效替代方案。本研究旨在评估数据特征和过色散对射频回归性能的影响。我们评估了预测因子类型(100、75、50和20%连续,100%分类)、预测因子数量(p = 816和24)和样本量(N = 50、250和1250)对射频参数设置的影响。我们还将射频性能与经典广义线性模型(泊松、负二项和零膨胀泊松)和应用于对数变换数据的线性模型进行了比较。分析了两个真实数据集,以证明RF对过度分散数据建模的有用性。拟合优度统计如均方根误差(RMSE)和偏倚被用来确定射频的准确性和有效性。结果表明,每次分割随机选择的变量数量、用于训练模型的样本比例、每个终端节点内的最小样本数量以及RF回归性能不受样本量、数量和预测因子类型的影响。然而,观测值与预测数的比值会影响最佳射频参数的稳定性。RF对所有类型的协变量和不同程度的分散表现良好。离散度的大小对射频预测效度没有显著影响。相反,它的预测精度受到响应变量的离散程度的显著影响,这取决于解释变量。在存在过色散的情况下,RF的表现几乎与经典泊松族模型一样好。考虑到RF的优点,它是统计数据的合适选择。
{"title":"Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance","authors":"C. A. Mushagalusa, A. B. Fandohan, R. G. Glèlè Kakaï","doi":"10.1155/2022/2833537","DOIUrl":"https://doi.org/10.1155/2022/2833537","url":null,"abstract":"Machine learning algorithms, especially random forests (RFs), have become an integrated part of the modern scientific methodology and represent an efficient alternative to conventional parametric algorithms. This study aimed to assess the influence of data features and overdispersion on RF regression performance. We assessed the effect of types of predictors (100, 75, 50, and 20% continuous, and 100% categorical), the number of predictors (p = 816 and 24), and the sample size (N = 50, 250, and 1250) on RF parameter settings. We also compared RF performance to that of classical generalized linear models (Poisson, negative binomial, and zero-inflated Poisson) and the linear model applied to log-transformed data. Two real datasets were analysed to demonstrate the usefulness of RF for overdispersed data modelling. Goodness-of-fit statistics such as root mean square error (RMSE) and biases were used to determine RF accuracy and validity. Results revealed that the number of variables to be randomly selected for each split, the proportion of samples to train the model, the minimal number of samples within each terminal node, and RF regression performance are not influenced by the sample size, number, and type of predictors. However, the ratio of observations to the number of predictors affects the stability of the best RF parameters. RF performs well for all types of covariates and different levels of dispersion. The magnitude of dispersion does not significantly influence RF predictive validity. In contrast, its predictive accuracy is significantly influenced by the magnitude of dispersion in the response variable, conditional on the explanatory variables. RF has performed almost as well as the models of the classical Poisson family in the presence of overdispersion. Given RF’s advantages, it is an appropriate statistical alternative for counting data.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44361782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mathematical Modeling of Concentration Risk under the Default Risk Charge Using Probability and Statistics Theory 违约风险收费下集中风险的概率统计数学建模
IF 1.1 Pub Date : 2022-11-01 DOI: 10.1155/2022/3063505
Badreddine Slime
In the Fundamental Review of the Trading Book (FRTB), the latest regulation for minimum capital market risk requirements, one of the major changes, is replacing the Incremental Risk Charge (IRC) with the Default Risk Charge (DRC). The DRC measures only the default and does not consider the migration rating risk. The second new change in this approach was that the DRC now includes equity assets, contrary to the IRC. This paper studies DRC modeling under the Internal Model Approach (IMA) and the regulator conditions that every DRC component must respect. The FRTB presents the DRC measurement as Value at Risk (VaR) over a one-year horizon, with the quantile equal to 99.9%. We use multifactor adjustment to measure the DRC and compare it with the Monte Carlo Model to understand how the approach fits. We then define concentration in the DRC and propose two methods to quantify the concentration risk: the Ad Hoc and Add-On methods. Finally, we study the behavior of the DRC with respect to the concentration risk.
在《交易簿基本审查》(FRTB)中,最新的最低资本市场风险要求规定是主要变化之一,即以违约风险收费(DRC)取代增量风险收费(IRC)。DRC仅度量默认值,而不考虑迁移评级风险。这种方法的第二个新变化是,与IRC相反,DRC现在包括股权资产。本文研究了内部模型方法(IMA)下的DRC建模以及每个DRC组件必须遵守的调节器条件。FRTB将DRC度量表示为一年期的风险值(VaR),分位数等于99.9%。我们使用多因素调整来测量DRC,并将其与蒙特卡洛模型进行比较,以了解该方法如何适合。然后,我们定义了DRC中的浓度,并提出了两种量化浓度风险的方法:Ad Hoc和Add-On方法。最后,我们研究了刚果民主共和国在集中风险方面的行为。
{"title":"Mathematical Modeling of Concentration Risk under the Default Risk Charge Using Probability and Statistics Theory","authors":"Badreddine Slime","doi":"10.1155/2022/3063505","DOIUrl":"https://doi.org/10.1155/2022/3063505","url":null,"abstract":"In the Fundamental Review of the Trading Book (FRTB), the latest regulation for minimum capital market risk requirements, one of the major changes, is replacing the Incremental Risk Charge (IRC) with the Default Risk Charge (DRC). The DRC measures only the default and does not consider the migration rating risk. The second new change in this approach was that the DRC now includes equity assets, contrary to the IRC. This paper studies DRC modeling under the Internal Model Approach (IMA) and the regulator conditions that every DRC component must respect. The FRTB presents the DRC measurement as Value at Risk (VaR) over a one-year horizon, with the quantile equal to 99.9%. We use multifactor adjustment to measure the DRC and compare it with the Monte Carlo Model to understand how the approach fits. We then define concentration in the DRC and propose two methods to quantify the concentration risk: the Ad Hoc and Add-On methods. Finally, we study the behavior of the DRC with respect to the concentration risk.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47063455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extreme Value Distributions: An Overview of Estimation and Simulation 极值分布:估计与模拟综述
IF 1.1 Pub Date : 2022-10-19 DOI: 10.1155/2022/5449751
Bashir Ahmed Albashir Abdulali, Mohd Aftar Abu Bakar, K. Ibrahim, N. M. Ariff
The generalized extreme value distribution (GEVD) and various extreme value distributions are commonly applied in air pollution, telecommunications, operational risk management, finance, insurance, material sciences, economics, and hydrology, among many other industries that deal with extreme events. Extreme value distributions (EVDs) typically limit the distribution of maximum and minimum values for many random observations drawn from the same arbitrary distribution. Besides that, it is a crucial method for forecasting future events and emerged as critical method for predicting future events. As a result, prior research is required to select the best estimation method to obtain a reliable value for the parameters of extreme value distributions. This study provides an overview of three-parameter estimation methods based on goodness-of-fit statistics and root mean square error (RMSE). This paper reviewed and compared three estimation methods used to approximate values of parameters for simulated observations taken from the EVD and GEVD. The method of moments (MOMs), maximum likelihood estimator (MLE), and maximum product of spacing (MPS) were the methods investigated in this study. Our findings indicated that the MPS performed better based on the mean square errors (MSEs); meanwhile, the MPS had similar goodness-of-fit statistic values compared to the MLE.
广义极值分布(GEVD)和各种极值分布通常应用于空气污染、电信、操作风险管理、金融、保险、材料科学、经济学和水文学等许多处理极端事件的行业。极值分布(evd)通常限制了从同一任意分布中提取的许多随机观测值的最大值和最小值分布。它是预测未来事件的关键方法,是预测未来事件的关键方法。因此,对于极值分布的参数,需要选择最佳的估计方法来获得可靠的值。本研究概述了基于拟合优度统计和均方根误差(RMSE)的三参数估计方法。本文综述并比较了三种用于EVD和GEVD模拟观测参数近似值的估计方法。本文研究了矩量法(mom)、极大似然估计法(MLE)和最大间距积法(MPS)。我们的研究结果表明,基于均方误差(MSEs)的MPS表现更好;同时,MPS与MLE具有相似的拟合优度统计值。
{"title":"Extreme Value Distributions: An Overview of Estimation and Simulation","authors":"Bashir Ahmed Albashir Abdulali, Mohd Aftar Abu Bakar, K. Ibrahim, N. M. Ariff","doi":"10.1155/2022/5449751","DOIUrl":"https://doi.org/10.1155/2022/5449751","url":null,"abstract":"The generalized extreme value distribution (GEVD) and various extreme value distributions are commonly applied in air pollution, telecommunications, operational risk management, finance, insurance, material sciences, economics, and hydrology, among many other industries that deal with extreme events. Extreme value distributions (EVDs) typically limit the distribution of maximum and minimum values for many random observations drawn from the same arbitrary distribution. Besides that, it is a crucial method for forecasting future events and emerged as critical method for predicting future events. As a result, prior research is required to select the best estimation method to obtain a reliable value for the parameters of extreme value distributions. This study provides an overview of three-parameter estimation methods based on goodness-of-fit statistics and root mean square error (RMSE). This paper reviewed and compared three estimation methods used to approximate values of parameters for simulated observations taken from the EVD and GEVD. The method of moments (MOMs), maximum likelihood estimator (MLE), and maximum product of spacing (MPS) were the methods investigated in this study. Our findings indicated that the MPS performed better based on the mean square errors (MSEs); meanwhile, the MPS had similar goodness-of-fit statistic values compared to the MLE.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45984192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
NetDA: An R Package for Network-Based Discriminant Analysis Subject to Multilabel Classes 一个基于网络的多标签类判别分析的R包
IF 1.1 Pub Date : 2022-09-27 DOI: 10.1155/2022/1041752
Li‐Pang Chen
In this paper, we introduce the R package NetDA, which aims to deal with multiclassification with network structures in predictors accommodated. To address the natural feature of network structures, we apply Gaussian graphical models to characterize dependence structures of the predictors and directly estimate the precision matrix. After that, the estimated precision matrix is employed to linear discriminant functions and quadratic discriminant functions. The R package NetDA is now available on CRAN, and the demonstration of functions is summarized as a vignette in the online documentation.
在本文中,我们介绍了R包NetDA,该包旨在处理具有网络结构的多分类问题。为了解决网络结构的自然特征,我们应用高斯图形模型来表征预测因子的依赖结构,并直接估计精度矩阵。然后,将估计精度矩阵用于线性判别函数和二次判别函数。R包NetDA现在可以在CRAN上使用,功能演示在在线文档中总结为一个小插曲。
{"title":"NetDA: An R Package for Network-Based Discriminant Analysis Subject to Multilabel Classes","authors":"Li‐Pang Chen","doi":"10.1155/2022/1041752","DOIUrl":"https://doi.org/10.1155/2022/1041752","url":null,"abstract":"In this paper, we introduce the R package NetDA, which aims to deal with multiclassification with network structures in predictors accommodated. To address the natural feature of network structures, we apply Gaussian graphical models to characterize dependence structures of the predictors and directly estimate the precision matrix. After that, the estimated precision matrix is employed to linear discriminant functions and quadratic discriminant functions. The R package NetDA is now available on CRAN, and the demonstration of functions is summarized as a vignette in the online documentation.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45687083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Probability and Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1