首页 > 最新文献

Journal of Probability and Statistics最新文献

英文 中文
A Cost of Misclassification Adjustment Approach for Estimating Optimal Cut-Off Point for Classification 估算最佳分类临界点的误分类成本调整方法
IF 1.1 Pub Date : 2024-05-15 DOI: 10.1155/2024/8082372
O.-A. Ampomah, R. Minkah, G. Kallah-Dagadu, E. N. N. Nortey
Classification is one of the main areas of machine learning, where the target variable is usually categorical with at least two levels. This study focuses on deducing an optimal cut-off point for continuous outcomes (e.g., predicted probabilities) resulting from binary classifiers. To achieve this aim, the study modified univariate discriminant functions by incorporating the error cost of misclassification penalties involved. By doing so, we can systematically shift the cut-off point within its measurement range till the optimal point is obtained. Extensive simulation studies were conducted to investigate the performance of the proposed method in comparison with existing classification methods under the binary logistic and Bayesian quantile regression frameworks. The simulation results indicate that logistic regression models incorporating the proposed method outperform the existing ordinary logistic regression and Bayesian regression models. We illustrate the proposed method with a practical dataset from the finance industry that assesses default status in home equity.
分类是机器学习的主要领域之一,其目标变量通常是至少有两个等级的分类变量。本研究的重点是为二元分类器产生的连续结果(如预测概率)推导出一个最佳分界点。为实现这一目标,本研究修改了单变量判别函数,将误判惩罚的误差成本纳入其中。这样,我们就能在测量范围内系统地移动分界点,直到获得最佳点。在二元逻辑回归和贝叶斯量子回归框架下,我们进行了广泛的模拟研究,以考察拟议方法与现有分类方法的性能对比。仿真结果表明,采用所提方法的逻辑回归模型优于现有的普通逻辑回归模型和贝叶斯回归模型。我们用金融业的一个实际数据集来说明所提出的方法,该数据集用于评估房屋净值的违约状况。
{"title":"A Cost of Misclassification Adjustment Approach for Estimating Optimal Cut-Off Point for Classification","authors":"O.-A. Ampomah, R. Minkah, G. Kallah-Dagadu, E. N. N. Nortey","doi":"10.1155/2024/8082372","DOIUrl":"https://doi.org/10.1155/2024/8082372","url":null,"abstract":"Classification is one of the main areas of machine learning, where the target variable is usually categorical with at least two levels. This study focuses on deducing an optimal cut-off point for continuous outcomes (e.g., predicted probabilities) resulting from binary classifiers. To achieve this aim, the study modified univariate discriminant functions by incorporating the error cost of misclassification penalties involved. By doing so, we can systematically shift the cut-off point within its measurement range till the optimal point is obtained. Extensive simulation studies were conducted to investigate the performance of the proposed method in comparison with existing classification methods under the binary logistic and Bayesian quantile regression frameworks. The simulation results indicate that logistic regression models incorporating the proposed method outperform the existing ordinary logistic regression and Bayesian regression models. We illustrate the proposed method with a practical dataset from the finance industry that assesses default status in home equity.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140976709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Lévy-Based Models for Time Series of Count Data with Zero-Inflation, Overdispersion, and Heavy Tails 零通胀、过度分散和重尾计数数据时间序列的灵活莱维模型
IF 1.1 Pub Date : 2023-11-30 DOI: 10.1155/2023/1780404
Confort Kollie, Philip Ngare, B. Malenje
The explosion of time series count data with diverse characteristics and features in recent years has led to a proliferation of new analysis models and methods. Significant efforts have been devoted to achieving flexibility capable of handling complex dependence structures, capturing multiple distributional characteristics simultaneously, and addressing nonstationary patterns such as trends, seasonality, or change points. However, it remains a challenge when considering them in the context of long-range dependence. The Lévy-based modeling framework offers a promising tool to meet the requirements of modern data analysis. It enables the modeling of both short-range and long-range serial correlation structures by selecting the kernel set accordingly and accommodates various marginal distributions within the class of infinitely divisible laws. We propose an extension of the basic stationary framework to capture additional marginal properties, such as heavy-tailedness, in both short-term and long-term dependencies, as well as overdispersion and zero inflation in simultaneous modeling. Statistical inference is based on composite pairwise likelihood. The model’s flexibility is illustrated through applications to rainfall data in Guinea from 2008 to 2023, and the number of NSF funding awarded to academic institutions. The proposed model demonstrates remarkable flexibility and versatility, capable of simultaneously capturing overdispersion, zero inflation, and heavy-tailedness in count time series data.
近年来,具有各种特征和特性的时间序列计数数据激增,导致新的分析模型和方法层出不穷。为了灵活处理复杂的依赖结构、同时捕捉多种分布特征以及处理非平稳模式(如趋势、季节性或变化点),人们付出了巨大的努力。然而,在考虑长程依赖性时,这仍然是一个挑战。基于莱维的建模框架为满足现代数据分析的要求提供了一个很有前景的工具。通过相应地选择核集,它可以对短程和长程序列相关结构进行建模,并在无限可分定律类中容纳各种边际分布。我们提出了基本静态框架的扩展,以捕捉短期和长期依赖关系中的额外边际属性,如重尾性,以及同步建模中的超分散和零膨胀。统计推断基于复合配对似然法。该模型的灵活性通过应用于几内亚 2008 年至 2023 年的降雨量数据以及学术机构获得的国家自然科学基金资助数量得到了说明。所提出的模型具有显著的灵活性和多功能性,能够同时捕捉计数时间序列数据中的超分散性、零膨胀性和重尾性。
{"title":"Flexible Lévy-Based Models for Time Series of Count Data with Zero-Inflation, Overdispersion, and Heavy Tails","authors":"Confort Kollie, Philip Ngare, B. Malenje","doi":"10.1155/2023/1780404","DOIUrl":"https://doi.org/10.1155/2023/1780404","url":null,"abstract":"The explosion of time series count data with diverse characteristics and features in recent years has led to a proliferation of new analysis models and methods. Significant efforts have been devoted to achieving flexibility capable of handling complex dependence structures, capturing multiple distributional characteristics simultaneously, and addressing nonstationary patterns such as trends, seasonality, or change points. However, it remains a challenge when considering them in the context of long-range dependence. The Lévy-based modeling framework offers a promising tool to meet the requirements of modern data analysis. It enables the modeling of both short-range and long-range serial correlation structures by selecting the kernel set accordingly and accommodates various marginal distributions within the class of infinitely divisible laws. We propose an extension of the basic stationary framework to capture additional marginal properties, such as heavy-tailedness, in both short-term and long-term dependencies, as well as overdispersion and zero inflation in simultaneous modeling. Statistical inference is based on composite pairwise likelihood. The model’s flexibility is illustrated through applications to rainfall data in Guinea from 2008 to 2023, and the number of NSF funding awarded to academic institutions. The proposed model demonstrates remarkable flexibility and versatility, capable of simultaneously capturing overdispersion, zero inflation, and heavy-tailedness in count time series data.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139198810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exponentially Generated Modified Chen Distribution with Applications to Lifetime Dataset 指数生成的修正陈分布与寿命数据集的应用
IF 1.1 Pub Date : 2023-11-21 DOI: 10.1155/2023/4458562
Awopeju Kabiru Abidemi, A. A. Abiodun
In this paper, the exponentially generated system was used to modify a two-parameter Chen distribution to a four-parameter distribution with better performance. The property of complete probability distribution function was used to verify the completeness of the resulting distribution, which shows that the distribution is a proper probability distribution function. A simulation study involving varying sample sizes was used to ascertain the asymptotic property of the new distribution. Small and large sample sizes were considered which shows the closeness of the estimates to the true value as the sample size increases. Lifetime dataset were used for model comparison which shows the superiority of exponentially generated modify Chen distribution over some existing distributions. It is therefore recommended to use the four-parameter Chen distribution in place of the well-known two-parameter Chen distribution.
本文利用指数生成系统将双参数陈分布修改为性能更佳的四参数分布。本文利用完整概率分布函数的性质来验证所得到的分布的完整性,这表明该分布是一个适当的概率分布函数。为了确定新分布的渐近特性,我们进行了一项涉及不同样本量的模拟研究。我们考虑了小样本量和大样本量,这表明随着样本量的增加,估计值与真实值的接近程度也在增加。使用终身数据集进行模型比较,结果表明指数生成的陈氏修正分布优于现有的一些分布。因此,建议使用四参数陈分布来代替众所周知的两参数陈分布。
{"title":"Exponentially Generated Modified Chen Distribution with Applications to Lifetime Dataset","authors":"Awopeju Kabiru Abidemi, A. A. Abiodun","doi":"10.1155/2023/4458562","DOIUrl":"https://doi.org/10.1155/2023/4458562","url":null,"abstract":"In this paper, the exponentially generated system was used to modify a two-parameter Chen distribution to a four-parameter distribution with better performance. The property of complete probability distribution function was used to verify the completeness of the resulting distribution, which shows that the distribution is a proper probability distribution function. A simulation study involving varying sample sizes was used to ascertain the asymptotic property of the new distribution. Small and large sample sizes were considered which shows the closeness of the estimates to the true value as the sample size increases. Lifetime dataset were used for model comparison which shows the superiority of exponentially generated modify Chen distribution over some existing distributions. It is therefore recommended to use the four-parameter Chen distribution in place of the well-known two-parameter Chen distribution.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139251651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Estimation of the Stress-Strength Reliability Based on Generalized Order Statistics for Pareto Distribution 基于广义阶统计量的Pareto分布应力-强度可靠度贝叶斯估计
Pub Date : 2023-11-13 DOI: 10.1155/2023/8648261
Zahra Karimi Ezmareh, Gholamhossein Yari
The aim of this paper is to obtain a Bayesian estimator of stress-strength reliability based on generalized order statistics for Pareto distribution. The dependence of the Pareto distribution support on the parameter complicates the calculations. Hence, in literature, one of the parameters is assumed to be known. In this paper, for the first time, two parameters of Pareto distribution are considered unknown. In computing the Bayesian confidence interval for reliability based on generalized order statistics, the posterior distribution has a complex form that cannot be sampled by conventional methods. To solve this problem, we propose an acceptance-rejection algorithm to generate a sample of the posterior distribution. We also propose a particular case of this model and obtain the classical and Bayesian estimators for this particular case. In this case, to obtain the Bayesian estimator of stress-strength reliability, we propose a variable change method. Then, these confidence intervals are compared by simulation. Finally, a practical example of this study is provided.
本文的目的是得到基于广义阶统计量的Pareto分布的应力-强度可靠度贝叶斯估计量。帕累托分布支持度对参数的依赖性使计算变得复杂。因此,在文献中,假设其中一个参数是已知的。本文首次考虑了帕累托分布的两个参数是未知的。在基于广义阶统计量的可靠性贝叶斯置信区间计算中,后验分布形式复杂,无法用常规方法进行抽样。为了解决这个问题,我们提出了一种接受-拒绝算法来生成后验分布的样本。我们还提出了该模型的一个特例,并得到了该特例的经典估计量和贝叶斯估计量。在这种情况下,为了获得应力-强度可靠度的贝叶斯估计量,我们提出了一种变量变化方法。然后,对这些置信区间进行仿真比较。最后,给出了本研究的一个实例。
{"title":"Bayesian Estimation of the Stress-Strength Reliability Based on Generalized Order Statistics for Pareto Distribution","authors":"Zahra Karimi Ezmareh, Gholamhossein Yari","doi":"10.1155/2023/8648261","DOIUrl":"https://doi.org/10.1155/2023/8648261","url":null,"abstract":"The aim of this paper is to obtain a Bayesian estimator of stress-strength reliability based on generalized order statistics for Pareto distribution. The dependence of the Pareto distribution support on the parameter complicates the calculations. Hence, in literature, one of the parameters is assumed to be known. In this paper, for the first time, two parameters of Pareto distribution are considered unknown. In computing the Bayesian confidence interval for reliability based on generalized order statistics, the posterior distribution has a complex form that cannot be sampled by conventional methods. To solve this problem, we propose an acceptance-rejection algorithm to generate a sample of the posterior distribution. We also propose a particular case of this model and obtain the classical and Bayesian estimators for this particular case. In this case, to obtain the Bayesian estimator of stress-strength reliability, we propose a variable change method. Then, these confidence intervals are compared by simulation. Finally, a practical example of this study is provided.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monitoring Changes in Clustering Solutions: A Review of Models and Applications 监测聚类解决方案的变化:模型和应用综述
Pub Date : 2023-11-03 DOI: 10.1155/2023/7493623
Muhammad Atif, Muhammad Shafiq, Muhammad Farooq, Gohar Ayub, Friedrich Leisch, Muhammad Ilyas
This article comprehensively reviews the applications and algorithms used for monitoring the evolution of clustering solutions in data streams. The clustering technique is an unsupervised learning problem that involves the identification of natural subgroups in a large dataset. In contrast to supervised learning models, clustering is a data mining technique that retrieves the hidden pattern in the input dataset. The clustering solution reflects the mechanism that leads to a high level of similarity between the items. A few applications include pattern recognition, knowledge discovery, and market segmentation. However, many modern-day applications generate streaming or temporal datasets over time, where the pattern is not stationary and may change over time. In the context of this article, change detection is the process of identifying differences in the cluster solutions obtained from streaming datasets at consecutive time points. In this paper, we briefly review the models/algorithms introduced in the literature to monitor clusters’ evolution in data streams. Monitoring the changes in clustering solutions in streaming datasets plays a vital role in policy-making and future prediction. Of course, it has a wide range of applications that cannot be covered in a single study, but some of the most common are highlighted in this article.
本文全面回顾了用于监控数据流中聚类解决方案演变的应用程序和算法。聚类技术是一种无监督学习问题,涉及在大型数据集中识别自然子组。与监督学习模型相比,聚类是一种数据挖掘技术,用于检索输入数据集中的隐藏模式。聚类解决方案反映了导致项目之间高度相似的机制。一些应用包括模式识别、知识发现和市场细分。然而,随着时间的推移,许多现代应用程序生成流或时态数据集,其中的模式不是固定的,可能会随着时间的推移而变化。在本文中,变化检测是在连续时间点从流数据集获得的集群解决方案中识别差异的过程。在本文中,我们简要回顾了文献中介绍的用于监测数据流中集群演化的模型/算法。监测流数据集中聚类解决方案的变化在决策和未来预测中起着至关重要的作用。当然,它的应用范围很广,无法在一项研究中涵盖,但本文将重点介绍一些最常见的应用。
{"title":"Monitoring Changes in Clustering Solutions: A Review of Models and Applications","authors":"Muhammad Atif, Muhammad Shafiq, Muhammad Farooq, Gohar Ayub, Friedrich Leisch, Muhammad Ilyas","doi":"10.1155/2023/7493623","DOIUrl":"https://doi.org/10.1155/2023/7493623","url":null,"abstract":"This article comprehensively reviews the applications and algorithms used for monitoring the evolution of clustering solutions in data streams. The clustering technique is an unsupervised learning problem that involves the identification of natural subgroups in a large dataset. In contrast to supervised learning models, clustering is a data mining technique that retrieves the hidden pattern in the input dataset. The clustering solution reflects the mechanism that leads to a high level of similarity between the items. A few applications include pattern recognition, knowledge discovery, and market segmentation. However, many modern-day applications generate streaming or temporal datasets over time, where the pattern is not stationary and may change over time. In the context of this article, change detection is the process of identifying differences in the cluster solutions obtained from streaming datasets at consecutive time points. In this paper, we briefly review the models/algorithms introduced in the literature to monitor clusters’ evolution in data streams. Monitoring the changes in clustering solutions in streaming datasets plays a vital role in policy-making and future prediction. Of course, it has a wide range of applications that cannot be covered in a single study, but some of the most common are highlighted in this article.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fitting Time Series Models to Fisheries Data to Ascertain Age 拟合渔业数据的时间序列模型以确定年龄
Pub Date : 2023-10-07 DOI: 10.1155/2023/9991872
Kathleen S. Kirch, Norou Diawara, Cynthia M. Jones
The ability of government agencies to assign accurate ages of fish is important to fisheries management. Accurate ageing allows for most reliable age-based models to be used to support sustainability and maximize economic benefit. Assigning age relies on validating putative annual marks by evaluating accretional material laid down in patterns in fish ear bones, typically by marginal increment analysis. These patterns often take the shape of a sawtooth wave with an abrupt drop in accretion yearly to form an annual band and are typically validated qualitatively. Researchers have shown key interest in modeling marginal increments to verify the marks do, in fact, occur yearly. However, it has been challenging in finding the best model to predict this sawtooth wave pattern. We propose three new applications of time series models to validate the existence of the yearly sawtooth wave patterned data: autoregressive integrated moving average (ARIMA), unobserved component, and copula. These methods are expected to enable the identification of yearly patterns in accretion. ARIMA and unobserved components account for the dependence of observations and error, while copula incorporates a variety of marginal distributions and dependence structures. The unobserved component model produced the best results (AIC: −123.7, MSE 0.00626), followed by the time series model (AIC: −117.292, MSE: 0.0081), and then the copula model (AIC: −96.62, Kendall’s tau: −0.5503). The unobserved component model performed best due to the completeness of the dataset. In conclusion, all three models are effective tools to validate yearly accretional patterns in fish ear bones despite their differences in constraints and assumptions.
政府机构确定鱼类准确年龄的能力对渔业管理很重要。准确的老龄化允许使用最可靠的基于年龄的模型来支持可持续性和最大限度地提高经济效益。确定年龄依赖于通过评估鱼耳骨模式中积累的物质来验证假定的年度标记,通常是通过边际增量分析。这些模式通常是锯齿波的形状,每年的吸积突然下降,形成每年的波段,通常是定性的。研究人员对建立边际增量模型来验证这些标记实际上是每年发生的表现出了浓厚的兴趣。然而,寻找预测这种锯齿波模式的最佳模型一直具有挑战性。我们提出了三种新的时间序列模型来验证年锯齿波数据的存在性:自回归积分移动平均(ARIMA)、未观测分量和copula。这些方法有望使人们能够确定吸积的年模式。ARIMA和未观测分量解释了观测值和误差的依赖性,而copula包含了各种边际分布和依赖性结构。未观测成分模型的结果最好(AIC: - 123.7, MSE: 0.00626),其次是时间序列模型(AIC: - 117.292, MSE: 0.0081),其次是copula模型(AIC: - 96.62, Kendall 's tau: - 0.5503)。由于数据集的完整性,未观察到的组件模型表现最好。总之,尽管这三种模型在约束条件和假设方面存在差异,但它们都是验证鱼耳骨年增长模式的有效工具。
{"title":"Fitting Time Series Models to Fisheries Data to Ascertain Age","authors":"Kathleen S. Kirch, Norou Diawara, Cynthia M. Jones","doi":"10.1155/2023/9991872","DOIUrl":"https://doi.org/10.1155/2023/9991872","url":null,"abstract":"The ability of government agencies to assign accurate ages of fish is important to fisheries management. Accurate ageing allows for most reliable age-based models to be used to support sustainability and maximize economic benefit. Assigning age relies on validating putative annual marks by evaluating accretional material laid down in patterns in fish ear bones, typically by marginal increment analysis. These patterns often take the shape of a sawtooth wave with an abrupt drop in accretion yearly to form an annual band and are typically validated qualitatively. Researchers have shown key interest in modeling marginal increments to verify the marks do, in fact, occur yearly. However, it has been challenging in finding the best model to predict this sawtooth wave pattern. We propose three new applications of time series models to validate the existence of the yearly sawtooth wave patterned data: autoregressive integrated moving average (ARIMA), unobserved component, and copula. These methods are expected to enable the identification of yearly patterns in accretion. ARIMA and unobserved components account for the dependence of observations and error, while copula incorporates a variety of marginal distributions and dependence structures. The unobserved component model produced the best results (AIC: −123.7, MSE 0.00626), followed by the time series model (AIC: −117.292, MSE: 0.0081), and then the copula model (AIC: −96.62, Kendall’s tau: −0.5503). The unobserved component model performed best due to the completeness of the dataset. In conclusion, all three models are effective tools to validate yearly accretional patterns in fish ear bones despite their differences in constraints and assumptions.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135252244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Analysis of Multivariate Data: A Weighted Spatial Ranks-Based Approach 多变量数据的聚类分析:一种基于加权空间秩的方法
Pub Date : 2023-09-30 DOI: 10.1155/2023/8849404
Mohammed H. Baragilly, Hend Gabr, Brian H. Willis
Determining the right number of clusters without any prior information about their numbers is a core problem in cluster analysis. In this paper, we propose a nonparametric clustering method based on different weighted spatial rank (WSR) functions. The main idea behind WSR is to define a dissimilarity measure locally based on a localized version of multivariate ranks. We consider a nonparametric Gaussian kernel weights function. We compare the performance of the method with other standard techniques and assess its misclassification rate. The method is completely data-driven, robust against distributional assumptions, and accurate for the purpose of intuitive visualization and can be used both to determine the number of clusters and assign each observation to its cluster.
在没有任何先验信息的情况下确定正确的聚类数量是聚类分析的核心问题。本文提出了一种基于不同加权空间秩函数的非参数聚类方法。WSR背后的主要思想是基于多变量排名的本地化版本在本地定义不相似性度量。我们考虑一个非参数高斯核权函数。我们比较了该方法与其他标准技术的性能,并评估了其误分类率。该方法完全是数据驱动的,对分布假设具有鲁棒性,并且对于直观可视化的目的是准确的,并且可以用于确定集群的数量并将每个观察值分配到其集群。
{"title":"Clustering Analysis of Multivariate Data: A Weighted Spatial Ranks-Based Approach","authors":"Mohammed H. Baragilly, Hend Gabr, Brian H. Willis","doi":"10.1155/2023/8849404","DOIUrl":"https://doi.org/10.1155/2023/8849404","url":null,"abstract":"Determining the right number of clusters without any prior information about their numbers is a core problem in cluster analysis. In this paper, we propose a nonparametric clustering method based on different weighted spatial rank (WSR) functions. The main idea behind WSR is to define a dissimilarity measure locally based on a localized version of multivariate ranks. We consider a nonparametric Gaussian kernel weights function. We compare the performance of the method with other standard techniques and assess its misclassification rate. The method is completely data-driven, robust against distributional assumptions, and accurate for the purpose of intuitive visualization and can be used both to determine the number of clusters and assign each observation to its cluster.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136280409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Type 1 Alpha Power Family of Distributions and Modeling Data with Correlation, Overdispersion, and Zero-Inflation in the Health Data Sets 一个新的1型阿尔法幂分布族和健康数据集中具有相关性、过度分散和零通货膨胀的建模数据
IF 1.1 Pub Date : 2023-08-31 DOI: 10.1155/2023/6611108
Getachew Tekle, R. Roozegar, Zubair Ahmad
In the recent era, the introduction of a new family of distributions has gotten great attention due to the curbs of the classical univariate distributions. This study introduces a novel family of distributions called a new type 1 alpha power family of distributions. Based on the novel family, a special model called a new type 1 alpha power Weibull model is studied in depth. The new model has very interesting patterns and it is very flexible. Thus, it can model the real data with the failure rate patterns of increasing, decreasing, parabola-down, and bathtub. Its applicability is studied by applying it to the health sector data, and time-to-recovery of breast cancer patients, and its performance is compared to seven well-known models. Based on the model comparison, it is the best model to fit the health-related data with no exceptional features. Furthermore, the popular models for the data with exceptional features such as correlation, overdispersion, and zero-inflation in aggregate are explored with applications to epileptic seizer data. Sometimes, these features are beyond the probability distribution models. Hence, this study has implemented eight possible models separately to these data and they are compared based on the standard techniques. Accordingly, the zero-inflated Poisson-normal-gamma model which includes the random effects in the linear predictor to handle the three features simultaneously has shown its supremacy over the others and is the best model to fit the health-related data with these features.
在最近的时代,由于对经典单变量分布的限制,一个新的分布族的引入引起了极大的关注。本研究引入了一个新的分布族,称为新的1型α幂分布族。在新族的基础上,深入研究了一种新的1型α-幂威布尔模型。新模型有非常有趣的模式,而且非常灵活。因此,它可以用增加、减少、抛物线下降和浴缸的故障率模式对真实数据进行建模。通过将其应用于卫生部门数据和癌症患者的恢复时间来研究其适用性,并将其性能与七个知名模型进行了比较。基于模型比较,它是拟合健康相关数据的最佳模型,没有异常特征。此外,还探索了具有异常特征(如相关性、过度分散和总体零膨胀)的数据的流行模型,并将其应用于癫痫发作数据。有时,这些特征超出了概率分布模型。因此,本研究对这些数据分别实现了八个可能的模型,并根据标准技术对它们进行了比较。因此,在线性预测器中包括随机效应以同时处理这三个特征的零膨胀泊松正态伽马模型已经显示出其优于其他模型的优势,并且是用这些特征拟合健康相关数据的最佳模型。
{"title":"A New Type 1 Alpha Power Family of Distributions and Modeling Data with Correlation, Overdispersion, and Zero-Inflation in the Health Data Sets","authors":"Getachew Tekle, R. Roozegar, Zubair Ahmad","doi":"10.1155/2023/6611108","DOIUrl":"https://doi.org/10.1155/2023/6611108","url":null,"abstract":"In the recent era, the introduction of a new family of distributions has gotten great attention due to the curbs of the classical univariate distributions. This study introduces a novel family of distributions called a new type 1 alpha power family of distributions. Based on the novel family, a special model called a new type 1 alpha power Weibull model is studied in depth. The new model has very interesting patterns and it is very flexible. Thus, it can model the real data with the failure rate patterns of increasing, decreasing, parabola-down, and bathtub. Its applicability is studied by applying it to the health sector data, and time-to-recovery of breast cancer patients, and its performance is compared to seven well-known models. Based on the model comparison, it is the best model to fit the health-related data with no exceptional features. Furthermore, the popular models for the data with exceptional features such as correlation, overdispersion, and zero-inflation in aggregate are explored with applications to epileptic seizer data. Sometimes, these features are beyond the probability distribution models. Hence, this study has implemented eight possible models separately to these data and they are compared based on the standard techniques. Accordingly, the zero-inflated Poisson-normal-gamma model which includes the random effects in the linear predictor to handle the three features simultaneously has shown its supremacy over the others and is the best model to fit the health-related data with these features.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48658235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applications of Robust Methods in Spatial Analysis 鲁棒方法在空间分析中的应用
IF 1.1 Pub Date : 2023-05-10 DOI: 10.1155/2023/1328265
S. Selvaratnam
Spatial data analysis provides valuable information to the government as well as companies. The rapid improvement of modern technology with a geographic information system (GIS) can lead to the collection and storage of more spatial data. We developed algorithms to choose optimal locations from those permanently in a space for an efficient spatial data analysis. Distances between neighboring permanent locations are not necessary to be equispaced distances. Robust and sequential methods were used to develop algorithms for design construction. The constructed designs are robust against misspecified regression responses and variance/covariance structures of responses. The proposed method can be extended for future works of image analysis which includes 3 dimensional image analysis.
空间数据分析为政府和企业提供了有价值的信息。现代技术与地理信息系统(GIS)的快速发展可以导致更多空间数据的收集和存储。我们开发了算法,从空间中的永久位置中选择最佳位置,以进行有效的空间数据分析。相邻永久位置之间的距离不一定是等距距离。使用稳健和顺序方法来开发用于设计构造的算法。所构建的设计对于错误指定的回归响应和响应的方差/协方差结构是稳健的。所提出的方法可以扩展到未来的图像分析工作,包括三维图像分析。
{"title":"Applications of Robust Methods in Spatial Analysis","authors":"S. Selvaratnam","doi":"10.1155/2023/1328265","DOIUrl":"https://doi.org/10.1155/2023/1328265","url":null,"abstract":"Spatial data analysis provides valuable information to the government as well as companies. The rapid improvement of modern technology with a geographic information system (GIS) can lead to the collection and storage of more spatial data. We developed algorithms to choose optimal locations from those permanently in a space for an efficient spatial data analysis. Distances between neighboring permanent locations are not necessary to be equispaced distances. Robust and sequential methods were used to develop algorithms for design construction. The constructed designs are robust against misspecified regression responses and variance/covariance structures of responses. The proposed method can be extended for future works of image analysis which includes 3 dimensional image analysis.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46327721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Model for Stock Market Volatility 股票市场波动性的混合模型
IF 1.1 Pub Date : 2023-04-25 DOI: 10.1155/2023/6124649
Kofi Agyarko, N. K. Frempong, E. N. Wiah
Empirical evidence suggests that the traditional GARCH-type models are unable to accurately estimate the volatility of financial markets. To improve on the accuracy of the traditional GARCH-type models, a hybrid model (BSGARCH (1, 1)) that combines the flexibility of B-splines with the GARCH (1, 1) model has been proposed in the study. The lagged residuals from the GARCH (1, 1) model are fitted with a B-spline estimator and added to the results produced from the GARCH (1, 1) model. The proposed BSGARCH (1, 1) model was applied to simulated data and two real financial time series data (NASDAQ 100 and S&P 500). The outcome was then compared to the outcomes of the GARCH (1, 1), EGARCH (1, 1), GJR-GARCH (1, 1), and APARCH (1, 1) with different error distributions (ED) using the mean absolute percentage error (MAPE), the root mean square error (RMSE), Theil’s inequality coefficient (TIC) and QLIKE. It was concluded that the proposed BSGARCH (1, 1) model outperforms the traditional GARCH-type models that were considered in the study based on the performance metrics, and thus, it can be used for estimating volatility of stock markets.
经验证据表明,传统的GARCH型模型无法准确估计金融市场的波动性。为了提高传统GARCH型模型的精度,本研究提出了一种将B样条曲线的灵活性与GARCH(1,1)模型相结合的混合模型(BSGARCH(1,1))。GARCH(1,1)模型的滞后残差用B样条估计器拟合,并与GARCH(1,2)模型产生的结果相加。将所提出的BSGARCH(1,1)模型应用于模拟数据和两个真实的金融时间序列数据(纳斯达克100指数和标准普尔500指数)。然后使用平均绝对百分比误差(MAPE)、均方根误差(RMSE)、泰尔不等式系数(TIC)和QLIKE,将结果与具有不同误差分布(ED)的GARCH(1,1)、EGARCH(1,2)、GJR-GARCH(2,1)和APARCH(3,1)的结果进行比较。结果表明,基于性能指标,所提出的BSGARCH(1,1)模型优于研究中考虑的传统GARCH型模型,因此,它可以用于估计股票市场的波动性。
{"title":"Hybrid Model for Stock Market Volatility","authors":"Kofi Agyarko, N. K. Frempong, E. N. Wiah","doi":"10.1155/2023/6124649","DOIUrl":"https://doi.org/10.1155/2023/6124649","url":null,"abstract":"Empirical evidence suggests that the traditional GARCH-type models are unable to accurately estimate the volatility of financial markets. To improve on the accuracy of the traditional GARCH-type models, a hybrid model (BSGARCH (1, 1)) that combines the flexibility of B-splines with the GARCH (1, 1) model has been proposed in the study. The lagged residuals from the GARCH (1, 1) model are fitted with a B-spline estimator and added to the results produced from the GARCH (1, 1) model. The proposed BSGARCH (1, 1) model was applied to simulated data and two real financial time series data (NASDAQ 100 and S&P 500). The outcome was then compared to the outcomes of the GARCH (1, 1), EGARCH (1, 1), GJR-GARCH (1, 1), and APARCH (1, 1) with different error distributions (ED) using the mean absolute percentage error (MAPE), the root mean square error (RMSE), Theil’s inequality coefficient (TIC) and QLIKE. It was concluded that the proposed BSGARCH (1, 1) model outperforms the traditional GARCH-type models that were considered in the study based on the performance metrics, and thus, it can be used for estimating volatility of stock markets.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47200080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Probability and Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1