首页 > 最新文献

Stats最新文献

英文 中文
Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data.
IF 0.9 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-01 Epub Date: 2025-01-14 DOI: 10.3390/stats8010007
Wen-Shan Liu, Tong Si, Aldas Kriauciunas, Marcus Snell, Haijun Gong

Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness.

{"title":"Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data.","authors":"Wen-Shan Liu, Tong Si, Aldas Kriauciunas, Marcus Snell, Haijun Gong","doi":"10.3390/stats8010007","DOIUrl":"10.3390/stats8010007","url":null,"abstract":"<p><p>Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness.</p>","PeriodicalId":93142,"journal":{"name":"Stats","volume":"8 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study.
IF 0.9 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-06-21 DOI: 10.3390/stats7030037
Amy Liu, Kennedy Dorsey, Almetra Granger, Ty-Runet Bryant, Tung-Sung Tseng, Michael Celestin, Qingzhao Yu

Background: Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences.

Objective: This study sought to identify racial disparities in e-cigarette use and to determine risk factors that help explain these differences.

Methods: We used Wave 5 (2018-2019) of the Adult Population Assessment of Tobacco and Health (PATH) Study. First, we conducted descriptive statistics of e-smoking across our risk factor variables. Next, we used multiple logistic regression to check the risk effects by adjusting all covariates. Finally, we conducted a mediation analysis to determine whether identified factors showed evidence of influencing the association between race and e-cigarette use. All analyses were performed in R or SAS. The R package mma was used for the mediation analysis.

Results: Between Hispanic and non-Hispanic White populations, our potential risk factors collectively explain 17.5% of the racial difference, former cigarette smoking explains 7.6%, receiving e-cigarette advertising 2.6%, and perception of e-cigarette harm explains 27.8% of the racial difference. Between non-Hispanic Black and non-Hispanic White populations, former cigarette smoking, receiving e-cigarette advertising, and perception of e-cigarette harm explain 5.2%, 1.8%, and 6.8% of the racial difference, respectively. E-cigarette use is most prevalent in the non-Hispanic White population compared to non-Hispanic Black and Hispanic populations, which may be explained by former cigarette smoking, exposure to e-cigarette advertising, and e-cigarette harm perception.

Conclusions: These findings suggest that racial differences in e-cigarette use may be reduced by increasing knowledge of the dangers associated with e-cigarette use and reducing exposure to e-cigarette advertisements. This comprehensive analysis of risk factors can be used to significantly guide smoking cessation efforts and address potential health burden disparities arising from differences in e-cigarette usage.

{"title":"Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study.","authors":"Amy Liu, Kennedy Dorsey, Almetra Granger, Ty-Runet Bryant, Tung-Sung Tseng, Michael Celestin, Qingzhao Yu","doi":"10.3390/stats7030037","DOIUrl":"10.3390/stats7030037","url":null,"abstract":"<p><strong>Background: </strong>Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences.</p><p><strong>Objective: </strong>This study sought to identify racial disparities in e-cigarette use and to determine risk factors that help explain these differences.</p><p><strong>Methods: </strong>We used Wave 5 (2018-2019) of the Adult Population Assessment of Tobacco and Health (PATH) Study. First, we conducted descriptive statistics of e-smoking across our risk factor variables. Next, we used multiple logistic regression to check the risk effects by adjusting all covariates. Finally, we conducted a mediation analysis to determine whether identified factors showed evidence of influencing the association between race and e-cigarette use. All analyses were performed in R or SAS. The R package mma was used for the mediation analysis.</p><p><strong>Results: </strong>Between Hispanic and non-Hispanic White populations, our potential risk factors collectively explain 17.5% of the racial difference, former cigarette smoking explains 7.6%, receiving e-cigarette advertising 2.6%, and perception of e-cigarette harm explains 27.8% of the racial difference. Between non-Hispanic Black and non-Hispanic White populations, former cigarette smoking, receiving e-cigarette advertising, and perception of e-cigarette harm explain 5.2%, 1.8%, and 6.8% of the racial difference, respectively. E-cigarette use is most prevalent in the non-Hispanic White population compared to non-Hispanic Black and Hispanic populations, which may be explained by former cigarette smoking, exposure to e-cigarette advertising, and e-cigarette harm perception.</p><p><strong>Conclusions: </strong>These findings suggest that racial differences in e-cigarette use may be reduced by increasing knowledge of the dangers associated with e-cigarette use and reducing exposure to e-cigarette advertisements. This comprehensive analysis of risk factors can be used to significantly guide smoking cessation efforts and address potential health burden disparities arising from differences in e-cigarette usage.</p>","PeriodicalId":93142,"journal":{"name":"Stats","volume":"7 3","pages":"613-626"},"PeriodicalIF":0.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precise Tensor Product Smoothing via Spectral Splines 通过光谱样条实现精确的张量乘积平滑化
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-10 DOI: 10.3390/stats7010003
Nathaniel E. Helwig
Tensor product smoothers are frequently used to include interaction effects in multiple nonparametric regression models. Current implementations of tensor product smoothers either require using approximate penalties, such as those typically used in generalized additive models, or costly parameterizations, such as those used in smoothing spline analysis of variance models. In this paper, I propose a computationally efficient and theoretically precise approach for tensor product smoothing. Specifically, I propose a spectral representation of a univariate smoothing spline basis, and I develop an efficient approach for building tensor product smooths from marginal spectral spline representations. The developed theory suggests that current tensor product smoothing methods could be improved by incorporating the proposed tensor product spectral smoothers. Simulation results demonstrate that the proposed approach can outperform popular tensor product smoothing implementations, which supports the theoretical results developed in the paper.
张量积平滑器常用于在多重非参数回归模型中加入交互效应。目前张量积平滑器的实现要么需要使用近似惩罚(如广义加法模型中通常使用的惩罚),要么需要昂贵的参数化(如平滑样条方差分析模型中使用的参数化)。在本文中,我提出了一种计算高效、理论精确的张量乘平滑方法。具体来说,我提出了单变量平滑样条曲线基础的谱表示,并开发了一种从边际谱样条曲线表示建立张量乘平滑的高效方法。所开发的理论表明,当前的张量积平滑方法可以通过结合所提出的张量积谱平滑器来加以改进。仿真结果表明,所提出的方法可以超越流行的张量乘平滑实现方法,这也支持了本文所提出的理论结果。
{"title":"Precise Tensor Product Smoothing via Spectral Splines","authors":"Nathaniel E. Helwig","doi":"10.3390/stats7010003","DOIUrl":"https://doi.org/10.3390/stats7010003","url":null,"abstract":"Tensor product smoothers are frequently used to include interaction effects in multiple nonparametric regression models. Current implementations of tensor product smoothers either require using approximate penalties, such as those typically used in generalized additive models, or costly parameterizations, such as those used in smoothing spline analysis of variance models. In this paper, I propose a computationally efficient and theoretically precise approach for tensor product smoothing. Specifically, I propose a spectral representation of a univariate smoothing spline basis, and I develop an efficient approach for building tensor product smooths from marginal spectral spline representations. The developed theory suggests that current tensor product smoothing methods could be improved by incorporating the proposed tensor product spectral smoothers. Simulation results demonstrate that the proposed approach can outperform popular tensor product smoothing implementations, which supports the theoretical results developed in the paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"59 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Random Walks and a Data-Splitting Prediction Region 预测随机行走和数据分割预测区域
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-08 DOI: 10.3390/stats7010002
Mulubrhan G. Haile, Lingling Zhang, David J. Olive
Perhaps the first nonparametric, asymptotically optimal prediction intervals are provided for univariate random walks, with applications to renewal processes. Perhaps the first nonparametric prediction regions are introduced for vector-valued random walks. This paper further derives nonparametric data-splitting prediction regions, which are underpinned by very simple theory. Some of the prediction regions can be used when the data distribution does not have first moments, and some can be used for high-dimensional data, where the number of predictors is larger than the sample size. The prediction regions can make use of many estimators of multivariate location and dispersion.
也许是首次为单变量随机游走提供了非参数、渐近最优预测区间,并将其应用于更新过程。本文或许首次为向量随机游走引入了非参数预测区间。本文进一步推导出了非参数数据分割预测区域,这些预测区域以非常简单的理论为基础。其中一些预测区域可用于数据分布没有第一矩的情况,还有一些预测区域可用于预测因子数量大于样本量的高维数据。预测区域可以利用多变量位置和离散性的许多估计值。
{"title":"Predicting Random Walks and a Data-Splitting Prediction Region","authors":"Mulubrhan G. Haile, Lingling Zhang, David J. Olive","doi":"10.3390/stats7010002","DOIUrl":"https://doi.org/10.3390/stats7010002","url":null,"abstract":"Perhaps the first nonparametric, asymptotically optimal prediction intervals are provided for univariate random walks, with applications to renewal processes. Perhaps the first nonparametric prediction regions are introduced for vector-valued random walks. This paper further derives nonparametric data-splitting prediction regions, which are underpinned by very simple theory. Some of the prediction regions can be used when the data distribution does not have first moments, and some can be used for high-dimensional data, where the number of predictors is larger than the sample size. The prediction regions can make use of many estimators of multivariate location and dispersion.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"53 36","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139447266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Mediating Impact of Innovation Types in the Relationship between Innovation Use Theory and Market Performance 创新类型在创新使用理论与市场绩效关系中的中介影响
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-30 DOI: 10.3390/stats7010001
Shieh-Liang Chen, Kuo-Liang Chen
The ultimate goal of innovation is to improve performance. But if people’s needs and uses are ignored, innovation will only be a formality. In the past, research on innovation mostly focused on technology, processes, business models, services, and organizations. The measurement of innovation focuses on capabilities, processes, results, and methods, but there has always been a lack of pre-innovation measurements and tools. This study is the first to use the innovation use theory proposed by Christensen et al. combined with innovation types, and it uses the measurement focus on the early stage of innovation as a post-innovation performance prediction. This study collected 590 valid samples and used SPSS and the four-step BK method to conduct regression analysis and mediation tests. The empirical results obtained the following: (1) a confirmed model and scale of the innovation use theory; (2) that three constructs of innovation use theory have an impact on market performance; and (3) that innovation types acting as mediators will improve market performance. This study establishes an academic model of the innovation use theory to provide a clear scale tool for subsequent research. In practice, it can first measure the direction of innovation and performance prediction, providing managers with a reference when developing new products and applying market strategies.
创新的最终目的是提高绩效。但如果忽视了人们的需求和用途,创新就只能流于形式。过去,关于创新的研究大多集中在技术、流程、商业模式、服务和组织上。对创新的衡量侧重于能力、过程、结果和方法,但一直缺乏对创新前的衡量和工具。本研究首次将克里斯坦森等人提出的创新使用理论与创新类型相结合,并将创新早期阶段的测量重点作为创新后的绩效预测。本研究收集了 590 个有效样本,采用 SPSS 和四步 BK 法进行了回归分析和中介检验。实证结果如下(1)确认了创新使用理论的模型和量表;(2)创新使用理论的三个构念对市场绩效有影响;(3)创新类型作为中介会提高市场绩效。本研究建立了创新使用理论的学术模型,为后续研究提供了明确的量表工具。在实践中,它可以首先衡量创新和绩效预测的方向,为管理者在开发新产品和应用市场策略时提供参考。
{"title":"The Mediating Impact of Innovation Types in the Relationship between Innovation Use Theory and Market Performance","authors":"Shieh-Liang Chen, Kuo-Liang Chen","doi":"10.3390/stats7010001","DOIUrl":"https://doi.org/10.3390/stats7010001","url":null,"abstract":"The ultimate goal of innovation is to improve performance. But if people’s needs and uses are ignored, innovation will only be a formality. In the past, research on innovation mostly focused on technology, processes, business models, services, and organizations. The measurement of innovation focuses on capabilities, processes, results, and methods, but there has always been a lack of pre-innovation measurements and tools. This study is the first to use the innovation use theory proposed by Christensen et al. combined with innovation types, and it uses the measurement focus on the early stage of innovation as a post-innovation performance prediction. This study collected 590 valid samples and used SPSS and the four-step BK method to conduct regression analysis and mediation tests. The empirical results obtained the following: (1) a confirmed model and scale of the innovation use theory; (2) that three constructs of innovation use theory have an impact on market performance; and (3) that innovation types acting as mediators will improve market performance. This study establishes an academic model of the innovation use theory to provide a clear scale tool for subsequent research. In practice, it can first measure the direction of innovation and performance prediction, providing managers with a reference when developing new products and applying market strategies.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139138519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jump-Robust Realized-GARCH-MIDAS-X Estimators for Bitcoin and Ethereum Volatility Indices 比特币和以太坊波动率指数的跃迁-稳健实现-GARCH-MIDAS-X 估计器
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-12 DOI: 10.3390/stats6040082
Julien Chevallier, Bilel Sanhaji
In this paper, we conducted an empirical investigation of the realized volatility of cryptocurrencies using an econometric approach. This work’s two main characteristics are: (i) the realized volatility to be forecast filters jumps, and (ii) the benefit of using various historical/implied volatility indices from brokers as exogenous variables was explicitly considered. We feature a jump-robust extension of the REGARCH-MIDAS-X model incorporating realized beta GARCH processes and MIDAS filters with monthly, daily, and hourly components. First, we estimated six jump-robust estimators of realized volatility for Bitcoin and Ethereum that were retained as the dependent variable. Second, we inserted ten Bitcoin and Ethereum volatility indices gathered from various exchanges as an exogenous variable, each at a time. Third, we explored their forecasting ability based on the MSE and QLIKE statistics. Our sample spanned the period from May 2018 to January 2023. The main result featured the best predictors among the volatility indices for Bitcoin and Ethereum derived from 30-day implied volatility. The significance of the findings could mostly be attributable to the ability of our new model to incorporate financial and technological variables directly into the specification of the Bitcoin and Ethereum volatility dynamics.
在本文中,我们使用计量经济学方法对加密货币的已实现波动性进行了实证调查。这项工作的两个主要特点是(i) 要预测的已实现波动率过滤了跳跃;(ii) 明确考虑了使用经纪商提供的各种历史/隐含波动率指数作为外生变量的好处。我们对 REGARCH-MIDAS-X 模型进行了跳跃稳健性扩展,纳入了已实现的贝塔 GARCH 过程和 MIDAS 滤波器的月度、日和小时成分。首先,我们估算了比特币和以太坊已实现波动率的六个跳跃稳健估计值,并将其保留为因变量。其次,我们插入了从不同交易所收集的十个比特币和以太坊波动率指数作为外生变量,每次一个。第三,我们根据 MSE 和 QLIKE 统计量探索了它们的预测能力。我们的样本时间跨度为 2018 年 5 月至 2023 年 1 月。主要结果显示,根据 30 天隐含波动率得出的比特币和以太坊波动率指数具有最佳预测能力。研究结果的重要性主要归功于我们的新模型能够将金融和技术变量直接纳入比特币和以太坊波动动态的规范中。
{"title":"Jump-Robust Realized-GARCH-MIDAS-X Estimators for Bitcoin and Ethereum Volatility Indices","authors":"Julien Chevallier, Bilel Sanhaji","doi":"10.3390/stats6040082","DOIUrl":"https://doi.org/10.3390/stats6040082","url":null,"abstract":"In this paper, we conducted an empirical investigation of the realized volatility of cryptocurrencies using an econometric approach. This work’s two main characteristics are: (i) the realized volatility to be forecast filters jumps, and (ii) the benefit of using various historical/implied volatility indices from brokers as exogenous variables was explicitly considered. We feature a jump-robust extension of the REGARCH-MIDAS-X model incorporating realized beta GARCH processes and MIDAS filters with monthly, daily, and hourly components. First, we estimated six jump-robust estimators of realized volatility for Bitcoin and Ethereum that were retained as the dependent variable. Second, we inserted ten Bitcoin and Ethereum volatility indices gathered from various exchanges as an exogenous variable, each at a time. Third, we explored their forecasting ability based on the MSE and QLIKE statistics. Our sample spanned the period from May 2018 to January 2023. The main result featured the best predictors among the volatility indices for Bitcoin and Ethereum derived from 30-day implied volatility. The significance of the findings could mostly be attributable to the ability of our new model to incorporate financial and technological variables directly into the specification of the Bitcoin and Ethereum volatility dynamics.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139007733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the Large n (Sample Size) Problem: How to Avert Spurious Significance Results 重新审视大 n(样本量)问题:如何避免虚假显著性结果
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-05 DOI: 10.3390/stats6040081
Aris Spanos
Although large data sets are generally viewed as advantageous for their ability to provide more precise and reliable evidence, it is often overlooked that these benefits are contingent upon certain conditions being met. The primary condition is the approximate validity (statistical adequacy) of the probabilistic assumptions comprising the statistical model Mθ(x) applied to the data. In the case of a statistically adequate Mθ(x) and a given significance level α, as n increases, the power of a test increases, and the p-value decreases due to the inherent trade-off between type I and type II error probabilities in frequentist testing. This trade-off raises concerns about the reliability of declaring ‘statistical significance’ based on conventional significance levels when n is exceptionally large. To address this issue, the author proposes that a principled approach, in the form of post-data severity (SEV) evaluation, be employed. The SEV evaluation represents a post-data error probability that converts unduly data-specific ‘accept/reject H0 results’ into evidence either supporting or contradicting inferential claims regarding the parameters of interest. This approach offers a more nuanced and robust perspective in navigating the challenges posed by the large n problem.
虽然大数据集通常被认为是有利的,因为它们能够提供更精确和可靠的证据,但往往被忽视的是,这些好处取决于满足某些条件。主要条件是构成应用于数据的统计模型Mθ(x)的概率假设的近似有效性(统计充分性)。在统计上足够的Mθ(x)和给定显著性水平α的情况下,随着n的增加,测试的功率增加,并且由于频率测试中I型和II型错误概率之间的固有权衡,p值降低。当n特别大时,这种权衡引起了人们对基于传统显著性水平宣布“统计显著性”的可靠性的担忧。为了解决这个问题,作者建议采用一种有原则的方法,以数据严重性(SEV)评估的形式进行评估。SEV评估代表了数据后的错误概率,它将过度特定于数据的“接受/拒绝H0结果”转换为支持或反对有关感兴趣参数的推论主张的证据。这种方法为应对大n问题带来的挑战提供了更细致、更可靠的视角。
{"title":"Revisiting the Large n (Sample Size) Problem: How to Avert Spurious Significance Results","authors":"Aris Spanos","doi":"10.3390/stats6040081","DOIUrl":"https://doi.org/10.3390/stats6040081","url":null,"abstract":"Although large data sets are generally viewed as advantageous for their ability to provide more precise and reliable evidence, it is often overlooked that these benefits are contingent upon certain conditions being met. The primary condition is the approximate validity (statistical adequacy) of the probabilistic assumptions comprising the statistical model Mθ(x) applied to the data. In the case of a statistically adequate Mθ(x) and a given significance level α, as n increases, the power of a test increases, and the p-value decreases due to the inherent trade-off between type I and type II error probabilities in frequentist testing. This trade-off raises concerns about the reliability of declaring ‘statistical significance’ based on conventional significance levels when n is exceptionally large. To address this issue, the author proposes that a principled approach, in the form of post-data severity (SEV) evaluation, be employed. The SEV evaluation represents a post-data error probability that converts unduly data-specific ‘accept/reject H0 results’ into evidence either supporting or contradicting inferential claims regarding the parameters of interest. This approach offers a more nuanced and robust perspective in navigating the challenges posed by the large n problem.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"68 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138598495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Process Monitoring Using Truncated Gamma Distribution 使用截断伽马分布进行过程监控
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 DOI: 10.3390/stats6040080
Sajid Ali, Shayaan Rajput, Ismail Shah, Hassan Houmani
The time-between-events idea is commonly used for monitoring high-quality processes. This study aims to monitor the increase and/or decrease in the process mean rapidly using a one-sided exponentially weighted moving average (EWMA) chart for the detection of upward or downward mean shifts using a truncated gamma distribution. The use of the truncation method helps to enhance and improve the sensitivity of the proposed chart. The performance of the proposed chart with known and estimated parameters is analyzed by using the run length properties, including the average run length (ARL) and standard deviation run length (SDRL), through extensive Monte Carlo simulation. The numerical results show that the proposed scheme is more sensitive than the existing ones. Finally, the chart is implemented in real-world situations to highlight the significance of the proposed chart.
事件之间的时间概念通常用于监视高质量的流程。本研究旨在使用单侧指数加权移动平均(EWMA)图快速监测过程均值的增加和/或减少,以便使用截断的伽玛分布检测向上或向下的均值移位。截断法的使用有助于增强和改善所提出的图表的灵敏度。通过广泛的蒙特卡罗模拟,利用包括平均运行长度(ARL)和标准偏差运行长度(SDRL)在内的运行长度属性,分析了已知参数和估计参数下所提出的图表的性能。数值结果表明,所提方案比现有方案具有更高的灵敏度。最后,该图表在现实世界的情况下实现,以突出所建议的图表的重要性。
{"title":"Process Monitoring Using Truncated Gamma Distribution","authors":"Sajid Ali, Shayaan Rajput, Ismail Shah, Hassan Houmani","doi":"10.3390/stats6040080","DOIUrl":"https://doi.org/10.3390/stats6040080","url":null,"abstract":"The time-between-events idea is commonly used for monitoring high-quality processes. This study aims to monitor the increase and/or decrease in the process mean rapidly using a one-sided exponentially weighted moving average (EWMA) chart for the detection of upward or downward mean shifts using a truncated gamma distribution. The use of the truncation method helps to enhance and improve the sensitivity of the proposed chart. The performance of the proposed chart with known and estimated parameters is analyzed by using the run length properties, including the average run length (ARL) and standard deviation run length (SDRL), through extensive Monte Carlo simulation. The numerical results show that the proposed scheme is more sensitive than the existing ones. Finally, the chart is implemented in real-world situations to highlight the significance of the proposed chart.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138613377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Response and Measles Dynamics 社会反应和麻疹动态
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-29 DOI: 10.3390/stats6040079
A. Adebanji, Franz Aschl, Ednah Chepkemoi Chumo, Emmanuel Odame Owiredu, Johannes Müller, Tukae Mbegalo
Measles remains one of the leading causes of death among young children globally, even though a safe and cost-effective vaccine is available. Vaccine hesitancy and social response to vaccination continue to undermine efforts to eradicate measles. In this study, we consider data about measles vaccination and measles prevalence in Germany for the years 2008–2012 in 345 districts. In the first part of the paper, we show that the probability of a local outbreak does not significantly depend on the vaccination coverage, but—if an outbreak does take place—the scale of the outbreak depends significantly on the vaccination coverage. Additionally, we show that the willingness to be vaccinated is significantly increased by local outbreaks, with a delay of about one year. In the second part of the paper, we consider a deterministic delay model to investigate the consequences of the statistical findings on the dynamics of the infection. Here, we find that the delay might induce oscillations if the vaccination coverage is rather low and the social response to an outbreak is sufficiently strong. The relevance of our findings is discussed at the end of the paper.
麻疹仍然是全球幼儿死亡的主要原因之一,尽管已经有了安全且成本效益高的疫苗。疫苗接种的犹豫不决和社会对疫苗接种的反应继续破坏着根除麻疹的努力。在本研究中,我们考虑了 2008-2012 年德国 345 个地区的麻疹疫苗接种和麻疹流行率数据。在论文的第一部分,我们表明当地爆发麻疹疫情的概率与疫苗接种覆盖率的关系不大,但如果疫情爆发,疫情的规模与疫苗接种覆盖率的关系很大。此外,我们还表明,疫苗接种的意愿会因当地疫情的爆发而显著增加,但会延迟一年左右。在本文的第二部分,我们考虑了一个确定性延迟模型,以研究统计结果对感染动态的影响。在此,我们发现如果疫苗接种覆盖率相当低,且社会对疫情的反应足够强烈,延迟可能会引起振荡。本文最后将讨论我们的发现的相关性。
{"title":"Social Response and Measles Dynamics","authors":"A. Adebanji, Franz Aschl, Ednah Chepkemoi Chumo, Emmanuel Odame Owiredu, Johannes Müller, Tukae Mbegalo","doi":"10.3390/stats6040079","DOIUrl":"https://doi.org/10.3390/stats6040079","url":null,"abstract":"Measles remains one of the leading causes of death among young children globally, even though a safe and cost-effective vaccine is available. Vaccine hesitancy and social response to vaccination continue to undermine efforts to eradicate measles. In this study, we consider data about measles vaccination and measles prevalence in Germany for the years 2008–2012 in 345 districts. In the first part of the paper, we show that the probability of a local outbreak does not significantly depend on the vaccination coverage, but—if an outbreak does take place—the scale of the outbreak depends significantly on the vaccination coverage. Additionally, we show that the willingness to be vaccinated is significantly increased by local outbreaks, with a delay of about one year. In the second part of the paper, we consider a deterministic delay model to investigate the consequences of the statistical findings on the dynamics of the infection. Here, we find that the delay might induce oscillations if the vaccination coverage is rather low and the social response to an outbreak is sufficiently strong. The relevance of our findings is discussed at the end of the paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139212369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Logistic Burr XII Distribution: Properties and Applications to Income Data Logistic Burr XII 分布:性质及在收入数据中的应用
Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-21 DOI: 10.3390/stats6040078
R. Guerra, Fernando A. Peña-Ramírez, G. Cordeiro
We define and study the four-parameter logistic Burr XII distribution. It is obtained by inserting the three-parameter Burr XII distribution as the baseline in the logistic-X family and may be a useful alternative method to model income distribution and could be applied to other areas. We illustrate that the new distribution can have decreasing and upside-down-bathtub hazard functions and that its density function is an infinite linear combination of Burr XII densities. Some mathematical properties of the proposed model are determined, such as the quantile function, ordinary and incomplete moments, and generating function. We also obtain the maximum likelihood estimators of the model parameters and perform a Monte Carlo simulation study. Further, we present a parametric regression model based on the introduced distribution as an alternative to the location-scale regression model. The potentiality of the new distribution is illustrated by means of two applications to income data sets.
我们定义并研究了四参数 logistic Burr XII 分布。它是通过在 logistic-X 系列中插入三参数 Burr XII 分布作为基线而得到的,可能是模拟收入分布的另一种有用方法,并可应用于其他领域。我们说明,新分布可以具有递减和倒挂浴缸危险函数,其密度函数是 Burr XII 密度的无限线性组合。我们还确定了所提模型的一些数学性质,如量化函数、普通矩和不完全矩以及生成函数。我们还获得了模型参数的最大似然估计值,并进行了蒙特卡罗模拟研究。此外,我们还提出了一个基于引入的分布的参数回归模型,以替代位置尺度回归模型。通过对收入数据集的两个应用,说明了新分布的潜力。
{"title":"The Logistic Burr XII Distribution: Properties and Applications to Income Data","authors":"R. Guerra, Fernando A. Peña-Ramírez, G. Cordeiro","doi":"10.3390/stats6040078","DOIUrl":"https://doi.org/10.3390/stats6040078","url":null,"abstract":"We define and study the four-parameter logistic Burr XII distribution. It is obtained by inserting the three-parameter Burr XII distribution as the baseline in the logistic-X family and may be a useful alternative method to model income distribution and could be applied to other areas. We illustrate that the new distribution can have decreasing and upside-down-bathtub hazard functions and that its density function is an infinite linear combination of Burr XII densities. Some mathematical properties of the proposed model are determined, such as the quantile function, ordinary and incomplete moments, and generating function. We also obtain the maximum likelihood estimators of the model parameters and perform a Monte Carlo simulation study. Further, we present a parametric regression model based on the introduced distribution as an alternative to the location-scale regression model. The potentiality of the new distribution is illustrated by means of two applications to income data sets.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"92 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139252783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stats
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1