Pub Date : 2024-09-19DOI: 10.1007/s00362-024-01599-1
Francesco Mariani, Fulvio De Santis, Stefania Gubbiotti
In the hybrid Bayesian-frequentist approach to hypotheses tests, the power function, i.e. the probability of rejecting the null hypothesis, is a random variable and a pre-experimental evaluation of the study is commonly carried out through the so-called probability of success (PoS). PoS is usually defined as the expected value of the random power that is not necessarily a well-representative summary of the entire distribution. Here, we consider the main definitions of PoS and investigate the power related random variables that induce them. We provide general expressions for their cumulative distribution and probability density functions, as well as closed-form expressions when the test statistic is, at least asymptotically, normal. The analysis of such distributions highlights discrepancies in the main definitions of PoS, leading us to prefer the one based on the utility function of the test. We illustrate our idea through an example and an application to clinical trials, which is a framework where PoS is commonly employed.
{"title":"The distribution of power-related random variables (and their use in clinical trials)","authors":"Francesco Mariani, Fulvio De Santis, Stefania Gubbiotti","doi":"10.1007/s00362-024-01599-1","DOIUrl":"https://doi.org/10.1007/s00362-024-01599-1","url":null,"abstract":"<p>In the hybrid Bayesian-frequentist approach to hypotheses tests, the power function, i.e. the probability of rejecting the null hypothesis, is a random variable and a pre-experimental evaluation of the study is commonly carried out through the so-called probability of success (PoS). PoS is usually defined as the expected value of the random power that is not necessarily a well-representative summary of the entire distribution. Here, we consider the main definitions of PoS and investigate the power related random variables that induce them. We provide general expressions for their cumulative distribution and probability density functions, as well as closed-form expressions when the test statistic is, at least asymptotically, normal. The analysis of such distributions highlights discrepancies in the main definitions of PoS, leading us to prefer the one based on the utility function of the test. We illustrate our idea through an example and an application to clinical trials, which is a framework where PoS is commonly employed.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"26 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s00362-024-01565-x
Sergey Tarima, Nancy Flournoy
Informative interim adaptations lead to random sample sizes. The random sample size becomes a component of the sufficient statistic and estimation based solely on observed samples or on the likelihood function does not use all available statistical evidence. The total Fisher Information (FI) is decomposed into the design FI and a conditional-on-design FI. The FI unspent by a design’s informative interim adaptation decomposes further into a weighted linear combination of FIs conditional-on-stopping decisions. Then, these components are used to determine the new lower mean squared error (MSE) in post-adaptation estimation because the Cramer–Rao lower bound (1945, 1946) and its sequential version suggested by Wolfowitz (Ann Math Stat 18(2):215–230, 1947) for non-informative stopping are not applicable to post-informative-adaptation estimation. In addition, we also show that the new proposed lower boundary on the MSE is reached by the maximum likelihood estimators in designs with informative adaptations when data are coming from one-parameter exponential family. Theoretical results are illustrated with simple normal samples collected according to a two-stage design with a possibility of early stopping.
有启发性的临时调整会产生随机样本规模。随机样本规模成为充分统计量的一个组成部分,而仅仅基于观测样本或似然函数的估计并没有使用所有可用的统计证据。总费雪信息 (FI) 分解为设计 FI 和条件设计 FI。设计信息中期调整未消耗的费雪信息进一步分解为以停止决策为条件的费雪信息的加权线性组合。然后,这些成分被用来确定适应后估计中新的均方误差下限(MSE),因为 Wolfowitz(Ann Math Stat 18(2):215-230, 1947)提出的用于非信息停止的 Cramer-Rao 下限(1945, 1946)及其顺序版本不适用于信息适应后估计。此外,我们还证明,当数据来自单参数指数族时,在有信息适应的设计中,最大似然估计值可以达到新提出的 MSE 下限。理论结果以根据两阶段设计收集的简单正态样本为例作了说明,该设计有可能提前停止。
{"title":"The cost of sequential adaptation and the lower bound for mean squared error","authors":"Sergey Tarima, Nancy Flournoy","doi":"10.1007/s00362-024-01565-x","DOIUrl":"https://doi.org/10.1007/s00362-024-01565-x","url":null,"abstract":"<p>Informative interim adaptations lead to random sample sizes. The random sample size becomes a component of the sufficient statistic and estimation based solely on observed samples or on the likelihood function does not use all available statistical evidence. The total Fisher Information (FI) is decomposed into the design FI and a conditional-on-design FI. The FI unspent by a design’s informative interim adaptation decomposes further into a weighted linear combination of FIs conditional-on-stopping decisions. Then, these components are used to determine the new lower mean squared error (MSE) in post-adaptation estimation because the Cramer–Rao lower bound (1945, 1946) and its sequential version suggested by Wolfowitz (Ann Math Stat 18(2):215–230, 1947) for non-informative stopping are not applicable to post-informative-adaptation estimation. In addition, we also show that the new proposed lower boundary on the MSE is reached by the maximum likelihood estimators in designs with informative adaptations when data are coming from one-parameter exponential family. Theoretical results are illustrated with simple normal samples collected according to a two-stage design with a possibility of early stopping.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"207 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s00362-024-01609-2
Chunwei Zheng, Wenlong Li, Jian-Feng Yang
Nested space-filling designs are popular for conducting multiple computer experiments with different levels of accuracy. Strong orthogonal arrays (SOAs) are a special type of space-filling designs which possess attractive low-dimensional stratifications. Combining these two kinds of designs, we propose a new type of design called a nested strong orthogonal array. Such a design is a special nested space-filling design that consists of two layers, i.e., the large SOA and the small SOA, where they enjoy different strengths, and the small one is nested in the large one. The proposed construction method is easy to use, capable of accommodating a larger number of columns, and the resulting designs possess better stratifications than the existing nested space-filling designs in two dimensions. The construction method is based on regular second order saturated designs and nonregular designs. Some comparisons with the existing nested space-filling designs are given to show the usefulness of the proposed designs.
嵌套空间填充设计是进行不同精度的多重计算机实验的常用方法。强正交阵列(SOA)是一种特殊的空间填充设计,它拥有极具吸引力的低维分层。结合这两种设计,我们提出了一种名为嵌套强正交阵列的新型设计。这种设计是一种特殊的嵌套空间填充设计,由两层组成,即大 SOA 和小 SOA,它们具有不同的强度,小 SOA 嵌套在大 SOA 中。与现有的二维嵌套式空间填充设计相比,所提出的构建方法易于使用,能够容纳更多的柱子,而且所产生的设计具有更好的分层效果。该构造方法基于规则二阶饱和设计和非规则设计。与现有的嵌套空间填充设计进行了一些比较,以显示拟议设计的实用性。
{"title":"Nested strong orthogonal arrays","authors":"Chunwei Zheng, Wenlong Li, Jian-Feng Yang","doi":"10.1007/s00362-024-01609-2","DOIUrl":"https://doi.org/10.1007/s00362-024-01609-2","url":null,"abstract":"<p>Nested space-filling designs are popular for conducting multiple computer experiments with different levels of accuracy. Strong orthogonal arrays (SOAs) are a special type of space-filling designs which possess attractive low-dimensional stratifications. Combining these two kinds of designs, we propose a new type of design called a nested strong orthogonal array. Such a design is a special nested space-filling design that consists of two layers, i.e., the large SOA and the small SOA, where they enjoy different strengths, and the small one is nested in the large one. The proposed construction method is easy to use, capable of accommodating a larger number of columns, and the resulting designs possess better stratifications than the existing nested space-filling designs in two dimensions. The construction method is based on regular second order saturated designs and nonregular designs. Some comparisons with the existing nested space-filling designs are given to show the usefulness of the proposed designs.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s00362-024-01607-4
Lingling Tian, Yunan Su, Chuanhua Wei
As an extension of the spatial autoregressive panel data model and the time-varying coefficient panel data model, the time-varying coefficient spatial autoregressive panel data model is useful in analysis of spatial panel data. While research has addressed the estimation problem of this model, less attention has been given to hypotheses tests. This paper studies two tests for this semiparametric spatial panel data model. One considers the existence of the spatial lag term, and the other determines whether some time-varying coefficients are constants. We employ the profile generalized likelihood ratio test procedure to construct the corresponding test statistic, and the residual-based bootstrap procedure is used to derive the p-value of the tests. Some simulations are conducted to evaluate the performance of the proposed test method, the results show that the proposed methods have good finite sample properties. Finally, we apply the proposed test methods to the provincial carbon emission data of China. Our findings suggest that the partially linear time-varying coefficients spatial autoregressive panel data model provides a better fit for the carbon emission data.
作为空间自回归面板数据模型和时变系数面板数据模型的扩展,时变系数空间自回归面板数据模型在空间面板数据分析中非常有用。虽然已有研究解决了该模型的估计问题,但较少关注假设检验。本文研究了该半参数空间面板数据模型的两种检验方法。一个是考虑空间滞后项的存在,另一个是确定某些时变系数是否为常数。我们采用剖面广义似然比检验程序来构建相应的检验统计量,并使用基于残差的引导程序来得出检验的 p 值。我们进行了一些模拟来评估所提出的检验方法的性能,结果表明所提出的方法具有良好的有限样本特性。最后,我们将所提出的检验方法应用于中国省级碳排放数据。我们的研究结果表明,部分线性时变系数空间自回归面板数据模型能更好地拟合碳排放数据。
{"title":"Tests for time-varying coefficient spatial autoregressive panel data model with fixed effects","authors":"Lingling Tian, Yunan Su, Chuanhua Wei","doi":"10.1007/s00362-024-01607-4","DOIUrl":"https://doi.org/10.1007/s00362-024-01607-4","url":null,"abstract":"<p>As an extension of the spatial autoregressive panel data model and the time-varying coefficient panel data model, the time-varying coefficient spatial autoregressive panel data model is useful in analysis of spatial panel data. While research has addressed the estimation problem of this model, less attention has been given to hypotheses tests. This paper studies two tests for this semiparametric spatial panel data model. One considers the existence of the spatial lag term, and the other determines whether some time-varying coefficients are constants. We employ the profile generalized likelihood ratio test procedure to construct the corresponding test statistic, and the residual-based bootstrap procedure is used to derive the p-value of the tests. Some simulations are conducted to evaluate the performance of the proposed test method, the results show that the proposed methods have good finite sample properties. Finally, we apply the proposed test methods to the provincial carbon emission data of China. Our findings suggest that the partially linear time-varying coefficients spatial autoregressive panel data model provides a better fit for the carbon emission data.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"167 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s00362-024-01550-4
Julie Josse, Jacob M. Chen, Nicolas Prost, Gaël Varoquaux, Erwan Scornet
In many application settings, data have missing entries, which makes subsequent analyses challenging. An abundant literature addresses missing values in an inferential framework, aiming at estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and test data. We first rewrite classic missing values results for this setting. We then show the consistency of two approaches, test-time multiple imputation and single imputation in prediction. A striking result is that the widely-used method of imputing with a constant prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is frowned upon as it distorts the distribution of the data. The consistency of such a popular simple approach is important in practice. Finally, to contrast procedures based on imputation prior to learning with procedures that optimize the missing-value handling for prediction, we consider decision trees. Indeed, decision trees are among the few methods that can tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing empirically different missing values strategies in trees, we recommend using the “missing incorporated in attribute” method as it can handle both non-informative and informative missing values.
{"title":"On the consistency of supervised learning with missing values","authors":"Julie Josse, Jacob M. Chen, Nicolas Prost, Gaël Varoquaux, Erwan Scornet","doi":"10.1007/s00362-024-01550-4","DOIUrl":"https://doi.org/10.1007/s00362-024-01550-4","url":null,"abstract":"<p>In many application settings, data have missing entries, which makes subsequent analyses challenging. An abundant literature addresses missing values in an inferential framework, aiming at estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and test data. We first rewrite classic missing values results for this setting. We then show the consistency of two approaches, test-time multiple imputation and single imputation in prediction. A striking result is that the widely-used method of imputing with a constant prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is frowned upon as it distorts the distribution of the data. The consistency of such a popular simple approach is important in practice. Finally, to contrast procedures based on imputation prior to learning with procedures that optimize the missing-value handling for prediction, we consider decision trees. Indeed, decision trees are among the few methods that can tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing empirically different missing values strategies in trees, we recommend using the “missing incorporated in attribute” method as it can handle both non-informative and informative missing values.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"15 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1007/s00362-024-01603-8
Markus Kreer, Ayşe Kızılersü, Jake Guscott, Lukas Christopher Schmitz, Anthony W. Thomas
For a sample (X_1, X_2,ldots X_N) of independent identically distributed copies of a log-logistically distributed random variable X the maximum likelihood estimation is analysed in detail if a left-truncation point (x_L>0) is introduced. Due to scaling properties it is sufficient to investigate the case (x_L=1). Here the corresponding maximum likelihood equations for a normalised sample (i.e. a sample divided by (x_L)) do not always possess a solution. A simple criterion guarantees the existence of a solution: Let (mathbb {E}(cdot )) denote the expectation induced by the normalised sample and denote by (beta _0=mathbb {E}(ln {X})^{-1}), the inverse value of expectation of the logarithm of the sampled random variable X (which is greater than (x_L=1)). If this value (beta _0) is bigger than a certain positive number (beta _C) then a solution of the maximum likelihood equation exists. Here the number (beta _C) is the unique solution of a moment equation,(mathbb {E}(X^{-beta _C})=frac{1}{2}). In the case of existence a profile likelihood function can be constructed and the optimisation problem is reduced to one dimension leading to a robust numerical algorithm. When the maximum likelihood equations do not admit a solution for certain data samples, it is shown that the Pareto distribution is the (L^1)-limit of the degenerated left-truncated log-logistic distribution, where (L^1(mathbb {R}^+)) is the usual Banach space of functions whose absolute value is Lebesgue-integrable. A large sample analysis showing consistency and asymptotic normality complements our analysis. Finally, two applications to real world data are presented.
{"title":"Maximum likelihood estimation for left-truncated log-logistic distributions with a given truncation point","authors":"Markus Kreer, Ayşe Kızılersü, Jake Guscott, Lukas Christopher Schmitz, Anthony W. Thomas","doi":"10.1007/s00362-024-01603-8","DOIUrl":"https://doi.org/10.1007/s00362-024-01603-8","url":null,"abstract":"<p>For a sample <span>(X_1, X_2,ldots X_N)</span> of independent identically distributed copies of a log-logistically distributed random variable <i>X</i> the maximum likelihood estimation is analysed in detail if a left-truncation point <span>(x_L>0)</span> is introduced. Due to scaling properties it is sufficient to investigate the case <span>(x_L=1)</span>. Here the corresponding maximum likelihood equations for a normalised sample (i.e. a sample divided by <span>(x_L)</span>) do not always possess a solution. A simple criterion guarantees the existence of a solution: Let <span>(mathbb {E}(cdot ))</span> denote the expectation induced by the normalised sample and denote by <span>(beta _0=mathbb {E}(ln {X})^{-1})</span>, the inverse value of expectation of the logarithm of the sampled random variable <i>X</i> (which is greater than <span>(x_L=1)</span>). If this value <span>(beta _0)</span> is bigger than a certain positive number <span>(beta _C)</span> then a solution of the maximum likelihood equation exists. Here the number <span>(beta _C)</span> is the unique solution of a moment equation,<span>(mathbb {E}(X^{-beta _C})=frac{1}{2})</span>. In the case of existence a profile likelihood function can be constructed and the optimisation problem is reduced to one dimension leading to a robust numerical algorithm. When the maximum likelihood equations do not admit a solution for certain data samples, it is shown that the Pareto distribution is the <span>(L^1)</span>-limit of the degenerated left-truncated log-logistic distribution, where <span>(L^1(mathbb {R}^+))</span> is the usual Banach space of functions whose absolute value is Lebesgue-integrable. A large sample analysis showing consistency and asymptotic normality complements our analysis. Finally, two applications to real world data are presented.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"4 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s00362-024-01604-7
Marek Skarupski, Qinhao Wu
The compound Poisson process (CPP) is a common mathematical model for describing many phenomena in medicine, reliability theory and risk theory. However, in the case of low-frequency phenomena, we are often unable to collect a sufficiently large database to conduct analysis. In this article, we focused on methods for determining confidence intervals for the rate of the CPP when the sample size is small. Based on the properties of process parameter estimators, we proposed a new method for constructing such intervals and compared it with other known approaches. In numerical simulations, we used synthetic data from several continuous and discrete distributions. The case of CPP, in which rewards came from exponential distribution, was discussed separately. The recommendation of how to use each method to have a more precise confidence interval is given. All simulations were performed in R version 4.2.1.
复合泊松过程(CPP)是描述医学、可靠性理论和风险理论中许多现象的常用数学模型。然而,对于低频现象,我们往往无法收集足够大的数据库来进行分析。在本文中,我们重点讨论了在样本量较小时确定 CPP 率置信区间的方法。基于过程参数估计器的特性,我们提出了一种构建此类区间的新方法,并将其与其他已知方法进行了比较。在数值模拟中,我们使用了几种连续和离散分布的合成数据。我们单独讨论了 CPP 的情况,其中奖励来自指数分布。我们给出了如何使用每种方法获得更精确置信区间的建议。所有模拟均在 R 4.2.1 版本中进行。
{"title":"Confidence bounds for compound Poisson process","authors":"Marek Skarupski, Qinhao Wu","doi":"10.1007/s00362-024-01604-7","DOIUrl":"https://doi.org/10.1007/s00362-024-01604-7","url":null,"abstract":"<p>The compound Poisson process (CPP) is a common mathematical model for describing many phenomena in medicine, reliability theory and risk theory. However, in the case of low-frequency phenomena, we are often unable to collect a sufficiently large database to conduct analysis. In this article, we focused on methods for determining confidence intervals for the rate of the CPP when the sample size is small. Based on the properties of process parameter estimators, we proposed a new method for constructing such intervals and compared it with other known approaches. In numerical simulations, we used synthetic data from several continuous and discrete distributions. The case of CPP, in which rewards came from exponential distribution, was discussed separately. The recommendation of how to use each method to have a more precise confidence interval is given. All simulations were performed in R version 4.2.1.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"17 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31DOI: 10.1007/s00362-024-01606-5
Guogen Shan, Xinlin Lu, Yahui Zhang, Samuel S. Wu
High placebo responses could significantly reduce the treatment effect in a parallel randomized trial. To combat that challenge, several approaches were developed, including the sequential parallel comparison design (SPCD) that was shown to increase the statistical power as compared to the traditional randomized trial. A linear combination of the response rate differences from two phases per the SPCD is commonly used to measure the overall treatment effect size. The traditional approach to calculate the confidence interval for the overall rate difference is based on the delta method using the variance–covariance matrix of all outcomes. As outcomes from a multinomial distribution are correlated, we suggest utilizing a constrained variance–covariance matrix in the delta method. In the observation of anti-conservative coverages from asymptotic intervals, we further propose using importance sampling to develop accurate intervals. Simulation studies show that accurate intervals have better coverage probabilities than others and the interval width of accurate intervals is similar to the interval width of others. Two real trials to treat major depressive disorder are used to illustrate the application of the proposed intervals.
{"title":"Confidence intervals for overall response rate difference in the sequential parallel comparison design","authors":"Guogen Shan, Xinlin Lu, Yahui Zhang, Samuel S. Wu","doi":"10.1007/s00362-024-01606-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01606-5","url":null,"abstract":"<p>High placebo responses could significantly reduce the treatment effect in a parallel randomized trial. To combat that challenge, several approaches were developed, including the sequential parallel comparison design (SPCD) that was shown to increase the statistical power as compared to the traditional randomized trial. A linear combination of the response rate differences from two phases per the SPCD is commonly used to measure the overall treatment effect size. The traditional approach to calculate the confidence interval for the overall rate difference is based on the delta method using the variance–covariance matrix of all outcomes. As outcomes from a multinomial distribution are correlated, we suggest utilizing a constrained variance–covariance matrix in the delta method. In the observation of anti-conservative coverages from asymptotic intervals, we further propose using importance sampling to develop accurate intervals. Simulation studies show that accurate intervals have better coverage probabilities than others and the interval width of accurate intervals is similar to the interval width of others. Two real trials to treat major depressive disorder are used to illustrate the application of the proposed intervals.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"39 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s00362-024-01597-3
David R. Bickel
Using statistical methods to analyze data requires considering the data set to be randomly generated from a probability distribution that is unknown but idealized according to a mathematical model consisting of constraints, assumptions about the distribution. Since the choice of such a model is up to the scientist, there is an understandable bias toward choosing models that make scientific conclusions appear more certain than they really are. There is a similar bias in the scientist’s choice of whether to use Bayesian or frequentist methods. This article provides tools to mitigate both of those biases on the basis of a principle of information theory. It is found that the same principle unifies Bayesianism with the fiducial version of frequentism. The principle arguably overcomes not only the main objections against fiducial inference but also the main Bayesian objection against the use of confidence intervals.
{"title":"Bayesian and frequentist inference derived from the maximum entropy principle with applications to propagating uncertainty about statistical methods","authors":"David R. Bickel","doi":"10.1007/s00362-024-01597-3","DOIUrl":"https://doi.org/10.1007/s00362-024-01597-3","url":null,"abstract":"<p>Using statistical methods to analyze data requires considering the data set to be randomly generated from a probability distribution that is unknown but idealized according to a mathematical model consisting of constraints, assumptions about the distribution. Since the choice of such a model is up to the scientist, there is an understandable bias toward choosing models that make scientific conclusions appear more certain than they really are. There is a similar bias in the scientist’s choice of whether to use Bayesian or frequentist methods. This article provides tools to mitigate both of those biases on the basis of a principle of information theory. It is found that the same principle unifies Bayesianism with the fiducial version of frequentism. The principle arguably overcomes not only the main objections against fiducial inference but also the main Bayesian objection against the use of confidence intervals.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"46 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1007/s00362-024-01593-7
Asma Saleh
Analysis of binary matched pairs data is problematic due to infinite maximum likelihood estimates of the log odds ratio and potentially biased estimates, especially for small samples. We propose a penalised version of the log-likelihood function based on adjusted responses which always results in a finite estimator of the log odds ratio. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. We implement indirect inference to the adjusted log-likelihood estimator. It is shown, through a complete enumeration study, that the indirect inference estimator is competitive in terms of bias and variance in comparison to the maximum likelihood, conditional, modified profile log-likelihood and Firth’s penalised log-likelihood estimators.
{"title":"Reduced bias estimation of the log odds ratio","authors":"Asma Saleh","doi":"10.1007/s00362-024-01593-7","DOIUrl":"https://doi.org/10.1007/s00362-024-01593-7","url":null,"abstract":"<p>Analysis of binary matched pairs data is problematic due to infinite maximum likelihood estimates of the log odds ratio and potentially biased estimates, especially for small samples. We propose a penalised version of the log-likelihood function based on adjusted responses which always results in a finite estimator of the log odds ratio. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. We implement indirect inference to the adjusted log-likelihood estimator. It is shown, through a complete enumeration study, that the indirect inference estimator is competitive in terms of bias and variance in comparison to the maximum likelihood, conditional, modified profile log-likelihood and Firth’s penalised log-likelihood estimators.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"6 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}