Statistical Theory and Related Fields最新文献_第4页

A discussion of ‘A selective review on calibration information from similar studies’ 关于“对类似研究的校准信息的选择性回顾”的讨论

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-08-26 DOI: 10.1080/24754269.2022.2077903

Jiahua Chen

Being a long-time friend of Dr. Qin and served as a supervisor of Drs. Li and Liu, I am as proud as authors of the richness of the content as well as the broadness of this paper. It helps me to play catch up and shames me to work hard rather than hardly work. As a discussant, I wish to come up with some additional insight on this research topic but this is deemed a very difficult task. I should congratulate the authors for covering a vast territory and leave no room for that. Instead, I raise two not so important technical issues which might be of interest to some fellow researchers.

作为秦博士的老朋友，曾任李、刘博士的导师，我为本文内容的丰富性和广度感到自豪。这有助于我迎头赶上，让我感到羞愧的是努力工作而不是几乎不工作。作为一名讨论者，我希望对这个研究主题有一些额外的见解，但这被认为是一项非常困难的任务。我要祝贺作者们覆盖了广阔的领土，没有留下任何余地。相反，我提出了两个不那么重要的技术问题，可能会引起一些研究人员的兴趣。

引用次数: 0

Rejoinder on “A selective review of statistical methods using calibration information from similar studies” 关于“使用类似研究的校准信息对统计方法进行选择性审查”的复辩状

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-08-26 DOI: 10.1080/24754269.2022.2111059

J. Qin, Yukun Liu, Pengfei Li

We thank Professor Jun Shao for organizing this interesting discussion. We also thank the six discussants formany insightful comments and suggestions. Assembling data from different sources has been becoming a very popular topic nowadays. In our review paper, we have mainly discussed many integration methods when internal data and external data share a common distribution, though the external data may not have information for some underlying variables collected in the internal study. Indeed the common distribution assumption is very strong in practical applications. Due to the technology advance, the collection of data is gettingmuch easier, for example, by using i-phone, satellite image, etc. As those collected data are not obtained by well-designed probability sampling, inevitably, they may not represent the general population. As a consequence, there probably exists a systematic bias. In the survey sampling literature, how to combine survey sampling data with non probability sampling data has also got very popular (Chen et al., 2020). Without bias correction, most existing methods may produce biased results if the common distribution assumption is violated. One has to be careful to assess the impartiality before data integration. Before we respond to the common concern by the reviewers on the heterogeneity among different studies, we first outline the possible distributional shifts or changes in each source data. In themachine learning literature, the concepts of covariate shift, label shift, and transfer learning have been widely used (QuiñoneroCandela et al., 2009). We briefly highlight those concepts in terms of statistical joint density or conditional density. Covariate shift: Let Y and X be, respectively, the outcome and a vector of covariates in Statistic terminology, or a label variable and a vector of features in Machine Learning Languish. Suppose we have two data-sets:

我们感谢邵军教授组织这次有趣的讨论。我们也感谢六位讨论者提出的富有见地的意见和建议。收集来自不同来源的数据已成为当今一个非常流行的话题。在我们的综述中，我们主要讨论了当内部数据和外部数据共享共同分布时的许多集成方法，尽管外部数据可能没有内部研究中收集的一些潜在变量的信息。事实上，共同分布假设在实际应用中是非常强大的。由于技术的进步，数据的收集变得更加容易，例如通过使用手机、卫星图像等。由于这些收集的数据不是通过精心设计的概率采样获得的，不可避免地，它们可能无法代表一般人群。因此，可能存在系统性的偏见。在调查抽样文献中，如何将调查抽样数据与非概率抽样数据相结合也变得非常流行（Chen et al.，2020）。在没有偏差校正的情况下，如果违反共同分布假设，大多数现有方法可能会产生有偏差的结果。在数据整合之前，必须谨慎评估公正性。在我们回应审稿人对不同研究之间异质性的共同担忧之前，我们首先概述了每个源数据中可能的分布变化。在机器学习文献中，协变移位、标签移位和迁移学习的概念已被广泛使用（QuiñoneroCandela等人，2009）。我们简要强调了统计节理密度或条件密度方面的这些概念。协变量移位：设Y和X分别是统计学术语中协变量的结果和向量，或机器学习语言中的标签变量和特征向量。假设我们有两个数据集：

{"title":"Rejoinder on “A selective review of statistical methods using calibration information from similar studies”","authors":"J. Qin, Yukun Liu, Pengfei Li","doi":"10.1080/24754269.2022.2111059","DOIUrl":"https://doi.org/10.1080/24754269.2022.2111059","url":null,"abstract":"We thank Professor Jun Shao for organizing this interesting discussion. We also thank the six discussants formany insightful comments and suggestions. Assembling data from different sources has been becoming a very popular topic nowadays. In our review paper, we have mainly discussed many integration methods when internal data and external data share a common distribution, though the external data may not have information for some underlying variables collected in the internal study. Indeed the common distribution assumption is very strong in practical applications. Due to the technology advance, the collection of data is gettingmuch easier, for example, by using i-phone, satellite image, etc. As those collected data are not obtained by well-designed probability sampling, inevitably, they may not represent the general population. As a consequence, there probably exists a systematic bias. In the survey sampling literature, how to combine survey sampling data with non probability sampling data has also got very popular (Chen et al., 2020). Without bias correction, most existing methods may produce biased results if the common distribution assumption is violated. One has to be careful to assess the impartiality before data integration. Before we respond to the common concern by the reviewers on the heterogeneity among different studies, we first outline the possible distributional shifts or changes in each source data. In themachine learning literature, the concepts of covariate shift, label shift, and transfer learning have been widely used (QuiñoneroCandela et al., 2009). We briefly highlight those concepts in terms of statistical joint density or conditional density. Covariate shift: Let Y and X be, respectively, the outcome and a vector of covariates in Statistic terminology, or a label variable and a vector of features in Machine Learning Languish. Suppose we have two data-sets:","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"204 - 207"},"PeriodicalIF":0.5,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48324852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Variable selection in finite mixture of median regression models using skew-normal distribution 基于偏斜正态分布的有限混合中值回归模型中的变量选择

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-08-06 DOI: 10.1080/24754269.2022.2107974

X. Zeng, Yuanyuan Ju, Liucang Wu

A regression model with skew-normal errors provides a useful extension for traditional normal regression models when the data involve asymmetric outcomes. Moreover, data that arise from a heterogeneous population can be efficiently analysed by a finite mixture of regression models. These observations motivate us to propose a novel finite mixture of median regression model based on a mixture of the skew-normal distributions to explore asymmetrical data from several subpopulations. With the appropriate choice of the tuning parameters, we establish the theoretical properties of the proposed procedure, including consistency for variable selection method and the oracle property in estimation. A productive nonparametric clustering method is applied to select the number of components, and an efficient EM algorithm for numerical computations is developed. Simulation studies and a real data set are used to illustrate the performance of the proposed methodologies.

当数据涉及不对称结果时，具有偏正态误差的回归模型为传统的正态回归模型提供了一种有用的扩展。此外，来自异质群体的数据可以通过有限混合的回归模型进行有效分析。这些观察结果促使我们提出了一种新的基于偏态分布混合的有限混合中位数回归模型，以探索来自几个亚种群的不对称数据。通过合理选择调优参数，建立了该方法的理论性质，包括变量选择方法的一致性和估计的oracle性。提出了一种有效的非参数聚类方法来选择分量数，并开发了一种高效的电磁算法进行数值计算。仿真研究和真实数据集被用来说明所提出的方法的性能。

引用次数: 0

Posterior propriety of an objective prior for generalized hierarchical normal linear models 广义层次正态线性模型目标先验的后验性

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-07-30 DOI: 10.1080/24754269.2021.1978206

Cong Lin, Dongchu Sun, Chengyuan Song

ABSTRACT Bayesian Hierarchical models has been widely used in modern statistical application. To deal with the data having complex structures, we propose a generalized hierarchical normal linear (GHNL) model which accommodates arbitrarily many levels, usual design matrices and ‘vanilla’ covariance matrices. Objective hyperpriors can be employed for the GHNL model to express ignorance or match frequentist properties, yet the common objective Bayesian approaches are infeasible or fraught with danger in hierarchical modelling. To tackle this issue, [Berger, J., Sun, D., & Song, C. (2020b). An objective prior for hyperparameters in normal hierarchical models. Journal of Multivariate Analysis, 178, 104606. https://doi.org/10.1016/j.jmva.2020.104606] proposed a particular objective prior and investigated its properties comprehensively. Posterior propriety is important for the choice of priors to guarantee the convergence of MCMC samplers. James Berger conjectured that the resulting posterior is proper for a hierarchical normal model with arbitrarily many levels, a rigorous proof of which was not given, however. In this paper, we complete this story and provide an user-friendly guidance. One main contribution of this paper is to propose a new technique for deriving an elaborate upper bound on the integrated likelihood, but also one unified approach to checking the posterior propriety for linear models. An efficient Gibbs sampling method is also introduced and outperforms other sampling approaches considerably.

摘要贝叶斯层次模型在现代统计学应用中得到了广泛的应用。为了处理具有复杂结构的数据，我们提出了一个广义层次正态线性（GHNL）模型，该模型可以容纳任意多个级别、常用的设计矩阵和“香草”协方差矩阵。GHNL模型可以使用目标超优先权来表达无知或匹配频率主义特性，但在分层建模中，常见的目标贝叶斯方法是不可行的或充满危险的。为了解决这个问题，[Berger，J.，Sun，D.，&Song，C.（2020b）。正态层次模型中超参数的客观先验。多变量分析杂志，178104606。https://doi.org/10.1016/j.jmva.2020.104606]提出了一种特殊的目标先验，并对其性质进行了全面的研究。后验适当性对于先验的选择是重要的，以保证MCMC采样器的收敛性。James Berger推测所得后验对于具有任意多个层次的层次正态模型是合适的，但没有给出严格的证明。在本文中，我们完成了这个故事，并提供了一个用户友好的指导。本文的一个主要贡献是提出了一种新的技术来推导积分似然的精细上界，同时也是一种检查线性模型后验性的统一方法。还介绍了一种有效的吉布斯采样方法，该方法大大优于其他采样方法。

{"title":"Posterior propriety of an objective prior for generalized hierarchical normal linear models","authors":"Cong Lin, Dongchu Sun, Chengyuan Song","doi":"10.1080/24754269.2021.1978206","DOIUrl":"https://doi.org/10.1080/24754269.2021.1978206","url":null,"abstract":"ABSTRACT Bayesian Hierarchical models has been widely used in modern statistical application. To deal with the data having complex structures, we propose a generalized hierarchical normal linear (GHNL) model which accommodates arbitrarily many levels, usual design matrices and ‘vanilla’ covariance matrices. Objective hyperpriors can be employed for the GHNL model to express ignorance or match frequentist properties, yet the common objective Bayesian approaches are infeasible or fraught with danger in hierarchical modelling. To tackle this issue, [Berger, J., Sun, D., & Song, C. (2020b). An objective prior for hyperparameters in normal hierarchical models. Journal of Multivariate Analysis, 178, 104606. https://doi.org/10.1016/j.jmva.2020.104606] proposed a particular objective prior and investigated its properties comprehensively. Posterior propriety is important for the choice of priors to guarantee the convergence of MCMC samplers. James Berger conjectured that the resulting posterior is proper for a hierarchical normal model with arbitrarily many levels, a rigorous proof of which was not given, however. In this paper, we complete this story and provide an user-friendly guidance. One main contribution of this paper is to propose a new technique for deriving an elaborate upper bound on the integrated likelihood, but also one unified approach to checking the posterior propriety for linear models. An efficient Gibbs sampling method is also introduced and outperforms other sampling approaches considerably.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"17 1","pages":"309 - 326"},"PeriodicalIF":0.5,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41289512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Moderate deviation principle for stochastic reaction-diffusion systems with multiplicative noise and non-Lipschitz reaction 具有乘性噪声和非lipschitz反应的随机反应-扩散系统的中等偏差原理

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-06-27 DOI: 10.1080/24754269.2021.1963183

Juan Yang

ABSTRACT In this article, we obtain a central limit theorem and prove a moderate deviation principle for stochastic reaction-diffusion systems with multiplicative noise and non-Lipschitz reaction term.

摘要本文给出了具有乘性噪声和非lipschitz反应项的随机反应扩散系统的一个中心极限定理，并证明了该系统的一个中等偏差原理。

引用次数: 4

A discussion on “A selective review of statistical methods using calibration information from similar studies” 关于“使用类似研究的校准信息对统计方法进行选择性审查”的讨论

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-06-10 DOI: 10.1080/24754269.2022.2084930

Lingzhi Zhou, P. Song

It is our pleasure to have an opportunity of making comments on this fine work in that the authors present a comprehensive review on empirical likelihood (EL) methods for integrative data analyses. This paper focuses on a unified methodological framework based on EL and estimating equations (EE) to sequentially combine summary information from individual data batches to obtain desirable estimation and inference comparable to those obtained by the EL method utilizing all individual-level data. The latter is sometimes referred to as an oracle estimation and inference in the setting of massively distributed data batches. An obvious strength of this review paper concerns the detailed theoretical properties in connection to the improved estimation efficiency through the utility of auxiliary information. In this paper, the authors consider a typical data integration situation where individual-level data from the Kth data batch is combined with certain ‘good’ summary information from the previous K−1 data batches. While appreciating the theoretical strengths in this paper, we notice a few interesting aspects that are worth some discussions. Distributed data structures: In practice, both individual data batch size and the number of data batches may appear rather heterogeneous, requiring different theory and algorithms in the data analysis. Such heterogeneity in distributed data structures is not well aligned with the methodological framework reviewed in the paper. One important practical scenario is that the number of data batches tends to infinity. Such setting may arise from distributed data collected from millions of mobile device users, or from electronic health records (EHR) data sources distributed across thousands of hospitals. In the presence of massively distributed data batches, a natural question pertains to a trade-off between data communication efficiency and analytic approximation accuracy. Although oneround data communication is popular in this type of integrative data analysis, multiple rounds of data communication may be also viable in the implementation via high-performance computing clusters. Our experience suggests that sacrifice in the flexibility of data communication (e.g., limited to one-round communication in the Hadoop paradigm), although enjoys computational speed, may pay a substantial price on the loss of approximation accuracy, leading to potentially accumulated estimation bias when the number of data batches increases. This issue of estimation bias is a technical challenge in nonlinear models due to the invocation of approximations to linearize both estimation procedure and numerical search algorithm. On the other hand, relaxing the restrictions on data communication, such as the operations within the lambda architecture, can help reduce the approximation error and lower estimation bias. Clearly, the latter requires more computational resources. This important issue was investigated by Zhou et al. (2022) that studied asympt

我们很高兴有机会对这项优秀的工作发表评论，因为作者对综合数据分析的经验似然（EL）方法进行了全面的综述。本文侧重于一个基于EL和估计方程（EE）的统一方法框架，以顺序组合来自各个数据批次的汇总信息，从而获得与利用所有个体水平数据的EL方法所获得的估计和推断相比较的期望估计和推断。后者有时被称为大规模分布式数据批次设置中的预言机估计和推理。这篇综述论文的一个明显优势涉及通过利用辅助信息提高估计效率的详细理论性质。在本文中，作者考虑了一种典型的数据集成情况，即第K个数据批次的单个级别数据与前K−1个数据批次中的某些“良好”汇总信息相结合。在欣赏本文理论优势的同时，我们注意到一些有趣的方面值得讨论。分布式数据结构：在实践中，单个数据批次的大小和数据批次的数量可能看起来相当异构，需要在数据分析中使用不同的理论和算法。分布式数据结构中的这种异质性与论文中回顾的方法框架并不一致。一个重要的实际场景是数据批处理的数量趋于无穷大。这种设置可能来自从数百万移动设备用户收集的分布式数据，或者来自分布在数千家医院的电子健康记录（EHR）数据源。在存在大规模分布式数据批的情况下，一个自然的问题涉及数据通信效率和分析近似精度之间的权衡。尽管单轮数据通信在这种类型的综合数据分析中很流行，但在通过高性能计算集群实现的过程中，多轮数据通信也可能是可行的。我们的经验表明，牺牲数据通信的灵活性（例如，Hadoop范式中仅限于一轮通信），尽管享有计算速度，但可能会为近似精度的损失付出巨大代价，从而在数据批次数量增加时导致潜在的累积估计偏差。由于调用近似来线性化估计过程和数值搜索算法，估计偏差问题在非线性模型中是一个技术挑战。另一方面，放松对数据通信的限制，例如lambda架构内的操作，可以帮助减少近似误差和降低估计偏差。显然，后者需要更多的计算资源。周等人研究了这一重要问题。（2022）研究了当分布式数据批的数量不断增加时，在单轮通信和无限轮通信下，分布式EL估计器和oracle EL估计员之间的渐近等价性。他们发现，在一轮通信中，如果数据批次的数量K随着样本量n的增加而以O（n1/2-δ）的慢序增加，0<δ≤1/2，并且所有单个批次的大小都增加（即，nmin=mink nk→ ∞), 他们提出的分布式EL估计器在分布收敛模式下渐近等价于oracle-EL估计器。有趣的是，他们发现，如果通信没有限制，上述两个技术条件都可以消除，而且，在弱得多的条件下，分布式EL估计器和预言EL估计员在概率收敛模式下是渐近等价的。后者是比前者更强的收敛结果。此外，在ADMM算法的辅助下，即使存在严重的不平衡

{"title":"A discussion on “A selective review of statistical methods using calibration information from similar studies”","authors":"Lingzhi Zhou, P. Song","doi":"10.1080/24754269.2022.2084930","DOIUrl":"https://doi.org/10.1080/24754269.2022.2084930","url":null,"abstract":"It is our pleasure to have an opportunity of making comments on this fine work in that the authors present a comprehensive review on empirical likelihood (EL) methods for integrative data analyses. This paper focuses on a unified methodological framework based on EL and estimating equations (EE) to sequentially combine summary information from individual data batches to obtain desirable estimation and inference comparable to those obtained by the EL method utilizing all individual-level data. The latter is sometimes referred to as an oracle estimation and inference in the setting of massively distributed data batches. An obvious strength of this review paper concerns the detailed theoretical properties in connection to the improved estimation efficiency through the utility of auxiliary information. In this paper, the authors consider a typical data integration situation where individual-level data from the Kth data batch is combined with certain ‘good’ summary information from the previous K−1 data batches. While appreciating the theoretical strengths in this paper, we notice a few interesting aspects that are worth some discussions. Distributed data structures: In practice, both individual data batch size and the number of data batches may appear rather heterogeneous, requiring different theory and algorithms in the data analysis. Such heterogeneity in distributed data structures is not well aligned with the methodological framework reviewed in the paper. One important practical scenario is that the number of data batches tends to infinity. Such setting may arise from distributed data collected from millions of mobile device users, or from electronic health records (EHR) data sources distributed across thousands of hospitals. In the presence of massively distributed data batches, a natural question pertains to a trade-off between data communication efficiency and analytic approximation accuracy. Although oneround data communication is popular in this type of integrative data analysis, multiple rounds of data communication may be also viable in the implementation via high-performance computing clusters. Our experience suggests that sacrifice in the flexibility of data communication (e.g., limited to one-round communication in the Hadoop paradigm), although enjoys computational speed, may pay a substantial price on the loss of approximation accuracy, leading to potentially accumulated estimation bias when the number of data batches increases. This issue of estimation bias is a technical challenge in nonlinear models due to the invocation of approximations to linearize both estimation procedure and numerical search algorithm. On the other hand, relaxing the restrictions on data communication, such as the operations within the lambda architecture, can help reduce the approximation error and lower estimation bias. Clearly, the latter requires more computational resources. This important issue was investigated by Zhou et al. (2022) that studied asympt","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"196 - 198"},"PeriodicalIF":0.5,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42466102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A discussion on “A selective review of statistical methods using calibration information from similar studies” by Qin, Liu and Li 关于秦、刘、李“使用类似研究的校准信息的统计方法的选择性回顾”的讨论

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-06-10 DOI: 10.1080/24754269.2022.2084929

Peisong Han

We Qin, Liu and Li (QLL) on a thoughtful and much needed review of many interesting methods for combining information from similar studies. We appreciate being given the opportunity to make a discussion. QLL cover a variety of different settings and methods. Based on that, we will provide a brief review on some additional relevant literature with a focus on methods that deal with population heterogeneity, since it is most likely that different studies sample different and whether information be combined depends on how similar those among many other To the we will follow the setting in of QLL, most of methods more broadly applied.

我们秦，刘和李（QLL）对许多有趣的方法进行了深思熟虑和急需的综述，以结合类似研究的信息。我们感谢有机会进行讨论。QLL涵盖了各种不同的设置和方法。在此基础上，我们将对一些额外的相关文献进行简要回顾，重点关注处理群体异质性的方法，因为不同的研究很可能样本不同，信息是否组合取决于这些研究在许多其他研究中的相似程度。为了我们将遵循QLL的设置，大多数方法都得到了更广泛的应用。

引用次数: 2

Discussion of “A selective review of statistical methods using calibration information from similar studies” and some remarks on data integration 关于“利用类似研究的校准信息选择性审查统计方法”的讨论和对数据整合的一些评论

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-05-19 DOI: 10.1080/24754269.2022.2075083

J. Lawless

Qin, Liu and Li (henceforth QLL) review methods for combining information using empirical likelihood and related approaches; many of these ideas originated in the earlier work of Jing Qin. I thank the authors for their review, and for the opportunity to contribute to its discussion. I have little to say about technical aspects, which are well established but will comment briefly on broader aspects of data integration, and implications for methods like those in the article. I will focus on settings where there is a response variable Y and covariates X , Z and assume the target of inference is either the distribution f ( y | x , z ) of Y given X , Z or the ‘marginal’ distribution f m ( y | x ) of Y given X . In health research Y might represent (time to) the occurrence of some specific event, and X , Z covariates, exposures or interventions. The distribution f ( y | x , z ) is important for individual-level decisions; in settings where X represents interventions f m ( y | x ) is relevant in randomized trials and comparative effectiveness research. The authors consider two main topics in data integration: (i) the use of external auxiliary data to augment the analysis of a specific ‘internal’ study, and (ii) the combination of data from separate studies with a view to for common parameters or They focus on where,

Qin，Liu和Li（以下简称QLL）使用经验似然和相关方法结合信息的审查方法；这些思想很多都源于景勤早期的著作。我感谢作者的评论，并感谢他们有机会为讨论做出贡献。关于技术方面，我几乎没有什么要说的，这些方面已经确立，但我将简要评论数据集成的更广泛方面，以及对本文中的方法的影响。我将重点讨论存在响应变量Y和协变量X、Z的设置，并假设推理的目标是给定X、Z时Y的分布f（Y|X，Z）或给定X时Y的“边际”分布f m（Y|X）。在健康研究中，Y可能代表（到）某个特定事件的发生，X、Z可能代表协变量、暴露或干预。分布f（y|x，z）对于个体水平的决策是重要的；在X代表干预措施的环境中，f m（y|X）在随机试验和比较有效性研究中是相关的。作者考虑了数据整合中的两个主要主题：（i）使用外部辅助数据来增强对特定“内部”研究的分析，以及（ii）将来自单独研究的数据进行组合，以获得共同的参数，

{"title":"Discussion of “A selective review of statistical methods using calibration information from similar studies” and some remarks on data integration","authors":"J. Lawless","doi":"10.1080/24754269.2022.2075083","DOIUrl":"https://doi.org/10.1080/24754269.2022.2075083","url":null,"abstract":"Qin, Liu and Li (henceforth QLL) review methods for combining information using empirical likelihood and related approaches; many of these ideas originated in the earlier work of Jing Qin. I thank the authors for their review, and for the opportunity to contribute to its discussion. I have little to say about technical aspects, which are well established but will comment briefly on broader aspects of data integration, and implications for methods like those in the article. I will focus on settings where there is a response variable Y and covariates X , Z and assume the target of inference is either the distribution f ( y | x , z ) of Y given X , Z or the ‘marginal’ distribution f m ( y | x ) of Y given X . In health research Y might represent (time to) the occurrence of some specific event, and X , Z covariates, exposures or interventions. The distribution f ( y | x , z ) is important for individual-level decisions; in settings where X represents interventions f m ( y | x ) is relevant in randomized trials and comparative effectiveness research. The authors consider two main topics in data integration: (i) the use of external auxiliary data to augment the analysis of a specific ‘internal’ study, and (ii) the combination of data from separate studies with a view to for common parameters or They focus on where,","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"191 - 192"},"PeriodicalIF":0.5,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47416322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discussion of ‘A selective review of statistical methods using calibration information from similar studies’ 讨论“使用类似研究的校准信息的统计方法的选择性回顾”

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-05-15 DOI: 10.1080/24754269.2022.2075082

J. Ning

Combining information from similar studies has attracted substantial attention and continues to become increasingly important to assemble quality evidence in comparative effectiveness research. To my knowledge, this is the first paper to systematically review classical and up-to-date methods on how different statistical methods, such as meta-analysis, empirical likelihood (EL), renewal estimation and incremental inference, can be applied to incorporate information from multiple sources. This review paper succinctly presents both basic and advanced issues and will be greatly beneficial for researchers who are interested in this field. Because of the wide array of related methods, this paper consists of cohesive but relatively independent sections. Although it is a review paper, the focus and contents are quite different from those of original papers. For example, an optimal combination of two estimators from two independent studies is derived by two methods from different perspectives: a linear combination with the smallest asymptotic variance and the maximum likelihood method. Another example is how to select a more efficient way to synthesize auxiliary information from other studies. In Section 5 of the review paper, two different sets of constraints, in which one involves parameter of interest and the other does not, have been presented and compared in terms of efficiency improvement. Both statistical intuition and theoretical justification are provided, which help readers create a better way to combine aggregate information for improved efficiency in practice. Such insightful discussions are not easily found elsewhere. The paper also nicely derives the conclusion that, similar to parametric-likelihood-based meta-analysis, the calibration methods (e.g., EL and generalized method of moments (GMM)) based on aggregate information have no efficiency loss compared to these methods using all individual data. Such deep insight into these methods greatly promotes their use for information calibration, since it is always challenging to obtain individual-level data. As stated in the title, this review paper mainly focuses on statistical methods using calibration information from similar studies. One crucial assumption of these methods is homogeneity between the cohort with individual data (e.g., target cohort) and these similar studies (e.g., external sources).When the calibration information from the external sources are not comparable with those of the target cohort, such calibration methods may result in severe bias in estimation and misleading conclusions (Chen et al., 2021; Huang et al., 2016). One way to address this issue is to test the comparability by comparing calibration information between the target cohort and external sources before combining such information. Using the setup in Section 4 of the reviewpaper as an example, assume that the auxiliary information from external sources is the mean of Y by subgroups (e.g., subgroups determined by

结合来自类似研究的信息已经引起了大量关注，并且在比较有效性研究中收集高质量证据变得越来越重要。据我所知，这是第一篇系统回顾经典和最新方法的论文，介绍了不同的统计方法，如元分析、经验似然(EL)、更新估计和增量推理，如何应用于整合来自多个来源的信息。本文简明扼要地介绍了该领域的基本问题和高级问题，对感兴趣的研究人员将大有裨益。由于相关的方法种类繁多，本文由连贯但相对独立的部分组成。虽然是一篇综述性的论文，但其重点和内容与原论文有很大的不同。例如，两个独立研究的两个估计量的最优组合通过两种不同角度的方法得到:最小渐近方差的线性组合和最大似然方法。另一个例子是如何选择一种更有效的方法来综合其他研究的辅助信息。在回顾论文的第5节中，已经提出并比较了两组不同的约束，其中一组涉及感兴趣的参数，另一组不涉及，并在效率改进方面进行了比较。提供了统计直觉和理论依据，帮助读者创建更好的方法来组合汇总信息，以提高实践中的效率。这样深刻的讨论在其他地方很难找到。本文还很好地得出结论，与基于参数似然的元分析类似，基于聚合信息的校准方法(如EL和广义矩法(GMM))与使用所有单个数据的方法相比没有效率损失。对这些方法的深入了解极大地促进了它们在信息校准中的应用，因为获取个人层面的数据总是具有挑战性的。如标题所述，本文主要侧重于利用类似研究的校准信息的统计方法。这些方法的一个关键假设是具有个体数据的队列(例如目标队列)和这些类似研究(例如外部来源)之间的同质性。当来自外部来源的校准信息与目标队列的校准信息不具有可比性时，这种校准方法可能导致严重的估计偏差和误导性结论(Chen et al.， 2021;黄等人，2016)。解决这一问题的一种方法是在合并这些信息之前，通过比较目标队列和外部来源之间的校准信息来测试可比性。以综述文章第4节中的设置为例，假设来自外部来源的辅助信息是Y按子组(例如，由年龄和性别等协变量确定的子组)的平均值，

{"title":"Discussion of ‘A selective review of statistical methods using calibration information from similar studies’","authors":"J. Ning","doi":"10.1080/24754269.2022.2075082","DOIUrl":"https://doi.org/10.1080/24754269.2022.2075082","url":null,"abstract":"Combining information from similar studies has attracted substantial attention and continues to become increasingly important to assemble quality evidence in comparative effectiveness research. To my knowledge, this is the first paper to systematically review classical and up-to-date methods on how different statistical methods, such as meta-analysis, empirical likelihood (EL), renewal estimation and incremental inference, can be applied to incorporate information from multiple sources. This review paper succinctly presents both basic and advanced issues and will be greatly beneficial for researchers who are interested in this field. Because of the wide array of related methods, this paper consists of cohesive but relatively independent sections. Although it is a review paper, the focus and contents are quite different from those of original papers. For example, an optimal combination of two estimators from two independent studies is derived by two methods from different perspectives: a linear combination with the smallest asymptotic variance and the maximum likelihood method. Another example is how to select a more efficient way to synthesize auxiliary information from other studies. In Section 5 of the review paper, two different sets of constraints, in which one involves parameter of interest and the other does not, have been presented and compared in terms of efficiency improvement. Both statistical intuition and theoretical justification are provided, which help readers create a better way to combine aggregate information for improved efficiency in practice. Such insightful discussions are not easily found elsewhere. The paper also nicely derives the conclusion that, similar to parametric-likelihood-based meta-analysis, the calibration methods (e.g., EL and generalized method of moments (GMM)) based on aggregate information have no efficiency loss compared to these methods using all individual data. Such deep insight into these methods greatly promotes their use for information calibration, since it is always challenging to obtain individual-level data. As stated in the title, this review paper mainly focuses on statistical methods using calibration information from similar studies. One crucial assumption of these methods is homogeneity between the cohort with individual data (e.g., target cohort) and these similar studies (e.g., external sources).When the calibration information from the external sources are not comparable with those of the target cohort, such calibration methods may result in severe bias in estimation and misleading conclusions (Chen et al., 2021; Huang et al., 2016). One way to address this issue is to test the comparability by comparing calibration information between the target cohort and external sources before combining such information. Using the setup in Section 4 of the reviewpaper as an example, assume that the auxiliary information from external sources is the mean of Y by subgroups (e.g., subgroups determined by","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"199 - 200"},"PeriodicalIF":0.5,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48425685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Bayesian penalized model for classification and selection of functional predictors using longitudinal MRI data from ADNI 利用ADNI的纵向MRI数据对功能预测因子进行分类和选择的贝叶斯惩罚模型

IF 0.5 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields

Pub Date : 2022-05-09 DOI: 10.1080/24754269.2022.2064611

Asish Banik, T. Maiti, Andrew R. Bender

ABSTRACT The main goal of this paper is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease (AD) classification. We use logistic regression in a Bayesian framework that includes many functional predictors. The direct sampling of regression coefficients from the Bayesian logistic model is difficult due to its complicated likelihood function. In high-dimensional scenarios, the selection of predictors is paramount with the introduction of either spike-and-slab priors, non-local priors, or Horseshoe priors. We seek to avoid the complicated Metropolis-Hastings approach and to develop an easily implementable Gibbs sampler. In addition, the Bayesian estimation provides proper estimates of the model parameters, which are also useful for building inference. Another advantage of working with logistic regression is that it calculates the log of odds of relative risk for AD compared to normal control based on the selected longitudinal predictors, rather than simply classifying patients based on cross-sectional estimates. Ultimately, however, we combine approaches and use a probability threshold to classify individual patients. We employ 49 functional predictors consisting of volumetric estimates of brain sub-regions, chosen for their established clinical significance. Moreover, the use of spike-and-slab priors ensures that many redundant predictors are dropped from the model.

摘要本文的主要目标是在大量亚区域脑体积MRI数据中使用纵向轨迹作为阿尔茨海默病（AD）分类的统计预测因素。我们在贝叶斯框架中使用逻辑回归，该框架包括许多函数预测因子。贝叶斯逻辑模型的回归系数的直接采样由于其复杂的似然函数而变得困难。在高维场景中，通过引入尖峰和平板先验、非局部先验或马蹄先验，预测因子的选择至关重要。我们试图避免复杂的Metropolis Hastings方法，并开发一个易于实施的吉布斯采样器。此外，贝叶斯估计提供了对模型参数的适当估计，这对于构建推理也是有用的。使用逻辑回归的另一个优点是，它根据选定的纵向预测因素计算AD与正常对照组的相对风险的对数，而不是简单地根据横断面估计对患者进行分类。然而，最终，我们将各种方法结合起来，并使用概率阈值对个别患者进行分类。我们采用了49个功能预测因子，包括大脑亚区域的体积估计，根据其既定的临床意义进行选择。此外，尖峰和板先验的使用确保了许多冗余的预测因子从模型中删除。

{"title":"Bayesian penalized model for classification and selection of functional predictors using longitudinal MRI data from ADNI","authors":"Asish Banik, T. Maiti, Andrew R. Bender","doi":"10.1080/24754269.2022.2064611","DOIUrl":"https://doi.org/10.1080/24754269.2022.2064611","url":null,"abstract":"ABSTRACT The main goal of this paper is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease (AD) classification. We use logistic regression in a Bayesian framework that includes many functional predictors. The direct sampling of regression coefficients from the Bayesian logistic model is difficult due to its complicated likelihood function. In high-dimensional scenarios, the selection of predictors is paramount with the introduction of either spike-and-slab priors, non-local priors, or Horseshoe priors. We seek to avoid the complicated Metropolis-Hastings approach and to develop an easily implementable Gibbs sampler. In addition, the Bayesian estimation provides proper estimates of the model parameters, which are also useful for building inference. Another advantage of working with logistic regression is that it calculates the log of odds of relative risk for AD compared to normal control based on the selected longitudinal predictors, rather than simply classifying patients based on cross-sectional estimates. Ultimately, however, we combine approaches and use a probability threshold to classify individual patients. We employ 49 functional predictors consisting of volumetric estimates of brain sub-regions, chosen for their established clinical significance. Moreover, the use of spike-and-slab priors ensures that many redundant predictors are dropped from the model.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"327 - 343"},"PeriodicalIF":0.5,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41643341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0