首页 > 最新文献

arXiv: Methodology最新文献

英文 中文
Statistical Inference on the Cure Time 治愈时间的统计推断
Pub Date : 2020-09-28 DOI: 10.6342/NTU201904221
Yueh Wang, Hung Hung
In population-based cancer survival analysis, the net survival is important for government to assess health care programs. For decades, it is observed that the net survival reaches a plateau after long-term follow-up, this is so called ``statistical cure''. Several methods were proposed to address the statistical cure. Besides, the cure time can be used to evaluate the time period of a health care program for a specific patient population, and it also can be helpful for a clinician to explain the prognosis for patients, therefore the cure time is an important health care index. However, those proposed methods assume the cure time to be infinity, thus it is inconvenient to make inference on the cure time. In this dissertation, we define a more general concept of statistical cure via conditional survival. Based on the newly defined statistical cure, the cure time is well defined. We develop cure time model methodologies and show a variety of properties through simulation. In data analysis, cure times are estimated for 22 major cancers in Taiwan, we further use colorectal cancer data as an example to conduct statistical inference via cure time model with covariate sex, age group, and stage. This dissertation provides a methodology to obtain cure time estimate, which can contribute to public health policy making.
在以人群为基础的癌症生存分析中,净生存是政府评估医疗保健计划的重要指标。几十年来,观察到净生存率在长期随访后达到平台期,这就是所谓的“统计治愈”。提出了几种方法来解决统计治愈问题。此外,治愈时间还可以用来评价某一特定患者群体的医疗保健计划的时间,也可以帮助临床医生解释患者的预后,因此治愈时间是一项重要的医疗保健指标。然而,这些方法都假定固化时间为无穷大,不便于对固化时间进行推断。在本文中,我们通过条件生存定义了一个更一般的统计治愈概念。基于新定义的统计治愈,治愈时间得到了很好的定义。我们开发了固化时间模型方法,并通过仿真显示了各种特性。在数据分析中,我们估计了台湾22种主要癌症的治愈时间,我们进一步以结直肠癌数据为例,通过带有协变量性别、年龄组和分期的治愈时间模型进行统计推断。本文提供了一种估算治愈时间的方法,可为公共卫生政策的制定提供参考。
{"title":"Statistical Inference on the Cure Time","authors":"Yueh Wang, Hung Hung","doi":"10.6342/NTU201904221","DOIUrl":"https://doi.org/10.6342/NTU201904221","url":null,"abstract":"In population-based cancer survival analysis, the net survival is important for government to assess health care programs. For decades, it is observed that the net survival reaches a plateau after long-term follow-up, this is so called ``statistical cure''. Several methods were proposed to address the statistical cure. Besides, the cure time can be used to evaluate the time period of a health care program for a specific patient population, and it also can be helpful for a clinician to explain the prognosis for patients, therefore the cure time is an important health care index. However, those proposed methods assume the cure time to be infinity, thus it is inconvenient to make inference on the cure time. In this dissertation, we define a more general concept of statistical cure via conditional survival. Based on the newly defined statistical cure, the cure time is well defined. We develop cure time model methodologies and show a variety of properties through simulation. In data analysis, cure times are estimated for 22 major cancers in Taiwan, we further use colorectal cancer data as an example to conduct statistical inference via cure time model with covariate sex, age group, and stage. This dissertation provides a methodology to obtain cure time estimate, which can contribute to public health policy making.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134620431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parsimonious Feature Extraction Methods: Extending Robust Probabilistic Projections with Generalized Skew-t 简化特征提取方法:扩展广义Skew-t稳健概率投影
Pub Date : 2020-09-24 DOI: 10.2139/ssrn.3678383
Dorota Toczydlowska, G. Peters, P. Shevchenko
We propose a novel generalisation to the Student-t Probabilistic Principal Component methodology which: (1) accounts for an asymmetric distribution of the observation data; (2) is a framework for grouped and generalised multiple-degree-of-freedom structures, which provides a more flexible approach to modelling groups of marginal tail dependence in the observation data; and (3) separates the tail effect of the error terms and factors. The new feature extraction methods are derived in an incomplete data setting to efficiently handle the presence of missing values in the observation vector. We discuss various special cases of the algorithm being a result of simplified assumptions on the process generating the data. The applicability of the new framework is illustrated on a data set that consists of crypto currencies with the highest market capitalisation.
我们对Student-t概率主成分方法提出了一种新的推广方法,该方法:(1)解释了观测数据的不对称分布;(2)是一种用于分组和广义多自由度结构的框架,它为观测数据中边缘尾依赖性组的建模提供了一种更灵活的方法;(3)分离误差项和因子的尾效应。为了有效地处理观测向量中缺失值的存在,提出了一种新的特征提取方法。我们讨论了算法的各种特殊情况,这些情况是对生成数据的过程进行简化假设的结果。新框架的适用性在由市值最高的加密货币组成的数据集上得到了说明。
{"title":"Parsimonious Feature Extraction Methods: Extending Robust Probabilistic Projections with Generalized Skew-t","authors":"Dorota Toczydlowska, G. Peters, P. Shevchenko","doi":"10.2139/ssrn.3678383","DOIUrl":"https://doi.org/10.2139/ssrn.3678383","url":null,"abstract":"We propose a novel generalisation to the Student-t Probabilistic Principal Component methodology which: (1) accounts for an asymmetric distribution of the observation data; (2) is a framework for grouped and generalised multiple-degree-of-freedom structures, which provides a more flexible approach to modelling groups of marginal tail dependence in the observation data; and (3) separates the tail effect of the error terms and factors. The new feature extraction methods are derived in an incomplete data setting to efficiently handle the presence of missing values in the observation vector. We discuss various special cases of the algorithm being a result of simplified assumptions on the process generating the data. The applicability of the new framework is illustrated on a data set that consists of crypto currencies with the highest market capitalisation.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128654223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Introduction to Proximal Causal Learning 近端因果学习导论
Pub Date : 2020-09-23 DOI: 10.1101/2020.09.21.20198762
E. T. Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, Wang Miao
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators' ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions. Causal learning from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal learning, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. Sufficient conditions for nonparametric identification are given, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation. These may be viewed as generalizations of Robins' foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Both point treatment and time-varying treatment settings are considered, and an application of proximal g-computation of causal effects is given for illustration.
从观测数据进行因果推断的一个标准假设是,人们已经测量了足够丰富的协变量集,以确保在协变量层内,受试者在观察到的处理值之间是可交换的。观察性研究中对互换性假设的怀疑通常是有根据的,因为它取决于研究者准确测量协变量的能力,这些协变量捕获了所有潜在的混淆源。实际上,混淆机制很少(如果有的话)可以从测量的协变量中确定地了解到。因此,人们只能希望协变量测量充其量只能代表观察性研究中运行的真正潜在混淆机制,从而使基于标准互换性条件的因果断言无效。从代理中进行因果学习是一个具有挑战性的逆向问题,至今仍未得到解决。在本文中,我们为近端因果学习引入了一个正式的潜在结果框架,虽然明确承认协变量测量是混淆机制的不完美代理,但它提供了一个机会,可以在基于测量协变量的互换性失败的情况下了解因果效应。给出了非参数辨识的充分条件,给出了估计的近端g公式和相应的近端g计算算法。这些可以看作是罗宾斯的基本g公式和g计算算法的概括,它们明确地解释了由于未测量的混杂而产生的偏差。考虑了点处理和时变处理设置,并给出了因果效应的近端g计算的应用。
{"title":"An Introduction to Proximal Causal Learning","authors":"E. T. Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, Wang Miao","doi":"10.1101/2020.09.21.20198762","DOIUrl":"https://doi.org/10.1101/2020.09.21.20198762","url":null,"abstract":"A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators' ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions. Causal learning from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal learning, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. Sufficient conditions for nonparametric identification are given, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation. These may be viewed as generalizations of Robins' foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Both point treatment and time-varying treatment settings are considered, and an application of proximal g-computation of causal effects is given for illustration.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123629812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Bayesian Causal Inference in Probit Graphical Models 概率图模型中的贝叶斯因果推理
Pub Date : 2020-09-10 DOI: 10.1214/21-BA1260
F. Castelletti, G. Consonni
We consider a binary response which is potentially affected by a set of continuous variables. Of special interest is the causal effect on the response due to an intervention on a specific variable. The latter can be meaningfully determined on the basis of observational data through suitable assumptions on the data generating mechanism. In particular we assume that the joint distribution obeys the conditional independencies (Markov properties) inherent in a Directed Acyclic Graph (DAG), and the DAG is given a causal interpretation through the notion of interventional distribution. We propose a DAG-probit model where the response is generated by discretization through a random threshold of a continuous latent variable and the latter, jointly with the remaining continuous variables, has a distribution belonging to a zero-mean Gaussian model whose covariance matrix is constrained to satisfy the Markov properties of the DAG. Our model leads to a natural definition of causal effect conditionally on a given DAG. Since the DAG which generates the observations is unknown, we present an efficient MCMC algorithm whose target is the posterior distribution on the space of DAGs, the Cholesky parameters of the concentration matrix, and the threshold linking the response to the latent. Our end result is a Bayesian Model Averaging estimate of the causal effect which incorporates parameter, as well as model, uncertainty. The methodology is assessed using simulation experiments and applied to a gene expression data set originating from breast cancer stem cells.
我们考虑一个可能受到一组连续变量影响的二元响应。特别令人感兴趣的是由于对特定变量的干预而对反应的因果效应。后者可以通过对数据产生机制的适当假设,在观测数据的基础上有意义地确定。特别地,我们假设联合分布服从有向无环图(DAG)固有的条件独立性(马尔可夫性质),并且通过介入分布的概念给出了DAG的因果解释。我们提出了一个DAG-probit模型,其中响应是通过一个连续潜变量的随机阈值离散产生的,后者与剩余的连续变量一起具有属于零均值高斯模型的分布,其协方差矩阵被约束以满足DAG的马尔可夫性质。我们的模型在给定DAG的条件下导致因果效应的自然定义。由于产生观测值的DAG是未知的,我们提出了一种高效的MCMC算法,其目标是DAG在空间上的后验分布、浓度矩阵的Cholesky参数以及连接响应与潜在的阈值。我们的最终结果是一个贝叶斯模型平均估计的因果关系,其中包括参数,以及模型,不确定性。该方法通过模拟实验进行评估,并应用于源自乳腺癌干细胞的基因表达数据集。
{"title":"Bayesian Causal Inference in Probit Graphical Models","authors":"F. Castelletti, G. Consonni","doi":"10.1214/21-BA1260","DOIUrl":"https://doi.org/10.1214/21-BA1260","url":null,"abstract":"We consider a binary response which is potentially affected by a set of continuous variables. Of special interest is the causal effect on the response due to an intervention on a specific variable. The latter can be meaningfully determined on the basis of observational data through suitable assumptions on the data generating mechanism. In particular we assume that the joint distribution obeys the conditional independencies (Markov properties) inherent in a Directed Acyclic Graph (DAG), and the DAG is given a causal interpretation through the notion of interventional distribution. We propose a DAG-probit model where the response is generated by discretization through a random threshold of a continuous latent variable and the latter, jointly with the remaining continuous variables, has a distribution belonging to a zero-mean Gaussian model whose covariance matrix is constrained to satisfy the Markov properties of the DAG. Our model leads to a natural definition of causal effect conditionally on a given DAG. Since the DAG which generates the observations is unknown, we present an efficient MCMC algorithm whose target is the posterior distribution on the space of DAGs, the Cholesky parameters of the concentration matrix, and the threshold linking the response to the latent. Our end result is a Bayesian Model Averaging estimate of the causal effect which incorporates parameter, as well as model, uncertainty. The methodology is assessed using simulation experiments and applied to a gene expression data set originating from breast cancer stem cells.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121625492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Ensemble Riemannian Data Assimilation over the WassersteinSpace WassersteinSpace上的集合黎曼数据同化
Pub Date : 2020-09-07 DOI: 10.5194/NPG-2021-11
S. Tamang, A. Ebtehaj, P. V. van Leeuwen, Dongmian Zou, Gilad Lerman
Abstract. In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations – enabling to formally penalize geophysical biases in state-space with non-Gaussian distributions. The new approach is applied to dissipative and chaotic evolutionary dynamics and its potential advantages and limitations are highlighted compared to the classic variational and filtering data assimilation approaches under systematic and random errors.
摘要在本文中,我们提出了一个集成数据同化范式在黎曼流形配备了瓦瑟斯坦度量。与欧几里得空间中的欧拉误差惩罚不同,Wasserstein度量可以捕获背景状态和观测值的平方可积概率分布形状之间的平移和差异,从而能够在非高斯分布的状态空间中正式惩罚地球物理偏差。将该方法应用于耗散和混沌演化动力学,并与系统误差和随机误差下的经典变分和滤波数据同化方法相比,突出了其潜在的优点和局限性。
{"title":"Ensemble Riemannian Data Assimilation over the Wasserstein\u0000Space","authors":"S. Tamang, A. Ebtehaj, P. V. van Leeuwen, Dongmian Zou, Gilad Lerman","doi":"10.5194/NPG-2021-11","DOIUrl":"https://doi.org/10.5194/NPG-2021-11","url":null,"abstract":"Abstract. In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations – enabling to formally penalize geophysical biases in state-space with non-Gaussian distributions. The new approach is applied to dissipative and chaotic evolutionary dynamics and its potential advantages and limitations are highlighted compared to the classic variational and filtering data assimilation approaches under systematic and random errors.\u0000","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129540574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Evaluating Catchment Models as Multiple Working Hypotheses: on the Role of Error Metrics, Parameter Sampling, Model Structure, and Data Information Content 评价集水区模型作为多个工作假设:关于误差度量、参数抽样、模型结构和数据信息内容的作用
Pub Date : 2020-09-01 DOI: 10.1002/essoar.10504066.1
S. Khatami, T. Peterson, M. Peel, A. Western
To evaluate models as hypotheses, we developed the method of Flux Mapping to construct a hypothesis space based on dominant runoff generating mechanisms. Acceptable model runs, defined as total simulated flow with similar (and minimal) model error, are mapped to the hypothesis space given their simulated runoff components. In each modeling case, the hypothesis space is the result of an interplay of factors: model structure and parameterization, chosen error metric, and data information content. The aim of this study is to disentangle the role of each factor in model evaluation. We used two model structures (SACRAMENTO and SIMHYD), two parameter sampling approaches (Latin Hypercube Sampling of the parameter space and guided-search of the solution space), three widely used error metrics (Nash-Sutcliffe Efficiency - NSE, Kling-Gupta Efficiency skill score - KGEss, and Willmott refined Index of Agreement - WIA), and hydrological data from a large sample of Australian catchments. First, we characterized how the three error metrics behave under different error types and magnitudes independent of any modeling. We then conducted a series of controlled experiments to unpack the role of each factor in runoff generation hypotheses. We show that KGEss is a more reliable metric compared to NSE and WIA for model evaluation. We further demonstrate that only changing the error metric -- while other factors remain constant -- can change the model solution space and hence vary model performance, parameter sampling sufficiency, and or the flux map. We show how unreliable error metrics and insufficient parameter sampling impair model-based inferences, particularly runoff generation hypotheses.
为了将模型作为假设进行评估,我们开发了通量映射的方法来构建基于主要产流机制的假设空间。可接受的模型运行,定义为具有相似(和最小)模型误差的总模拟流量,被映射到给定其模拟径流成分的假设空间。在每个建模案例中,假设空间是模型结构和参数化、选择的误差度量和数据信息内容等因素相互作用的结果。本研究的目的是厘清每个因素在模型评估中的作用。我们使用了两种模型结构(SACRAMENTO和SIMHYD),两种参数采样方法(参数空间的拉丁超立方采样和解决方案空间的引导搜索),三种广泛使用的误差度量(Nash-Sutcliffe效率- NSE, KGEss - Kling-Gupta效率技能分数和Willmott精炼一致指数- WIA),以及来自澳大利亚流域的大量样本的水文数据。首先,我们描述了三种误差度量在不同的误差类型和大小下独立于任何建模的行为。然后,我们进行了一系列对照实验,以解开每个因素在径流产生假设中的作用。我们表明,与NSE和WIA相比,KGEss是一个更可靠的模型评估指标。我们进一步证明,在其他因素保持不变的情况下,仅改变误差度量可以改变模型解空间,从而改变模型性能、参数采样充分性和/或通量图。我们展示了不可靠的误差度量和不充分的参数采样如何损害基于模型的推断,特别是径流生成假设。
{"title":"Evaluating Catchment Models as Multiple Working Hypotheses: on the Role of Error Metrics, Parameter Sampling, Model Structure, and Data Information Content","authors":"S. Khatami, T. Peterson, M. Peel, A. Western","doi":"10.1002/essoar.10504066.1","DOIUrl":"https://doi.org/10.1002/essoar.10504066.1","url":null,"abstract":"To evaluate models as hypotheses, we developed the method of Flux Mapping to construct a hypothesis space based on dominant runoff generating mechanisms. Acceptable model runs, defined as total simulated flow with similar (and minimal) model error, are mapped to the hypothesis space given their simulated runoff components. In each modeling case, the hypothesis space is the result of an interplay of factors: model structure and parameterization, chosen error metric, and data information content. The aim of this study is to disentangle the role of each factor in model evaluation. We used two model structures (SACRAMENTO and SIMHYD), two parameter sampling approaches (Latin Hypercube Sampling of the parameter space and guided-search of the solution space), three widely used error metrics (Nash-Sutcliffe Efficiency - NSE, Kling-Gupta Efficiency skill score - KGEss, and Willmott refined Index of Agreement - WIA), and hydrological data from a large sample of Australian catchments. First, we characterized how the three error metrics behave under different error types and magnitudes independent of any modeling. We then conducted a series of controlled experiments to unpack the role of each factor in runoff generation hypotheses. We show that KGEss is a more reliable metric compared to NSE and WIA for model evaluation. We further demonstrate that only changing the error metric -- while other factors remain constant -- can change the model solution space and hence vary model performance, parameter sampling sufficiency, and or the flux map. We show how unreliable error metrics and insufficient parameter sampling impair model-based inferences, particularly runoff generation hypotheses.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121393771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Cramér–von Mises Test of Uniformity on the Hypersphere 关于超球均匀性的cram<s:1> - von Mises检验
Pub Date : 2020-08-25 DOI: 10.1007/978-3-030-69944-4_12
Eduardo Garc'ia-Portugu'es, Paula Navarro-Esteban, J. A. Cuesta-Albertos
{"title":"A Cramér–von Mises Test of Uniformity on the Hypersphere","authors":"Eduardo Garc'ia-Portugu'es, Paula Navarro-Esteban, J. A. Cuesta-Albertos","doi":"10.1007/978-3-030-69944-4_12","DOIUrl":"https://doi.org/10.1007/978-3-030-69944-4_12","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"355 14‐15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Maximin $Phi_{p}$-Efficient Design for Multivariate GLM 一个Maximin $Phi_{p}$-高效的多元GLM设计
Pub Date : 2020-08-14 DOI: 10.5705/ss.202020.0278
Yiou Li, Lulu Kang, Xinwei Deng
Experimental designs for a generalized linear model (GLM) often depend on the specification of the model, including the link function, the predictors, and unknown parameters, such as the regression coefficients. To deal with uncertainties of these model specifications, it is important to construct optimal designs with high efficiency under such uncertainties. Existing methods such as Bayesian experimental designs often use prior distributions of model specifications to incorporate model uncertainties into the design criterion. Alternatively, one can obtain the design by optimizing the worst-case design efficiency with respect to uncertainties of model specifications. In this work, we propose a new Maximin $Phi_p$-Efficient (or Mm-$Phi_p$ for short) design which aims at maximizing the minimum $Phi_p$-efficiency under model uncertainties. Based on the theoretical properties of the proposed criterion, we develop an efficient algorithm with sound convergence properties to construct the Mm-$Phi_p$ design. The performance of the proposed Mm-$Phi_p$ design is assessed through several numerical examples.
广义线性模型(GLM)的实验设计通常取决于模型的规格,包括链接函数、预测因子和未知参数,如回归系数。为了处理这些模型规格的不确定性,在这种不确定性下构建高效的优化设计是很重要的。现有方法如贝叶斯实验设计通常使用模型规格的先验分布将模型不确定性纳入设计准则。另一种方法是,根据模型规格的不确定性对最坏情况下的设计效率进行优化。在这项工作中,我们提出了一种新的Maximin $Phi_p$- efficient(或简称Mm-$Phi_p$)设计,旨在最大化模型不确定性下的最小$Phi_p$-efficiency。基于该准则的理论性质,我们开发了一种具有良好收敛性的高效算法来构造Mm-$Phi_p$设计。通过几个数值算例对所提出的Mm-$Phi_p$设计的性能进行了评估。
{"title":"A Maximin $Phi_{p}$-Efficient Design for Multivariate GLM","authors":"Yiou Li, Lulu Kang, Xinwei Deng","doi":"10.5705/ss.202020.0278","DOIUrl":"https://doi.org/10.5705/ss.202020.0278","url":null,"abstract":"Experimental designs for a generalized linear model (GLM) often depend on the specification of the model, including the link function, the predictors, and unknown parameters, such as the regression coefficients. To deal with uncertainties of these model specifications, it is important to construct optimal designs with high efficiency under such uncertainties. Existing methods such as Bayesian experimental designs often use prior distributions of model specifications to incorporate model uncertainties into the design criterion. Alternatively, one can obtain the design by optimizing the worst-case design efficiency with respect to uncertainties of model specifications. In this work, we propose a new Maximin $Phi_p$-Efficient (or Mm-$Phi_p$ for short) design which aims at maximizing the minimum $Phi_p$-efficiency under model uncertainties. Based on the theoretical properties of the proposed criterion, we develop an efficient algorithm with sound convergence properties to construct the Mm-$Phi_p$ design. The performance of the proposed Mm-$Phi_p$ design is assessed through several numerical examples.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Bayesian Approach to Spherical Factor Analysis for Binary Data 二值数据球面因子分析的贝叶斯方法
Pub Date : 2020-08-12 DOI: 10.2139/ssrn.3672055
Xingchen Yu, Abel Rodríguez
Factor models are widely used across diverse areas of application for purposes that include dimensionality reduction, covariance estimation, and feature engineering. Traditional factor models can be seen as an instance of linear embedding methods that project multivariate observations onto a lower dimensional Euclidean latent space. This paper discusses a new class of geometric embedding models for multivariate binary data in which the embedding space correspond to a spherical manifold, with potentially unknown dimension. The resulting models include traditional factor models as a special case, but provide additional flexibility. Furthermore, unlike other techniques for geometric embedding, the models are easy to interpret, and the uncertainty associated with the latent features can be properly quantified. These advantages are illustrated using both simulation studies and real data on voting records from the U.S. Senate.
因子模型广泛应用于不同的应用领域,包括降维、协方差估计和特征工程。传统的因子模型可以看作是线性嵌入方法的一个实例,它将多变量观测投影到较低维的欧几里得潜空间上。本文讨论了一类新的多元二值数据的几何嵌入模型,其中嵌入空间对应于一个可能未知维数的球面流形。生成的模型包括传统的因子模型作为特例,但提供了额外的灵活性。此外,与其他几何嵌入技术不同,该模型易于解释,并且与潜在特征相关的不确定性可以适当量化。这些优势可以通过模拟研究和美国参议院投票记录的真实数据来说明。
{"title":"A Bayesian Approach to Spherical Factor Analysis for Binary Data","authors":"Xingchen Yu, Abel Rodríguez","doi":"10.2139/ssrn.3672055","DOIUrl":"https://doi.org/10.2139/ssrn.3672055","url":null,"abstract":"Factor models are widely used across diverse areas of application for purposes that include dimensionality reduction, covariance estimation, and feature engineering. Traditional factor models can be seen as an instance of linear embedding methods that project multivariate observations onto a lower dimensional Euclidean latent space. This paper discusses a new class of geometric embedding models for multivariate binary data in which the embedding space correspond to a spherical manifold, with potentially unknown dimension. The resulting models include traditional factor models as a special case, but provide additional flexibility. Furthermore, unlike other techniques for geometric embedding, the models are easy to interpret, and the uncertainty associated with the latent features can be properly quantified. These advantages are illustrated using both simulation studies and real data on voting records from the U.S. Senate.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128182722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis tests for structured rank correlation matrices 结构化秩相关矩阵的假设检验
Pub Date : 2020-07-19 DOI: 10.1080/01621459.2022.2096619
S. Perreault, J. Nešlehová, T. Duchesne
Joint modeling of a large number of variables often requires dimension reduction strategies that lead to structural assumptions of the underlying correlation matrix, such as equal pair-wise correlations within subsets of variables. The underlying correlation matrix is thus of interest for both model specification and model validation. In this paper, we develop tests of the hypothesis that the entries of the Kendall rank correlation matrix are linear combinations of a smaller number of parameters. The asymptotic behaviour of the proposed test statistics is investigated both when the dimension is fixed and when it grows with the sample size. We pay special attention to the restricted hypothesis of partial exchangeability, which contains full exchangeability as a special case. We show that under partial exchangeability, the test statistics and their large-sample distributions simplify, which leads to computational advantages and better performance of the tests. We propose various scalable numerical strategies for implementation of the proposed procedures, investigate their finite sample behaviour through simulations, and demonstrate their use on a real dataset of mean sea levels at various geographical locations.
大量变量的联合建模通常需要降维策略,这会导致对底层相关矩阵的结构性假设,例如变量子集内的相等成对相关性。因此,潜在的相关矩阵对于模型规范和模型验证都很重要。在本文中,我们开发了肯德尔秩相关矩阵的条目是较少数量参数的线性组合的假设的检验。研究了所提出的检验统计量在维数固定和随样本量增长时的渐近行为。我们特别注意了部分可交换性的限制假设,它包含了完全可交换性作为一种特例。结果表明,在部分可交换性下,测试统计量及其大样本分布简化,从而具有计算优势和更好的测试性能。我们提出了各种可扩展的数值策略来实施所提议的程序,通过模拟研究它们的有限样本行为,并展示了它们在不同地理位置的平均海平面的真实数据集上的使用。
{"title":"Hypothesis tests for structured rank correlation matrices","authors":"S. Perreault, J. Nešlehová, T. Duchesne","doi":"10.1080/01621459.2022.2096619","DOIUrl":"https://doi.org/10.1080/01621459.2022.2096619","url":null,"abstract":"Joint modeling of a large number of variables often requires dimension reduction strategies that lead to structural assumptions of the underlying correlation matrix, such as equal pair-wise correlations within subsets of variables. The underlying correlation matrix is thus of interest for both model specification and model validation. In this paper, we develop tests of the hypothesis that the entries of the Kendall rank correlation matrix are linear combinations of a smaller number of parameters. The asymptotic behaviour of the proposed test statistics is investigated both when the dimension is fixed and when it grows with the sample size. We pay special attention to the restricted hypothesis of partial exchangeability, which contains full exchangeability as a special case. We show that under partial exchangeability, the test statistics and their large-sample distributions simplify, which leads to computational advantages and better performance of the tests. We propose various scalable numerical strategies for implementation of the proposed procedures, investigate their finite sample behaviour through simulations, and demonstrate their use on a real dataset of mean sea levels at various geographical locations.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115813299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
arXiv: Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1