首页 > 最新文献

arXiv - MATH - Statistics Theory最新文献

英文 中文
Convergence of opinions 意见趋同
Pub Date : 2023-12-04 DOI: arxiv-2312.02033
Vladimir Vovk
This paper establishes a game-theoretic version of the classicalBlackwell-Dubins result. We consider two forecasters who at each step issueprobability forecasts for the infinite future. Our result says that either atleast one of the two forecasters will be discredited or their forecasts willconverge in total variation.
本文建立了经典blackwell - dubins结果的博弈论版本。我们考虑两个预测者,他们在每一步都对无限的未来进行概率预测。我们的结果表明,两个预测者中至少有一个将被怀疑,或者他们的预测将在总变化中收敛。
{"title":"Convergence of opinions","authors":"Vladimir Vovk","doi":"arxiv-2312.02033","DOIUrl":"https://doi.org/arxiv-2312.02033","url":null,"abstract":"This paper establishes a game-theoretic version of the classical\u0000Blackwell-Dubins result. We consider two forecasters who at each step issue\u0000probability forecasts for the infinite future. Our result says that either at\u0000least one of the two forecasters will be discredited or their forecasts will\u0000converge in total variation.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cone Ranking for Multi-Criteria Decision Making 多标准决策的锥形排序
Pub Date : 2023-12-04 DOI: arxiv-2312.03006
Andreas H Hamel, Daniel Kostner
Recently introduced cone distribution functions from statistics are turnedinto multi-criteria decision making (MCDM) tools. It is demonstrated that thisprocedure can be considered as an upgrade of the weighted sum scalarizationinsofar as it absorbs a whole collection of weighted sum scalarizations at onceinstead of fixing a particular one in advance. Moreover, situations arecharacterized in which different types of rank reversal occur, and it isexplained why this might even be useful for analyzing the ranking procedure. Afew examples will be discussed and a potential application in machine learningis outlined.
最近从统计学中引入的锥分布函数被转化为多标准决策(MCDM)工具。研究表明,这一过程可被视为加权和标量化的升级版,因为它一次性吸收了整个加权和标量化集合,而不是事先固定一个特定的标量化。此外,我们还描述了发生不同类型排序逆转的情况,并解释了为什么这对分析排序程序可能有用。我们还将讨论几个例子,并概述其在机器学习中的潜在应用。
{"title":"Cone Ranking for Multi-Criteria Decision Making","authors":"Andreas H Hamel, Daniel Kostner","doi":"arxiv-2312.03006","DOIUrl":"https://doi.org/arxiv-2312.03006","url":null,"abstract":"Recently introduced cone distribution functions from statistics are turned\u0000into multi-criteria decision making (MCDM) tools. It is demonstrated that this\u0000procedure can be considered as an upgrade of the weighted sum scalarization\u0000insofar as it absorbs a whole collection of weighted sum scalarizations at once\u0000instead of fixing a particular one in advance. Moreover, situations are\u0000characterized in which different types of rank reversal occur, and it is\u0000explained why this might even be useful for analyzing the ranking procedure. A\u0000few examples will be discussed and a potential application in machine learning\u0000is outlined.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"485 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework for covariate adjustment under stratified randomization 分层随机化下协变量调整的统一框架
Pub Date : 2023-12-03 DOI: arxiv-2312.01266
Fuyi Tu, Wei Ma, Hanzhong Liu
Randomization, as a key technique in clinical trials, can eliminate sourcesof bias and produce comparable treatment groups. In randomized experiments, thetreatment effect is a parameter of general interest. Researchers have exploredthe validity of using linear models to estimate the treatment effect andperform covariate adjustment and thus improve the estimation efficiency.However, the relationship between covariates and outcomes is not necessarilylinear, and is often intricate. Advances in statistical theory and relatedcomputer technology allow us to use nonparametric and machine learning methodsto better estimate the relationship between covariates and outcomes and thusobtain further efficiency gains. However, theoretical studies on how to drawvalid inferences when using nonparametric and machine learning methods understratified randomization are yet to be conducted. In this paper, we discuss aunified framework for covariate adjustment and corresponding statisticalinference under stratified randomization and present a detailed proof of thevalidity of using local linear kernel-weighted least squares regression forcovariate adjustment in treatment effect estimators as a special case. In thecase of high-dimensional data, we additionally propose an algorithm forstatistical inference using machine learning methods under stratifiedrandomization, which makes use of sample splitting to alleviate therequirements on the asymptotic properties of machine learning methods. Finally,we compare the performances of treatment effect estimators using differentmachine learning methods by considering various data generation scenarios, toguide practical research.
随机化作为临床试验的一项关键技术,可以消除偏倚来源,产生可比较的治疗组。在随机实验中,治疗效果是一个普遍关注的参数。研究人员探索了使用线性模型估计治疗效果并进行协变量调整的有效性,从而提高了估计效率。然而,协变量和结果之间的关系并不一定是线性的,而且往往是复杂的。统计理论和相关计算机技术的进步使我们能够使用非参数和机器学习方法来更好地估计协变量和结果之间的关系,从而进一步提高效率。然而,如何在分层随机化下使用非参数和机器学习方法得出有效推论的理论研究尚未开展。本文讨论了分层随机化条件下协变量平差的统一框架和相应的统计推断,并以局部线性核加权最小二乘回归作为一个特例,详细证明了在治疗效果估计中使用协变量平差的有效性。在高维数据的情况下,我们还提出了一种在分层场随机化下使用机器学习方法进行统计推断的算法,该算法利用样本分裂来减轻机器学习方法对渐近性质的要求。最后,我们通过考虑不同的数据生成场景,比较了使用不同机器学习方法的治疗效果估计器的性能,以指导实际研究。
{"title":"A unified framework for covariate adjustment under stratified randomization","authors":"Fuyi Tu, Wei Ma, Hanzhong Liu","doi":"arxiv-2312.01266","DOIUrl":"https://doi.org/arxiv-2312.01266","url":null,"abstract":"Randomization, as a key technique in clinical trials, can eliminate sources\u0000of bias and produce comparable treatment groups. In randomized experiments, the\u0000treatment effect is a parameter of general interest. Researchers have explored\u0000the validity of using linear models to estimate the treatment effect and\u0000perform covariate adjustment and thus improve the estimation efficiency.\u0000However, the relationship between covariates and outcomes is not necessarily\u0000linear, and is often intricate. Advances in statistical theory and related\u0000computer technology allow us to use nonparametric and machine learning methods\u0000to better estimate the relationship between covariates and outcomes and thus\u0000obtain further efficiency gains. However, theoretical studies on how to draw\u0000valid inferences when using nonparametric and machine learning methods under\u0000stratified randomization are yet to be conducted. In this paper, we discuss a\u0000unified framework for covariate adjustment and corresponding statistical\u0000inference under stratified randomization and present a detailed proof of the\u0000validity of using local linear kernel-weighted least squares regression for\u0000covariate adjustment in treatment effect estimators as a special case. In the\u0000case of high-dimensional data, we additionally propose an algorithm for\u0000statistical inference using machine learning methods under stratified\u0000randomization, which makes use of sample splitting to alleviate the\u0000requirements on the asymptotic properties of machine learning methods. Finally,\u0000we compare the performances of treatment effect estimators using different\u0000machine learning methods by considering various data generation scenarios, to\u0000guide practical research.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"83 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the admissibility of Horvitz-Thompson estimator for estimating causal effects under network interference 网络干扰下估计因果效应的Horvitz-Thompson估计的可容许性
Pub Date : 2023-12-02 DOI: arxiv-2312.01234
Vishesh Karwa, Edoardo M. Airoldi
The Horvitz-Thompson (H-T) estimator is widely used for estimating varioustypes of average treatment effects under network interference. Wesystematically investigate the optimality properties of H-T estimator undernetwork interference, by embedding it in the class of all linear estimators. Inparticular, we show that in presence of any kind of network interference, H-Testimator is in-admissible in the class of all linear estimators when using acompletely randomized and a Bernoulli design. We also show that the H-Testimator becomes admissible under certain restricted randomization schemestermed as ``fixed exposure designs''. We give examples of such fixed exposuredesigns. It is well known that the H-T estimator is unbiased when correctweights are specified. Here, we derive the weights for unbiased estimation ofvarious causal effects, and illustrate how they depend not only on the design,but more importantly, on the assumed form of interference (which in many realworld situations is unknown at design stage), and the causal effect ofinterest.
Horvitz-Thompson (H-T)估计量被广泛用于估计网络干扰下各种类型的平均处理效果。通过将H-T估计量嵌入到所有线性估计量的类中,系统地研究了网络干扰下H-T估计量的最优性。特别地,我们证明了在存在任何类型的网络干扰的情况下,当使用完全随机和伯努利设计时,h -估计量在所有线性估计量的类别中是不允许的。我们还证明了h - estimator在一定的受限随机化方案(称为“固定暴露设计”)下是可接受的。我们给出了这种固定曝光设计的例子。众所周知,当正确权值被指定时,H-T估计量是无偏的。在这里,我们推导了各种因果效应的无偏估计的权重,并说明了它们不仅取决于设计,更重要的是,取决于假设的干扰形式(在许多现实世界的情况下,在设计阶段是未知的),以及兴趣的因果效应。
{"title":"On the admissibility of Horvitz-Thompson estimator for estimating causal effects under network interference","authors":"Vishesh Karwa, Edoardo M. Airoldi","doi":"arxiv-2312.01234","DOIUrl":"https://doi.org/arxiv-2312.01234","url":null,"abstract":"The Horvitz-Thompson (H-T) estimator is widely used for estimating various\u0000types of average treatment effects under network interference. We\u0000systematically investigate the optimality properties of H-T estimator under\u0000network interference, by embedding it in the class of all linear estimators. In\u0000particular, we show that in presence of any kind of network interference, H-T\u0000estimator is in-admissible in the class of all linear estimators when using a\u0000completely randomized and a Bernoulli design. We also show that the H-T\u0000estimator becomes admissible under certain restricted randomization schemes\u0000termed as ``fixed exposure designs''. We give examples of such fixed exposure\u0000designs. It is well known that the H-T estimator is unbiased when correct\u0000weights are specified. Here, we derive the weights for unbiased estimation of\u0000various causal effects, and illustrate how they depend not only on the design,\u0000but more importantly, on the assumed form of interference (which in many real\u0000world situations is unknown at design stage), and the causal effect of\u0000interest.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"87 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bagged Regularized $k$-Distances for Anomaly Detection 袋化正则化$k$-距离异常检测
Pub Date : 2023-12-02 DOI: arxiv-2312.01046
Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang
We consider the paradigm of unsupervised anomaly detection, which involvesthe identification of anomalies within a dataset in the absence of labeledexamples. Though distance-based methods are top-performing for unsupervisedanomaly detection, they suffer heavily from the sensitivity to the choice ofthe number of the nearest neighbors. In this paper, we propose a newdistance-based algorithm called bagged regularized $k$-distances for anomalydetection (BRDAD) converting the unsupervised anomaly detection problem into aconvex optimization problem. Our BRDAD algorithm selects the weights byminimizing the surrogate risk, i.e., the finite sample bound of the empiricalrisk of the bagged weighted $k$-distances for density estimation (BWDDE). Thisapproach enables us to successfully address the sensitivity challenge of thehyperparameter choice in distance-based algorithms. Moreover, when dealing withlarge-scale datasets, the efficiency issues can be addressed by theincorporated bagging technique in our BRDAD algorithm. On the theoretical side,we establish fast convergence rates of the AUC regret of our algorithm anddemonstrate that the bagging technique significantly reduces the computationalcomplexity. On the practical side, we conduct numerical experiments on anomalydetection benchmarks to illustrate the insensitivity of parameter selection ofour algorithm compared with other state-of-the-art distance-based methods.Moreover, promising improvements are brought by applying the bagging techniquein our algorithm on real-world datasets.
我们考虑无监督异常检测的范例,它涉及在没有标记示例的情况下识别数据集中的异常。尽管基于距离的方法在无监督异常检测中表现最好,但它们对最近邻居数量选择的敏感性很大。在本文中,我们提出了一种新的基于距离的算法,称为bagged正则化$k$-距离异常检测(BRDAD),将无监督异常检测问题转化为凸优化问题。我们的BRDAD算法通过最小化代理风险来选择权重,即密度估计(BWDDE)的加权距离的经验风险的有限样本界。这种方法使我们能够成功地解决基于距离的算法中超参数选择的敏感性挑战。此外,当处理大规模数据集时,我们的BRDAD算法中结合的bagging技术可以解决效率问题。在理论方面,我们建立了我们算法的AUC遗憾的快速收敛速率,并证明了bagging技术显着降低了计算复杂度。在实践方面,我们在异常检测基准上进行了数值实验,以说明与其他最先进的基于距离的方法相比,我们的算法的参数选择不敏感。此外,将bagging技术应用于实际数据集的算法也带来了有希望的改进。
{"title":"Bagged Regularized $k$-Distances for Anomaly Detection","authors":"Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang","doi":"arxiv-2312.01046","DOIUrl":"https://doi.org/arxiv-2312.01046","url":null,"abstract":"We consider the paradigm of unsupervised anomaly detection, which involves\u0000the identification of anomalies within a dataset in the absence of labeled\u0000examples. Though distance-based methods are top-performing for unsupervised\u0000anomaly detection, they suffer heavily from the sensitivity to the choice of\u0000the number of the nearest neighbors. In this paper, we propose a new\u0000distance-based algorithm called bagged regularized $k$-distances for anomaly\u0000detection (BRDAD) converting the unsupervised anomaly detection problem into a\u0000convex optimization problem. Our BRDAD algorithm selects the weights by\u0000minimizing the surrogate risk, i.e., the finite sample bound of the empirical\u0000risk of the bagged weighted $k$-distances for density estimation (BWDDE). This\u0000approach enables us to successfully address the sensitivity challenge of the\u0000hyperparameter choice in distance-based algorithms. Moreover, when dealing with\u0000large-scale datasets, the efficiency issues can be addressed by the\u0000incorporated bagging technique in our BRDAD algorithm. On the theoretical side,\u0000we establish fast convergence rates of the AUC regret of our algorithm and\u0000demonstrate that the bagging technique significantly reduces the computational\u0000complexity. On the practical side, we conduct numerical experiments on anomaly\u0000detection benchmarks to illustrate the insensitivity of parameter selection of\u0000our algorithm compared with other state-of-the-art distance-based methods.\u0000Moreover, promising improvements are brought by applying the bagging technique\u0000in our algorithm on real-world datasets.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"88 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and Inference for Synthetic Controls with Confounding 混杂综合控制的辨识与推理
Pub Date : 2023-12-01 DOI: arxiv-2312.00955
Guido W. Imbens, Davide Viviano
This paper studies inference on treatment effects in panel data settings withunobserved confounding. We model outcome variables through a factor model withrandom factors and loadings. Such factors and loadings may act as unobservedconfounders: when the treatment is implemented depends on time-varying factors,and who receives the treatment depends on unit-level confounders. We study theidentification of treatment effects and illustrate the presence of a trade-offbetween time and unit-level confounding. We provide asymptotic results forinference for several Synthetic Control estimators and show that differentsources of randomness should be considered for inference, depending on thenature of confounding. We conclude with a comparison of Synthetic Controlestimators with alternatives for factor models.
本文研究了在未观察到的混杂情况下面板数据设置对治疗效果的推断。我们通过一个带有随机因素和负荷的因子模型来模拟结果变量。这些因素和负荷可能作为未观察到的混杂因素:何时实施治疗取决于时变因素,谁接受治疗取决于单位水平的混杂因素。我们研究了治疗效果的识别,并说明了时间和单位水平混杂之间的权衡。我们提供了几个综合控制估计的渐近推断结果,并表明根据混杂的性质,应该考虑不同的随机性来源。最后,我们比较了综合控制估计器与替代因子模型。
{"title":"Identification and Inference for Synthetic Controls with Confounding","authors":"Guido W. Imbens, Davide Viviano","doi":"arxiv-2312.00955","DOIUrl":"https://doi.org/arxiv-2312.00955","url":null,"abstract":"This paper studies inference on treatment effects in panel data settings with\u0000unobserved confounding. We model outcome variables through a factor model with\u0000random factors and loadings. Such factors and loadings may act as unobserved\u0000confounders: when the treatment is implemented depends on time-varying factors,\u0000and who receives the treatment depends on unit-level confounders. We study the\u0000identification of treatment effects and illustrate the presence of a trade-off\u0000between time and unit-level confounding. We provide asymptotic results for\u0000inference for several Synthetic Control estimators and show that different\u0000sources of randomness should be considered for inference, depending on the\u0000nature of confounding. We conclude with a comparison of Synthetic Control\u0000estimators with alternatives for factor models.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference on common trends in functional time series 函数时间序列中共同趋势的推断
Pub Date : 2023-12-01 DOI: arxiv-2312.00590
Morten Ørregaard Nielsen, Won-Ki Seo, Dakyung Seong
This paper studies statistical inference on unit roots and cointegration fortime series in a Hilbert space. We develop statistical inference on the numberof common stochastic trends that are embedded in the time series, i.e., thedimension of the nonstationary subspace. We also consider hypotheses on thenonstationary subspace itself. The Hilbert space can be of an arbitrarily largedimension, and our methods remain asymptotically valid even when the timeseries of interest takes values in a subspace of possibly unknown dimension.This has wide applicability in practice; for example, in the case ofcointegrated vector time series of finite dimension, in a high-dimensionalfactor model that includes a finite number of nonstationary factors, in thecase of cointegrated curve-valued (or function-valued) time series, andnonstationary dynamic functional factor models. We include two empiricalillustrations to the term structure of interest rates and labor market indices,respectively.
本文研究了Hilbert空间中时间序列的单位根和协整的统计推断。我们对嵌入在时间序列中的常见随机趋势的数量,即非平稳子空间的维数进行了统计推断。我们还考虑了对非平稳子空间本身的假设。希尔伯特空间的维数可以是任意大的,即使我们感兴趣的时间序列在维数可能未知的子空间中取值,我们的方法仍然是渐近有效的。这在实践中具有广泛的适用性;例如,在有限维的协整向量时间序列的情况下,在包含有限数量的非平稳因素的高维因子模型中,在协整曲线值(或函数值)时间序列的情况下,以及非平稳的动态功能因子模型。我们包括两个实证说明利率和劳动力市场指数的期限结构分别。
{"title":"Inference on common trends in functional time series","authors":"Morten Ørregaard Nielsen, Won-Ki Seo, Dakyung Seong","doi":"arxiv-2312.00590","DOIUrl":"https://doi.org/arxiv-2312.00590","url":null,"abstract":"This paper studies statistical inference on unit roots and cointegration for\u0000time series in a Hilbert space. We develop statistical inference on the number\u0000of common stochastic trends that are embedded in the time series, i.e., the\u0000dimension of the nonstationary subspace. We also consider hypotheses on the\u0000nonstationary subspace itself. The Hilbert space can be of an arbitrarily large\u0000dimension, and our methods remain asymptotically valid even when the time\u0000series of interest takes values in a subspace of possibly unknown dimension.\u0000This has wide applicability in practice; for example, in the case of\u0000cointegrated vector time series of finite dimension, in a high-dimensional\u0000factor model that includes a finite number of nonstationary factors, in the\u0000case of cointegrated curve-valued (or function-valued) time series, and\u0000nonstationary dynamic functional factor models. We include two empirical\u0000illustrations to the term structure of interest rates and labor market indices,\u0000respectively.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple Testing of Linear Forms for Noisy Matrix Completion 噪声矩阵补全线性形式的多重检验
Pub Date : 2023-12-01 DOI: arxiv-2312.00305
Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan
Many important tasks of large-scale recommender systems can be naturally castas testing multiple linear forms for noisy matrix completion. These problems,however, present unique challenges because of the subtle bias-and-variancetradeoff of and an intricate dependence among the estimated entries induced bythe low-rank structure. In this paper, we develop a general approach toovercome these difficulties by introducing new statistics for individual testswith sharp asymptotics both marginally and jointly, and utilizing them tocontrol the false discovery rate (FDR) via a data splitting and symmetricaggregation scheme. We show that valid FDR control can be achieved withguaranteed power under nearly optimal sample size requirements using theproposed methodology. Extensive numerical simulations and real data examplesare also presented to further illustrate its practical merits.
大规模推荐系统的许多重要任务都可以自然地转换为测试多线性形式的噪声矩阵补全。然而,这些问题呈现出独特的挑战,因为由低秩结构引起的估计条目之间存在微妙的偏差和方差权衡和复杂的依赖关系。在本文中,我们开发了一种克服这些困难的一般方法,通过引入具有边缘和联合尖锐渐近的单个检验的新统计量,并利用它们通过数据分割和对称聚合方案来控制错误发现率(FDR)。我们表明,使用所提出的方法,在几乎最优样本量要求下,可以在保证功率的情况下实现有效的FDR控制。通过大量的数值模拟和实际数据实例进一步说明了该方法的实用价值。
{"title":"Multiple Testing of Linear Forms for Noisy Matrix Completion","authors":"Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan","doi":"arxiv-2312.00305","DOIUrl":"https://doi.org/arxiv-2312.00305","url":null,"abstract":"Many important tasks of large-scale recommender systems can be naturally cast\u0000as testing multiple linear forms for noisy matrix completion. These problems,\u0000however, present unique challenges because of the subtle bias-and-variance\u0000tradeoff of and an intricate dependence among the estimated entries induced by\u0000the low-rank structure. In this paper, we develop a general approach to\u0000overcome these difficulties by introducing new statistics for individual tests\u0000with sharp asymptotics both marginally and jointly, and utilizing them to\u0000control the false discovery rate (FDR) via a data splitting and symmetric\u0000aggregation scheme. We show that valid FDR control can be achieved with\u0000guaranteed power under nearly optimal sample size requirements using the\u0000proposed methodology. Extensive numerical simulations and real data examples\u0000are also presented to further illustrate its practical merits.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"88 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the two-sample location shift model with a log-concavity assumption 基于对数凹性假设的双样本位置移位模型
Pub Date : 2023-11-30 DOI: arxiv-2311.18277
Ridhiman Saha, Priyam Das, Nilanjana Laha
In this paper, we consider the two-sample location shift model, a classicsemiparametric model introduced by Stein (1956). This model is known for itsadaptive nature, enabling nonparametric estimation with full parametricefficiency. Existing nonparametric estimators of the location shift oftendepend on external tuning parameters, which restricts their practicalapplicability (Van der Vaart and Wellner, 2021). We demonstrate thatintroducing an additional assumption of log-concavity on the underlying densitycan alleviate the need for tuning parameters. We propose a one step estimatorfor location shift estimation, utilizing log-concave density estimationtechniques to facilitate tuning-free estimation of the efficient influencefunction. While we employ a truncated version of the one step estimator fortheoretical adaptivity, our simulations indicate that the one step estimatorsperform best with zero truncation, eliminating the need for tuning duringpractical implementation.
本文考虑Stein(1956)提出的经典半参数模型——双样本位置移位模型。该模型以其自适应特性而闻名,使非参数估计具有充分的参数效率。现有的位置移位的非参数估计器通常依赖于外部调谐参数,这限制了它们的实际适用性(Van der Vaart和Wellner, 2021)。我们证明了在底层密度上引入一个额外的对数凹性假设可以减轻对参数调优的需要。我们提出了一个用于位置移位估计的一步估计器,利用对数凹密度估计技术来促进有效影响函数的无调谐估计。虽然我们采用截断版本的一步估计器进行理论自适应,但我们的模拟表明,一步估计器在零截断时表现最佳,从而消除了在实际实现期间调整的需要。
{"title":"Revisiting the two-sample location shift model with a log-concavity assumption","authors":"Ridhiman Saha, Priyam Das, Nilanjana Laha","doi":"arxiv-2311.18277","DOIUrl":"https://doi.org/arxiv-2311.18277","url":null,"abstract":"In this paper, we consider the two-sample location shift model, a classic\u0000semiparametric model introduced by Stein (1956). This model is known for its\u0000adaptive nature, enabling nonparametric estimation with full parametric\u0000efficiency. Existing nonparametric estimators of the location shift often\u0000depend on external tuning parameters, which restricts their practical\u0000applicability (Van der Vaart and Wellner, 2021). We demonstrate that\u0000introducing an additional assumption of log-concavity on the underlying density\u0000can alleviate the need for tuning parameters. We propose a one step estimator\u0000for location shift estimation, utilizing log-concave density estimation\u0000techniques to facilitate tuning-free estimation of the efficient influence\u0000function. While we employ a truncated version of the one step estimator for\u0000theoretical adaptivity, our simulations indicate that the one step estimators\u0000perform best with zero truncation, eliminating the need for tuning during\u0000practical implementation.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"85 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic Efficiency for Fractional Brownian Motion with general noise 一般噪声下分数阶布朗运动的渐近效率
Pub Date : 2023-11-30 DOI: arxiv-2311.18669
Grégoire Szymanski, Tetsuya Takabatake
We investigate the Local Asymptotic Property for fractional Brownian modelsbased on discrete observations contaminated by a Gaussian moving averageprocess. We consider both situations of low and high-frequency observations ina unified setup and we show that the convergence rate $n^{1/2} (nu_nDelta_n^{-H})^{-1/(2H+2K+1)}$ is optimal for estimating the Hurst index $H$,where $nu_n$ is the noise intensity, $Delta_n$ is the sampling frequency and$K$ is the moving average order. We also derive asymptotically efficientvariances and we build an estimator achieving this convergence rate andvariance. This theoretical analysis is backed up by a comprehensive numericalanalysis of the estimation procedure that illustrates in particular itseffectiveness for finite samples.
我们研究了基于高斯移动平均过程污染的离散观测的分数阶布朗模型的局部渐近性质。我们在统一设置中考虑了低频率和高频观测的两种情况,并表明收敛速率$n^{1/2} (nu_nDelta_n^{-H})^{-1/(2H+2K+1)}$对于估计Hurst指数$H$是最优的,其中$nu_n$是噪声强度,$Delta_n$是采样频率,$K$是移动平均阶数。我们还推导了渐近有效方差,并建立了一个估计器来实现这种收敛速率和方差。这一理论分析得到了对估计过程的全面数值分析的支持,该分析特别说明了它对有限样本的有效性。
{"title":"Asymptotic Efficiency for Fractional Brownian Motion with general noise","authors":"Grégoire Szymanski, Tetsuya Takabatake","doi":"arxiv-2311.18669","DOIUrl":"https://doi.org/arxiv-2311.18669","url":null,"abstract":"We investigate the Local Asymptotic Property for fractional Brownian models\u0000based on discrete observations contaminated by a Gaussian moving average\u0000process. We consider both situations of low and high-frequency observations in\u0000a unified setup and we show that the convergence rate $n^{1/2} (nu_n\u0000Delta_n^{-H})^{-1/(2H+2K+1)}$ is optimal for estimating the Hurst index $H$,\u0000where $nu_n$ is the noise intensity, $Delta_n$ is the sampling frequency and\u0000$K$ is the moving average order. We also derive asymptotically efficient\u0000variances and we build an estimator achieving this convergence rate and\u0000variance. This theoretical analysis is backed up by a comprehensive numerical\u0000analysis of the estimation procedure that illustrates in particular its\u0000effectiveness for finite samples.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - MATH - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1