Robust estimation has played an important role in statistical and machine learning. However, its applications to functional linear regression are still under-developed. In this paper, we focus on Huber's loss with a diverging robustness parameter which was previously used in parametric models. Compared to other robust methods such as median regression, the distinction is that the proposed method aims to estimate the conditional mean robustly, instead of estimating the conditional median. We only require $(1+kappa)$-th moment assumption ($kappa>0$) on the noise distribution, and the established error bounds match the optimal rate in the least-squares case as soon as $kappage 1$. We establish convergence rate in probability when the functional predictor has a finite 4-th moment, and finite-sample bound with exponential tail when the functional predictor is Gaussian, in terms of both prediction error and $L^2$ error. The results also extend to the case of functional estimation in a reproducing kernel Hilbert space (RKHS).
稳健估计在统计和机器学习中发挥了重要作用。然而,其在函数线性回归中的应用仍未得到充分发展。在本文中,我们将重点放在带有发散稳健性参数的 Huber 损失上,该参数以前曾用于参数模型。与其他稳健方法(如中值回归)相比,本文的区别在于,本文提出的方法旨在稳健地估计条件均值,而不是估计条件中值。我们只需要噪声分布上的 $(1+kappa)$-th moment 假设($kappa>0$),只要$kappage1$,所建立的误差边界就与最小二乘情况下的最优率相匹配。当函数预测器具有有限 4th 矩时,我们建立了概率收敛率;当函数预测器为高斯时,我们建立了具有指数尾部的有限样本约束。这些结果还扩展到了在产生核希尔伯特空间(RKHS)中进行函数估计的情况。
{"title":"Functional Adaptive Huber Linear Regression","authors":"Ling Peng, Xiaohui Liu, Heng Lian","doi":"arxiv-2409.11053","DOIUrl":"https://doi.org/arxiv-2409.11053","url":null,"abstract":"Robust estimation has played an important role in statistical and machine\u0000learning. However, its applications to functional linear regression are still\u0000under-developed. In this paper, we focus on Huber's loss with a diverging\u0000robustness parameter which was previously used in parametric models. Compared\u0000to other robust methods such as median regression, the distinction is that the\u0000proposed method aims to estimate the conditional mean robustly, instead of\u0000estimating the conditional median. We only require $(1+kappa)$-th moment\u0000assumption ($kappa>0$) on the noise distribution, and the established error\u0000bounds match the optimal rate in the least-squares case as soon as $kappage\u00001$. We establish convergence rate in probability when the functional predictor\u0000has a finite 4-th moment, and finite-sample bound with exponential tail when\u0000the functional predictor is Gaussian, in terms of both prediction error and\u0000$L^2$ error. The results also extend to the case of functional estimation in a\u0000reproducing kernel Hilbert space (RKHS).","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Anagnostopoulou-Merkouri, R. A. Bailey, Peter J. Cameron
Let $G$ be a transitive permutation group on $Omega$. The $G$-invariant partitions form a sublattice of the lattice of all partitions of $Omega$, having the further property that all its elements are uniform (that is, have all parts of the same size). If, in addition, all the equivalence relations defining the partitions commute, then the relations form an emph{orthogonal block structure}, a concept from statistics; in this case the lattice is modular. If it is distributive, then we have a emph{poset block structure}, whose automorphism group is a emph{generalised wreath product}. We examine permutation groups with these properties, which we call the emph{OB property} and emph{PB property} respectively, and in particular investigate when direct and wreath products of groups with these properties also have these properties. A famous theorem on permutation groups asserts that a transitive imprimitive group $G$ is embeddable in the wreath product of two factors obtained from the group (the group induced on a block by its setwise stabiliser, and the group induced on the set of blocks by~$G$). We extend this theorem to groups with the PB property, embeddng them into generalised wreath products. We show that the map from posets to generalised wreath products preserves intersections and inclusions. We have included background and historical material on these concepts.
{"title":"Permutation groups, partition lattices and block structures","authors":"Marina Anagnostopoulou-Merkouri, R. A. Bailey, Peter J. Cameron","doi":"arxiv-2409.10461","DOIUrl":"https://doi.org/arxiv-2409.10461","url":null,"abstract":"Let $G$ be a transitive permutation group on $Omega$. The $G$-invariant\u0000partitions form a sublattice of the lattice of all partitions of $Omega$,\u0000having the further property that all its elements are uniform (that is, have\u0000all parts of the same size). If, in addition, all the equivalence relations\u0000defining the partitions commute, then the relations form an emph{orthogonal\u0000block structure}, a concept from statistics; in this case the lattice is\u0000modular. If it is distributive, then we have a emph{poset block structure},\u0000whose automorphism group is a emph{generalised wreath product}. We examine\u0000permutation groups with these properties, which we call the emph{OB property}\u0000and emph{PB property} respectively, and in particular investigate when direct\u0000and wreath products of groups with these properties also have these properties. A famous theorem on permutation groups asserts that a transitive imprimitive\u0000group $G$ is embeddable in the wreath product of two factors obtained from the\u0000group (the group induced on a block by its setwise stabiliser, and the group\u0000induced on the set of blocks by~$G$). We extend this theorem to groups with the\u0000PB property, embeddng them into generalised wreath products. We show that the\u0000map from posets to generalised wreath products preserves intersections and\u0000inclusions. We have included background and historical material on these concepts.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantitative measurement of ageing across systems and components is crucial for accurately assessing reliability and predicting failure probabilities. This measurement supports effective maintenance scheduling, performance optimisation, and cost management. Examining the ageing characteristics of a system that operates beyond a specified time $t > 0$ yields valuable insights. This paper introduces a novel metric for ageing, termed the Variance Residual Life Ageing Intensity (VRLAI) function, and explores its properties across various probability distributions. Additionally, we characterise the closure properties of the two ageing classes defined by the VRLAI function. We propose a new ordering, called the Variance Residual Life Ageing Intensity (VRLAI) ordering, and discuss its various properties. Furthermore, we examine the closure of the VRLAI order under coherent systems.
{"title":"Variance Residual Life Ageing Intensity Function","authors":"Ashutosh Singh","doi":"arxiv-2409.10591","DOIUrl":"https://doi.org/arxiv-2409.10591","url":null,"abstract":"Quantitative measurement of ageing across systems and components is crucial\u0000for accurately assessing reliability and predicting failure probabilities. This\u0000measurement supports effective maintenance scheduling, performance\u0000optimisation, and cost management. Examining the ageing characteristics of a\u0000system that operates beyond a specified time $t > 0$ yields valuable insights.\u0000This paper introduces a novel metric for ageing, termed the Variance Residual\u0000Life Ageing Intensity (VRLAI) function, and explores its properties across\u0000various probability distributions. Additionally, we characterise the closure\u0000properties of the two ageing classes defined by the VRLAI function. We propose\u0000a new ordering, called the Variance Residual Life Ageing Intensity (VRLAI)\u0000ordering, and discuss its various properties. Furthermore, we examine the\u0000closure of the VRLAI order under coherent systems.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates, in heterogeneous populations. The model has recently attracted attention in the AI literature, under the name softmax mixtures, where it is routinely used in the final layer of a neural network to map a large number $p$ of vectors in $mathbb{R}^L$ to a probability vector. Despite its wide applicability and empirical success, statistically optimal estimators of the mixture parameters, obtained via algorithms whose running time scales polynomially in $L$, are not known. This paper provides a solution to this problem for contemporary applications, such as large language models, in which the mixture has a large number $p$ of support points, and the size $N$ of the sample observed from the mixture is also large. Our proposed estimator combines two classical estimators, obtained respectively via a method of moments (MoM) and the expectation-minimization (EM) algorithm. Although both estimator types have been studied, from a theoretical perspective, for Gaussian mixtures, no similar results exist for softmax mixtures for either procedure. We develop a new MoM parameter estimator based on latent moment estimation that is tailored to our model, and provide the first theoretical analysis for a MoM-based procedure in softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit poor numerical performance, as observed other mixture models. Nevertheless, as MoM is provably in a neighborhood of the target, it can be used as warm start for any iterative algorithm. We study in detail the EM algorithm, and provide its first theoretical analysis for softmax mixtures. Our final proposal for parameter estimation is the EM algorithm with a MoM warm start.
{"title":"Learning large softmax mixtures with warm start EM","authors":"Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp","doi":"arxiv-2409.09903","DOIUrl":"https://doi.org/arxiv-2409.09903","url":null,"abstract":"Mixed multinomial logits are discrete mixtures introduced several decades ago\u0000to model the probability of choosing an attribute from $p$ possible candidates,\u0000in heterogeneous populations. The model has recently attracted attention in the\u0000AI literature, under the name softmax mixtures, where it is routinely used in\u0000the final layer of a neural network to map a large number $p$ of vectors in\u0000$mathbb{R}^L$ to a probability vector. Despite its wide applicability and\u0000empirical success, statistically optimal estimators of the mixture parameters,\u0000obtained via algorithms whose running time scales polynomially in $L$, are not\u0000known. This paper provides a solution to this problem for contemporary\u0000applications, such as large language models, in which the mixture has a large\u0000number $p$ of support points, and the size $N$ of the sample observed from the\u0000mixture is also large. Our proposed estimator combines two classical\u0000estimators, obtained respectively via a method of moments (MoM) and the\u0000expectation-minimization (EM) algorithm. Although both estimator types have\u0000been studied, from a theoretical perspective, for Gaussian mixtures, no similar\u0000results exist for softmax mixtures for either procedure. We develop a new MoM\u0000parameter estimator based on latent moment estimation that is tailored to our\u0000model, and provide the first theoretical analysis for a MoM-based procedure in\u0000softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit\u0000poor numerical performance, as observed other mixture models. Nevertheless, as\u0000MoM is provably in a neighborhood of the target, it can be used as warm start\u0000for any iterative algorithm. We study in detail the EM algorithm, and provide\u0000its first theoretical analysis for softmax mixtures. Our final proposal for\u0000parameter estimation is the EM algorithm with a MoM warm start.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gennaro Auricchio, Paolo Giudici, Giuseppe Toscani
Measuring the degree of inequality expressed by a multivariate statistical distribution is a challenging problem, which appears in many fields of science and engineering. In this paper, we propose to extend the well known univariate Gini coefficient to multivariate distributions, by maintaining most of its properties. Our extension is based on the application of whitening processes that possess the property of scale stability.
{"title":"Extending the Gini Index to Higher Dimensions via Whitening Processes","authors":"Gennaro Auricchio, Paolo Giudici, Giuseppe Toscani","doi":"arxiv-2409.10119","DOIUrl":"https://doi.org/arxiv-2409.10119","url":null,"abstract":"Measuring the degree of inequality expressed by a multivariate statistical\u0000distribution is a challenging problem, which appears in many fields of science\u0000and engineering. In this paper, we propose to extend the well known univariate\u0000Gini coefficient to multivariate distributions, by maintaining most of its\u0000properties. Our extension is based on the application of whitening processes\u0000that possess the property of scale stability.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"104 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data dispersed across multiple files are commonly integrated through probabilistic linkage methods, where even minimal error rates in record matching can significantly contaminate subsequent statistical analyses. In regression problems, we examine scenarios where the identifiers of predictors or responses are subject to an unknown permutation, challenging the assumption of correspondence. Many emerging approaches in the literature focus on sparsely permuted data, where only a small subset of pairs ($k << n$) are affected by the permutation, treating these permuted entries as outliers to restore original correspondence and obtain consistent estimates of regression parameters. In this article, we complement the existing literature by introducing a novel generalized robust Bayesian formulation of the problem. We develop an efficient posterior sampling scheme by adapting the fractional posterior framework and addressing key computational bottlenecks via careful use of discrete optimal transport and sampling in the space of binary matrices with fixed margins. Further, we establish new posterior contraction results within this framework, providing theoretical guarantees for our approach. The utility of the proposed framework is demonstrated via extensive numerical experiments.
{"title":"Learning with Sparsely Permuted Data: A Robust Bayesian Approach","authors":"Abhisek Chakraborty, Saptati Datta","doi":"arxiv-2409.10678","DOIUrl":"https://doi.org/arxiv-2409.10678","url":null,"abstract":"Data dispersed across multiple files are commonly integrated through\u0000probabilistic linkage methods, where even minimal error rates in record\u0000matching can significantly contaminate subsequent statistical analyses. In\u0000regression problems, we examine scenarios where the identifiers of predictors\u0000or responses are subject to an unknown permutation, challenging the assumption\u0000of correspondence. Many emerging approaches in the literature focus on sparsely\u0000permuted data, where only a small subset of pairs ($k << n$) are affected by\u0000the permutation, treating these permuted entries as outliers to restore\u0000original correspondence and obtain consistent estimates of regression\u0000parameters. In this article, we complement the existing literature by\u0000introducing a novel generalized robust Bayesian formulation of the problem. We\u0000develop an efficient posterior sampling scheme by adapting the fractional\u0000posterior framework and addressing key computational bottlenecks via careful\u0000use of discrete optimal transport and sampling in the space of binary matrices\u0000with fixed margins. Further, we establish new posterior contraction results\u0000within this framework, providing theoretical guarantees for our approach. The\u0000utility of the proposed framework is demonstrated via extensive numerical\u0000experiments.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we consider the complete independence test of high-dimensional data. Based on Chatterjee coefficient, we pioneer the development of quadratic test and extreme value test which possess good testing performance for oscillatory data, and establish the corresponding large sample properties under both null hypotheses and alternative hypotheses. In order to overcome the shortcomings of quadratic statistic and extreme value statistic, we propose a testing method termed as power enhancement test by adding a screening statistic to the quadratic statistic. The proposed method do not reduce the testing power under dense alternative hypotheses, but can enhance the power significantly under sparse alternative hypotheses. Three synthetic data examples and two real data examples are further used to illustrate the performance of our proposed methods.
{"title":"Consistent complete independence test in high dimensions based on Chatterjee correlation coefficient","authors":"Liqi Xia, Ruiyuan Cao, Jiang Du, Jun Dai","doi":"arxiv-2409.10315","DOIUrl":"https://doi.org/arxiv-2409.10315","url":null,"abstract":"In this article, we consider the complete independence test of\u0000high-dimensional data. Based on Chatterjee coefficient, we pioneer the\u0000development of quadratic test and extreme value test which possess good testing\u0000performance for oscillatory data, and establish the corresponding large sample\u0000properties under both null hypotheses and alternative hypotheses. In order to\u0000overcome the shortcomings of quadratic statistic and extreme value statistic,\u0000we propose a testing method termed as power enhancement test by adding a\u0000screening statistic to the quadratic statistic. The proposed method do not\u0000reduce the testing power under dense alternative hypotheses, but can enhance\u0000the power significantly under sparse alternative hypotheses. Three synthetic\u0000data examples and two real data examples are further used to illustrate the\u0000performance of our proposed methods.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fueled by the ever-increasing need for statistics that guarantee the privacy of their training sets, this article studies the centrally-private estimation of Sobolev-smooth densities of probability over the hypercube in dimension d. The contributions of this article are two-fold : Firstly, it generalizes the one dimensional results of (Lalanne et al., 2023) to non-integer levels of smoothness and to a high-dimensional setting, which is important for two reasons : it is more suited for modern learning tasks, and it allows understanding the relations between privacy, dimensionality and smoothness, which is a central question with differential privacy. Secondly, this article presents a private strategy of estimation that is data-driven (usually referred to as adaptive in Statistics) in order to privately choose an estimator that achieves a good bias-variance trade-off among a finite family of private projection estimators without prior knowledge of the ground-truth smoothness $beta$. This is achieved by adapting the Lepskii method for private selection, by adding a new penalization term that makes the estimation privacy-aware.
随着人们对保证训练集隐私的统计的需求日益增长,本文研究了在维数为 d 的超立方体上对 Sobolev 平滑概率密度的集中隐私估计、首先,本文将(Lalanne 等人,2023 年)的一维结果推广到非整数平滑度水平和高维环境,这有两个重要原因:一是它更适合现代学习任务,二是它允许理解隐私、维度和平滑度之间的关系,而这是微分隐私的核心问题。其次,本文介绍了一种由数据驱动的私人估计策略(通常在统计学中称为自适应策略),以便在事先不知道地面真实平滑度$beta$的情况下,在有限的私人投影估计器家族中私下选择一个能实现良好偏差-方差权衡的估计器。这是通过调整用于私人选择的 Lepskii 方法来实现的,方法是添加一个新的惩罚项,使估计具有隐私意识。
{"title":"Privately Learning Smooth Distributions on the Hypercube by Projections","authors":"Clément LalanneTSE-R, Sébastien GadatTSE-R, IUF","doi":"arxiv-2409.10083","DOIUrl":"https://doi.org/arxiv-2409.10083","url":null,"abstract":"Fueled by the ever-increasing need for statistics that guarantee the privacy\u0000of their training sets, this article studies the centrally-private estimation\u0000of Sobolev-smooth densities of probability over the hypercube in dimension d.\u0000The contributions of this article are two-fold : Firstly, it generalizes the\u0000one dimensional results of (Lalanne et al., 2023) to non-integer levels of\u0000smoothness and to a high-dimensional setting, which is important for two\u0000reasons : it is more suited for modern learning tasks, and it allows\u0000understanding the relations between privacy, dimensionality and smoothness,\u0000which is a central question with differential privacy. Secondly, this article\u0000presents a private strategy of estimation that is data-driven (usually referred\u0000to as adaptive in Statistics) in order to privately choose an estimator that\u0000achieves a good bias-variance trade-off among a finite family of private\u0000projection estimators without prior knowledge of the ground-truth smoothness\u0000$beta$. This is achieved by adapting the Lepskii method for private selection,\u0000by adding a new penalization term that makes the estimation privacy-aware.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashutosh Singh, Ishapathik Das, Asok Kumar Nanda, Sumen Sen
The ageing intensity function is a powerful analytical tool that provides valuable insights into the ageing process across diverse domains such as reliability engineering, actuarial science, and healthcare. Its applications continue to expand as researchers delve deeper into understanding the complex dynamics of ageing and its implications for society. One common approach to defining the ageing intensity function is through the hazard rate or failure rate function, extensively explored in scholarly literature. Equally significant to the hazard rate function is the mean residual life function, which plays a crucial role in analyzing the ageing patterns exhibited by units or components. This article introduces the mean residual life ageing intensity (MRLAI) function to delve into component ageing behaviours across various distributions. Additionally, we scrutinize the closure properties of the MRLAI function across different reliability operations. Furthermore, a new order termed the mean residual life ageing intensity order is defined to analyze the ageing behaviour of a system, and the closure property of this order under various reliability operations is discussed.
{"title":"Mean Residual Life Ageing Intensity Function","authors":"Ashutosh Singh, Ishapathik Das, Asok Kumar Nanda, Sumen Sen","doi":"arxiv-2409.10456","DOIUrl":"https://doi.org/arxiv-2409.10456","url":null,"abstract":"The ageing intensity function is a powerful analytical tool that provides\u0000valuable insights into the ageing process across diverse domains such as\u0000reliability engineering, actuarial science, and healthcare. Its applications\u0000continue to expand as researchers delve deeper into understanding the complex\u0000dynamics of ageing and its implications for society. One common approach to\u0000defining the ageing intensity function is through the hazard rate or failure\u0000rate function, extensively explored in scholarly literature. Equally\u0000significant to the hazard rate function is the mean residual life function,\u0000which plays a crucial role in analyzing the ageing patterns exhibited by units\u0000or components. This article introduces the mean residual life ageing intensity\u0000(MRLAI) function to delve into component ageing behaviours across various\u0000distributions. Additionally, we scrutinize the closure properties of the MRLAI\u0000function across different reliability operations. Furthermore, a new order\u0000termed the mean residual life ageing intensity order is defined to analyze the\u0000ageing behaviour of a system, and the closure property of this order under\u0000various reliability operations is discussed.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"209 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington
We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution. While this theory proves effective in many significant contexts, it falls short in certain common data fusion problems, such as two-sample instrumental variable analysis, settings that integrate data from epidemiological studies with diverse designs (e.g., prospective cohorts and retrospective case-control studies), and studies with variables prone to measurement error that are supplemented by validation studies. In this paper, we extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution. Assuming conditional and marginal distribution alignments, we provide universal results that characterize the class of all influence functions of regular asymptotically linear estimators and the efficient influence function of any pathwise differentiable parameter, irrespective of the number of data sources, the specific parameter of interest, or the statistical model for the target distribution. This theory paves the way for machine-learning debiased, semiparametric efficient estimation.
{"title":"Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data","authors":"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington","doi":"arxiv-2409.09973","DOIUrl":"https://doi.org/arxiv-2409.09973","url":null,"abstract":"We address the goal of conducting inference about a smooth finite-dimensional\u0000parameter by utilizing individual-level data from various independent sources.\u0000Recent advancements have led to the development of a comprehensive theory\u0000capable of handling scenarios where different data sources align with, possibly\u0000distinct subsets of, conditional distributions of a single factorization of the\u0000joint target distribution. While this theory proves effective in many\u0000significant contexts, it falls short in certain common data fusion problems,\u0000such as two-sample instrumental variable analysis, settings that integrate data\u0000from epidemiological studies with diverse designs (e.g., prospective cohorts\u0000and retrospective case-control studies), and studies with variables prone to\u0000measurement error that are supplemented by validation studies. In this paper,\u0000we extend the aforementioned comprehensive theory to allow for the fusion of\u0000individual-level data from sources aligned with conditional distributions that\u0000do not correspond to a single factorization of the target distribution.\u0000Assuming conditional and marginal distribution alignments, we provide universal\u0000results that characterize the class of all influence functions of regular\u0000asymptotically linear estimators and the efficient influence function of any\u0000pathwise differentiable parameter, irrespective of the number of data sources,\u0000the specific parameter of interest, or the statistical model for the target\u0000distribution. This theory paves the way for machine-learning debiased,\u0000semiparametric efficient estimation.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}