首页 > 最新文献

arXiv - STAT - Methodology最新文献

英文 中文
Robust Elicitable Functionals 稳健的可激发函数
Pub Date : 2024-09-06 DOI: arxiv-2409.04412
Kathleen E. Miao, Silvana M. Pesenti
Elicitable functionals and (strict) consistent scoring functions are ofinterest due to their utility of determining (uniquely) optimal forecasts, andthus the ability to effectively backtest predictions. However, in practice,assuming that a distribution is correctly specified is too strong a belief toreliably hold. To remediate this, we incorporate a notion of statisticalrobustness into the framework of elicitable functionals, meaning that ourrobust functional accounts for "small" misspecifications of a baselinedistribution. Specifically, we propose a robustified version of elicitablefunctionals by using the Kullback-Leibler divergence to quantify potentialmisspecifications from a baseline distribution. We show that the robustelicitable functionals admit unique solutions lying at the boundary of theuncertainty region. Since every elicitable functional possesses infinitely manyscoring functions, we propose the class of b-homogeneous strictly consistentscoring functions, for which the robust functionals maintain desirablestatistical properties. We show the applicability of the REF in two examples:in the reinsurance setting and in robust regression problems.
可激发函数和(严格的)一致性评分函数因其确定(唯一的)最优预测的效用而备受关注,并因此能够有效地回测预测。然而,在实践中,假设一个分布是正确指定的信念过于强烈,难以可靠地坚持。为了解决这个问题,我们将统计稳健性的概念纳入了可激发函数的框架中,这意味着我们的稳健函数可以解释对基底分布的 "微小 "误指定。具体来说,我们使用库尔巴克-莱伯勒发散(Kullback-Leibler divergence)来量化基线分布的潜在误差,从而提出了可激发函数的稳健版本。我们证明,稳健可激发函数在不确定性区域的边界上有唯一的解。由于每个可激发函数都有无限多的评分函数,我们提出了一类 b-同质严格一致性评分函数,对于这类函数,鲁棒函数保持了理想的统计特性。我们在两个例子中展示了 REF 的适用性:再保险环境和稳健回归问题。
{"title":"Robust Elicitable Functionals","authors":"Kathleen E. Miao, Silvana M. Pesenti","doi":"arxiv-2409.04412","DOIUrl":"https://doi.org/arxiv-2409.04412","url":null,"abstract":"Elicitable functionals and (strict) consistent scoring functions are of\u0000interest due to their utility of determining (uniquely) optimal forecasts, and\u0000thus the ability to effectively backtest predictions. However, in practice,\u0000assuming that a distribution is correctly specified is too strong a belief to\u0000reliably hold. To remediate this, we incorporate a notion of statistical\u0000robustness into the framework of elicitable functionals, meaning that our\u0000robust functional accounts for \"small\" misspecifications of a baseline\u0000distribution. Specifically, we propose a robustified version of elicitable\u0000functionals by using the Kullback-Leibler divergence to quantify potential\u0000misspecifications from a baseline distribution. We show that the robust\u0000elicitable functionals admit unique solutions lying at the boundary of the\u0000uncertainty region. Since every elicitable functional possesses infinitely many\u0000scoring functions, we propose the class of b-homogeneous strictly consistent\u0000scoring functions, for which the robust functionals maintain desirable\u0000statistical properties. We show the applicability of the REF in two examples:\u0000in the reinsurance setting and in robust regression problems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Machine Learning for Official Statistics: A Statistical Manifesto 利用机器学习进行官方统计:统计宣言
Pub Date : 2024-09-06 DOI: arxiv-2409.04365
Marco Puts, David Salgado, Piet Daas
It is important for official statistics production to apply ML withstatistical rigor, as it presents both opportunities and challenges. Althoughmachine learning has enjoyed rapid technological advances in recent years, itsapplication does not possess the methodological robustness necessary to producehigh quality statistical results. In order to account for all sources of errorin machine learning models, the Total Machine Learning Error (TMLE) ispresented as a framework analogous to the Total Survey Error Model used insurvey methodology. As a means of ensuring that ML models are both internallyvalid as well as externally valid, the TMLE model addresses issues such asrepresentativeness and measurement errors. There are several case studiespresented, illustrating the importance of applying more rigor to theapplication of machine learning in official statistics.
对于官方统计数据的编制而言,以严谨的统计方法应用 ML 非常重要,因为它既带来了机遇,也带来了挑战。尽管近年来机器学习在技术上取得了突飞猛进的发展,但其应用并不具备产生高质量统计结果所需的方法论稳健性。为了考虑机器学习模型中的所有误差来源,我们提出了机器学习总误差(TMLE)框架,类似于调查方法中使用的总调查误差模型。作为确保机器学习模型内部有效和外部有效的一种手段,TMLE 模型解决了代表性和测量误差等问题。本文介绍了几个案例研究,说明了在官方统计中应用机器学习时更加严格的重要性。
{"title":"Leveraging Machine Learning for Official Statistics: A Statistical Manifesto","authors":"Marco Puts, David Salgado, Piet Daas","doi":"arxiv-2409.04365","DOIUrl":"https://doi.org/arxiv-2409.04365","url":null,"abstract":"It is important for official statistics production to apply ML with\u0000statistical rigor, as it presents both opportunities and challenges. Although\u0000machine learning has enjoyed rapid technological advances in recent years, its\u0000application does not possess the methodological robustness necessary to produce\u0000high quality statistical results. In order to account for all sources of error\u0000in machine learning models, the Total Machine Learning Error (TMLE) is\u0000presented as a framework analogous to the Total Survey Error Model used in\u0000survey methodology. As a means of ensuring that ML models are both internally\u0000valid as well as externally valid, the TMLE model addresses issues such as\u0000representativeness and measurement errors. There are several case studies\u0000presented, illustrating the importance of applying more rigor to the\u0000application of machine learning in official statistics.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling multivariate spatio-temporal data with identifiable variational autoencoders 用可识别变异自动编码器建立多变量时空数据模型
Pub Date : 2024-09-06 DOI: arxiv-2409.04162
Mika Sipilä, Claudia Cappello, Sandra De Iaco, Klaus Nordhausen, Sara Taskinen
Modelling multivariate spatio-temporal data with complex dependencystructures is a challenging task but can be simplified by assuming that theoriginal variables are generated from independent latent components. If thesecomponents are found, they can be modelled univariately. Blind sourceseparation aims to recover the latent components by estimating the unmixingtransformation based on the observed data only. The current methods forspatio-temporal blind source separation are restricted to linear unmixing, andnonlinear variants have not been implemented. In this paper, we extendidentifiable variational autoencoder to the nonlinear nonstationaryspatio-temporal blind source separation setting and demonstrate its performanceusing comprehensive simulation studies. Additionally, we introduce twoalternative methods for the latent dimension estimation, which is a crucialtask in order to obtain the correct latent representation. Finally, weillustrate the proposed methods using a meteorological application, where weestimate the latent dimension and the latent components, interpret thecomponents, and show how nonstationarity can be accounted and predictionaccuracy can be improved by using the proposed nonlinear blind sourceseparation method as a preprocessing method.
对具有复杂依赖性结构的多变量时空数据建模是一项极具挑战性的任务,但可以通过假设原始变量由独立的潜在成分生成来加以简化。如果找到了这些成分,就可以对它们进行单变量建模。盲源分离的目的是通过仅根据观测数据估计非混合变换来恢复潜在成分。目前用于时空盲源分离的方法仅限于线性解混,非线性变体尚未实现。在本文中,我们将可识别变异自动编码器扩展到非线性非稳态时空盲源分离设置中,并通过全面的仿真研究证明了其性能。此外,我们还介绍了潜在维度估计的两种替代方法,这是获得正确潜在表示的关键任务。最后,我们利用气象应用来演示所提出的方法,其中我们估计了潜维度和潜分量,解释了分量,并展示了如何通过使用所提出的非线性盲源分离方法作为预处理方法来考虑非平稳性并提高预测精度。
{"title":"Modelling multivariate spatio-temporal data with identifiable variational autoencoders","authors":"Mika Sipilä, Claudia Cappello, Sandra De Iaco, Klaus Nordhausen, Sara Taskinen","doi":"arxiv-2409.04162","DOIUrl":"https://doi.org/arxiv-2409.04162","url":null,"abstract":"Modelling multivariate spatio-temporal data with complex dependency\u0000structures is a challenging task but can be simplified by assuming that the\u0000original variables are generated from independent latent components. If these\u0000components are found, they can be modelled univariately. Blind source\u0000separation aims to recover the latent components by estimating the unmixing\u0000transformation based on the observed data only. The current methods for\u0000spatio-temporal blind source separation are restricted to linear unmixing, and\u0000nonlinear variants have not been implemented. In this paper, we extend\u0000identifiable variational autoencoder to the nonlinear nonstationary\u0000spatio-temporal blind source separation setting and demonstrate its performance\u0000using comprehensive simulation studies. Additionally, we introduce two\u0000alternative methods for the latent dimension estimation, which is a crucial\u0000task in order to obtain the correct latent representation. Finally, we\u0000illustrate the proposed methods using a meteorological application, where we\u0000estimate the latent dimension and the latent components, interpret the\u0000components, and show how nonstationarity can be accounted and prediction\u0000accuracy can be improved by using the proposed nonlinear blind source\u0000separation method as a preprocessing method.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria 具有隐藏变量的 DAG 中的平均因果效应估计:后门和前门标准的扩展
Pub Date : 2024-09-06 DOI: arxiv-2409.03962
Anna Guo, Razieh Nabi
The identification theory for causal effects in directed acyclic graphs(DAGs) with hidden variables is well-developed, but methods for estimating andinferring functionals beyond the g-formula remain limited. Previous studieshave proposed semiparametric estimators for identifiable functionals in a broadclass of DAGs with hidden variables. While demonstrating double robustness insome models, existing estimators face challenges, particularly with densityestimation and numerical integration for continuous variables, and theirestimates may fall outside the parameter space of the target estimand. Theirasymptotic properties are also underexplored, especially when using flexiblestatistical and machine learning models for nuisance estimation. This studyaddresses these challenges by introducing novel one-step corrected plug-in andtargeted minimum loss-based estimators of causal effects for a class of DAGsthat extend classical back-door and front-door criteria (known as the treatmentprimal fixability criterion in prior literature). These estimators leveragemachine learning to minimize modeling assumptions while ensuring keystatistical properties such as asymptotic linearity, double robustness,efficiency, and staying within the bounds of the target parameter space. Weestablish conditions for nuisance functional estimates in terms of L2(P)-normsto achieve root-n consistent causal effect estimates. To facilitate practicalapplication, we have developed the flexCausal package in R.
在具有隐藏变量的有向无环图(DAG)中,因果效应的识别理论已经发展得很成熟,但除了 g 公式之外,估计和推断函数的方法仍然有限。以前的研究提出了在一类广泛的具有隐藏变量的有向无环图(DAG)中可识别函数的半参数估计方法。现有估计器在某些模型中表现出双重稳健性,但也面临挑战,尤其是连续变量的密度测定和数值积分,而且估计结果可能超出目标估计值的参数空间。它们的渐近特性也未得到充分探索,尤其是在使用灵活的统计和机器学习模型进行滋扰估计时。本研究针对这些挑战,为一类 DAG 引入了新颖的一步校正插件和基于目标最小损失的因果效应估计器,它们扩展了经典的后门和前门标准(在以前的文献中称为处理原始固定性标准)。这些估计器利用机器学习来最小化建模假设,同时确保关键的统计特性,如渐近线性、双重稳健性、效率以及不超出目标参数空间的边界。我们用 L2(P)规范建立了滋扰函数估计的条件,以实现根 n 一致的因果效应估计。为了便于实际应用,我们在 R 中开发了 flexCausal 软件包。
{"title":"Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria","authors":"Anna Guo, Razieh Nabi","doi":"arxiv-2409.03962","DOIUrl":"https://doi.org/arxiv-2409.03962","url":null,"abstract":"The identification theory for causal effects in directed acyclic graphs\u0000(DAGs) with hidden variables is well-developed, but methods for estimating and\u0000inferring functionals beyond the g-formula remain limited. Previous studies\u0000have proposed semiparametric estimators for identifiable functionals in a broad\u0000class of DAGs with hidden variables. While demonstrating double robustness in\u0000some models, existing estimators face challenges, particularly with density\u0000estimation and numerical integration for continuous variables, and their\u0000estimates may fall outside the parameter space of the target estimand. Their\u0000asymptotic properties are also underexplored, especially when using flexible\u0000statistical and machine learning models for nuisance estimation. This study\u0000addresses these challenges by introducing novel one-step corrected plug-in and\u0000targeted minimum loss-based estimators of causal effects for a class of DAGs\u0000that extend classical back-door and front-door criteria (known as the treatment\u0000primal fixability criterion in prior literature). These estimators leverage\u0000machine learning to minimize modeling assumptions while ensuring key\u0000statistical properties such as asymptotic linearity, double robustness,\u0000efficiency, and staying within the bounds of the target parameter space. We\u0000establish conditions for nuisance functional estimates in terms of L2(P)-norms\u0000to achieve root-n consistent causal effect estimates. To facilitate practical\u0000application, we have developed the flexCausal package in R.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating external data for analyzing randomized clinical trials: A transfer learning approach 结合外部数据分析随机临床试验:迁移学习法
Pub Date : 2024-09-06 DOI: arxiv-2409.04126
Yujia Gu, Hanzhong Liu, Wei Ma
Randomized clinical trials are the gold standard for analyzing treatmenteffects, but high costs and ethical concerns can limit recruitment, potentiallyleading to invalid inferences. Incorporating external trial data with similarcharacteristics into the analysis using transfer learning appears promising foraddressing these issues. In this paper, we present a formal framework forapplying transfer learning to the analysis of clinical trials, consideringthree key perspectives: transfer algorithm, theoretical foundation, andinference method. For the algorithm, we adopt a parameter-based transferlearning approach to enhance the lasso-adjusted stratum-specific estimatordeveloped for estimating treatment effects. A key component in constructing thetransfer learning estimator is deriving the regression coefficient estimateswithin each stratum, accounting for the bias between source and target data. Toprovide a theoretical foundation, we derive the $l_1$ convergence rate for theestimated regression coefficients and establish the asymptotic normality of thetransfer learning estimator. Our results show that when external trial dataresembles current trial data, the sample size requirements can be reducedcompared to using only the current trial data. Finally, we propose a consistentnonparametric variance estimator to facilitate inference. Numerical studiesdemonstrate the effectiveness and robustness of our proposed estimator acrossvarious scenarios.
随机临床试验是分析治疗效果的黄金标准,但高昂的成本和伦理问题会限制试验的招募,从而可能导致无效的推论。利用迁移学习将具有相似特征的外部试验数据纳入分析似乎有望解决这些问题。在本文中,我们提出了一个将迁移学习应用于临床试验分析的正式框架,其中考虑了三个关键视角:迁移算法、理论基础和推断方法。在算法方面,我们采用基于参数的迁移学习方法来增强为估计治疗效果而开发的套索调整分层估计法。构建转移学习估计器的一个关键部分是得出每个分层的回归系数估计值,并考虑源数据和目标数据之间的偏差。为了提供理论基础,我们推导出了回归系数估计值的 l_1$ 收敛率,并建立了转移学习估计器的渐近正态性。我们的结果表明,当外部试验数据与当前试验数据相结合时,与只使用当前试验数据相比,可以减少样本量要求。最后,我们提出了一种一致的非参数方差估计器,以方便推理。数值研究证明了我们提出的估计器在各种情况下的有效性和稳健性。
{"title":"Incorporating external data for analyzing randomized clinical trials: A transfer learning approach","authors":"Yujia Gu, Hanzhong Liu, Wei Ma","doi":"arxiv-2409.04126","DOIUrl":"https://doi.org/arxiv-2409.04126","url":null,"abstract":"Randomized clinical trials are the gold standard for analyzing treatment\u0000effects, but high costs and ethical concerns can limit recruitment, potentially\u0000leading to invalid inferences. Incorporating external trial data with similar\u0000characteristics into the analysis using transfer learning appears promising for\u0000addressing these issues. In this paper, we present a formal framework for\u0000applying transfer learning to the analysis of clinical trials, considering\u0000three key perspectives: transfer algorithm, theoretical foundation, and\u0000inference method. For the algorithm, we adopt a parameter-based transfer\u0000learning approach to enhance the lasso-adjusted stratum-specific estimator\u0000developed for estimating treatment effects. A key component in constructing the\u0000transfer learning estimator is deriving the regression coefficient estimates\u0000within each stratum, accounting for the bias between source and target data. To\u0000provide a theoretical foundation, we derive the $l_1$ convergence rate for the\u0000estimated regression coefficients and establish the asymptotic normality of the\u0000transfer learning estimator. Our results show that when external trial data\u0000resembles current trial data, the sample size requirements can be reduced\u0000compared to using only the current trial data. Finally, we propose a consistent\u0000nonparametric variance estimator to facilitate inference. Numerical studies\u0000demonstrate the effectiveness and robustness of our proposed estimator across\u0000various scenarios.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local times of self-intersection and sample path properties of Volterra Gaussian processes Volterra 高斯过程的局部自交时间和样本路径特性
Pub Date : 2024-09-06 DOI: arxiv-2409.04377
Olga Izyumtseva, Wasiur R. KhudaBukhsh
We study a Volterra Gaussian process of the form$X(t)=int^t_0K(t,s)d{W(s)},$ where $W$ is a Wiener process and $K$ is acontinuous kernel. In dimension one, we prove a law of the iterated logarithm,discuss the existence of local times and verify a continuous dependence betweenthe local time and the kernel that generates the process. Furthermore, we provethe existence of the Rosen renormalized self-intersection local times for aplanar Gaussian Volterra process.
我们研究了形式为$X(t)=int^t_0K(t,s)d{W(s)}$的沃尔特拉高斯过程,其中$W$是维纳过程,$K$是连续核。在维度一中,我们证明了迭代对数定律,讨论了局部时间的存在,并验证了局部时间与产生过程的核之间的连续依赖关系。此外,我们还证明了平面高斯 Volterra 过程的罗森归一化自交局部时间的存在性。
{"title":"Local times of self-intersection and sample path properties of Volterra Gaussian processes","authors":"Olga Izyumtseva, Wasiur R. KhudaBukhsh","doi":"arxiv-2409.04377","DOIUrl":"https://doi.org/arxiv-2409.04377","url":null,"abstract":"We study a Volterra Gaussian process of the form\u0000$X(t)=int^t_0K(t,s)d{W(s)},$ where $W$ is a Wiener process and $K$ is a\u0000continuous kernel. In dimension one, we prove a law of the iterated logarithm,\u0000discuss the existence of local times and verify a continuous dependence between\u0000the local time and the kernel that generates the process. Furthermore, we prove\u0000the existence of the Rosen renormalized self-intersection local times for a\u0000planar Gaussian Volterra process.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Over-parameterized regression methods and their application to semi-supervised learning 过参数化回归方法及其在半监督学习中的应用
Pub Date : 2024-09-06 DOI: arxiv-2409.04001
Katsuyuki Hagiwara
The minimum norm least squares is an estimation strategy under anover-parameterized case and, in machine learning, is known as a helpful toolfor understanding a nature of deep learning. In this paper, to apply it in acontext of non-parametric regression problems, we established several methodswhich are based on thresholding of SVD (singular value decomposition)components, wihch are referred to as SVD regression methods. We consideredseveral methods that are singular value based thresholding, hard-thresholdingwith cross validation, universal thresholding and bridge thresholding.Information on output samples is not utilized in the first method while it isutilized in the other methods. We then applied them to semi-supervisedlearning, in which unlabeled input samples are incorporated into kernelfunctions in a regressor. The experimental results for real data showed that,depending on the datasets, the SVD regression methods is superior to a naiveridge regression method. Unfortunately, there were no clear advantage of themethods utilizing information on output samples. Furthermore, for depending ondatasets, incorporation of unlabeled input samples into kernels is found tohave certain advantages.
最小规范最小二乘法是一种超参数情况下的估计策略,在机器学习中被认为是理解深度学习本质的有用工具。在本文中,为了将其应用于非参数回归问题,我们建立了几种基于 SVD(奇异值分解)成分阈值化的方法,这些方法被称为 SVD 回归方法。我们考虑了几种方法,分别是基于奇异值的阈值法、带交叉验证的硬阈值法、通用阈值法和桥阈值法。然后,我们将这些方法应用于半监督学习,在半监督学习中,未标记的输入样本被纳入回归器的核函数中。真实数据的实验结果表明,根据数据集的不同,SVD 回归方法优于 naiveridge 回归方法。遗憾的是,利用输出样本信息的方法没有明显优势。此外,根据数据集的不同,在核中加入未标记的输入样本也具有一定的优势。
{"title":"Over-parameterized regression methods and their application to semi-supervised learning","authors":"Katsuyuki Hagiwara","doi":"arxiv-2409.04001","DOIUrl":"https://doi.org/arxiv-2409.04001","url":null,"abstract":"The minimum norm least squares is an estimation strategy under an\u0000over-parameterized case and, in machine learning, is known as a helpful tool\u0000for understanding a nature of deep learning. In this paper, to apply it in a\u0000context of non-parametric regression problems, we established several methods\u0000which are based on thresholding of SVD (singular value decomposition)\u0000components, wihch are referred to as SVD regression methods. We considered\u0000several methods that are singular value based thresholding, hard-thresholding\u0000with cross validation, universal thresholding and bridge thresholding.\u0000Information on output samples is not utilized in the first method while it is\u0000utilized in the other methods. We then applied them to semi-supervised\u0000learning, in which unlabeled input samples are incorporated into kernel\u0000functions in a regressor. The experimental results for real data showed that,\u0000depending on the datasets, the SVD regression methods is superior to a naive\u0000ridge regression method. Unfortunately, there were no clear advantage of the\u0000methods utilizing information on output samples. Furthermore, for depending on\u0000datasets, incorporation of unlabeled input samples into kernels is found to\u0000have certain advantages.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The $infty$-S test via regression quantile affine LASSO 通过回归量子仿射 LASSO 进行 $infty$-S 检验
Pub Date : 2024-09-06 DOI: arxiv-2409.04256
Sylvain Sardy, Xiaoyu Ma, Hugo Gaible
The nonparametric sign test dates back to the early 18th century with a dataanalysis by John Arbuthnot. It is an alternative to Gosset's more recent$t$-test for consistent differences between two sets of observations. Fisher's$F$-test is a generalization of the $t$-test to linear regression and linearnull hypotheses. Only the sign test is robust to non-Gaussianity. Gutenbrunneret al. [1993] derived a version of the sign test for linear null hypotheses inthe spirit of the F-test, which requires the difficult estimation of thesparsity function. We propose instead a new sign test called $infty$-S testvia the convex analysis of a point estimator that thresholds the estimatetowards the null hypothesis of the test.
非参数符号检验可追溯到 18 世纪初约翰-阿布特诺(John Arbuthnot)的数据分析。它是戈塞特(Gosset)较新的t检验的替代方法,用于检验两组观测值之间的一致性差异。费雪的$F$检验是将$t$检验推广到线性回归和线性零假设。只有符号检验对非高斯性是稳健的。Gutenbrunneret 等人[1993]根据 F 检验的精神,推导出了线性零假设的符号检验版本,该版本需要对稀疏性函数进行困难的估计。我们提出了一种新的符号检验,称为"$infty$-S 检验",它通过对点估计器的凸分析,将估计值阈值指向检验的零假设。
{"title":"The $infty$-S test via regression quantile affine LASSO","authors":"Sylvain Sardy, Xiaoyu Ma, Hugo Gaible","doi":"arxiv-2409.04256","DOIUrl":"https://doi.org/arxiv-2409.04256","url":null,"abstract":"The nonparametric sign test dates back to the early 18th century with a data\u0000analysis by John Arbuthnot. It is an alternative to Gosset's more recent\u0000$t$-test for consistent differences between two sets of observations. Fisher's\u0000$F$-test is a generalization of the $t$-test to linear regression and linear\u0000null hypotheses. Only the sign test is robust to non-Gaussianity. Gutenbrunner\u0000et al. [1993] derived a version of the sign test for linear null hypotheses in\u0000the spirit of the F-test, which requires the difficult estimation of the\u0000sparsity function. We propose instead a new sign test called $infty$-S test\u0000via the convex analysis of a point estimator that thresholds the estimate\u0000towards the null hypothesis of the test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fitting the Discrete Swept Skeletal Representation to Slabular Objects 将离散扫掠骨骼表示法拟合到板状物体上
Pub Date : 2024-09-06 DOI: arxiv-2409.04079
Mohsen Taheri, Stephen M. Pizer, Jörn Schulz
Statistical shape analysis of slabular objects like groups of hippocampi ishighly useful for medical researchers as it can be useful for diagnoses andunderstanding diseases. This work proposes a novel object representation basedon locally parameterized discrete swept skeletal structures. Further, modelfitting and analysis of such representations are discussed. The model fittingprocedure is based on boundary division and surface flattening. The quality ofthe model fitting is evaluated based on the symmetry and tidiness of theskeletal structure as well as the volume of the implied boundary. The power ofthe method is demonstrated by visual inspection and statistical analysis of asynthetic and an actual data set in comparison with an available skeletalrepresentation.
板状物体(如海马群)的统计形状分析对医学研究人员非常有用,因为它有助于诊断和了解疾病。本研究提出了一种基于局部参数化离散扫掠骨骼结构的新型物体表示法。此外,还讨论了这种表示的模型拟合和分析。模型拟合过程基于边界划分和表面平坦化。根据骨骼结构的对称性和整齐度以及隐含边界的体积来评估模型拟合的质量。通过对合成数据集和实际数据集进行目测和统计分析,并与现有的骨骼描述进行比较,证明了该方法的强大功能。
{"title":"Fitting the Discrete Swept Skeletal Representation to Slabular Objects","authors":"Mohsen Taheri, Stephen M. Pizer, Jörn Schulz","doi":"arxiv-2409.04079","DOIUrl":"https://doi.org/arxiv-2409.04079","url":null,"abstract":"Statistical shape analysis of slabular objects like groups of hippocampi is\u0000highly useful for medical researchers as it can be useful for diagnoses and\u0000understanding diseases. This work proposes a novel object representation based\u0000on locally parameterized discrete swept skeletal structures. Further, model\u0000fitting and analysis of such representations are discussed. The model fitting\u0000procedure is based on boundary division and surface flattening. The quality of\u0000the model fitting is evaluated based on the symmetry and tidiness of the\u0000skeletal structure as well as the volume of the implied boundary. The power of\u0000the method is demonstrated by visual inspection and statistical analysis of a\u0000synthetic and an actual data set in comparison with an available skeletal\u0000representation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A tutorial on panel data analysis using partially observed Markov processes via the R package panelPomp 通过 R 软件包 panelPomp 使用部分观测马尔可夫过程进行面板数据分析的教程
Pub Date : 2024-09-05 DOI: arxiv-2409.03876
Carles Breto, Jesse Wheeler, Aaron A. King, Edward L. Ionides
The R package panelPomp supports analysis of panel data via a general classof partially observed Markov process models (PanelPOMP). This package tutorialdescribes how the mathematical concept of a PanelPOMP is represented in thesoftware and demonstrates typical use-cases of panelPomp. Monte Carlo methodsused for POMP models require adaptation for PanelPOMP models due to the higherdimensionality of panel data. The package takes advantage of recent advancesfor PanelPOMP, including an iterated filtering algorithm, Monte Carlo adjustedprofile methodology and block optimization methodology to assist with the largeparameter spaces that can arise with panel models. In addition, tools formanipulation of models and data are provided that take advantage of the panelstructure.
R 软件包 panelPomp 支持通过一般的部分观测马尔可夫过程模型(PanelPOMP)来分析面板数据。本软件包教程介绍了如何在软件中表示 PanelPOMP 的数学概念,并演示了 panelPomp 的典型用例。由于面板数据的高维性,用于 POMP 模型的蒙特卡罗方法需要对 PanelPOMP 模型进行调整。该软件包利用了 PanelPOMP 的最新进展,包括迭代过滤算法、蒙特卡罗调整轮廓方法和块优化方法,以帮助处理面板模型可能出现的大参数空间。此外,还提供了利用面板结构对模型和数据进行处理的工具。
{"title":"A tutorial on panel data analysis using partially observed Markov processes via the R package panelPomp","authors":"Carles Breto, Jesse Wheeler, Aaron A. King, Edward L. Ionides","doi":"arxiv-2409.03876","DOIUrl":"https://doi.org/arxiv-2409.03876","url":null,"abstract":"The R package panelPomp supports analysis of panel data via a general class\u0000of partially observed Markov process models (PanelPOMP). This package tutorial\u0000describes how the mathematical concept of a PanelPOMP is represented in the\u0000software and demonstrates typical use-cases of panelPomp. Monte Carlo methods\u0000used for POMP models require adaptation for PanelPOMP models due to the higher\u0000dimensionality of panel data. The package takes advantage of recent advances\u0000for PanelPOMP, including an iterated filtering algorithm, Monte Carlo adjusted\u0000profile methodology and block optimization methodology to assist with the large\u0000parameter spaces that can arise with panel models. In addition, tools for\u0000manipulation of models and data are provided that take advantage of the panel\u0000structure.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1