Elicitable functionals and (strict) consistent scoring functions are of interest due to their utility of determining (uniquely) optimal forecasts, and thus the ability to effectively backtest predictions. However, in practice, assuming that a distribution is correctly specified is too strong a belief to reliably hold. To remediate this, we incorporate a notion of statistical robustness into the framework of elicitable functionals, meaning that our robust functional accounts for "small" misspecifications of a baseline distribution. Specifically, we propose a robustified version of elicitable functionals by using the Kullback-Leibler divergence to quantify potential misspecifications from a baseline distribution. We show that the robust elicitable functionals admit unique solutions lying at the boundary of the uncertainty region. Since every elicitable functional possesses infinitely many scoring functions, we propose the class of b-homogeneous strictly consistent scoring functions, for which the robust functionals maintain desirable statistical properties. We show the applicability of the REF in two examples: in the reinsurance setting and in robust regression problems.
{"title":"Robust Elicitable Functionals","authors":"Kathleen E. Miao, Silvana M. Pesenti","doi":"arxiv-2409.04412","DOIUrl":"https://doi.org/arxiv-2409.04412","url":null,"abstract":"Elicitable functionals and (strict) consistent scoring functions are of\u0000interest due to their utility of determining (uniquely) optimal forecasts, and\u0000thus the ability to effectively backtest predictions. However, in practice,\u0000assuming that a distribution is correctly specified is too strong a belief to\u0000reliably hold. To remediate this, we incorporate a notion of statistical\u0000robustness into the framework of elicitable functionals, meaning that our\u0000robust functional accounts for \"small\" misspecifications of a baseline\u0000distribution. Specifically, we propose a robustified version of elicitable\u0000functionals by using the Kullback-Leibler divergence to quantify potential\u0000misspecifications from a baseline distribution. We show that the robust\u0000elicitable functionals admit unique solutions lying at the boundary of the\u0000uncertainty region. Since every elicitable functional possesses infinitely many\u0000scoring functions, we propose the class of b-homogeneous strictly consistent\u0000scoring functions, for which the robust functionals maintain desirable\u0000statistical properties. We show the applicability of the REF in two examples:\u0000in the reinsurance setting and in robust regression problems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learning models, the Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology. As a means of ensuring that ML models are both internally valid as well as externally valid, the TMLE model addresses issues such as representativeness and measurement errors. There are several case studies presented, illustrating the importance of applying more rigor to the application of machine learning in official statistics.
对于官方统计数据的编制而言,以严谨的统计方法应用 ML 非常重要,因为它既带来了机遇,也带来了挑战。尽管近年来机器学习在技术上取得了突飞猛进的发展,但其应用并不具备产生高质量统计结果所需的方法论稳健性。为了考虑机器学习模型中的所有误差来源,我们提出了机器学习总误差(TMLE)框架,类似于调查方法中使用的总调查误差模型。作为确保机器学习模型内部有效和外部有效的一种手段,TMLE 模型解决了代表性和测量误差等问题。本文介绍了几个案例研究,说明了在官方统计中应用机器学习时更加严格的重要性。
{"title":"Leveraging Machine Learning for Official Statistics: A Statistical Manifesto","authors":"Marco Puts, David Salgado, Piet Daas","doi":"arxiv-2409.04365","DOIUrl":"https://doi.org/arxiv-2409.04365","url":null,"abstract":"It is important for official statistics production to apply ML with\u0000statistical rigor, as it presents both opportunities and challenges. Although\u0000machine learning has enjoyed rapid technological advances in recent years, its\u0000application does not possess the methodological robustness necessary to produce\u0000high quality statistical results. In order to account for all sources of error\u0000in machine learning models, the Total Machine Learning Error (TMLE) is\u0000presented as a framework analogous to the Total Survey Error Model used in\u0000survey methodology. As a means of ensuring that ML models are both internally\u0000valid as well as externally valid, the TMLE model addresses issues such as\u0000representativeness and measurement errors. There are several case studies\u0000presented, illustrating the importance of applying more rigor to the\u0000application of machine learning in official statistics.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mika Sipilä, Claudia Cappello, Sandra De Iaco, Klaus Nordhausen, Sara Taskinen
Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unmixing transformation based on the observed data only. The current methods for spatio-temporal blind source separation are restricted to linear unmixing, and nonlinear variants have not been implemented. In this paper, we extend identifiable variational autoencoder to the nonlinear nonstationary spatio-temporal blind source separation setting and demonstrate its performance using comprehensive simulation studies. Additionally, we introduce two alternative methods for the latent dimension estimation, which is a crucial task in order to obtain the correct latent representation. Finally, we illustrate the proposed methods using a meteorological application, where we estimate the latent dimension and the latent components, interpret the components, and show how nonstationarity can be accounted and prediction accuracy can be improved by using the proposed nonlinear blind source separation method as a preprocessing method.
{"title":"Modelling multivariate spatio-temporal data with identifiable variational autoencoders","authors":"Mika Sipilä, Claudia Cappello, Sandra De Iaco, Klaus Nordhausen, Sara Taskinen","doi":"arxiv-2409.04162","DOIUrl":"https://doi.org/arxiv-2409.04162","url":null,"abstract":"Modelling multivariate spatio-temporal data with complex dependency\u0000structures is a challenging task but can be simplified by assuming that the\u0000original variables are generated from independent latent components. If these\u0000components are found, they can be modelled univariately. Blind source\u0000separation aims to recover the latent components by estimating the unmixing\u0000transformation based on the observed data only. The current methods for\u0000spatio-temporal blind source separation are restricted to linear unmixing, and\u0000nonlinear variants have not been implemented. In this paper, we extend\u0000identifiable variational autoencoder to the nonlinear nonstationary\u0000spatio-temporal blind source separation setting and demonstrate its performance\u0000using comprehensive simulation studies. Additionally, we introduce two\u0000alternative methods for the latent dimension estimation, which is a crucial\u0000task in order to obtain the correct latent representation. Finally, we\u0000illustrate the proposed methods using a meteorological application, where we\u0000estimate the latent dimension and the latent components, interpret the\u0000components, and show how nonstationarity can be accounted and prediction\u0000accuracy can be improved by using the proposed nonlinear blind source\u0000separation method as a preprocessing method.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification theory for causal effects in directed acyclic graphs (DAGs) with hidden variables is well-developed, but methods for estimating and inferring functionals beyond the g-formula remain limited. Previous studies have proposed semiparametric estimators for identifiable functionals in a broad class of DAGs with hidden variables. While demonstrating double robustness in some models, existing estimators face challenges, particularly with density estimation and numerical integration for continuous variables, and their estimates may fall outside the parameter space of the target estimand. Their asymptotic properties are also underexplored, especially when using flexible statistical and machine learning models for nuisance estimation. This study addresses these challenges by introducing novel one-step corrected plug-in and targeted minimum loss-based estimators of causal effects for a class of DAGs that extend classical back-door and front-door criteria (known as the treatment primal fixability criterion in prior literature). These estimators leverage machine learning to minimize modeling assumptions while ensuring key statistical properties such as asymptotic linearity, double robustness, efficiency, and staying within the bounds of the target parameter space. We establish conditions for nuisance functional estimates in terms of L2(P)-norms to achieve root-n consistent causal effect estimates. To facilitate practical application, we have developed the flexCausal package in R.
在具有隐藏变量的有向无环图(DAG)中,因果效应的识别理论已经发展得很成熟,但除了 g 公式之外,估计和推断函数的方法仍然有限。以前的研究提出了在一类广泛的具有隐藏变量的有向无环图(DAG)中可识别函数的半参数估计方法。现有估计器在某些模型中表现出双重稳健性,但也面临挑战,尤其是连续变量的密度测定和数值积分,而且估计结果可能超出目标估计值的参数空间。它们的渐近特性也未得到充分探索,尤其是在使用灵活的统计和机器学习模型进行滋扰估计时。本研究针对这些挑战,为一类 DAG 引入了新颖的一步校正插件和基于目标最小损失的因果效应估计器,它们扩展了经典的后门和前门标准(在以前的文献中称为处理原始固定性标准)。这些估计器利用机器学习来最小化建模假设,同时确保关键的统计特性,如渐近线性、双重稳健性、效率以及不超出目标参数空间的边界。我们用 L2(P)规范建立了滋扰函数估计的条件,以实现根 n 一致的因果效应估计。为了便于实际应用,我们在 R 中开发了 flexCausal 软件包。
{"title":"Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria","authors":"Anna Guo, Razieh Nabi","doi":"arxiv-2409.03962","DOIUrl":"https://doi.org/arxiv-2409.03962","url":null,"abstract":"The identification theory for causal effects in directed acyclic graphs\u0000(DAGs) with hidden variables is well-developed, but methods for estimating and\u0000inferring functionals beyond the g-formula remain limited. Previous studies\u0000have proposed semiparametric estimators for identifiable functionals in a broad\u0000class of DAGs with hidden variables. While demonstrating double robustness in\u0000some models, existing estimators face challenges, particularly with density\u0000estimation and numerical integration for continuous variables, and their\u0000estimates may fall outside the parameter space of the target estimand. Their\u0000asymptotic properties are also underexplored, especially when using flexible\u0000statistical and machine learning models for nuisance estimation. This study\u0000addresses these challenges by introducing novel one-step corrected plug-in and\u0000targeted minimum loss-based estimators of causal effects for a class of DAGs\u0000that extend classical back-door and front-door criteria (known as the treatment\u0000primal fixability criterion in prior literature). These estimators leverage\u0000machine learning to minimize modeling assumptions while ensuring key\u0000statistical properties such as asymptotic linearity, double robustness,\u0000efficiency, and staying within the bounds of the target parameter space. We\u0000establish conditions for nuisance functional estimates in terms of L2(P)-norms\u0000to achieve root-n consistent causal effect estimates. To facilitate practical\u0000application, we have developed the flexCausal package in R.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randomized clinical trials are the gold standard for analyzing treatment effects, but high costs and ethical concerns can limit recruitment, potentially leading to invalid inferences. Incorporating external trial data with similar characteristics into the analysis using transfer learning appears promising for addressing these issues. In this paper, we present a formal framework for applying transfer learning to the analysis of clinical trials, considering three key perspectives: transfer algorithm, theoretical foundation, and inference method. For the algorithm, we adopt a parameter-based transfer learning approach to enhance the lasso-adjusted stratum-specific estimator developed for estimating treatment effects. A key component in constructing the transfer learning estimator is deriving the regression coefficient estimates within each stratum, accounting for the bias between source and target data. To provide a theoretical foundation, we derive the $l_1$ convergence rate for the estimated regression coefficients and establish the asymptotic normality of the transfer learning estimator. Our results show that when external trial data resembles current trial data, the sample size requirements can be reduced compared to using only the current trial data. Finally, we propose a consistent nonparametric variance estimator to facilitate inference. Numerical studies demonstrate the effectiveness and robustness of our proposed estimator across various scenarios.
{"title":"Incorporating external data for analyzing randomized clinical trials: A transfer learning approach","authors":"Yujia Gu, Hanzhong Liu, Wei Ma","doi":"arxiv-2409.04126","DOIUrl":"https://doi.org/arxiv-2409.04126","url":null,"abstract":"Randomized clinical trials are the gold standard for analyzing treatment\u0000effects, but high costs and ethical concerns can limit recruitment, potentially\u0000leading to invalid inferences. Incorporating external trial data with similar\u0000characteristics into the analysis using transfer learning appears promising for\u0000addressing these issues. In this paper, we present a formal framework for\u0000applying transfer learning to the analysis of clinical trials, considering\u0000three key perspectives: transfer algorithm, theoretical foundation, and\u0000inference method. For the algorithm, we adopt a parameter-based transfer\u0000learning approach to enhance the lasso-adjusted stratum-specific estimator\u0000developed for estimating treatment effects. A key component in constructing the\u0000transfer learning estimator is deriving the regression coefficient estimates\u0000within each stratum, accounting for the bias between source and target data. To\u0000provide a theoretical foundation, we derive the $l_1$ convergence rate for the\u0000estimated regression coefficients and establish the asymptotic normality of the\u0000transfer learning estimator. Our results show that when external trial data\u0000resembles current trial data, the sample size requirements can be reduced\u0000compared to using only the current trial data. Finally, we propose a consistent\u0000nonparametric variance estimator to facilitate inference. Numerical studies\u0000demonstrate the effectiveness and robustness of our proposed estimator across\u0000various scenarios.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a Volterra Gaussian process of the form $X(t)=int^t_0K(t,s)d{W(s)},$ where $W$ is a Wiener process and $K$ is a continuous kernel. In dimension one, we prove a law of the iterated logarithm, discuss the existence of local times and verify a continuous dependence between the local time and the kernel that generates the process. Furthermore, we prove the existence of the Rosen renormalized self-intersection local times for a planar Gaussian Volterra process.
{"title":"Local times of self-intersection and sample path properties of Volterra Gaussian processes","authors":"Olga Izyumtseva, Wasiur R. KhudaBukhsh","doi":"arxiv-2409.04377","DOIUrl":"https://doi.org/arxiv-2409.04377","url":null,"abstract":"We study a Volterra Gaussian process of the form\u0000$X(t)=int^t_0K(t,s)d{W(s)},$ where $W$ is a Wiener process and $K$ is a\u0000continuous kernel. In dimension one, we prove a law of the iterated logarithm,\u0000discuss the existence of local times and verify a continuous dependence between\u0000the local time and the kernel that generates the process. Furthermore, we prove\u0000the existence of the Rosen renormalized self-intersection local times for a\u0000planar Gaussian Volterra process.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The minimum norm least squares is an estimation strategy under an over-parameterized case and, in machine learning, is known as a helpful tool for understanding a nature of deep learning. In this paper, to apply it in a context of non-parametric regression problems, we established several methods which are based on thresholding of SVD (singular value decomposition) components, wihch are referred to as SVD regression methods. We considered several methods that are singular value based thresholding, hard-thresholding with cross validation, universal thresholding and bridge thresholding. Information on output samples is not utilized in the first method while it is utilized in the other methods. We then applied them to semi-supervised learning, in which unlabeled input samples are incorporated into kernel functions in a regressor. The experimental results for real data showed that, depending on the datasets, the SVD regression methods is superior to a naive ridge regression method. Unfortunately, there were no clear advantage of the methods utilizing information on output samples. Furthermore, for depending on datasets, incorporation of unlabeled input samples into kernels is found to have certain advantages.
{"title":"Over-parameterized regression methods and their application to semi-supervised learning","authors":"Katsuyuki Hagiwara","doi":"arxiv-2409.04001","DOIUrl":"https://doi.org/arxiv-2409.04001","url":null,"abstract":"The minimum norm least squares is an estimation strategy under an\u0000over-parameterized case and, in machine learning, is known as a helpful tool\u0000for understanding a nature of deep learning. In this paper, to apply it in a\u0000context of non-parametric regression problems, we established several methods\u0000which are based on thresholding of SVD (singular value decomposition)\u0000components, wihch are referred to as SVD regression methods. We considered\u0000several methods that are singular value based thresholding, hard-thresholding\u0000with cross validation, universal thresholding and bridge thresholding.\u0000Information on output samples is not utilized in the first method while it is\u0000utilized in the other methods. We then applied them to semi-supervised\u0000learning, in which unlabeled input samples are incorporated into kernel\u0000functions in a regressor. The experimental results for real data showed that,\u0000depending on the datasets, the SVD regression methods is superior to a naive\u0000ridge regression method. Unfortunately, there were no clear advantage of the\u0000methods utilizing information on output samples. Furthermore, for depending on\u0000datasets, incorporation of unlabeled input samples into kernels is found to\u0000have certain advantages.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The nonparametric sign test dates back to the early 18th century with a data analysis by John Arbuthnot. It is an alternative to Gosset's more recent $t$-test for consistent differences between two sets of observations. Fisher's $F$-test is a generalization of the $t$-test to linear regression and linear null hypotheses. Only the sign test is robust to non-Gaussianity. Gutenbrunner et al. [1993] derived a version of the sign test for linear null hypotheses in the spirit of the F-test, which requires the difficult estimation of the sparsity function. We propose instead a new sign test called $infty$-S test via the convex analysis of a point estimator that thresholds the estimate towards the null hypothesis of the test.
非参数符号检验可追溯到 18 世纪初约翰-阿布特诺(John Arbuthnot)的数据分析。它是戈塞特(Gosset)较新的t检验的替代方法,用于检验两组观测值之间的一致性差异。费雪的$F$检验是将$t$检验推广到线性回归和线性零假设。只有符号检验对非高斯性是稳健的。Gutenbrunneret 等人[1993]根据 F 检验的精神,推导出了线性零假设的符号检验版本,该版本需要对稀疏性函数进行困难的估计。我们提出了一种新的符号检验,称为"$infty$-S 检验",它通过对点估计器的凸分析,将估计值阈值指向检验的零假设。
{"title":"The $infty$-S test via regression quantile affine LASSO","authors":"Sylvain Sardy, Xiaoyu Ma, Hugo Gaible","doi":"arxiv-2409.04256","DOIUrl":"https://doi.org/arxiv-2409.04256","url":null,"abstract":"The nonparametric sign test dates back to the early 18th century with a data\u0000analysis by John Arbuthnot. It is an alternative to Gosset's more recent\u0000$t$-test for consistent differences between two sets of observations. Fisher's\u0000$F$-test is a generalization of the $t$-test to linear regression and linear\u0000null hypotheses. Only the sign test is robust to non-Gaussianity. Gutenbrunner\u0000et al. [1993] derived a version of the sign test for linear null hypotheses in\u0000the spirit of the F-test, which requires the difficult estimation of the\u0000sparsity function. We propose instead a new sign test called $infty$-S test\u0000via the convex analysis of a point estimator that thresholds the estimate\u0000towards the null hypothesis of the test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical shape analysis of slabular objects like groups of hippocampi is highly useful for medical researchers as it can be useful for diagnoses and understanding diseases. This work proposes a novel object representation based on locally parameterized discrete swept skeletal structures. Further, model fitting and analysis of such representations are discussed. The model fitting procedure is based on boundary division and surface flattening. The quality of the model fitting is evaluated based on the symmetry and tidiness of the skeletal structure as well as the volume of the implied boundary. The power of the method is demonstrated by visual inspection and statistical analysis of a synthetic and an actual data set in comparison with an available skeletal representation.
{"title":"Fitting the Discrete Swept Skeletal Representation to Slabular Objects","authors":"Mohsen Taheri, Stephen M. Pizer, Jörn Schulz","doi":"arxiv-2409.04079","DOIUrl":"https://doi.org/arxiv-2409.04079","url":null,"abstract":"Statistical shape analysis of slabular objects like groups of hippocampi is\u0000highly useful for medical researchers as it can be useful for diagnoses and\u0000understanding diseases. This work proposes a novel object representation based\u0000on locally parameterized discrete swept skeletal structures. Further, model\u0000fitting and analysis of such representations are discussed. The model fitting\u0000procedure is based on boundary division and surface flattening. The quality of\u0000the model fitting is evaluated based on the symmetry and tidiness of the\u0000skeletal structure as well as the volume of the implied boundary. The power of\u0000the method is demonstrated by visual inspection and statistical analysis of a\u0000synthetic and an actual data set in comparison with an available skeletal\u0000representation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carles Breto, Jesse Wheeler, Aaron A. King, Edward L. Ionides
The R package panelPomp supports analysis of panel data via a general class of partially observed Markov process models (PanelPOMP). This package tutorial describes how the mathematical concept of a PanelPOMP is represented in the software and demonstrates typical use-cases of panelPomp. Monte Carlo methods used for POMP models require adaptation for PanelPOMP models due to the higher dimensionality of panel data. The package takes advantage of recent advances for PanelPOMP, including an iterated filtering algorithm, Monte Carlo adjusted profile methodology and block optimization methodology to assist with the large parameter spaces that can arise with panel models. In addition, tools for manipulation of models and data are provided that take advantage of the panel structure.
{"title":"A tutorial on panel data analysis using partially observed Markov processes via the R package panelPomp","authors":"Carles Breto, Jesse Wheeler, Aaron A. King, Edward L. Ionides","doi":"arxiv-2409.03876","DOIUrl":"https://doi.org/arxiv-2409.03876","url":null,"abstract":"The R package panelPomp supports analysis of panel data via a general class\u0000of partially observed Markov process models (PanelPOMP). This package tutorial\u0000describes how the mathematical concept of a PanelPOMP is represented in the\u0000software and demonstrates typical use-cases of panelPomp. Monte Carlo methods\u0000used for POMP models require adaptation for PanelPOMP models due to the higher\u0000dimensionality of panel data. The package takes advantage of recent advances\u0000for PanelPOMP, including an iterated filtering algorithm, Monte Carlo adjusted\u0000profile methodology and block optimization methodology to assist with the large\u0000parameter spaces that can arise with panel models. In addition, tools for\u0000manipulation of models and data are provided that take advantage of the panel\u0000structure.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}