首页 > 最新文献

Asta-Advances in Statistical Analysis最新文献

英文 中文
Clustering of extreme values: estimation and application 极值聚类:估计和应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-31 DOI: 10.1007/s10182-023-00474-y
Marta Ferreira

The extreme value theory (EVT) encompasses a set of methods that allow inferring about the risk inherent to various phenomena in the scope of economic, financial, actuarial, environmental, hydrological, climatic sciences, as well as various areas of engineering. In many situations the clustering effect of high values may have an impact on the risk of occurrence of extreme phenomena. For example, extreme temperatures that last over time and result in drought situations, the permanence of intense rains leading to floods, stock markets in successive falls and consequent catastrophic losses. The extremal index is a measure of EVT associated with the degree of clustering of extreme values. In many situations, and under certain conditions, it corresponds to the arithmetic inverse of the average size of high-value clusters. The estimation of the extremal index generally entails two sources of uncertainty: the level at which high observations are considered and the identification of clusters. There are several contributions in the literature on the estimation of the extremal index, including methodologies to overcome the aforementioned sources of uncertainty. In this work we will revisit several existing estimators, apply automatic choice methods, both for the threshold and for the clustering parameter, and compare the performance of the methods. We will end with an application to meteorological data.

极值理论(EVT)包括一套方法,可以推断经济、金融、精算、环境、水文、气候科学以及各种工程领域中各种现象所固有的风险。在许多情况下,高值的聚集效应可能会对极端现象发生的风险产生影响。例如,持续一段时间并导致干旱的极端温度,导致洪水的持续暴雨,股市连续下跌,以及随之而来的灾难性损失。极值指数是与极值的聚类程度相关联的EVT的度量。在许多情况下,在某些条件下,它对应于高值集群平均大小的算术逆。极值指数的估计通常包含两个不确定性来源:考虑高观测值的水平和聚类的识别。文献中有一些关于极值指数估计的贡献,包括克服上述不确定性来源的方法。在这项工作中,我们将重新审视几种现有的估计量,应用阈值和聚类参数的自动选择方法,并比较这些方法的性能。最后我们将介绍气象数据的应用。
{"title":"Clustering of extreme values: estimation and application","authors":"Marta Ferreira","doi":"10.1007/s10182-023-00474-y","DOIUrl":"10.1007/s10182-023-00474-y","url":null,"abstract":"<div><p>The extreme value theory (EVT) encompasses a set of methods that allow inferring about the risk inherent to various phenomena in the scope of economic, financial, actuarial, environmental, hydrological, climatic sciences, as well as various areas of engineering. In many situations the clustering effect of high values may have an impact on the risk of occurrence of extreme phenomena. For example, extreme temperatures that last over time and result in drought situations, the permanence of intense rains leading to floods, stock markets in successive falls and consequent catastrophic losses. The extremal index is a measure of EVT associated with the degree of clustering of extreme values. In many situations, and under certain conditions, it corresponds to the arithmetic inverse of the average size of high-value clusters. The estimation of the extremal index generally entails two sources of uncertainty: the level at which high observations are considered and the identification of clusters. There are several contributions in the literature on the estimation of the extremal index, including methodologies to overcome the aforementioned sources of uncertainty. In this work we will revisit several existing estimators, apply automatic choice methods, both for the threshold and for the clustering parameter, and compare the performance of the methods. We will end with an application to meteorological data.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"101 - 125"},"PeriodicalIF":1.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10064624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial semiparametric M-quantile regression for hedonic price modelling 特征价格模型的空间半参数M-分位数回归
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-30 DOI: 10.1007/s10182-023-00476-w
Francesco Schirripa Spagnolo, Riccardo Borgoni, Antonella Carcagnì, Alessandra Michelangeli, Nicola Salvati

This paper proposes an M-quantile regression approach to address the heterogeneity of the housing market in a modern European city. We show how M-quantile modelling is a rich and flexible tool for empirical market price data analysis, allowing us to obtain a robust estimation of the hedonic price function whilst accounting for different sources of heterogeneity in market prices. The suggested methodology can generally be used to analyse nonlinear interactions between prices and predictors. In particular, we develop a spatial semiparametric M-quantile model to capture both the potential nonlinear effects of the cultural environment on pricing and spatial trends. In both cases, nonlinearity is introduced into the model using appropriate bases functions. We show how the implicit price associated with the variable that measures cultural amenities can be determined in this semiparametric framework. Our findings show that the effect of several housing attributes and urban amenities differs significantly across the response distribution, suggesting that buyers of lower-priced properties behave differently than buyers of higher-priced properties.

本文提出了一种 M-quantile 回归方法,以解决现代欧洲城市住房市场的异质性问题。我们展示了 M-quantile 模型是如何成为经验性市场价格数据分析的一个丰富而灵活的工具,使我们能够获得对冲价格函数的稳健估计,同时考虑到市场价格中不同来源的异质性。建议的方法一般可用于分析价格与预测因素之间的非线性相互作用。特别是,我们建立了一个空间半参数 M-quantile 模型,以捕捉文化环境对价格和空间趋势的潜在非线性影响。在这两种情况下,都使用适当的基函数将非线性引入模型。我们展示了如何在这个半参数框架中确定与衡量文化设施的变量相关的隐含价格。我们的研究结果表明,在不同的响应分布中,若干住房属性和城市配套设施的影响存在显著差异,这表明低价房产的买家与高价房产的买家行为不同。
{"title":"A spatial semiparametric M-quantile regression for hedonic price modelling","authors":"Francesco Schirripa Spagnolo,&nbsp;Riccardo Borgoni,&nbsp;Antonella Carcagnì,&nbsp;Alessandra Michelangeli,&nbsp;Nicola Salvati","doi":"10.1007/s10182-023-00476-w","DOIUrl":"10.1007/s10182-023-00476-w","url":null,"abstract":"<div><p>This paper proposes an M-quantile regression approach to address the heterogeneity of the housing market in a modern European city. We show how M-quantile modelling is a rich and flexible tool for empirical market price data analysis, allowing us to obtain a robust estimation of the hedonic price function whilst accounting for different sources of heterogeneity in market prices. The suggested methodology can generally be used to analyse nonlinear interactions between prices and predictors. In particular, we develop a spatial semiparametric M-quantile model to capture both the potential nonlinear effects of the cultural environment on pricing and spatial trends. In both cases, nonlinearity is introduced into the model using appropriate bases functions. We show how the implicit price associated with the variable that measures cultural amenities can be determined in this semiparametric framework. Our findings show that the effect of several housing attributes and urban amenities differs significantly across the response distribution, suggesting that buyers of lower-priced properties behave differently than buyers of higher-priced properties.\u0000</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"159 - 183"},"PeriodicalIF":1.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00476-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41823433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach 线性混合模型固定效应参数和方差的稳健估计:最小密度功率散度法
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-29 DOI: 10.1007/s10182-023-00473-z
Giovanni Saraceno, Abhik Ghosh, Ayanendranath Basu, Claudio Agostinelli

Many real-life data sets can be analyzed using linear mixed models (LMMs). Since these are ordinarily based on normality assumptions, under small deviations from the model the inference can be highly unstable when the associated parameters are estimated by classical methods. On the other hand, the density power divergence (DPD) family, which measures the discrepancy between two probability density functions, has been successfully used to build robust estimators with high stability associated with minimal loss in efficiency. Here, we develop the minimum DPD estimator (MDPDE) for independent but non-identically distributed observations for LMMs according to the variance components model. We prove that the theoretical properties hold, including consistency and asymptotic normality of the estimators. The influence function and sensitivity measures are computed to explore the robustness properties. As a data-based choice of the MDPDE tuning parameter (alpha) is very important, we propose two candidates as “optimal” choices, where optimality is in the sense of choosing the strongest downweighting that is necessary for the particular data set. We conduct a simulation study comparing the proposed MDPDE, for different values of (alpha), with S-estimators, M-estimators and the classical maximum likelihood estimator, considering different levels of contamination. Finally, we illustrate the performance of our proposal on a real-data example.

许多现实生活中的数据集都可以使用线性混合模型(LMM)进行分析。由于这些模型通常基于正态性假设,因此在模型出现微小偏差的情况下,用经典方法估计相关参数时,推理可能会非常不稳定。另一方面,密度幂发散(DPD)系列测量两个概率密度函数之间的差异,已被成功用于建立稳健的估计器,其稳定性高,效率损失最小。在此,我们根据方差分量模型,为 LMM 的独立但非同分布观测值开发了最小 DPD 估计器(MDPDE)。我们证明了理论特性的成立,包括估计器的一致性和渐近正态性。我们还计算了影响函数和敏感性度量,以探索鲁棒性特性。由于基于数据选择 MDPDE 调整参数 (α)非常重要,我们提出了两个候选的 "最优 "选择,这里的最优是指选择特定数据集所需的最强降权。我们进行了一项模拟研究,在考虑到不同污染水平的情况下,针对不同的 (alpha)值,比较了所提出的 MDPDE 与 S-估计器、M-估计器和经典的最大似然估计器。最后,我们在一个真实数据实例中说明了我们建议的性能。
{"title":"Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach","authors":"Giovanni Saraceno,&nbsp;Abhik Ghosh,&nbsp;Ayanendranath Basu,&nbsp;Claudio Agostinelli","doi":"10.1007/s10182-023-00473-z","DOIUrl":"10.1007/s10182-023-00473-z","url":null,"abstract":"<div><p>Many real-life data sets can be analyzed using linear mixed models (LMMs). Since these are ordinarily based on normality assumptions, under small deviations from the model the inference can be highly unstable when the associated parameters are estimated by classical methods. On the other hand, the density power divergence (DPD) family, which measures the discrepancy between two probability density functions, has been successfully used to build robust estimators with high stability associated with minimal loss in efficiency. Here, we develop the minimum DPD estimator (MDPDE) for independent but non-identically distributed observations for LMMs according to the variance components model. We prove that the theoretical properties hold, including consistency and asymptotic normality of the estimators. The influence function and sensitivity measures are computed to explore the robustness properties. As a data-based choice of the MDPDE tuning parameter <span>(alpha)</span> is very important, we propose two candidates as “optimal” choices, where optimality is in the sense of choosing the strongest downweighting that is necessary for the particular data set. We conduct a simulation study comparing the proposed MDPDE, for different values of <span>(alpha)</span>, with S-estimators, M-estimators and the classical maximum likelihood estimator, considering different levels of contamination. Finally, we illustrate the performance of our proposal on a real-data example.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"127 - 157"},"PeriodicalIF":1.4,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00473-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47139711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lasso-based variable selection methods in text regression: the case of short texts 文本回归中基于Lasso的变量选择方法:以短文本为例
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-20 DOI: 10.1007/s10182-023-00472-0
Marzia Freo, Alessandra Luati

Communication through websites is often characterised by short texts, made of few words, such as image captions or tweets. This paper explores the class of supervised learning methods for the analysis of short texts, as an alternative to unsupervised methods, widely employed to infer topics from structured texts. The aim is to assess the effectiveness of text data in social sciences, when they are used as explanatory variables in regression models. To this purpose, we compare different variable selection procedures when text regression models are fitted to real, short, text data. We discuss the results obtained by several variants of lasso, screening-based methods and randomisation-based models, such as sure independence screening and stability selection, in terms of number and importance of selected variables, assessed through goodness-of-fit measures, inclusion frequency and model class reliance. Latent Dirichlet allocation results are also considered as a term of comparison. Our perspective is primarily empirical and our starting point is the analysis of two real case studies, though bootstrap replications of each dataset are considered. The first case study aims at explaining price variations based on the information contained in the description of items on sale on e-commerce platforms. The second regards open questions in surveys on satisfaction ratings. The case studies are different in nature and representative of different kinds of short texts, as, in one case, a concise descriptive text is considered, whereas, in the other case, the text expresses an opinion.

通过网站进行的交流通常以短文为特征,如图片说明或推文等。本文探讨了一类用于分析短文的监督学习方法,以替代广泛用于从结构化文本中推断主题的无监督方法。目的是评估文本数据在社会科学中用作回归模型解释变量时的有效性。为此,我们比较了将文本回归模型拟合到真实、简短的文本数据时的不同变量选择程序。我们从所选变量的数量和重要性(通过拟合优度、纳入频率和模型类别依赖性进行评估)的角度,讨论了拉索的几种变体、基于筛选的方法和基于随机化的模型(如确定的独立性筛选和稳定性选择)所获得的结果。潜在德里赫特分配结果也被视为一种比较。我们的视角主要是实证性的,我们的出发点是分析两个真实的案例研究,但也考虑了每个数据集的引导复制。第一个案例研究旨在根据电子商务平台上销售商品描述中包含的信息来解释价格变化。第二个案例涉及满意度调查中的开放式问题。案例研究的性质不同,代表了不同类型的短文,其中一个案例研究的是简洁的描述性文本,而另一个案例研究的是表达观点的文本。
{"title":"Lasso-based variable selection methods in text regression: the case of short texts","authors":"Marzia Freo,&nbsp;Alessandra Luati","doi":"10.1007/s10182-023-00472-0","DOIUrl":"10.1007/s10182-023-00472-0","url":null,"abstract":"<div><p>Communication through websites is often characterised by short texts, made of few words, such as image captions or tweets. This paper explores the class of supervised learning methods for the analysis of short texts, as an alternative to unsupervised methods, widely employed to infer topics from structured texts. The aim is to assess the effectiveness of text data in social sciences, when they are used as explanatory variables in regression models. To this purpose, we compare different variable selection procedures when text regression models are fitted to real, short, text data. We discuss the results obtained by several variants of lasso, screening-based methods and randomisation-based models, such as sure independence screening and stability selection, in terms of number and importance of selected variables, assessed through goodness-of-fit measures, inclusion frequency and model class reliance. Latent Dirichlet allocation results are also considered as a term of comparison. Our perspective is primarily empirical and our starting point is the analysis of two real case studies, though bootstrap replications of each dataset are considered. The first case study aims at explaining price variations based on the information contained in the description of items on sale on e-commerce platforms. The second regards open questions in surveys on satisfaction ratings. The case studies are different in nature and representative of different kinds of short texts, as, in one case, a concise descriptive text is considered, whereas, in the other case, the text expresses an opinion.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"69 - 99"},"PeriodicalIF":1.4,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00472-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43416978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Bayesian ridge regression for survival data based on a vine copula-based prior 更正:基于藤蔓协整先验的生存数据贝叶斯脊回归
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-02-14 DOI: 10.1007/s10182-023-00470-2
Hirofumi Michimae, Takeshi Emura
{"title":"Correction: Bayesian ridge regression for survival data based on a vine copula-based prior","authors":"Hirofumi Michimae,&nbsp;Takeshi Emura","doi":"10.1007/s10182-023-00470-2","DOIUrl":"10.1007/s10182-023-00470-2","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 3","pages":"703 - 703"},"PeriodicalIF":1.4,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135797364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic causal modeling of the second outbreak of COVID-19 in Italy 意大利 COVID-19 第二次爆发的动态因果模型。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-02-07 DOI: 10.1007/s10182-023-00469-9
Massimo Bilancia, Domenico Vitale, Fabio Manca, Paola Perchinunno, Luigi Santacroce

While the vaccination campaign against COVID-19 is having its positive impact, we retrospectively analyze the causal impact of some decisions made by the Italian government on the second outbreak of the SARS-CoV-2 pandemic in Italy, when no vaccine was available. First, we analyze the causal impact of reopenings after the first lockdown in 2020. In addition, we also analyze the impact of reopening schools in September 2020. Our results provide an unprecedented opportunity to evaluate the causal relationship between the relaxation of restrictions and the transmission in the community of a highly contagious respiratory virus that causes severe illness in the absence of prophylactic vaccination programs. We present a purely data-analytic approach based on a Bayesian methodology and discuss possible interpretations of the results obtained and implications for policy makers.

在 COVID-19 疫苗接种活动产生积极影响的同时,我们回顾性地分析了意大利政府在第二次 SARS-CoV-2 大流行爆发(当时还没有疫苗)时所做的一些决策的因果影响。首先,我们分析了 2020 年第一次封锁后重新开放的因果影响。此外,我们还分析了 2020 年 9 月重新开放学校的影响。我们的研究结果提供了一个前所未有的机会,可以评估放宽限制与在没有预防性疫苗接种计划的情况下会导致严重疾病的高传染性呼吸道病毒在社区传播之间的因果关系。我们提出了一种基于贝叶斯方法的纯数据分析方法,并讨论了对所获结果的可能解释以及对政策制定者的影响。
{"title":"A dynamic causal modeling of the second outbreak of COVID-19 in Italy","authors":"Massimo Bilancia,&nbsp;Domenico Vitale,&nbsp;Fabio Manca,&nbsp;Paola Perchinunno,&nbsp;Luigi Santacroce","doi":"10.1007/s10182-023-00469-9","DOIUrl":"10.1007/s10182-023-00469-9","url":null,"abstract":"<div><p>While the vaccination campaign against COVID-19 is having its positive impact, we retrospectively analyze the causal impact of some decisions made by the Italian government on the second outbreak of the SARS-CoV-2 pandemic in Italy, when no vaccine was available. First, we analyze the causal impact of reopenings after the first lockdown in 2020. In addition, we also analyze the impact of reopening schools in September 2020. Our results provide an unprecedented opportunity to evaluate the causal relationship between the relaxation of restrictions and the transmission in the community of a highly contagious respiratory virus that causes severe illness in the absence of prophylactic vaccination programs. We present a purely data-analytic approach based on a Bayesian methodology and discuss possible interpretations of the results obtained and implications for policy makers.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"1 - 30"},"PeriodicalIF":1.4,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10712587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Left-truncated health insurance claims data: theoretical review and empirical application 左截断医疗保险理赔数据:理论回顾与实证应用
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-02-02 DOI: 10.1007/s10182-023-00471-1
Rafael Weißbach, Achim Dörre, Dominik Wied, Gabriele Doblhammer, Anne Fink

From the inventory of the health insurer AOK in 2004, we draw a sample of a quarter million people and follow each person’s health claims continuously until 2013. Our aim is to estimate the effect of a stroke on the dementia onset probability for Germans born in the first half of the 20th century. People deceased before 2004 are randomly left-truncated, and especially their number is unknown. Filtrations, modelling the missing data, enable circumventing the unknown number of truncated persons by using a conditional likelihood. Dementia onset after 2013 is a fixed right-censoring event. For each observed health history, Jacod’s formula yields its conditional likelihood contribution. Asymptotic normality of the estimated intensities is derived, related to a sample size definition including the number of truncated people. The standard error results from the asymptotic normality and is easily computable, despite the unknown sample size. The claims data reveal that after a stroke, with time measured in years, the intensity of dementia onset increases from 0.02 to 0.07. Using the independence of the two estimated intensities, a 95% confidence interval for their difference is [0.053, 0.057]. The effect halves when we extend the analysis to an age-inhomogeneous model, but does not change further when we additionally adjust for multi-morbidity.

我们从 2004 年医疗保险公司 AOK 的库存中抽取了 25 万人作为样本,并对每个人的医疗索赔进行了持续跟踪,直至 2013 年。我们的目的是估计中风对 20 世纪上半叶出生的德国人痴呆症发病概率的影响。2004 年之前去世的人被随机左截断,尤其是他们的人数未知。利用条件似然法对缺失数据进行过滤建模,可以规避截断人数未知的问题。2013 年后发病的痴呆症患者是一个固定的右截断事件。对于每个观察到的健康史,Jacod 公式都能得出其条件似然贡献。估计强度的渐近正态性与包括截断人数在内的样本量定义有关。标准误差由渐近正态性得出,尽管样本量未知,但很容易计算。理赔数据显示,中风后,随着时间的推移(以年为单位),痴呆症发病强度从 0.02 增加到 0.07。利用两个估计强度的独立性,它们之间差异的 95% 置信区间为 [0.053, 0.057]。当我们将分析扩展到年龄同质性模型时,该效应减半,但当我们对多病症进行额外调整时,该效应没有进一步变化。
{"title":"Left-truncated health insurance claims data: theoretical review and empirical application","authors":"Rafael Weißbach,&nbsp;Achim Dörre,&nbsp;Dominik Wied,&nbsp;Gabriele Doblhammer,&nbsp;Anne Fink","doi":"10.1007/s10182-023-00471-1","DOIUrl":"10.1007/s10182-023-00471-1","url":null,"abstract":"<div><p>From the inventory of the health insurer AOK in 2004, we draw a sample of a quarter million people and follow each person’s health claims continuously until 2013. Our aim is to estimate the effect of a stroke on the dementia onset probability for Germans born in the first half of the 20th century. People deceased before 2004 are randomly left-truncated, and especially their number is unknown. Filtrations, modelling the missing data, enable circumventing the unknown number of truncated persons by using a conditional likelihood. Dementia onset after 2013 is a fixed right-censoring event. For each observed health history, Jacod’s formula yields its conditional likelihood contribution. Asymptotic normality of the estimated intensities is derived, related to a sample size definition including the number of truncated people. The standard error results from the asymptotic normality and is easily computable, despite the unknown sample size. The claims data reveal that after a stroke, with time measured in years, the intensity of dementia onset increases from 0.02 to 0.07. Using the independence of the two estimated intensities, a 95% confidence interval for their difference is [0.053, 0.057]. The effect halves when we extend the analysis to an age-inhomogeneous model, but does not change further when we additionally adjust for multi-morbidity.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 1","pages":"31 - 68"},"PeriodicalIF":1.4,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00471-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42282189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical guarantees for sparse deep learning 稀疏深度学习的统计保障
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-01-24 DOI: 10.1007/s10182-022-00467-3
Johannes Lederer

Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by developing statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and (ell_{2})-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.

神经网络在应用中越来越受欢迎,但我们对其潜力和局限性的数学理解仍然有限。在本文中,我们通过开发稀疏深度学习的统计保证,进一步加深了对这一问题的理解。与之前的工作不同,我们考虑了不同类型的稀疏性,如很少的活动连接、很少的活动节点以及其他基于规范的稀疏性类型。此外,我们的理论还涵盖了以往理论所忽略的重要方面,如多重输出、正则化和(ell_{2})损失。这些保证对网络宽度和深度有温和的依赖性,这意味着它们支持从统计学角度应用稀疏但宽而深的网络。我们在推导中使用的一些概念和工具在深度学习中并不常见,因此可能会引起额外的兴趣。
{"title":"Statistical guarantees for sparse deep learning","authors":"Johannes Lederer","doi":"10.1007/s10182-022-00467-3","DOIUrl":"10.1007/s10182-022-00467-3","url":null,"abstract":"<div><p>Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by developing statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and <span>(ell_{2})</span>-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 2","pages":"231 - 258"},"PeriodicalIF":1.4,"publicationDate":"2023-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00467-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136118419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing non-normality in multivariate analysis using the t-distribution 利用t分布解决多元分析中的非正态性
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-01-21 DOI: 10.1007/s10182-022-00468-2
Felipe Osorio, Manuel Galea, Claudio Henríquez, Reinaldo Arellano-Valle

The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate t-distributions. Assuming second moment existence, we consider a reparameterized version of the usual t distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.

本文的主要目的是提出一套评估非正态性的工具,同时考虑到多元 t 分布的类别。假定第二矩存在,我们考虑了通常 t 分布的重参数化版本,从而使尺度矩阵与分布的协方差矩阵重合。我们使用局部影响程序和库尔贝克-莱布勒发散度量,提出了评估正态性假设偏差的定量方法。此外,我们还探讨了由于偏斜和重尾的存在而可能导致的非正态性。我们基于两个真实数据集的研究结果通过模拟研究得到了补充,以评估所提出的方法在有限样本上的性能。
{"title":"Addressing non-normality in multivariate analysis using the t-distribution","authors":"Felipe Osorio,&nbsp;Manuel Galea,&nbsp;Claudio Henríquez,&nbsp;Reinaldo Arellano-Valle","doi":"10.1007/s10182-022-00468-2","DOIUrl":"10.1007/s10182-022-00468-2","url":null,"abstract":"<div><p>The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate <i>t</i>-distributions. Assuming second moment existence, we consider a reparameterized version of the usual <i>t</i> distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"785 - 813"},"PeriodicalIF":1.4,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46365758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian ridge regression for survival data based on a vine copula-based prior 基于vine copula先验的生存数据贝叶斯脊回归
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-12-30 DOI: 10.1007/s10182-022-00466-4
Hirofumi Michimae, Takeshi Emura

Ridge regression estimators can be interpreted as a Bayesian posterior mean (or mode) when the regression coefficients follow multivariate normal prior. However, the multivariate normal prior may not give efficient posterior estimates for regression coefficients, especially in the presence of interaction terms. In this paper, the vine copula-based priors are proposed for Bayesian ridge estimators under the Cox proportional hazards model. The semiparametric Cox models are built on the posterior density under two likelihoods: Cox’s partial likelihood and the full likelihood under the gamma process prior. The simulations show that the full likelihood is generally more efficient and stable for estimating regression coefficients than the partial likelihood. We also show via simulations and a data example that the Archimedean copula priors (the Clayton and Gumbel copula) are superior to the multivariate normal prior and the Gaussian copula prior.

当回归系数遵循多元正态先验时,岭回归估计值可解释为贝叶斯后验均值(或模式)。然而,多元正态先验可能无法给出有效的回归系数后验估计值,尤其是在存在交互项的情况下。本文针对 Cox 比例危险模型下的贝叶斯脊估计器提出了基于藤状协方差的先验。半参数 Cox 模型建立在两种似然下的后验密度上:Cox 部分似然和伽玛过程先验下的完全似然。模拟结果表明,在估计回归系数时,完全似然通常比部分似然更有效、更稳定。我们还通过模拟和一个数据示例表明,阿基米德协程先验(克莱顿协程和 Gumbel 协程)优于多元正态先验和高斯协程先验。
{"title":"Bayesian ridge regression for survival data based on a vine copula-based prior","authors":"Hirofumi Michimae,&nbsp;Takeshi Emura","doi":"10.1007/s10182-022-00466-4","DOIUrl":"10.1007/s10182-022-00466-4","url":null,"abstract":"<div><p>Ridge regression estimators can be interpreted as a Bayesian posterior mean (or mode) when the regression coefficients follow multivariate normal prior. However, the multivariate normal prior may not give efficient posterior estimates for regression coefficients, especially in the presence of interaction terms. In this paper, the vine copula-based priors are proposed for Bayesian ridge estimators under the Cox proportional hazards model. The semiparametric Cox models are built on the posterior density under two likelihoods: Cox’s partial likelihood and the full likelihood under the gamma process prior. The simulations show that the full likelihood is generally more efficient and stable for estimating regression coefficients than the partial likelihood. We also show via simulations and a data example that the Archimedean copula priors (the Clayton and Gumbel copula) are superior to the multivariate normal prior and the Gaussian copula prior.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 4","pages":"755 - 784"},"PeriodicalIF":1.4,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47123911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Asta-Advances in Statistical Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1