首页 > 最新文献

Journal of the Royal Statistical Society Series C-Applied Statistics最新文献

英文 中文
Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2 SMC2对广谱β -内酰胺酶大肠杆菌和肺炎克雷伯菌数据的推断
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-24 DOI: 10.1093/jrsssc/qlad055
L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell
We propose a novel stochastic model for the spread of antimicrobial-resistant bacteria in a population, together with an efficient algorithm for fitting such a model to sample data. We introduce an individual-based model for the epidemic, with the state of the model determining which individuals are colonised by the bacteria. The transmission rate of the epidemic takes into account both individuals’ locations, individuals’ covariates, seasonality, and environmental effects. The state of our model is only partially observed, with data consisting of test results from individuals from a sample of households. Fitting our model to data is challenging due to the large state space of our model. We develop an efficient SMC2 algorithm to estimate parameters and compare models for the transmission rate. We implement this algorithm in a computationally efficient manner by using the scale invariance properties of the underlying epidemic model. Our motivating application focuses on the dynamics of community-acquired extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae, using data collected as part of the Drivers of Resistance in Uganda and Malawi project. We infer the parameters of the model and learn key epidemic quantities such as the effective reproduction number, spatial distribution of prevalence, household cluster dynamics, and seasonality.
我们提出了一种新的抗菌素耐药细菌在种群中传播的随机模型,以及一种有效的算法来拟合这种模型到样本数据。我们引入了一个基于个体的流行病模型,模型的状态决定了哪些个体被细菌定植。流行病的传播率考虑到个人的位置、个人的协变量、季节性和环境影响。我们的模型状态仅被部分观察到,数据由来自家庭样本的个人的测试结果组成。由于我们模型的大状态空间,将我们的模型拟合到数据是具有挑战性的。我们开发了一种有效的SMC2算法来估计传输速率的参数和比较模型。我们利用底层流行病模型的尺度不变性,以一种计算效率高的方式实现了该算法。我们的激励应用侧重于社区获得的产生广谱β -内酰胺酶的大肠杆菌和肺炎克雷伯菌的动态,使用作为乌干达和马拉维耐药驱动因素项目的一部分收集的数据。我们推断模型的参数,并了解关键的流行病数量,如有效繁殖数,流行的空间分布,家庭集群动态和季节性。
{"title":"Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2","authors":"L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell","doi":"10.1093/jrsssc/qlad055","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad055","url":null,"abstract":"\u0000 We propose a novel stochastic model for the spread of antimicrobial-resistant bacteria in a population, together with an efficient algorithm for fitting such a model to sample data. We introduce an individual-based model for the epidemic, with the state of the model determining which individuals are colonised by the bacteria. The transmission rate of the epidemic takes into account both individuals’ locations, individuals’ covariates, seasonality, and environmental effects. The state of our model is only partially observed, with data consisting of test results from individuals from a sample of households. Fitting our model to data is challenging due to the large state space of our model. We develop an efficient SMC2 algorithm to estimate parameters and compare models for the transmission rate. We implement this algorithm in a computationally efficient manner by using the scale invariance properties of the underlying epidemic model. Our motivating application focuses on the dynamics of community-acquired extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae, using data collected as part of the Drivers of Resistance in Uganda and Malawi project. We infer the parameters of the model and learn key epidemic quantities such as the effective reproduction number, spatial distribution of prevalence, household cluster dynamics, and seasonality.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"5 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86122983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods 使用Christofides广义随机响应设计和贝叶斯方法调查敏感属性与随机变量的关联
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-16 DOI: 10.1111/rssc.12585
Shen-Ming Lee, Truong-Nhat Le, Phuoc-Loc Tran, Chin-Shang Li

In empirical studies involving sensitive topics, in addition to the problem of estimating the population proportion with a sensitive characteristic, a question arises as to whether or not there is heterogeneity in the distribution of an auxiliary random variable representing the information of subjects collected from a sensitive group and a non-sensitive group. That is, it is of interest to investigate the influence of sensitive attribute on the auxiliary random variable of interest. Finite mixture models are utilised to evaluate the association. A proposed Bayesian method through data augmentation and Markov chain Monte Carlo is applied to estimate unknown parameters of interest. Deviance information criterion and marginal likelihood are employed to select a suitable model to describe the association of the sensitive characteristic with the auxiliary random variable. Simulation and real data studies are conducted to assess the performance of and illustrate applications of the proposed methodology.

在涉及敏感话题的实证研究中,除了估计具有敏感特征的总体比例的问题外,还存在一个问题,即代表从敏感组和非敏感组收集的受试者信息的辅助随机变量的分布是否存在异质性。也就是说,研究敏感属性对感兴趣的辅助随机变量的影响是有意义的。有限混合模型被用来评估这种关联。提出了一种通过数据扩充和马尔可夫链蒙特卡罗的贝叶斯方法来估计感兴趣的未知参数。利用偏差信息准则和边际似然选择合适的模型来描述敏感特征与辅助随机变量的关联。模拟和真实数据研究进行了评估性能和说明所提出的方法的应用。
{"title":"Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods","authors":"Shen-Ming Lee,&nbsp;Truong-Nhat Le,&nbsp;Phuoc-Loc Tran,&nbsp;Chin-Shang Li","doi":"10.1111/rssc.12585","DOIUrl":"10.1111/rssc.12585","url":null,"abstract":"<p>In empirical studies involving sensitive topics, in addition to the problem of estimating the population proportion with a sensitive characteristic, a question arises as to whether or not there is heterogeneity in the distribution of an auxiliary random variable representing the information of subjects collected from a sensitive group and a non-sensitive group. That is, it is of interest to investigate the influence of sensitive attribute on the auxiliary random variable of interest. Finite mixture models are utilised to evaluate the association. A proposed Bayesian method through data augmentation and Markov chain Monte Carlo is applied to estimate unknown parameters of interest. Deviance information criterion and marginal likelihood are employed to select a suitable model to describe the association of the sensitive characteristic with the auxiliary random variable. Simulation and real data studies are conducted to assess the performance of and illustrate applications of the proposed methodology.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1471-1502"},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88884368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical integration of heterogeneous omics data: Probabilistic two-way partial least squares (PO2PLS) 异构组学数据的统计集成:概率双向偏最小二乘(PO2PLS)
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-16 DOI: 10.1111/rssc.12583
Said el Bouhaddani, Hae-Won Uh, Geurt Jongbloed, Jeanine Houwing-Duistermaat

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS.

多组学数据的可用性通过创建集成系统级方法的途径,彻底改变了生命科学。数据集成将跨数据集的信息链接起来,以更好地理解潜在的生物过程。然而,高维性、相关性和异质性给统计和计算带来了挑战。我们提出了一个通用框架,概率双向偏最小二乘(PO2PLS),以解决这些挑战。PO2PLS使用联合和数据特定的潜在变量对两个数据集之间的关系进行建模。对于参数的极大似然估计,我们提出了一种新的快速EM算法,并证明了估计量是渐近正态分布的。针对高维数据集之间的关系,提出了一种全局检验方法,并推导了其渐近分布。值得注意的是,现有的一些数据集成方法是PO2PLS的特殊情况。通过大量的仿真,我们证明了PO2PLS在特征选择和预测性能方面优于替代方案。此外,当样本量足够大时,渐近分布似乎成立。我们用两个常用研究设计的例子来说明PO2PLS:一个大人群队列研究和一个小病例对照研究。除了恢复已知的关系,PO2PLS还发现了新的发现。这些方法在我们的r包PO2PLS中实现。
{"title":"Statistical integration of heterogeneous omics data: Probabilistic two-way partial least squares (PO2PLS)","authors":"Said el Bouhaddani,&nbsp;Hae-Won Uh,&nbsp;Geurt Jongbloed,&nbsp;Jeanine Houwing-Duistermaat","doi":"10.1111/rssc.12583","DOIUrl":"10.1111/rssc.12583","url":null,"abstract":"<p>The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package <i>PO2PLS</i>.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1451-1470"},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12583","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74773208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modelling time-varying rankings with autoregressive and score-driven dynamics 用自回归和分数驱动的动态建模时变排名
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-02 DOI: 10.1111/rssc.12584
Vladimír Holý, Jan Zouhar

We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via the maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett–Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilise the conditional score in the fashion of the generalised autoregressive score models. Simulation experiments show that the small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. In an empirical study, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys and non-parametric efficiency analysis.

我们建立了一个新的统计模型来分析时变的排名数据。该模型可用于大量的排名项目,适应外生时变协变量和部分排名,并通过最大似然以一种简单的方式进行估计。排名使用Plackett-Luce分布建模,随时间变化的价值参数遵循均值回归的时间序列过程。为了捕捉价值参数对过去排名的依赖性,我们以广义自回归分数模型的方式利用条件分数。仿真实验表明,最大似然估计量的小样本特性随着时间序列的长度而迅速改善,这表明即使对于中等样本,依靠传统的基于hessian标准误差的统计推断也是可用的。在实证研究中,我们将该模型应用于冰球世界锦标赛的结果。我们还讨论了基于基础指数、重复调查和非参数效率分析的排名应用。
{"title":"Modelling time-varying rankings with autoregressive and score-driven dynamics","authors":"Vladimír Holý,&nbsp;Jan Zouhar","doi":"10.1111/rssc.12584","DOIUrl":"10.1111/rssc.12584","url":null,"abstract":"<p>We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via the maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett–Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilise the conditional score in the fashion of the generalised autoregressive score models. Simulation experiments show that the small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. In an empirical study, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys and non-parametric efficiency analysis.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1427-1450"},"PeriodicalIF":1.6,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83166449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Network Hawkes process models for exploring latent hierarchy in social animal interactions 探索动物社会互动中潜在等级的网络Hawkes过程模型
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-28 DOI: 10.1111/rssc.12581
Owen G. Ward, Jing Wu, Tian Zheng, Anna L. Smith, James P. Curley

Group-based social dominance hierarchies are of essential interest in understanding social structure (DeDeo & Hobson in, Proceedings of the National Academy of Sciences 118(21), 2021). Recent animal behaviour research studies can record aggressive interactions observed over time. Models that can explore the underlying hierarchy from the observed temporal dynamics in behaviours are therefore crucial. Traditional ranking methods aggregate interactions across time into win/loss counts, equalizing dynamic interactions with the underlying hierarchy. Although these models have gleaned important behavioural insights from such data, they are limited in addressing many important questions that remain unresolved. In this paper, we take advantage of the observed interactions' timestamps, proposing a series of network point process models with latent ranks. We carefully design these models to incorporate important theories on animal behaviour that account for dynamic patterns observed in the interaction data, including the winner effect, bursting and pair-flip phenomena. Through iteratively constructing and evaluating these models we arrive at the final cohort Markov-modulated Hawkes process (C-MMHP), which best characterizes all aforementioned patterns observed in interaction data. As such, inference on our model components can be readily interpreted in terms of theories on animal behaviours. The probabilistic nature of our model allows us to estimate the uncertainty in our ranking. In particular, our model is able to provide insights into the distribution of power within the hierarchy which forms and the strength of the established hierarchy. We compare all models using simulated and real data. Using statistically developed diagnostic perspectives, we demonstrate that the C-MMHP model outperforms other methods, capturing relevant latent ranking structures that lead to meaningful predictions for real data.

基于群体的社会支配等级对于理解社会结构至关重要(DeDeo &Hobson,《美国国家科学院院刊》118(21),2021)。最近的动物行为研究可以记录长期观察到的攻击性相互作用。因此,能够从观察到的行为时间动态中探索潜在层次的模型是至关重要的。传统的排名方法将不同时间的交互聚合成输赢计数,平衡了与底层层次的动态交互。尽管这些模型从这些数据中收集了重要的行为见解,但它们在解决许多尚未解决的重要问题方面是有限的。在本文中,我们利用观察到的相互作用的时间戳,提出了一系列具有潜在等级的网络点过程模型。我们精心设计了这些模型,以纳入动物行为的重要理论,这些理论解释了在相互作用数据中观察到的动态模式,包括赢家效应、破裂和成对翻转现象。通过迭代构建和评估这些模型,我们得到了最终的队列马尔可夫调制Hawkes过程(C-MMHP),它最好地表征了上述在相互作用数据中观察到的所有模式。因此,对我们模型成分的推断可以很容易地用动物行为理论来解释。我们模型的概率性质使我们能够估计排名中的不确定性。特别是,我们的模型能够提供对形成的层次结构中的权力分配和已建立的层次结构的强度的见解。我们使用模拟数据和真实数据对所有模型进行了比较。使用统计发展的诊断视角,我们证明C-MMHP模型优于其他方法,捕获相关的潜在排名结构,从而对真实数据进行有意义的预测。
{"title":"Network Hawkes process models for exploring latent hierarchy in social animal interactions","authors":"Owen G. Ward,&nbsp;Jing Wu,&nbsp;Tian Zheng,&nbsp;Anna L. Smith,&nbsp;James P. Curley","doi":"10.1111/rssc.12581","DOIUrl":"10.1111/rssc.12581","url":null,"abstract":"<p>Group-based social dominance hierarchies are of essential interest in understanding social structure (DeDeo &amp; Hobson in, Proceedings of the National Academy of Sciences 118(21), 2021). Recent animal behaviour research studies can record aggressive interactions observed over time. Models that can explore the underlying hierarchy from the observed temporal dynamics in behaviours are therefore crucial. Traditional ranking methods aggregate interactions across time into win/loss counts, equalizing dynamic interactions with the underlying hierarchy. Although these models have gleaned important behavioural insights from such data, they are limited in addressing many important questions that remain unresolved. In this paper, we take advantage of the observed interactions' timestamps, proposing a series of network point process models with latent ranks. We carefully design these models to incorporate important theories on animal behaviour that account for dynamic patterns observed in the interaction data, including the winner effect, bursting and pair-flip phenomena. Through iteratively constructing and evaluating these models we arrive at the final cohort Markov-modulated Hawkes process (C-MMHP), which best characterizes all aforementioned patterns observed in interaction data. As such, inference on our model components can be readily interpreted in terms of theories on animal behaviours. The probabilistic nature of our model allows us to estimate the uncertainty in our ranking. In particular, our model is able to provide insights into the distribution of power within the hierarchy which forms and the strength of the established hierarchy. We compare all models using simulated and real data. Using statistically developed diagnostic perspectives, we demonstrate that the C-MMHP model outperforms other methods, capturing relevant latent ranking structures that lead to meaningful predictions for real data.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1402-1426"},"PeriodicalIF":1.6,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82071302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust correspondence analysis 鲁棒对应分析
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-27 DOI: 10.1111/rssc.12580
Marco Riani, Anthony C. Atkinson, Francesca Torti, Aldo Corbellini

Correspondence analysis is a method for the visual display of information from two-way contingency tables. We introduce a robust form of correspondence analysis based on minimum covariance determinant estimation. This leads to the systematic deletion of outlying rows of the table and to plots of greatly increased informativeness. Our examples are trade flows of clothes and consumer evaluations of the perceived properties of cars. The robust method requires that a specified proportion of the data be used in fitting. To accommodate this requirement we provide an algorithm that uses a subset of complete rows and one row partially, both sets of rows being chosen robustly. We prove the convergence of this algorithm.

对应分析是一种直观显示双向列联表信息的方法。我们介绍了一种基于最小协方差行列式估计的稳健的对应分析形式。这导致系统地删除了表格的外围行,并大大增加了信息量。我们的例子是服装的贸易流动和消费者对汽车感知特性的评估。鲁棒方法要求在拟合中使用一定比例的数据。为了满足这一需求,我们提供了一种算法,该算法使用完整行的子集和部分行的子集,这两组行都被健壮地选择。证明了该算法的收敛性。
{"title":"Robust correspondence analysis","authors":"Marco Riani,&nbsp;Anthony C. Atkinson,&nbsp;Francesca Torti,&nbsp;Aldo Corbellini","doi":"10.1111/rssc.12580","DOIUrl":"10.1111/rssc.12580","url":null,"abstract":"<p>Correspondence analysis is a method for the visual display of information from two-way contingency tables. We introduce a robust form of correspondence analysis based on minimum covariance determinant estimation. This leads to the systematic deletion of outlying rows of the table and to plots of greatly increased informativeness. Our examples are trade flows of clothes and consumer evaluations of the perceived properties of cars. The robust method requires that a specified proportion of the data be used in fitting. To accommodate this requirement we provide an algorithm that uses a subset of complete rows and one row partially, both sets of rows being chosen robustly. We prove the convergence of this algorithm.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1381-1401"},"PeriodicalIF":1.6,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82808130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spatiotemporal ETAS model with a renewal main-shock arrival process 具有更新主震到达过程的时空ETAS模型
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-26 DOI: 10.1111/rssc.12579
Tom Stindl, Feng Chen

We propose a spatiotemporal point process model that enhances the classical Epidemic-Type Aftershock Sequence (ETAS) model. This is achieved with the introduction of a renewal main-shock arrival process and we call this extension the renewal ETAS (RETAS) model. This modification is similar in spirit to the renewal Hawkes (RHawkes) process but the conditional intensity process supports a spatial component. It empowers the main-shock intensity to reset upon the arrival of main-shocks. This allows for heavier clustering of main-shocks than the classical spatiotemporal ETAS model. We introduce a likelihood evaluation algorithm for parameter estimation and provide a novel procedure to evaluate the fitted model's goodness-of-fit (GOF) based on a sequential application of the Rosenblatt transformation. A simulation algorithm for the RETAS model is outlined and used to validate the numerical performance of the likelihood evaluation algorithm and GOF test procedure. We illustrate the proposed model and methods on various earthquake catalogues around the world each with distinctly different seismic activity. These catalogues demonstrate the RETAS model's additional flexibility in comparison to the classical spatiotemporal ETAS model and emphasizes the potential for superior modelling and forecasting of seismicity.

提出了一个时空点过程模型,对经典的流行型余震序列(ETAS)模型进行了改进。这是通过引入更新主震到达过程来实现的,我们称之为更新ETAS (RETAS)模型。这种修改在精神上类似于更新Hawkes (RHawkes)过程,但条件强度过程支持空间组件。它使主震强度在主震到达时重新设定。与经典的时空ETAS模型相比,这允许更重的主震聚集。我们引入了一种用于参数估计的似然评估算法,并基于Rosenblatt变换的顺序应用,提出了一种新的方法来评估拟合模型的拟合优度(GOF)。提出了RETAS模型的仿真算法,并利用该算法验证了似然评估算法和GOF测试程序的数值性能。我们在世界各地不同的地震目录上说明了所提出的模型和方法,每个地震都有明显不同的地震活动。这些目录表明,与经典的时空ETAS模型相比,RETAS模型具有额外的灵活性,并强调了在地震活动建模和预测方面的卓越潜力。
{"title":"Spatiotemporal ETAS model with a renewal main-shock arrival process","authors":"Tom Stindl,&nbsp;Feng Chen","doi":"10.1111/rssc.12579","DOIUrl":"10.1111/rssc.12579","url":null,"abstract":"<p>We propose a spatiotemporal point process model that enhances the classical Epidemic-Type Aftershock Sequence (ETAS) model. This is achieved with the introduction of a renewal main-shock arrival process and we call this extension the renewal ETAS (RETAS) model. This modification is similar in spirit to the renewal Hawkes (RHawkes) process but the conditional intensity process supports a spatial component. It empowers the main-shock intensity to reset upon the arrival of main-shocks. This allows for heavier clustering of main-shocks than the classical spatiotemporal ETAS model. We introduce a likelihood evaluation algorithm for parameter estimation and provide a novel procedure to evaluate the fitted model's goodness-of-fit (GOF) based on a sequential application of the Rosenblatt transformation. A simulation algorithm for the RETAS model is outlined and used to validate the numerical performance of the likelihood evaluation algorithm and GOF test procedure. We illustrate the proposed model and methods on various earthquake catalogues around the world each with distinctly different seismic activity. These catalogues demonstrate the RETAS model's additional flexibility in comparison to the classical spatiotemporal ETAS model and emphasizes the potential for superior modelling and forecasting of seismicity.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1356-1380"},"PeriodicalIF":1.6,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79644802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Specification analysis for technology use and teenager well-being: Statistical validity and a Bayesian proposal 技术使用和青少年幸福感的规范分析:统计效度和贝叶斯建议
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-13 DOI: 10.1111/rssc.12578
Christoph Semken, David Rossell
A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask important treatment effect heterogeneity. As our motivating application, it led an influential study to conclude there is no relevant association between technology use and teenager mental well‐being. We discuss these issues and propose a strategy for valid inference. Bayesian Specification Curve Analysis (BSCA) uses Bayesian Model Averaging to incorporate covariates and heterogeneous effects across treatments, outcomes and subpopulations. BSCA gives significantly different insights into teenager well‐being, revealing that the association with technology differs by device, gender and who assesses well‐being (teenagers or their parents).
科学中的一个关键问题是评估数据分析选择的稳健性,同时避免选择性报告和提供有效推断。规格曲线分析是一种旨在防止选择性报告的工具。唉,当用于推理时,由于错误地调整协变量,它可能会产生严重的偏差和假阳性,并掩盖重要的治疗效果异质性。作为我们的激励应用,它导致了一项有影响力的研究,得出科技使用与青少年心理健康之间没有相关联系的结论。我们讨论了这些问题,并提出了有效推理的策略。贝叶斯规格曲线分析(BSCA)使用贝叶斯模型平均来合并协变量和跨治疗、结局和亚群体的异质效应。BSCA对青少年幸福感给出了显著不同的见解,揭示了与技术的关联因设备、性别和评估幸福感的人(青少年或他们的父母)而异。
{"title":"Specification analysis for technology use and teenager well-being: Statistical validity and a Bayesian proposal","authors":"Christoph Semken,&nbsp;David Rossell","doi":"10.1111/rssc.12578","DOIUrl":"10.1111/rssc.12578","url":null,"abstract":"A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask important treatment effect heterogeneity. As our motivating application, it led an influential study to conclude there is no relevant association between technology use and teenager mental well‐being. We discuss these issues and propose a strategy for valid inference. Bayesian Specification Curve Analysis (BSCA) uses Bayesian Model Averaging to incorporate covariates and heterogeneous effects across treatments, outcomes and subpopulations. BSCA gives significantly different insights into teenager well‐being, revealing that the association with technology differs by device, gender and who assesses well‐being (teenagers or their parents).","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1330-1355"},"PeriodicalIF":1.6,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12578","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83476610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Missing data patterns in runners’ careers: do they matter? 跑步者职业生涯中缺失的数据模式:它们重要吗?
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-06-25 DOI: 10.1093/jrsssc/qlad009
M. Stival, M. Bernardi, Manuela Cattelan, P. Dellaportas
Predicting the future performance of young runners is an important research issue in experimental sports science and performance analysis. We analyse a dataset with annual seasonal best performances of male middle distance runners for a period of 14 years and provide a modelling framework that accounts for both the fact that each runner has typically run in 3 distance events (800, 1,500, and 5,000 m) and the presence of periods of no running activities. We propose a latent class matrix-variate state space model and we empirically demonstrate that accounting for missing data patterns in runners’ careers improves the out of sample prediction of their performances over time. In particular, we demonstrate that for this analysis, the missing data patterns provide valuable information for the prediction of runner’s performance.
预测年轻运动员的未来成绩是实验运动科学和成绩分析中的一个重要研究课题。我们分析了14年来男性中长跑运动员年度季节性最佳表现的数据集,并提供了一个建模框架,该框架考虑了每个运动员通常参加3个长跑项目(800米、1500米和5000米)以及没有跑步活动的时期的存在。我们提出了一个潜在类矩阵-变量状态空间模型,并通过经验证明,考虑跑步者职业生涯中缺失的数据模式,可以提高对其表现的样本外预测。特别地,我们证明了在这个分析中,缺失的数据模式为预测跑步者的表现提供了有价值的信息。
{"title":"Missing data patterns in runners’ careers: do they matter?","authors":"M. Stival, M. Bernardi, Manuela Cattelan, P. Dellaportas","doi":"10.1093/jrsssc/qlad009","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad009","url":null,"abstract":"\u0000 Predicting the future performance of young runners is an important research issue in experimental sports science and performance analysis. We analyse a dataset with annual seasonal best performances of male middle distance runners for a period of 14 years and provide a modelling framework that accounts for both the fact that each runner has typically run in 3 distance events (800, 1,500, and 5,000 m) and the presence of periods of no running activities. We propose a latent class matrix-variate state space model and we empirically demonstrate that accounting for missing data patterns in runners’ careers improves the out of sample prediction of their performances over time. In particular, we demonstrate that for this analysis, the missing data patterns provide valuable information for the prediction of runner’s performance.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"6 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80455942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heterogeneous graphical model for non-negative and non-Gaussian PM 2.5 data 非负和非高斯pm2.5数据的异构图形模型
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-06-22 DOI: 10.1111/rssc.12575
Jiaqi Zhang, Xinyan Fan, Yang Li, Shuangge Ma

Studies on the conditional relationships between PM2.5 concentrations among different regions are of great interest for the joint prevention and control of air pollution. Because of seasonal changes in atmospheric conditions, spatial patterns of PM2.5 may differ throughout the year. Additionally, concentration data are both non-negative and non-Gaussian. These data features pose significant challenges to existing methods. This study proposes a heterogeneous graphical model for non-negative and non-Gaussian data via the score matching loss. The proposed method simultaneously clusters multiple datasets and estimates a graph for variables with complex properties in each cluster. Furthermore, our model involves a network that indicate similarity among datasets, and this network can have additional applications. In simulation studies, the proposed method outperforms competing alternatives in both clustering and edge identification. We also analyse the PM2.5 concentrations' spatial correlations in Taiwan's regions using data obtained in year 2019 from 67 air-quality monitoring stations. The 12 months are clustered into four groups: January–March, April, May–September and October–December, and the corresponding graphs have 153, 57, 86 and 167 edges respectively. The results show obvious seasonality, which is consistent with the meteorological literature. Geographically, the PM2.5 concentrations of north and south Taiwan regions correlate more respectively. These results can provide valuable information for developing joint air-quality control strategies.

研究不同区域间pm2.5浓度的条件关系对大气污染联防联控具有重要意义。由于大气条件的季节性变化,pm2.5的空间分布在全年可能有所不同。此外,浓度数据是非负的和非高斯的。这些数据特征对现有方法提出了重大挑战。本研究提出了一种基于分数匹配损失的非负和非高斯数据的异构图形模型。该方法同时对多个数据集进行聚类,并对每个聚类中具有复杂属性的变量进行图估计。此外,我们的模型涉及一个网络,表明数据集之间的相似性,这个网络可以有额外的应用。在仿真研究中,该方法在聚类和边缘识别方面都优于竞争方案。我们还利用2019年67个空气质量监测站的数据分析了台湾地区pm2.5浓度的空间相关性。将12个月聚为1 - 3月、4月、5 - 9月和10 - 12月四组,对应的图分别有153条、57条、86条和167条边。结果显示出明显的季节性,这与气象文献一致。在地理上,台湾北部和南部地区的pm2.5浓度相关性更强。这些结果可以为制定联合空气质量控制策略提供有价值的信息。
{"title":"Heterogeneous graphical model for non-negative and non-Gaussian \u0000 \u0000 \u0000 PM\u0000 2.5\u0000 \u0000 data","authors":"Jiaqi Zhang,&nbsp;Xinyan Fan,&nbsp;Yang Li,&nbsp;Shuangge Ma","doi":"10.1111/rssc.12575","DOIUrl":"10.1111/rssc.12575","url":null,"abstract":"<p>Studies on the conditional relationships between \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations among different regions are of great interest for the joint prevention and control of air pollution. Because of seasonal changes in atmospheric conditions, spatial patterns of \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> may differ throughout the year. Additionally, concentration data are both non-negative and non-Gaussian. These data features pose significant challenges to existing methods. This study proposes a heterogeneous graphical model for non-negative and non-Gaussian data via the score matching loss. The proposed method simultaneously clusters multiple datasets and estimates a graph for variables with complex properties in each cluster. Furthermore, our model involves a network that indicate similarity among datasets, and this network can have additional applications. In simulation studies, the proposed method outperforms competing alternatives in both clustering and edge identification. We also analyse the \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations' spatial correlations in Taiwan's regions using data obtained in year 2019 from 67 air-quality monitoring stations. The 12 months are clustered into four groups: January–March, April, May–September and October–December, and the corresponding graphs have 153, 57, 86 and 167 edges respectively. The results show obvious seasonality, which is consistent with the meteorological literature. Geographically, the \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations of north and south Taiwan regions correlate more respectively. These results can provide valuable information for developing joint air-quality control strategies.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1303-1329"},"PeriodicalIF":1.6,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82229817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of the Royal Statistical Society Series C-Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1