首页 > 最新文献

Journal of the Royal Statistical Society Series C-Applied Statistics最新文献

英文 中文
Semi-parametric time-to-event modelling of lengths of hospital stays 住院时间长度的半参数时间-事件模型
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-09-15 DOI: 10.1111/rssc.12593
Yang Li, Hao Liu, Xiaoshen Wang, Wanzhu Tu

Length of stay (LOS) is an essential metric for the quality of hospital care. Published works on LOS analysis have primarily focused on skewed LOS distributions and the influences of patient diagnostic characteristics. Few authors have considered the events that terminate a hospital stay: Both successful discharge and death could end a hospital stay but with completely different implications. Modelling the time to the first occurrence of discharge or death obscures the true nature of LOS. In this research, we propose a structure that simultaneously models the probabilities of discharge and death. The model has a flexible formulation that accounts for both additive and multiplicative effects of factors influencing the occurrence of death and discharge. We present asymptotic properties of the parameter estimates so that valid inference can be performed for the parametric as well as nonparametric model components. Simulation studies confirmed the good finite-sample performance of the proposed method. As the research is motivated by practical issues encountered in LOS analysis, we analysed data from two real clinical studies to showcase the general applicability of the proposed model.

住院时间(LOS)是衡量医院护理质量的重要指标。已发表的LOS分析作品主要集中在斜斜的LOS分布和患者诊断特征的影响。很少有作者考虑到终止住院的事件:成功出院和死亡都可能结束住院,但具有完全不同的含义。模拟第一次放电或死亡的时间模糊了LOS的真实性质。在本研究中,我们提出了一个同时模拟放电和死亡概率的结构。该模型具有灵活的公式,可以考虑影响死亡和放电发生的因素的加性和乘法效应。我们给出了参数估计的渐近性质,从而可以对参数和非参数模型分量进行有效的推理。仿真研究证实了该方法具有良好的有限样本性能。由于研究的动机是LOS分析中遇到的实际问题,我们分析了两个真实临床研究的数据,以展示所提出模型的一般适用性。
{"title":"Semi-parametric time-to-event modelling of lengths of hospital stays","authors":"Yang Li,&nbsp;Hao Liu,&nbsp;Xiaoshen Wang,&nbsp;Wanzhu Tu","doi":"10.1111/rssc.12593","DOIUrl":"10.1111/rssc.12593","url":null,"abstract":"<p>Length of stay (LOS) is an essential metric for the quality of hospital care. Published works on LOS analysis have primarily focused on skewed LOS distributions and the influences of patient diagnostic characteristics. Few authors have considered the events that terminate a hospital stay: Both successful discharge and death could end a hospital stay but with completely different implications. Modelling the time to the first occurrence of discharge or death obscures the true nature of LOS. In this research, we propose a structure that simultaneously models the probabilities of discharge and death. The model has a flexible formulation that accounts for both additive and multiplicative effects of factors influencing the occurrence of death and discharge. We present asymptotic properties of the parameter estimates so that valid inference can be performed for the parametric as well as nonparametric model components. Simulation studies confirmed the good finite-sample performance of the proposed method. As the research is motivated by practical issues encountered in LOS analysis, we analysed data from two real clinical studies to showcase the general applicability of the proposed model.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/b9/RSSC-71-1623.PMC9826400.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10525190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utility-based Bayesian personalized treatment selection for advanced breast cancer 基于效用的晚期乳腺癌贝叶斯个性化治疗选择
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-09-09 DOI: 10.1111/rssc.12582
Juhee Lee, Peter F. Thall, Bora Lim, Pavlos Msaouel

A Bayesian method is proposed for personalized treatment selection in settings where data are available from a randomized clinical trial with two or more outcomes. The motivating application is a randomized trial that compared letrozole plus bevacizumab to letrozole alone as first-line therapy for hormone receptor-positive advanced breast cancer. The combination treatment arm had larger median progression-free survival time, but also a higher rate of severe toxicities. This suggests that the risk-benefit trade-off between these two outcomes should play a central role in selecting each patient's treatment, particularly since older patients are less likely to tolerate severe toxicities. To quantify the desirability of each possible outcome combination for an individual patient, we elicited from breast cancer oncologists a utility function that varied with age. The utility was used as an explicit criterion for quantifying risk-benefit trade-offs when making personalized treatment selections. A Bayesian nonparametric multivariate regression model with a dependent Dirichlet process prior was fit to the trial data. Under the fitted model, a new patient's treatment can be selected based on the posterior predictive utility distribution. For the breast cancer trial dataset, the optimal treatment depends on the patient's age, with the combination preferable for patients 70 years or younger and the single agent preferable for patients older than 70.

在具有两个或多个结果的随机临床试验中,提出了一种贝叶斯方法,用于个性化治疗选择。激励应用是一项随机试验,比较来曲唑加贝伐单抗与来曲唑单独作为激素受体阳性晚期乳腺癌一线治疗。联合治疗组的中位无进展生存时间更长,但严重毒性发生率也更高。这表明,这两种结果之间的风险-收益权衡应该在选择每个患者的治疗方案时发挥核心作用,特别是因为老年患者不太可能耐受严重的毒性。为了量化每位患者的每种可能结果组合的可取性,我们从乳腺癌肿瘤学家那里获得了一个随年龄变化的效用函数。在做出个性化治疗选择时,效用被用作量化风险-收益权衡的明确标准。对试验数据拟合了一个具有相关Dirichlet过程先验的贝叶斯非参数多元回归模型。在拟合模型下,根据后验预测效用分布选择新患者的治疗方案。对于乳腺癌试验数据集,最佳治疗取决于患者的年龄,70岁或以下的患者优选联合用药,70岁以上的患者优选单药。
{"title":"Utility-based Bayesian personalized treatment selection for advanced breast cancer","authors":"Juhee Lee,&nbsp;Peter F. Thall,&nbsp;Bora Lim,&nbsp;Pavlos Msaouel","doi":"10.1111/rssc.12582","DOIUrl":"10.1111/rssc.12582","url":null,"abstract":"<p>A Bayesian method is proposed for personalized treatment selection in settings where data are available from a randomized clinical trial with two or more outcomes. The motivating application is a randomized trial that compared letrozole plus bevacizumab to letrozole alone as first-line therapy for hormone receptor-positive advanced breast cancer. The combination treatment arm had larger median progression-free survival time, but also a higher rate of severe toxicities. This suggests that the risk-benefit trade-off between these two outcomes should play a central role in selecting each patient's treatment, particularly since older patients are less likely to tolerate severe toxicities. To quantify the desirability of each possible outcome combination for an individual patient, we elicited from breast cancer oncologists a utility function that varied with age. The utility was used as an explicit criterion for quantifying risk-benefit trade-offs when making personalized treatment selections. A Bayesian nonparametric multivariate regression model with a dependent Dirichlet process prior was fit to the trial data. Under the fitted model, a new patient's treatment can be selected based on the posterior predictive utility distribution. For the breast cancer trial dataset, the optimal treatment depends on the patient's age, with the combination preferable for patients 70 years or younger and the single agent preferable for patients older than 70.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10116488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Measuring diachronic sense change: New models and Monte Carlo methods for Bayesian inference 测量历时感变化:贝叶斯推理的新模型和蒙特卡罗方法
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-09-06 DOI: 10.1111/rssc.12591
Schyan Zafar, Geoff K. Nicholls

In a bag-of-words model, the senses of a word with multiple meanings, for example ‘bank’ (used either in a river-bank or an institution sense), are represented as probability distributions over context words, and sense prevalence is represented as a probability distribution over senses. Both of these may change with time. Modelling and measuring this kind of sense change are challenging due to the typically high-dimensional parameter space and sparse datasets. A recently published corpus of ancient Greek texts contains expert-annotated sense labels for selected target words. Automatic sense-annotation for the word ‘kosmos’ (meaning decoration, order or world) has been used as a test case in recent work with related generative models and Monte Carlo methods. We adapt an existing generative sense change model to develop a simpler model for the main effects of sense and time, and give Markov Chain Monte Carlo methods for Bayesian inference on all these models that are more efficient than existing methods. We carry out automatic sense-annotation of snippets containing ‘kosmos’ using our model, and measure the time-evolution of its three senses and their prevalence. As far as we are aware, ours is the first analysis of this data, within the class of generative models we consider, that quantifies uncertainty and returns credible sets for evolving sense prevalence in good agreement with those given by expert annotation.

在词袋模型中,一个词的多个含义的意义,例如“bank”(用于河岸或机构意义),被表示为上下文词的概率分布,而意义流行度被表示为意义的概率分布。这两者都可能随着时间而改变。由于典型的高维参数空间和稀疏数据集,这种感觉变化的建模和测量具有挑战性。最近出版的古希腊文本语料库包含专家注释的意义标签为选定的目标词。单词“kosmos”(意为装饰、秩序或世界)的自动意义注释已被用作最近与相关生成模型和蒙特卡罗方法一起工作的测试用例。我们对现有的生成式感觉变化模型进行了改进,建立了一个更简单的模型来描述感觉和时间的主要影响,并给出了在所有这些模型上进行贝叶斯推理的马尔可夫链蒙特卡罗方法,该方法比现有方法更有效。我们使用我们的模型对包含“宇宙”的片段进行自动意义注释,并测量其三种意义的时间演化及其流行程度。据我们所知,在我们考虑的生成模型类别中,我们的分析是对这些数据的第一次分析,它量化了不确定性,并返回了与专家注释给出的一致的进化感觉流行度的可信集。
{"title":"Measuring diachronic sense change: New models and Monte Carlo methods for Bayesian inference","authors":"Schyan Zafar,&nbsp;Geoff K. Nicholls","doi":"10.1111/rssc.12591","DOIUrl":"10.1111/rssc.12591","url":null,"abstract":"<p>In a bag-of-words model, the <i>senses</i> of a word with multiple meanings, for example ‘bank’ (used either in a river-bank or an institution sense), are represented as probability distributions over context words, and sense prevalence is represented as a probability distribution over senses. Both of these may change with time. Modelling and measuring this kind of sense change are challenging due to the typically high-dimensional parameter space and sparse datasets. A recently published corpus of ancient Greek texts contains expert-annotated sense labels for selected target words. Automatic sense-annotation for the word ‘kosmos’ (meaning decoration, order or world) has been used as a test case in recent work with related generative models and Monte Carlo methods. We adapt an existing generative sense change model to develop a simpler model for the main effects of sense and time, and give Markov Chain Monte Carlo methods for Bayesian inference on all these models that are more efficient than existing methods. We carry out automatic sense-annotation of snippets containing ‘kosmos’ using our model, and measure the time-evolution of its three senses and their prevalence. As far as we are aware, ours is the first analysis of this data, within the class of generative models we consider, that quantifies uncertainty and returns credible sets for evolving sense prevalence in good agreement with those given by expert annotation.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82774142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Environmental Engel curves: A neural network approach 环境恩格尔曲线:一种神经网络方法
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-31 DOI: 10.1111/rssc.12588
Tullio Mancini, Hector Calvo-Pardo, Jose Olmo

Environmental Engel curves describe how households' income relates to the pollution associated with the services and goods consumed. This paper estimates these curves with neural networks using the novel dataset constructed in Levinson and O'Brien. We provide further statistical rigor to the empirical analysis by constructing prediction intervals obtained from novel neural network methods such as extra-neural nets and MC dropout. The application of these techniques for five different pollutants allow us to confirm statistically that Environmental Engel curves are upward sloping, have income elasticities smaller than one and shift down, becoming more concave, over time. Importantly, for the last year of the sample, we find an inverted U shape that suggests the existence of a maximum in pollution for medium-to-high levels of household income beyond which pollution flattens or decreases for top income earners.

环境恩格尔曲线描述了家庭收入与所消费的服务和商品相关的污染之间的关系。本文使用Levinson和O'Brien构建的新数据集用神经网络估计这些曲线。我们通过构建新的神经网络方法(如extra-neural networks和MC dropout)获得的预测区间,为实证分析提供进一步的统计严谨性。这些技术对五种不同污染物的应用使我们能够在统计上确认环境恩格尔曲线是向上倾斜的,收入弹性小于1,并且随着时间的推移向下移动,变得更加凹。重要的是,在样本的最后一年,我们发现了一个倒U形,表明中高收入家庭的污染存在一个最大值,超过这个最大值,高收入者的污染就会持平或减少。
{"title":"Environmental Engel curves: A neural network approach","authors":"Tullio Mancini,&nbsp;Hector Calvo-Pardo,&nbsp;Jose Olmo","doi":"10.1111/rssc.12588","DOIUrl":"10.1111/rssc.12588","url":null,"abstract":"<p>Environmental Engel curves describe how households' income relates to the pollution associated with the services and goods consumed. This paper estimates these curves with neural networks using the novel dataset constructed in Levinson and O'Brien. We provide further statistical rigor to the empirical analysis by constructing prediction intervals obtained from novel neural network methods such as extra-neural nets and MC dropout. The application of these techniques for five different pollutants allow us to confirm statistically that Environmental Engel curves are upward sloping, have income elasticities smaller than one and shift down, becoming more concave, over time. Importantly, for the last year of the sample, we find an inverted U shape that suggests the existence of a maximum in pollution for medium-to-high levels of household income beyond which pollution flattens or decreases for top income earners.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12588","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77934299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-parametric Bayesian covariate-dependent multivariate functional clustering: An application to time-series data for multiple air pollutants 非参数贝叶斯协变量相关多变量函数聚类:多空气污染物时间序列数据的应用
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-30 DOI: 10.1111/rssc.12589
Daewon Yang, Taeryon Choi, Eric Lavigne, Yeonseung Chung

Air pollution is a major threat to public health. Understanding the spatial distribution of air pollution concentration is of great interest to government or local authorities, as it informs about target areas for implementing policies for air quality management. Cluster analysis has been popularly used to identify groups of locations with similar profiles of average levels of multiple air pollutants, efficiently summarising the spatial pattern. This study aimed to cluster locations based on the seasonal patterns of multiple air pollutants incorporating the location-specific characteristics such as socio-economic indicators. For this purpose, we proposed a novel non-parametric Bayesian sparse latent factor model for covariate-dependent multivariate functional clustering. Furthermore, we extend this model to conduct clustering with temporal dependency. The proposed methods are illustrated through a simulation study and applied to time-series data for daily mean concentrations of ozone (O3$$ {mathrm{O}}_3 $$), nitrogen dioxide (NO2$$ mathrm{N}{mathrm{O}}_2 $$), and fine particulate matter (PM2.5$$ mathrm{P}{mathrm{M}}_{2.5} $$) collected for 25 cities in Canada in 1986–2015.

空气污染是对公众健康的重大威胁。了解空气污染浓度的空间分布对政府或地方当局非常有意义,因为它可以为实施空气质量管理政策的目标区域提供信息。聚类分析已被广泛用于识别具有多种空气污染物平均水平相似概况的地点组,有效地总结空间格局。本研究旨在根据多种空气污染物的季节性模式,结合社会经济指标等地点特定特征,对地点进行聚类。为此,我们提出了一种新的非参数贝叶斯稀疏潜因子模型,用于协变量相关的多元函数聚类。此外,我们将该模型扩展到具有时间依赖性的聚类。通过模拟研究说明了所提出的方法,并将其应用于臭氧日平均浓度的时间序列数据(o3 $$ {mathrm{O}}_3 $$)。二氧化氮(n2 $$ mathrm{N}{mathrm{O}}_2 $$);细颗粒物(pm2)。5 $$ mathrm{P}{mathrm{M}}_{2.5} $$),收集了1986-2015年加拿大25个城市的数据。
{"title":"Non-parametric Bayesian covariate-dependent multivariate functional clustering: An application to time-series data for multiple air pollutants","authors":"Daewon Yang,&nbsp;Taeryon Choi,&nbsp;Eric Lavigne,&nbsp;Yeonseung Chung","doi":"10.1111/rssc.12589","DOIUrl":"10.1111/rssc.12589","url":null,"abstract":"<p>Air pollution is a major threat to public health. Understanding the spatial distribution of air pollution concentration is of great interest to government or local authorities, as it informs about target areas for implementing policies for air quality management. Cluster analysis has been popularly used to identify groups of locations with similar profiles of average levels of multiple air pollutants, efficiently summarising the spatial pattern. This study aimed to cluster locations based on the seasonal patterns of multiple air pollutants incorporating the location-specific characteristics such as socio-economic indicators. For this purpose, we proposed a novel non-parametric Bayesian sparse latent factor model for covariate-dependent multivariate functional clustering. Furthermore, we extend this model to conduct clustering with temporal dependency. The proposed methods are illustrated through a simulation study and applied to time-series data for daily mean concentrations of ozone (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>O</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>3</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {mathrm{O}}_3 $$</annotation>\u0000 </semantics></math>), nitrogen dioxide (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>N</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>O</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mathrm{N}{mathrm{O}}_2 $$</annotation>\u0000 </semantics></math>), and fine particulate matter (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mo>.</mo>\u0000 <mn>5</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mathrm{P}{mathrm{M}}_{2.5} $$</annotation>\u0000 </semantics></math>) collected for 25 cities in Canada in 1986–2015.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89028372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model-based approach to predict employee compensation components 基于模型的员工薪酬预测方法
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-26 DOI: 10.1111/rssc.12587
Andreea L. Erciulescu, Jean D. Opsomer

The demand for official statistics at fine levels is motivating researchers to explore estimation methods that extend beyond the traditional survey-based estimation. For this work, the challenge originated with the US Bureau of Labor Statistics, who conducts the National Compensation Survey to collect compensation data from a nationwide sample of establishments. The objective is to obtain predictions of the wage and non-wage components of compensation for a large number of employment domains defined by detailed job characteristics. Survey estimates are only available for a small subset of these domains. To address the objective, we developed a bivariate hierarchical Bayes model that jointly predicts the wage and non-wage compensation components for a large number of employment domains defined by detailed job characteristics. We also discuss solutions to some practical challenges encountered in implementing small area estimation methods in large-scale settings, including methods for defining the prediction space, for constructing and selecting the information that serves as model input, and for obtaining stable survey variance and covariance estimates.

对精细官方统计的需求促使研究者探索超越传统的基于调查的估计方法。对于这项工作,挑战源于美国劳工统计局,他们进行了全国薪酬调查,从全国范围内的企业样本中收集薪酬数据。其目的是对由详细的工作特征界定的大量就业领域的工资和非工资部分的薪酬进行预测。调查估计仅适用于这些领域的一小部分。为了实现这一目标,我们开发了一个双变量分层贝叶斯模型,该模型可以共同预测由详细工作特征定义的大量就业领域的工资和非工资补偿成分。我们还讨论了在大规模环境下实施小面积估计方法时遇到的一些实际挑战的解决方案,包括定义预测空间的方法,构建和选择作为模型输入的信息的方法,以及获得稳定的调查方差和协方差估计的方法。
{"title":"A model-based approach to predict employee compensation components","authors":"Andreea L. Erciulescu,&nbsp;Jean D. Opsomer","doi":"10.1111/rssc.12587","DOIUrl":"10.1111/rssc.12587","url":null,"abstract":"<p>The demand for official statistics at fine levels is motivating researchers to explore estimation methods that extend beyond the traditional survey-based estimation. For this work, the challenge originated with the US Bureau of Labor Statistics, who conducts the National Compensation Survey to collect compensation data from a nationwide sample of establishments. The objective is to obtain predictions of the wage and non-wage components of compensation for a large number of employment domains defined by detailed job characteristics. Survey estimates are only available for a small subset of these domains. To address the objective, we developed a bivariate hierarchical Bayes model that jointly predicts the wage and non-wage compensation components for a large number of employment domains defined by detailed job characteristics. We also discuss solutions to some practical challenges encountered in implementing small area estimation methods in large-scale settings, including methods for defining the prediction space, for constructing and selecting the information that serves as model input, and for obtaining stable survey variance and covariance estimates.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88880927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2 SMC2对广谱β -内酰胺酶大肠杆菌和肺炎克雷伯菌数据的推断
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-24 DOI: 10.1093/jrsssc/qlad055
L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell
We propose a novel stochastic model for the spread of antimicrobial-resistant bacteria in a population, together with an efficient algorithm for fitting such a model to sample data. We introduce an individual-based model for the epidemic, with the state of the model determining which individuals are colonised by the bacteria. The transmission rate of the epidemic takes into account both individuals’ locations, individuals’ covariates, seasonality, and environmental effects. The state of our model is only partially observed, with data consisting of test results from individuals from a sample of households. Fitting our model to data is challenging due to the large state space of our model. We develop an efficient SMC2 algorithm to estimate parameters and compare models for the transmission rate. We implement this algorithm in a computationally efficient manner by using the scale invariance properties of the underlying epidemic model. Our motivating application focuses on the dynamics of community-acquired extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae, using data collected as part of the Drivers of Resistance in Uganda and Malawi project. We infer the parameters of the model and learn key epidemic quantities such as the effective reproduction number, spatial distribution of prevalence, household cluster dynamics, and seasonality.
我们提出了一种新的抗菌素耐药细菌在种群中传播的随机模型,以及一种有效的算法来拟合这种模型到样本数据。我们引入了一个基于个体的流行病模型,模型的状态决定了哪些个体被细菌定植。流行病的传播率考虑到个人的位置、个人的协变量、季节性和环境影响。我们的模型状态仅被部分观察到,数据由来自家庭样本的个人的测试结果组成。由于我们模型的大状态空间,将我们的模型拟合到数据是具有挑战性的。我们开发了一种有效的SMC2算法来估计传输速率的参数和比较模型。我们利用底层流行病模型的尺度不变性,以一种计算效率高的方式实现了该算法。我们的激励应用侧重于社区获得的产生广谱β -内酰胺酶的大肠杆菌和肺炎克雷伯菌的动态,使用作为乌干达和马拉维耐药驱动因素项目的一部分收集的数据。我们推断模型的参数,并了解关键的流行病数量,如有效繁殖数,流行的空间分布,家庭集群动态和季节性。
{"title":"Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2","authors":"L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell","doi":"10.1093/jrsssc/qlad055","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad055","url":null,"abstract":"\u0000 We propose a novel stochastic model for the spread of antimicrobial-resistant bacteria in a population, together with an efficient algorithm for fitting such a model to sample data. We introduce an individual-based model for the epidemic, with the state of the model determining which individuals are colonised by the bacteria. The transmission rate of the epidemic takes into account both individuals’ locations, individuals’ covariates, seasonality, and environmental effects. The state of our model is only partially observed, with data consisting of test results from individuals from a sample of households. Fitting our model to data is challenging due to the large state space of our model. We develop an efficient SMC2 algorithm to estimate parameters and compare models for the transmission rate. We implement this algorithm in a computationally efficient manner by using the scale invariance properties of the underlying epidemic model. Our motivating application focuses on the dynamics of community-acquired extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae, using data collected as part of the Drivers of Resistance in Uganda and Malawi project. We infer the parameters of the model and learn key epidemic quantities such as the effective reproduction number, spatial distribution of prevalence, household cluster dynamics, and seasonality.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86122983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods 使用Christofides广义随机响应设计和贝叶斯方法调查敏感属性与随机变量的关联
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-16 DOI: 10.1111/rssc.12585
Shen-Ming Lee, Truong-Nhat Le, Phuoc-Loc Tran, Chin-Shang Li

In empirical studies involving sensitive topics, in addition to the problem of estimating the population proportion with a sensitive characteristic, a question arises as to whether or not there is heterogeneity in the distribution of an auxiliary random variable representing the information of subjects collected from a sensitive group and a non-sensitive group. That is, it is of interest to investigate the influence of sensitive attribute on the auxiliary random variable of interest. Finite mixture models are utilised to evaluate the association. A proposed Bayesian method through data augmentation and Markov chain Monte Carlo is applied to estimate unknown parameters of interest. Deviance information criterion and marginal likelihood are employed to select a suitable model to describe the association of the sensitive characteristic with the auxiliary random variable. Simulation and real data studies are conducted to assess the performance of and illustrate applications of the proposed methodology.

在涉及敏感话题的实证研究中,除了估计具有敏感特征的总体比例的问题外,还存在一个问题,即代表从敏感组和非敏感组收集的受试者信息的辅助随机变量的分布是否存在异质性。也就是说,研究敏感属性对感兴趣的辅助随机变量的影响是有意义的。有限混合模型被用来评估这种关联。提出了一种通过数据扩充和马尔可夫链蒙特卡罗的贝叶斯方法来估计感兴趣的未知参数。利用偏差信息准则和边际似然选择合适的模型来描述敏感特征与辅助随机变量的关联。模拟和真实数据研究进行了评估性能和说明所提出的方法的应用。
{"title":"Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods","authors":"Shen-Ming Lee,&nbsp;Truong-Nhat Le,&nbsp;Phuoc-Loc Tran,&nbsp;Chin-Shang Li","doi":"10.1111/rssc.12585","DOIUrl":"10.1111/rssc.12585","url":null,"abstract":"<p>In empirical studies involving sensitive topics, in addition to the problem of estimating the population proportion with a sensitive characteristic, a question arises as to whether or not there is heterogeneity in the distribution of an auxiliary random variable representing the information of subjects collected from a sensitive group and a non-sensitive group. That is, it is of interest to investigate the influence of sensitive attribute on the auxiliary random variable of interest. Finite mixture models are utilised to evaluate the association. A proposed Bayesian method through data augmentation and Markov chain Monte Carlo is applied to estimate unknown parameters of interest. Deviance information criterion and marginal likelihood are employed to select a suitable model to describe the association of the sensitive characteristic with the auxiliary random variable. Simulation and real data studies are conducted to assess the performance of and illustrate applications of the proposed methodology.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88884368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical integration of heterogeneous omics data: Probabilistic two-way partial least squares (PO2PLS) 异构组学数据的统计集成:概率双向偏最小二乘(PO2PLS)
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-16 DOI: 10.1111/rssc.12583
Said el Bouhaddani, Hae-Won Uh, Geurt Jongbloed, Jeanine Houwing-Duistermaat

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS.

多组学数据的可用性通过创建集成系统级方法的途径,彻底改变了生命科学。数据集成将跨数据集的信息链接起来,以更好地理解潜在的生物过程。然而,高维性、相关性和异质性给统计和计算带来了挑战。我们提出了一个通用框架,概率双向偏最小二乘(PO2PLS),以解决这些挑战。PO2PLS使用联合和数据特定的潜在变量对两个数据集之间的关系进行建模。对于参数的极大似然估计,我们提出了一种新的快速EM算法,并证明了估计量是渐近正态分布的。针对高维数据集之间的关系,提出了一种全局检验方法,并推导了其渐近分布。值得注意的是,现有的一些数据集成方法是PO2PLS的特殊情况。通过大量的仿真,我们证明了PO2PLS在特征选择和预测性能方面优于替代方案。此外,当样本量足够大时,渐近分布似乎成立。我们用两个常用研究设计的例子来说明PO2PLS:一个大人群队列研究和一个小病例对照研究。除了恢复已知的关系,PO2PLS还发现了新的发现。这些方法在我们的r包PO2PLS中实现。
{"title":"Statistical integration of heterogeneous omics data: Probabilistic two-way partial least squares (PO2PLS)","authors":"Said el Bouhaddani,&nbsp;Hae-Won Uh,&nbsp;Geurt Jongbloed,&nbsp;Jeanine Houwing-Duistermaat","doi":"10.1111/rssc.12583","DOIUrl":"10.1111/rssc.12583","url":null,"abstract":"<p>The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package <i>PO2PLS</i>.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12583","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74773208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modelling time-varying rankings with autoregressive and score-driven dynamics 用自回归和分数驱动的动态建模时变排名
IF 1.6 4区 数学 Q2 Mathematics Pub Date : 2022-08-02 DOI: 10.1111/rssc.12584
Vladimír Holý, Jan Zouhar

We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via the maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett–Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilise the conditional score in the fashion of the generalised autoregressive score models. Simulation experiments show that the small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. In an empirical study, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys and non-parametric efficiency analysis.

我们建立了一个新的统计模型来分析时变的排名数据。该模型可用于大量的排名项目,适应外生时变协变量和部分排名,并通过最大似然以一种简单的方式进行估计。排名使用Plackett-Luce分布建模,随时间变化的价值参数遵循均值回归的时间序列过程。为了捕捉价值参数对过去排名的依赖性,我们以广义自回归分数模型的方式利用条件分数。仿真实验表明,最大似然估计量的小样本特性随着时间序列的长度而迅速改善,这表明即使对于中等样本,依靠传统的基于hessian标准误差的统计推断也是可用的。在实证研究中,我们将该模型应用于冰球世界锦标赛的结果。我们还讨论了基于基础指数、重复调查和非参数效率分析的排名应用。
{"title":"Modelling time-varying rankings with autoregressive and score-driven dynamics","authors":"Vladimír Holý,&nbsp;Jan Zouhar","doi":"10.1111/rssc.12584","DOIUrl":"10.1111/rssc.12584","url":null,"abstract":"<p>We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via the maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett–Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilise the conditional score in the fashion of the generalised autoregressive score models. Simulation experiments show that the small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. In an empirical study, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys and non-parametric efficiency analysis.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83166449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Journal of the Royal Statistical Society Series C-Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1