首页 > 最新文献

Journal of the Royal Statistical Society Series C-Applied Statistics最新文献

英文 中文
Enumeration of regular fractional factorial designs with four-level and two-level factors 列举具有四水平和二水平因子的规则分数因子设计
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-03-10 DOI: 10.1093/jrsssc/qlad031
Alexandre Bohyn, E. Schoen, P. Goos
Designs for screening experiments usually include factors with two levels only. Adding a few four-level factors allows for the inclusion of multi-level categorical factors or quantitative factors with possible quadratic or third-order effects. Three examples motivated us to generate a large catalogue of designs with two-level factors as well as four-level factors. To create the catalogue, we considered three methods. In the first method, we select designs using a search table, and in the second method, we use a procedure that selects candidate designs based on the properties of their projections into fewer factors. The third method is actually a benchmark method, in which we use a general orthogonal array enumeration algorithm. We compare the efficiencies of the new methods for generating complete sets of nonisomorphic designs. Finally, we use the most efficient method to generate a catalogue of designs with up to three four-level factors and up to 20 two-level factors for run sizes 16, 32, 64, and 128. In some cases, a complete enumeration was infeasible. For these cases, we used a bounded enumeration strategy instead. We demonstrate the usefulness of the catalogue by revisiting the motivating examples.
筛选实验的设计通常只包括两个水平的因素。添加一些四级因素允许包含多级分类因素或可能具有二次或三阶效应的定量因素。三个例子促使我们产生了一个包含两级因素和四级因素的大型设计目录。为了创建目录,我们考虑了三种方法。在第一种方法中,我们使用搜索表选择设计,而在第二种方法中,我们使用基于其投影到较少因素的属性选择候选设计的过程。第三种方法实际上是一种基准方法,其中我们使用一般的正交数组枚举算法。我们比较了生成非同构设计完备集的新方法的效率。最后,我们使用最有效的方法生成一个设计目录,其中包含多达三个四水平因子和多达20个两水平因子,运行规模为16、32、64和128。在某些情况下,完整的列举是不可行的。对于这些情况,我们使用了有界枚举策略。我们通过回顾激励的例子来证明目录的有用性。
{"title":"Enumeration of regular fractional factorial designs with four-level and two-level factors","authors":"Alexandre Bohyn, E. Schoen, P. Goos","doi":"10.1093/jrsssc/qlad031","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad031","url":null,"abstract":"\u0000 Designs for screening experiments usually include factors with two levels only. Adding a few four-level factors allows for the inclusion of multi-level categorical factors or quantitative factors with possible quadratic or third-order effects. Three examples motivated us to generate a large catalogue of designs with two-level factors as well as four-level factors. To create the catalogue, we considered three methods. In the first method, we select designs using a search table, and in the second method, we use a procedure that selects candidate designs based on the properties of their projections into fewer factors. The third method is actually a benchmark method, in which we use a general orthogonal array enumeration algorithm. We compare the efficiencies of the new methods for generating complete sets of nonisomorphic designs. Finally, we use the most efficient method to generate a catalogue of designs with up to three four-level factors and up to 20 two-level factors for run sizes 16, 32, 64, and 128. In some cases, a complete enumeration was infeasible. For these cases, we used a bounded enumeration strategy instead. We demonstrate the usefulness of the catalogue by revisiting the motivating examples.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"45 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90815208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial stochastic frontier model introducing inefficiency spillovers 引入无效率溢出的空间随机前沿模型
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-24 DOI: 10.1093/jrsssc/qlad012
Federica Galli
This paper develops a spatial Durbin stochastic frontier model for panel data introducing spillover effects in the determinants of technical efficiency (SDF-STE). The model nests several existing spatial and non-spatial stochastic frontier specifications and is estimated using maximum-likelihood techniques. Estimates are shown to be unbiased even for small sample sizes and for alternative specifications of the spatial weight matrix implementing different Monte Carlo simulations. Finally, an application to the Italian accommodation sector is provided. Empirical findings suggest the relevance of the SDF-STE model in capturing labour productivity and knowledge spillover effects.
本文建立了面板数据的空间Durbin随机前沿模型,引入了技术效率决定因素(SDF-STE)的溢出效应。该模型嵌套了几种现有的空间和非空间随机前沿规范,并使用最大似然技术进行估计。即使对于小样本量和实现不同蒙特卡罗模拟的空间权重矩阵的替代规范,估计也显示为无偏的。最后,提供了意大利住宿部门的申请。实证结果表明,SDF-STE模型在捕捉劳动生产率和知识溢出效应方面具有相关性。
{"title":"A spatial stochastic frontier model introducing inefficiency spillovers","authors":"Federica Galli","doi":"10.1093/jrsssc/qlad012","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad012","url":null,"abstract":"\u0000 This paper develops a spatial Durbin stochastic frontier model for panel data introducing spillover effects in the determinants of technical efficiency (SDF-STE). The model nests several existing spatial and non-spatial stochastic frontier specifications and is estimated using maximum-likelihood techniques. Estimates are shown to be unbiased even for small sample sizes and for alternative specifications of the spatial weight matrix implementing different Monte Carlo simulations. Finally, an application to the Italian accommodation sector is provided. Empirical findings suggest the relevance of the SDF-STE model in capturing labour productivity and knowledge spillover effects.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81544168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using natural strata when examining unmeasured biases in an observational study of neurological side effects of antibiotics 在抗生素神经系统副作用的观察性研究中,使用天然地层检查未测量偏差
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-23 DOI: 10.1093/jrsssc/qlad010
K. Brumberg, Darcy E. Ellis, D. Small, S. Hennessy, P. Rosenbaum
Fluoroquinolones are widely prescribed antibiotics that carry a US Food and Drug Administration warning about possible side-effects on the central and peripheral nervous system. We compare 436,891 patients with sinusitis treated with fluoroquinolones to two control groups treated with azithromycin or amoxicillin. In addition to looking for nervous system complications, we look for evidence of bias using outcomes for which an effect was not anticipated. The comparison uses ‘natural strata’ that form control groups proportional in size to the treated group and balance many covariates beyond those that define the strata. The main technical contribution is a new method for near-optimal construction of natural strata with multiple groups. The online supplement material contains proofs, details, and information about the R package natstrat and replication.
氟喹诺酮类药物是一种广泛使用的抗生素,美国食品和药物管理局警告说,这种药物可能对中枢和周围神经系统产生副作用。我们比较了436,891例使用氟喹诺酮类药物治疗的鼻窦炎患者与使用阿奇霉素或阿莫西林治疗的对照组。除了寻找神经系统并发症外,我们还使用未预料到的结果寻找偏倚的证据。比较使用“自然地层”,形成与处理组成比例的对照组,并平衡定义地层之外的许多协变量。主要的技术贡献是提出了一种多群天然地层近最优施工的新方法。在线补充材料包含证据、细节和有关R包的natstrat和复制的信息。
{"title":"Using natural strata when examining unmeasured biases in an observational study of neurological side effects of antibiotics","authors":"K. Brumberg, Darcy E. Ellis, D. Small, S. Hennessy, P. Rosenbaum","doi":"10.1093/jrsssc/qlad010","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad010","url":null,"abstract":"\u0000 Fluoroquinolones are widely prescribed antibiotics that carry a US Food and Drug Administration warning about possible side-effects on the central and peripheral nervous system. We compare 436,891 patients with sinusitis treated with fluoroquinolones to two control groups treated with azithromycin or amoxicillin. In addition to looking for nervous system complications, we look for evidence of bias using outcomes for which an effect was not anticipated. The comparison uses ‘natural strata’ that form control groups proportional in size to the treated group and balance many covariates beyond those that define the strata. The main technical contribution is a new method for near-optimal construction of natural strata with multiple groups. The online supplement material contains proofs, details, and information about the R package natstrat and replication.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84822538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland 语境在极端值分析中的重要性,并应用于美国和格陵兰岛的极端温度
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-16 DOI: 10.1093/jrsssc/qlad020
D. Clarkson, E. Eastoe, A. Leeson
Statistical extreme value models allow estimation of the frequency, magnitude and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, record-breaking temperatures experienced in the US in the summer of 2021 are found to exceed the maximum feasible temperature predicted from a standard extreme value analysis of pre-2021 data. Incorporating random effects into the standard methods accounts for additional variability in the model parameters, reflecting shifts in unobserved climatic drivers and permitting greater accuracy in return period prediction. The second scenario examines ice surface temperatures in Greenland. The temperature distribution is found to have a poorly-defined upper tail, with a spike in observations just below 0◦C and an unexpectedly large number of measurements above this value. A Gaussian mixture model fit to the full range of measurements improves fit and predictive abilities in the upper tail when compared to traditional extreme value methods.
统计极值模式可以在气候变化的情况下估计极端温度事件的频率、大小和时空范围。不幸的是,许多标准方法的假设对于复杂的环境数据集是无效的,一个现实的统计模型需要适当地结合科学背景。我们研究了两个案例研究,其中应用常规极值方法导致不适当的模型和不准确的预测。在第一种情况下,2021年夏季美国经历的破纪录温度被发现超过了根据2021年前数据的标准极值分析预测的最高可行温度。将随机效应纳入标准方法可以解释模式参数的额外变率,反映未观测到的气候驱动因素的变化,并使回归期预测更加准确。第二种情景考察的是格陵兰岛的冰层表面温度。温度分布被发现有一个不明确的上尾,在0℃以下的观察中有一个尖峰,在这个值以上的测量出乎意料地多。与传统的极值方法相比,高斯混合模型对整个测量范围的拟合提高了上尾的拟合和预测能力。
{"title":"The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland","authors":"D. Clarkson, E. Eastoe, A. Leeson","doi":"10.1093/jrsssc/qlad020","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad020","url":null,"abstract":"\u0000 Statistical extreme value models allow estimation of the frequency, magnitude and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, record-breaking temperatures experienced in the US in the summer of 2021 are found to exceed the maximum feasible temperature predicted from a standard extreme value analysis of pre-2021 data. Incorporating random effects into the standard methods accounts for additional variability in the model parameters, reflecting shifts in unobserved climatic drivers and permitting greater accuracy in return period prediction. The second scenario examines ice surface temperatures in Greenland. The temperature distribution is found to have a poorly-defined upper tail, with a spike in observations just below 0◦C and an unexpectedly large number of measurements above this value. A Gaussian mixture model fit to the full range of measurements improves fit and predictive abilities in the upper tail when compared to traditional extreme value methods.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87796961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical calibration for infinite many future values in linear regression: simultaneous or pointwise tolerance intervals or what else? 线性回归中无限多个未来值的统计校准:同步或点公差区间或其他什么?
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-13 DOI: 10.1093/jrsssc/qlac004
Yang Han, Yujia Sun, Lingjiao Wang, Wei Liu, F. Bretz
Statistical calibration using regression is a useful statistical tool with many applications. For confidence sets for x-values associated with infinitely many future y-values, there is a consensus in the statistical literature that the confidence sets constructed should guarantee a key property. While it is well known that the confidence sets based on the simultaneous tolerance intervals (STIs) guarantee this key property conservatively, it is desirable to construct confidence sets that satisfy this property exactly. Also, there is a misconception that the confidence sets based on the pointwise tolerance intervals (PTIs) also guarantee this property. This paper constructs the weighted simultaneous tolerance intervals (WSTIs) so that the confidence sets based on the WSTIs satisfy this property exactly if the future observations have the x-values distributed according to a known specific distribution F(⋅). Through the lens of the WSTIs, convincing counter examples are also provided to demonstrate that the confidence sets based on the PTIs do not guarantee the key property in general and so should not be used. The WSTIs have been applied to real data examples to show that the WSTIs can produce more accurate calibration intervals than STIs and PTIs.
使用回归的统计校准是一种有用的统计工具,具有许多应用。对于与无限多个未来y值相关的x值的置信集,在统计文献中有一个共识,即构造的置信集应该保证一个关键属性。众所周知,基于同步容差区间的置信集保守地保证了这一关键属性,但我们需要构造完全满足这一属性的置信集。此外,还有一种误解,认为基于点向公差区间(pti)的置信集也保证了这一特性。本文构造了加权同时容差区间(WSTIs),当未来观测值的x值按照已知的特定分布F(⋅)分布时,基于WSTIs的置信集完全满足这一性质。通过wsti的视角,还提供了令人信服的反例,以证明基于pti的置信集一般不能保证关键属性,因此不应使用。将WSTIs应用于实际数据实例,结果表明WSTIs比STIs和pti能得到更精确的标定区间。
{"title":"Statistical calibration for infinite many future values in linear regression: simultaneous or pointwise tolerance intervals or what else?","authors":"Yang Han, Yujia Sun, Lingjiao Wang, Wei Liu, F. Bretz","doi":"10.1093/jrsssc/qlac004","DOIUrl":"https://doi.org/10.1093/jrsssc/qlac004","url":null,"abstract":"\u0000 Statistical calibration using regression is a useful statistical tool with many applications. For confidence sets for x-values associated with infinitely many future y-values, there is a consensus in the statistical literature that the confidence sets constructed should guarantee a key property. While it is well known that the confidence sets based on the simultaneous tolerance intervals (STIs) guarantee this key property conservatively, it is desirable to construct confidence sets that satisfy this property exactly. Also, there is a misconception that the confidence sets based on the pointwise tolerance intervals (PTIs) also guarantee this property. This paper constructs the weighted simultaneous tolerance intervals (WSTIs) so that the confidence sets based on the WSTIs satisfy this property exactly if the future observations have the x-values distributed according to a known specific distribution F(⋅). Through the lens of the WSTIs, convincing counter examples are also provided to demonstrate that the confidence sets based on the PTIs do not guarantee the key property in general and so should not be used. The WSTIs have been applied to real data examples to show that the WSTIs can produce more accurate calibration intervals than STIs and PTIs.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"85 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83880930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the competitive facility location problem with a Bayesian spatial interaction model 基于贝叶斯空间相互作用模型的竞争性设施选址问题
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-09 DOI: 10.1093/jrsssc/qlad003
Shanaka Perera, Virginia Aglietti, T. Damoulas
The competitive facility location problem arises when businesses plan to enter a new market or expand their presence. We introduce a Bayesian spatial interaction model which provides probabilistic estimates on location-specific revenues and then formulate a mathematical framework to simultaneously identify the location and design of new facilities that maximise revenue. To solve the allocation optimisation problem, we develop a hierarchical search algorithm and associated sampling techniques that explore geographic regions of varying spatial resolution. We demonstrate the approach by producing optimal facility locations and corresponding designs for two large-scale applications in the supermarket and pub sectors of Greater London.
当企业计划进入一个新市场或扩大其存在时,竞争性设施选址问题就会出现。我们引入了一个贝叶斯空间相互作用模型,该模型提供了特定地点收入的概率估计,然后制定了一个数学框架,以同时确定收入最大化的新设施的位置和设计。为了解决分配优化问题,我们开发了一种分层搜索算法和相关的采样技术,用于探索不同空间分辨率的地理区域。我们通过为大伦敦的超市和酒吧部门的两个大型应用程序提供最佳设施位置和相应的设计来展示这种方法。
{"title":"On the competitive facility location problem with a Bayesian spatial interaction model","authors":"Shanaka Perera, Virginia Aglietti, T. Damoulas","doi":"10.1093/jrsssc/qlad003","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad003","url":null,"abstract":"\u0000 The competitive facility location problem arises when businesses plan to enter a new market or expand their presence. We introduce a Bayesian spatial interaction model which provides probabilistic estimates on location-specific revenues and then formulate a mathematical framework to simultaneously identify the location and design of new facilities that maximise revenue. To solve the allocation optimisation problem, we develop a hierarchical search algorithm and associated sampling techniques that explore geographic regions of varying spatial resolution. We demonstrate the approach by producing optimal facility locations and corresponding designs for two large-scale applications in the supermarket and pub sectors of Greater London.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73364515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The determinants of Airbnb prices in New York City: a spatial quantile regression approach 纽约市Airbnb价格的决定因素:空间分位数回归方法
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-02-08 DOI: 10.1093/jrsssc/qlad001
M. Bernardi, M. Guidolin
In this paper, we study the price determinants of Airbnb rentals, for the case of New York City, by developing a new dataset, which combines attributes of the property and of the related service, with other information available as open data. This dataset is employed within a spatial quantile semiparametric regression model, able to handle the intrinsic heterogeneity of house prices. The results confirm that property and service attributes play a significant role in determining rental prices, while some variables exert a different impact on prices in magnitude and sign, depending on the quantile considered.
在本文中,我们以纽约市为例,通过开发一个新的数据集来研究Airbnb租金的价格决定因素,该数据集结合了房产和相关服务的属性以及其他作为开放数据的信息。该数据集采用空间分位数半参数回归模型,能够处理房价的内在异质性。结果证实,物业和服务属性在决定租金价格方面发挥着重要作用,而一些变量对价格的影响程度和影响程度不同,取决于所考虑的分位数。
{"title":"The determinants of Airbnb prices in New York City: a spatial quantile regression approach","authors":"M. Bernardi, M. Guidolin","doi":"10.1093/jrsssc/qlad001","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad001","url":null,"abstract":"\u0000 In this paper, we study the price determinants of Airbnb rentals, for the case of New York City, by developing a new dataset, which combines attributes of the property and of the related service, with other information available as open data. This dataset is employed within a spatial quantile semiparametric regression model, able to handle the intrinsic heterogeneity of house prices. The results confirm that property and service attributes play a significant role in determining rental prices, while some variables exert a different impact on prices in magnitude and sign, depending on the quantile considered.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"112 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87895558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer. 基于稀疏树的微生物组数据聚类,描述胰腺癌微生物组异质性的特征。
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 Epub Date: 2023-02-13 DOI: 10.1093/jrsssc/qlac002
Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert Jenq, Christine B Peterson

There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with similar microbiome profiles. We propose a novel unsupervised clustering approach in the Bayesian framework that innovates over existing model-based clustering approaches, such as the Dirichlet multinomial mixture model, in three key respects: we incorporate feature selection, learn the appropriate number of clusters from the data, and integrate information on the tree structure relating the observed features. We compare the performance of our proposed method to existing methods on simulated data designed to mimic real microbiome data. We then illustrate results obtained for our motivating data set, a clinical study aimed at characterizing the tumor microbiome of pancreatic cancer patients.

有越来越多的证据表明,微生物组在决定治疗效果方面发挥着重要作用,因此,人们对描述癌症患者微生物组的变异特征有着浓厚的兴趣。在这里,我们的目标是发现具有相似微生物组特征的患者亚群。我们在贝叶斯框架下提出了一种新颖的无监督聚类方法,与现有的基于模型的聚类方法(如 Dirichlet 多叉混合物模型)相比,该方法在三个关键方面进行了创新:我们纳入了特征选择,从数据中学习适当数量的聚类,并整合了与观测特征相关的树结构信息。我们在模拟真实微生物组数据的模拟数据上比较了我们提出的方法和现有方法的性能。然后,我们说明了在我们的激励数据集上获得的结果,该数据集是一项旨在描述胰腺癌患者肿瘤微生物组特征的临床研究。
{"title":"Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer.","authors":"Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert Jenq, Christine B Peterson","doi":"10.1093/jrsssc/qlac002","DOIUrl":"10.1093/jrsssc/qlac002","url":null,"abstract":"<p><p>There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with similar microbiome profiles. We propose a novel unsupervised clustering approach in the Bayesian framework that innovates over existing model-based clustering approaches, such as the Dirichlet multinomial mixture model, in three key respects: we incorporate feature selection, learn the appropriate number of clusters from the data, and integrate information on the tree structure relating the observed features. We compare the performance of our proposed method to existing methods on simulated data designed to mimic real microbiome data. We then illustrate results obtained for our motivating data set, a clinical study aimed at characterizing the tumor microbiome of pancreatic cancer patients.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"72 1","pages":"20-36"},"PeriodicalIF":1.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10077950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9289729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utility-based Bayesian personalized treatment selection for advanced breast cancer 基于效用的晚期乳腺癌贝叶斯个性化治疗选择
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-12-22 DOI: 10.1111/rssc.12606

The Section 2 heading ‘A BNR MODEL’ should be corrected to read as ‘A BAYESIAN NONPARAMETRIC REGRESSION MODEL’.

第2节标题“一个BNR模型”应更正为“一个贝叶斯非参数回归模型”。
{"title":"Utility-based Bayesian personalized treatment selection for advanced breast cancer","authors":"","doi":"10.1111/rssc.12606","DOIUrl":"https://doi.org/10.1111/rssc.12606","url":null,"abstract":"<p>The Section 2 heading ‘A BNR MODEL’ should be corrected to read as ‘A BAYESIAN NONPARAMETRIC REGRESSION MODEL’.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"O1"},"PeriodicalIF":1.6,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134813229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contents of volume 71, 2022 第71卷内容,2022年
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-11-18 DOI: 10.1111/rssc.12605
{"title":"Contents of volume 71, 2022","authors":"","doi":"10.1111/rssc.12605","DOIUrl":"10.1111/rssc.12605","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"2038-2041"},"PeriodicalIF":1.6,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83091174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the Royal Statistical Society Series C-Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1