Designs for screening experiments usually include factors with two levels only. Adding a few four-level factors allows for the inclusion of multi-level categorical factors or quantitative factors with possible quadratic or third-order effects. Three examples motivated us to generate a large catalogue of designs with two-level factors as well as four-level factors. To create the catalogue, we considered three methods. In the first method, we select designs using a search table, and in the second method, we use a procedure that selects candidate designs based on the properties of their projections into fewer factors. The third method is actually a benchmark method, in which we use a general orthogonal array enumeration algorithm. We compare the efficiencies of the new methods for generating complete sets of nonisomorphic designs. Finally, we use the most efficient method to generate a catalogue of designs with up to three four-level factors and up to 20 two-level factors for run sizes 16, 32, 64, and 128. In some cases, a complete enumeration was infeasible. For these cases, we used a bounded enumeration strategy instead. We demonstrate the usefulness of the catalogue by revisiting the motivating examples.
{"title":"Enumeration of regular fractional factorial designs with four-level and two-level factors","authors":"Alexandre Bohyn, E. Schoen, P. Goos","doi":"10.1093/jrsssc/qlad031","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad031","url":null,"abstract":"\u0000 Designs for screening experiments usually include factors with two levels only. Adding a few four-level factors allows for the inclusion of multi-level categorical factors or quantitative factors with possible quadratic or third-order effects. Three examples motivated us to generate a large catalogue of designs with two-level factors as well as four-level factors. To create the catalogue, we considered three methods. In the first method, we select designs using a search table, and in the second method, we use a procedure that selects candidate designs based on the properties of their projections into fewer factors. The third method is actually a benchmark method, in which we use a general orthogonal array enumeration algorithm. We compare the efficiencies of the new methods for generating complete sets of nonisomorphic designs. Finally, we use the most efficient method to generate a catalogue of designs with up to three four-level factors and up to 20 two-level factors for run sizes 16, 32, 64, and 128. In some cases, a complete enumeration was infeasible. For these cases, we used a bounded enumeration strategy instead. We demonstrate the usefulness of the catalogue by revisiting the motivating examples.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"45 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90815208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper develops a spatial Durbin stochastic frontier model for panel data introducing spillover effects in the determinants of technical efficiency (SDF-STE). The model nests several existing spatial and non-spatial stochastic frontier specifications and is estimated using maximum-likelihood techniques. Estimates are shown to be unbiased even for small sample sizes and for alternative specifications of the spatial weight matrix implementing different Monte Carlo simulations. Finally, an application to the Italian accommodation sector is provided. Empirical findings suggest the relevance of the SDF-STE model in capturing labour productivity and knowledge spillover effects.
{"title":"A spatial stochastic frontier model introducing inefficiency spillovers","authors":"Federica Galli","doi":"10.1093/jrsssc/qlad012","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad012","url":null,"abstract":"\u0000 This paper develops a spatial Durbin stochastic frontier model for panel data introducing spillover effects in the determinants of technical efficiency (SDF-STE). The model nests several existing spatial and non-spatial stochastic frontier specifications and is estimated using maximum-likelihood techniques. Estimates are shown to be unbiased even for small sample sizes and for alternative specifications of the spatial weight matrix implementing different Monte Carlo simulations. Finally, an application to the Italian accommodation sector is provided. Empirical findings suggest the relevance of the SDF-STE model in capturing labour productivity and knowledge spillover effects.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81544168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Brumberg, Darcy E. Ellis, D. Small, S. Hennessy, P. Rosenbaum
Fluoroquinolones are widely prescribed antibiotics that carry a US Food and Drug Administration warning about possible side-effects on the central and peripheral nervous system. We compare 436,891 patients with sinusitis treated with fluoroquinolones to two control groups treated with azithromycin or amoxicillin. In addition to looking for nervous system complications, we look for evidence of bias using outcomes for which an effect was not anticipated. The comparison uses ‘natural strata’ that form control groups proportional in size to the treated group and balance many covariates beyond those that define the strata. The main technical contribution is a new method for near-optimal construction of natural strata with multiple groups. The online supplement material contains proofs, details, and information about the R package natstrat and replication.
{"title":"Using natural strata when examining unmeasured biases in an observational study of neurological side effects of antibiotics","authors":"K. Brumberg, Darcy E. Ellis, D. Small, S. Hennessy, P. Rosenbaum","doi":"10.1093/jrsssc/qlad010","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad010","url":null,"abstract":"\u0000 Fluoroquinolones are widely prescribed antibiotics that carry a US Food and Drug Administration warning about possible side-effects on the central and peripheral nervous system. We compare 436,891 patients with sinusitis treated with fluoroquinolones to two control groups treated with azithromycin or amoxicillin. In addition to looking for nervous system complications, we look for evidence of bias using outcomes for which an effect was not anticipated. The comparison uses ‘natural strata’ that form control groups proportional in size to the treated group and balance many covariates beyond those that define the strata. The main technical contribution is a new method for near-optimal construction of natural strata with multiple groups. The online supplement material contains proofs, details, and information about the R package natstrat and replication.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84822538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical extreme value models allow estimation of the frequency, magnitude and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, record-breaking temperatures experienced in the US in the summer of 2021 are found to exceed the maximum feasible temperature predicted from a standard extreme value analysis of pre-2021 data. Incorporating random effects into the standard methods accounts for additional variability in the model parameters, reflecting shifts in unobserved climatic drivers and permitting greater accuracy in return period prediction. The second scenario examines ice surface temperatures in Greenland. The temperature distribution is found to have a poorly-defined upper tail, with a spike in observations just below 0◦C and an unexpectedly large number of measurements above this value. A Gaussian mixture model fit to the full range of measurements improves fit and predictive abilities in the upper tail when compared to traditional extreme value methods.
{"title":"The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland","authors":"D. Clarkson, E. Eastoe, A. Leeson","doi":"10.1093/jrsssc/qlad020","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad020","url":null,"abstract":"\u0000 Statistical extreme value models allow estimation of the frequency, magnitude and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, record-breaking temperatures experienced in the US in the summer of 2021 are found to exceed the maximum feasible temperature predicted from a standard extreme value analysis of pre-2021 data. Incorporating random effects into the standard methods accounts for additional variability in the model parameters, reflecting shifts in unobserved climatic drivers and permitting greater accuracy in return period prediction. The second scenario examines ice surface temperatures in Greenland. The temperature distribution is found to have a poorly-defined upper tail, with a spike in observations just below 0◦C and an unexpectedly large number of measurements above this value. A Gaussian mixture model fit to the full range of measurements improves fit and predictive abilities in the upper tail when compared to traditional extreme value methods.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87796961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Han, Yujia Sun, Lingjiao Wang, Wei Liu, F. Bretz
Statistical calibration using regression is a useful statistical tool with many applications. For confidence sets for x-values associated with infinitely many future y-values, there is a consensus in the statistical literature that the confidence sets constructed should guarantee a key property. While it is well known that the confidence sets based on the simultaneous tolerance intervals (STIs) guarantee this key property conservatively, it is desirable to construct confidence sets that satisfy this property exactly. Also, there is a misconception that the confidence sets based on the pointwise tolerance intervals (PTIs) also guarantee this property. This paper constructs the weighted simultaneous tolerance intervals (WSTIs) so that the confidence sets based on the WSTIs satisfy this property exactly if the future observations have the x-values distributed according to a known specific distribution F(⋅). Through the lens of the WSTIs, convincing counter examples are also provided to demonstrate that the confidence sets based on the PTIs do not guarantee the key property in general and so should not be used. The WSTIs have been applied to real data examples to show that the WSTIs can produce more accurate calibration intervals than STIs and PTIs.
{"title":"Statistical calibration for infinite many future values in linear regression: simultaneous or pointwise tolerance intervals or what else?","authors":"Yang Han, Yujia Sun, Lingjiao Wang, Wei Liu, F. Bretz","doi":"10.1093/jrsssc/qlac004","DOIUrl":"https://doi.org/10.1093/jrsssc/qlac004","url":null,"abstract":"\u0000 Statistical calibration using regression is a useful statistical tool with many applications. For confidence sets for x-values associated with infinitely many future y-values, there is a consensus in the statistical literature that the confidence sets constructed should guarantee a key property. While it is well known that the confidence sets based on the simultaneous tolerance intervals (STIs) guarantee this key property conservatively, it is desirable to construct confidence sets that satisfy this property exactly. Also, there is a misconception that the confidence sets based on the pointwise tolerance intervals (PTIs) also guarantee this property. This paper constructs the weighted simultaneous tolerance intervals (WSTIs) so that the confidence sets based on the WSTIs satisfy this property exactly if the future observations have the x-values distributed according to a known specific distribution F(⋅). Through the lens of the WSTIs, convincing counter examples are also provided to demonstrate that the confidence sets based on the PTIs do not guarantee the key property in general and so should not be used. The WSTIs have been applied to real data examples to show that the WSTIs can produce more accurate calibration intervals than STIs and PTIs.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"85 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83880930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The competitive facility location problem arises when businesses plan to enter a new market or expand their presence. We introduce a Bayesian spatial interaction model which provides probabilistic estimates on location-specific revenues and then formulate a mathematical framework to simultaneously identify the location and design of new facilities that maximise revenue. To solve the allocation optimisation problem, we develop a hierarchical search algorithm and associated sampling techniques that explore geographic regions of varying spatial resolution. We demonstrate the approach by producing optimal facility locations and corresponding designs for two large-scale applications in the supermarket and pub sectors of Greater London.
{"title":"On the competitive facility location problem with a Bayesian spatial interaction model","authors":"Shanaka Perera, Virginia Aglietti, T. Damoulas","doi":"10.1093/jrsssc/qlad003","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad003","url":null,"abstract":"\u0000 The competitive facility location problem arises when businesses plan to enter a new market or expand their presence. We introduce a Bayesian spatial interaction model which provides probabilistic estimates on location-specific revenues and then formulate a mathematical framework to simultaneously identify the location and design of new facilities that maximise revenue. To solve the allocation optimisation problem, we develop a hierarchical search algorithm and associated sampling techniques that explore geographic regions of varying spatial resolution. We demonstrate the approach by producing optimal facility locations and corresponding designs for two large-scale applications in the supermarket and pub sectors of Greater London.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73364515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study the price determinants of Airbnb rentals, for the case of New York City, by developing a new dataset, which combines attributes of the property and of the related service, with other information available as open data. This dataset is employed within a spatial quantile semiparametric regression model, able to handle the intrinsic heterogeneity of house prices. The results confirm that property and service attributes play a significant role in determining rental prices, while some variables exert a different impact on prices in magnitude and sign, depending on the quantile considered.
{"title":"The determinants of Airbnb prices in New York City: a spatial quantile regression approach","authors":"M. Bernardi, M. Guidolin","doi":"10.1093/jrsssc/qlad001","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad001","url":null,"abstract":"\u0000 In this paper, we study the price determinants of Airbnb rentals, for the case of New York City, by developing a new dataset, which combines attributes of the property and of the related service, with other information available as open data. This dataset is employed within a spatial quantile semiparametric regression model, able to handle the intrinsic heterogeneity of house prices. The results confirm that property and service attributes play a significant role in determining rental prices, while some variables exert a different impact on prices in magnitude and sign, depending on the quantile considered.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"112 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87895558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-02-13DOI: 10.1093/jrsssc/qlac002
Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert Jenq, Christine B Peterson
There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with similar microbiome profiles. We propose a novel unsupervised clustering approach in the Bayesian framework that innovates over existing model-based clustering approaches, such as the Dirichlet multinomial mixture model, in three key respects: we incorporate feature selection, learn the appropriate number of clusters from the data, and integrate information on the tree structure relating the observed features. We compare the performance of our proposed method to existing methods on simulated data designed to mimic real microbiome data. We then illustrate results obtained for our motivating data set, a clinical study aimed at characterizing the tumor microbiome of pancreatic cancer patients.
{"title":"Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer.","authors":"Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert Jenq, Christine B Peterson","doi":"10.1093/jrsssc/qlac002","DOIUrl":"10.1093/jrsssc/qlac002","url":null,"abstract":"<p><p>There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with similar microbiome profiles. We propose a novel unsupervised clustering approach in the Bayesian framework that innovates over existing model-based clustering approaches, such as the Dirichlet multinomial mixture model, in three key respects: we incorporate feature selection, learn the appropriate number of clusters from the data, and integrate information on the tree structure relating the observed features. We compare the performance of our proposed method to existing methods on simulated data designed to mimic real microbiome data. We then illustrate results obtained for our motivating data set, a clinical study aimed at characterizing the tumor microbiome of pancreatic cancer patients.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"72 1","pages":"20-36"},"PeriodicalIF":1.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10077950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9289729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Section 2 heading ‘A BNR MODEL’ should be corrected to read as ‘A BAYESIAN NONPARAMETRIC REGRESSION MODEL’.
第2节标题“一个BNR模型”应更正为“一个贝叶斯非参数回归模型”。
{"title":"Utility-based Bayesian personalized treatment selection for advanced breast cancer","authors":"","doi":"10.1111/rssc.12606","DOIUrl":"https://doi.org/10.1111/rssc.12606","url":null,"abstract":"<p>The Section 2 heading ‘A BNR MODEL’ should be corrected to read as ‘A BAYESIAN NONPARAMETRIC REGRESSION MODEL’.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"O1"},"PeriodicalIF":1.6,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134813229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contents of volume 71, 2022","authors":"","doi":"10.1111/rssc.12605","DOIUrl":"10.1111/rssc.12605","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"2038-2041"},"PeriodicalIF":1.6,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83091174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}