Abstract Partial likelihood, introduced in Cox (1975, Partial likelihood. Biometrika, 62(2),269–276), formalizes the construction of the inference function developed in Cox (1972, Regression models and life-tables (with discussion). Journal of the Royal Statistical Society Series B, 34(2),187–220) and referred there to as a conditional likelihood. Partial likelihood can also be viewed as a version of composite likelihood, a different example of which was studied in Cox, and Reid (2004, A note on pseudolikelihood constructed from marginal densities. Biometrika, 91(3),729–737). In this note, I describe the links between partial and composite likelihood, and the connections to profile, marginal, and conditional likelihood. Somewhat tangentially, two recent applications of the Cox proportional hazards model from the medical literature are briefly discussed, as they highlight the model’s ongoing relevance while also raising some more general questions about inference.
摘要 部分似然法由 Cox(1975,Partial likelihood.Biometrika,62(2),269-276)中提出的部分似然法,将考克斯(1972,回归模型和生命表(附讨论)。英国皇家统计学会期刊 B 辑,34(2),187-220)中提出的推理函数的构造形式化,并将其称为条件似然。部分似然也可视为复合似然的一个版本,Cox 和 Reid(2004,A note on pseudolikelihood constructed from marginal densities.Biometrika,91(3),729-737)。在本说明中,我将介绍部分似然与复合似然之间的联系,以及与轮廓似然、边际似然和条件似然之间的联系。本文从切入点出发,简要讨论了医学文献中对考克斯比例危险模型的两个最新应用,因为它们突出了该模型的持续相关性,同时也提出了一些关于推理的一般性问题。
{"title":"On partial likelihood","authors":"N Reid","doi":"10.1093/jrsssa/qnae008","DOIUrl":"https://doi.org/10.1093/jrsssa/qnae008","url":null,"abstract":"<jats:title>Abstract</jats:title> Partial likelihood, introduced in Cox (1975, Partial likelihood. Biometrika, 62(2),269–276), formalizes the construction of the inference function developed in Cox (1972, Regression models and life-tables (with discussion). Journal of the Royal Statistical Society Series B, 34(2),187–220) and referred there to as a conditional likelihood. Partial likelihood can also be viewed as a version of composite likelihood, a different example of which was studied in Cox, and Reid (2004, A note on pseudolikelihood constructed from marginal densities. Biometrika, 91(3),729–737). In this note, I describe the links between partial and composite likelihood, and the connections to profile, marginal, and conditional likelihood. Somewhat tangentially, two recent applications of the Cox proportional hazards model from the medical literature are briefly discussed, as they highlight the model’s ongoing relevance while also raising some more general questions about inference.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"203 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Recently, attention was drawn to the failure of two very large internet-based probability surveys to correctly estimate COVID-19 vaccine uptake in the U.S. in early 2021. Both the Delphi-Facebook COVID-19 Trends and Impact Survey (CTIS) and Census Household Pulse Survey (HPS) overestimated uptake substantially, by 17 and 14 percentage points in May 2021, respectively. These surveys had large numbers of respondents but very low response rates (<10%), thus, nonignorable nonresponse could have had substantial impact. Specifically, it is plausible that ‘anti-vaccine’ individuals were less likely to participate given the topic (impact of the pandemic on daily life). In this article, we use proxy pattern-mixture models (PPMMs) to estimate the proportion of adults (18 +) who received at least one dose of a COVID-19 vaccine, using data from the CTIS and HPS, under a nonignorable nonresponse assumption. Data from the American Community Survey provide the necessary population data for the PPMMs. We compare these estimates to the true benchmark uptake numbers and show that the PPMM could have detected the direction of the bias and provide meaningful bias bounds. We also use the PPMM to estimate vaccine hesitancy, a measure for which we do not have a benchmark truth, and compare to the direct survey estimates.
{"title":"Using proxy pattern-mixture models to explain bias in estimates of COVID-19 vaccine uptake from two large surveys","authors":"Rebecca R Andridge","doi":"10.1093/jrsssa/qnae005","DOIUrl":"https://doi.org/10.1093/jrsssa/qnae005","url":null,"abstract":"<jats:title>Abstract</jats:title> Recently, attention was drawn to the failure of two very large internet-based probability surveys to correctly estimate COVID-19 vaccine uptake in the U.S. in early 2021. Both the Delphi-Facebook COVID-19 Trends and Impact Survey (CTIS) and Census Household Pulse Survey (HPS) overestimated uptake substantially, by 17 and 14 percentage points in May 2021, respectively. These surveys had large numbers of respondents but very low response rates (&lt;10%), thus, nonignorable nonresponse could have had substantial impact. Specifically, it is plausible that ‘anti-vaccine’ individuals were less likely to participate given the topic (impact of the pandemic on daily life). In this article, we use proxy pattern-mixture models (PPMMs) to estimate the proportion of adults (18 +) who received at least one dose of a COVID-19 vaccine, using data from the CTIS and HPS, under a nonignorable nonresponse assumption. Data from the American Community Survey provide the necessary population data for the PPMMs. We compare these estimates to the true benchmark uptake numbers and show that the PPMM could have detected the direction of the bias and provide meaningful bias bounds. We also use the PPMM to estimate vaccine hesitancy, a measure for which we do not have a benchmark truth, and compare to the direct survey estimates.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"127 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Many economic studies use shift-share instruments to estimate causal effects. Often, all shares need to fulfil an exclusion restriction, making the identifying assumption strict. This paper proposes to use methods that relax the exclusion restriction by selecting valid shares. I apply the methods to estimate the effect of immigration on wages. The coefficient becomes much lower and often changes sign, which is in line with arguments made in the literature.
{"title":"Relaxing the exclusion restriction in shift-share instrumental variable estimation","authors":"Nicolas Apfel","doi":"10.1093/jrsssa/qnad148","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad148","url":null,"abstract":"<jats:title>Abstract</jats:title> Many economic studies use shift-share instruments to estimate causal effects. Often, all shares need to fulfil an exclusion restriction, making the identifying assumption strict. This paper proposes to use methods that relax the exclusion restriction by selecting valid shares. I apply the methods to estimate the effect of immigration on wages. The coefficient becomes much lower and often changes sign, which is in line with arguments made in the literature.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kerollos Nashat Wanis, Aaron L Sarvet, Lan Wen, Jason P Block, Sheryl L Rifas-Shiman, James M Robins, Jessica G Young
Abstract Researchers are often interested in estimating the effect of sustained use of a treatment on a health outcome. However, adherence to strict treatment protocols can be challenging for individuals in practice and, when non-adherence is expected, estimates of the effect of sustained use may not be useful for decision making. As an alternative, more relaxed treatment protocols which allow for periods of time off treatment (i.e. grace periods) have been considered in pragmatic randomized trials and observational studies. In this article, we consider the interpretation, identification, and estimation of treatment strategies which include grace periods. We contrast natural grace period strategies which allow individuals the flexibility to take treatment as they would naturally do, with stochastic grace period strategies in which the investigator specifies the distribution of treatment utilization. We estimate the effect of initiation of a thiazide diuretic or an angiotensin-converting enzyme inhibitor in hypertensive individuals under various strategies which include grace periods.
{"title":"Grace periods in comparative effectiveness studies of sustained treatments","authors":"Kerollos Nashat Wanis, Aaron L Sarvet, Lan Wen, Jason P Block, Sheryl L Rifas-Shiman, James M Robins, Jessica G Young","doi":"10.1093/jrsssa/qnae002","DOIUrl":"https://doi.org/10.1093/jrsssa/qnae002","url":null,"abstract":"<jats:title>Abstract</jats:title> Researchers are often interested in estimating the effect of sustained use of a treatment on a health outcome. However, adherence to strict treatment protocols can be challenging for individuals in practice and, when non-adherence is expected, estimates of the effect of sustained use may not be useful for decision making. As an alternative, more relaxed treatment protocols which allow for periods of time off treatment (i.e. grace periods) have been considered in pragmatic randomized trials and observational studies. In this article, we consider the interpretation, identification, and estimation of treatment strategies which include grace periods. We contrast natural grace period strategies which allow individuals the flexibility to take treatment as they would naturally do, with stochastic grace period strategies in which the investigator specifies the distribution of treatment utilization. We estimate the effect of initiation of a thiazide diuretic or an angiotensin-converting enzyme inhibitor in hypertensive individuals under various strategies which include grace periods.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In a recent prominent study, Worobey et al. (2022. The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic. Science, 377(6609), 951–959) purported to demonstrate statistically that the Huanan Seafood Wholesale Market was the epicentre of the early COVID-19 epidemic. We show that this statistical conclusion is invalid on two grounds: (a) The assumption that a centroid of early case locations or another simply constructed point is the origin of an epidemic is unproved. (b) A Monte Carlo test used to conclude that no other location than the seafood market can be the origin is flawed. Hence, the question of the origin of the pandemic has not been answered by their statistical analysis.
摘要 在最近的一项重要研究中,Worobey 等人(2022.武汉华南海鲜批发市场是 COVID-19 大流行的早期震中。Science,377(6609),951-959)声称从统计学角度证明华南海鲜批发市场是 COVID-19 早期流行的中心。我们的研究表明,这一统计结论是无效的,理由有二:(a) 假设早期病例地点的中心点或其他简单构建的点是疫情的起源点是不成立的。(b) Monte Carlo 检验得出的除海鲜市场外没有其他地点可能是疫源地的结论是有缺陷的。因此,他们的统计分析没有回答疫情起源的问题。
{"title":"Statistics did not prove that the Huanan Seafood Wholesale Market was the early epicentre of the COVID-19 pandemic","authors":"Dietrich Stoyan, Sung Nok Chiu","doi":"10.1093/jrsssa/qnad139","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad139","url":null,"abstract":"<jats:title>Abstract</jats:title> In a recent prominent study, Worobey et al. (2022. The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic. Science, 377(6609), 951–959) purported to demonstrate statistically that the Huanan Seafood Wholesale Market was the epicentre of the early COVID-19 epidemic. We show that this statistical conclusion is invalid on two grounds: (a) The assumption that a centroid of early case locations or another simply constructed point is the origin of an epidemic is unproved. (b) A Monte Carlo test used to conclude that no other location than the seafood market can be the origin is flawed. Hence, the question of the origin of the pandemic has not been answered by their statistical analysis.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Big R-Book: From Data Science to Learning Machines and Big Data","authors":"","doi":"10.1093/jrsssa/qnad029","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad029","url":null,"abstract":"","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"282 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theo Gasser, 1941–2023","authors":"Hans-Georg Müller","doi":"10.1093/jrsssa/qnad145","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad145","url":null,"abstract":"","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"717 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isaac H Goldstein, Jon Wakefield, Volodymyr M Minin
Abstract Branching process inspired models are widely used to estimate the effective reproduction number—a useful summary statistic describing an infectious disease outbreak—using counts of new cases. Case data is a real-time indicator of changes in the reproduction number, but is challenging to work with because cases fluctuate due to factors unrelated to the number of new infections. We develop a new model that incorporates the number of diagnostic tests as a surveillance model covariate. Using simulated data and data from the SARS-CoV-2 pandemic in California, we demonstrate that incorporating tests leads to improved performance over the state of the art.
{"title":"Incorporating testing volume into estimation of effective reproduction number dynamics","authors":"Isaac H Goldstein, Jon Wakefield, Volodymyr M Minin","doi":"10.1093/jrsssa/qnad128","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad128","url":null,"abstract":"<jats:title>Abstract</jats:title> Branching process inspired models are widely used to estimate the effective reproduction number—a useful summary statistic describing an infectious disease outbreak—using counts of new cases. Case data is a real-time indicator of changes in the reproduction number, but is challenging to work with because cases fluctuate due to factors unrelated to the number of new infections. We develop a new model that incorporates the number of diagnostic tests as a surveillance model covariate. Using simulated data and data from the SARS-CoV-2 pandemic in California, we demonstrate that incorporating tests leads to improved performance over the state of the art.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Briana J K Stephenson, Stephanie M Wu, Francesca Dominici
Abstract Dietary assessments provide the snapshots of population-based dietary habits. Questions remain about how generalisable those snapshots are in national survey data, where certain subgroups are sampled disproportionately. We propose a Bayesian overfitted latent class model to derive dietary patterns, accounting for survey design and sampling variability. Compared to standard approaches, our model showed improved identifiability of the true population pattern and prevalence in simulation. We focus application of this model to identify the intake patterns of adults living at or below the 130% poverty income level. Five dietary patterns were identified and characterised by reproducible code/data made available to encourage further research.
{"title":"Identifying dietary consumption patterns from survey data: a Bayesian nonparametric latent class model","authors":"Briana J K Stephenson, Stephanie M Wu, Francesca Dominici","doi":"10.1093/jrsssa/qnad135","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad135","url":null,"abstract":"<jats:title>Abstract</jats:title> Dietary assessments provide the snapshots of population-based dietary habits. Questions remain about how generalisable those snapshots are in national survey data, where certain subgroups are sampled disproportionately. We propose a Bayesian overfitted latent class model to derive dietary patterns, accounting for survey design and sampling variability. Compared to standard approaches, our model showed improved identifiability of the true population pattern and prevalence in simulation. We focus application of this model to identify the intake patterns of adults living at or below the 130% poverty income level. Five dietary patterns were identified and characterised by reproducible code/data made available to encourage further research.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This article employs a Bayesian methodology to predict the results of soccer matches in real-time. Using sequential data of various events throughout the match, we utilise a multinomial probit regression in a novel framework to estimate the time-varying impact of covariates and to forecast the outcome. English Premier League data from eight seasons are used to evaluate the efficacy of our method. Different evaluation metrics establish that the proposed model outperforms potential competitors inspired by existing statistical or machine learning algorithms. Additionally, we apply robustness checks to demonstrate the model’s accuracy across various scenarios.
{"title":"Real-time forecasting within soccer matches through a Bayesian lens","authors":"Chinmay Divekar, Soudeep Deb, Rishideep Roy","doi":"10.1093/jrsssa/qnad136","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad136","url":null,"abstract":"<jats:title>Abstract</jats:title> This article employs a Bayesian methodology to predict the results of soccer matches in real-time. Using sequential data of various events throughout the match, we utilise a multinomial probit regression in a novel framework to estimate the time-varying impact of covariates and to forecast the outcome. English Premier League data from eight seasons are used to evaluate the efficacy of our method. Different evaluation metrics establish that the proposed model outperforms potential competitors inspired by existing statistical or machine learning algorithms. Additionally, we apply robustness checks to demonstrate the model’s accuracy across various scenarios.","PeriodicalId":517419,"journal":{"name":"The Journal of the Royal Statistical Society, Series A (Statistics in Society)","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}