Abstract:Leo Breiman's article "Statistical Modeling: The two cultures" was timely and provocative. He advocated for Statisticians to learn about and appreciate a different "culture": an algorithmic approach, as distinct from the familiar, stochastic, data modeling approach to Statistics. While we have appreciated and contributed to the algorithmic approach, we have always had a foot in both camps. Here we advocate for a "melting pot", arguing that both approaches have their virtues, sometimes on the same problem.
{"title":"A Melting Pot","authors":"R. Tibshirani, T. Hastie","doi":"10.1353/obs.2021.0012","DOIUrl":"https://doi.org/10.1353/obs.2021.0012","url":null,"abstract":"Abstract:Leo Breiman's article \"Statistical Modeling: The two cultures\" was timely and provocative. He advocated for Statisticians to learn about and appreciate a different \"culture\": an algorithmic approach, as distinct from the familiar, stochastic, data modeling approach to Statistics. While we have appreciated and contributed to the algorithmic approach, we have always had a foot in both camps. Here we advocate for a \"melting pot\", arguing that both approaches have their virtues, sometimes on the same problem.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"7 1","pages":"213 - 215"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2021.0012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44567597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:We offer descriptive and normative standards for the principled pursuit of causal inference. These standards address critiques of both the algorithmic and the data modeling cultures identified in (Breiman, 2001), and provide a fruitful synthesis of both cultures. We contrast the resulting "cautious causal inference" with overly optimistic methods inspired by algorithmic data analysis methods prevalent in machine learning, as well as older approaches to causal modeling that employ overly restrictive parametric models.
{"title":"Causal Modelling: The Two Cultures","authors":"Elizabeth L. Ogburn, I. Shpitser","doi":"10.1353/obs.2021.0006","DOIUrl":"https://doi.org/10.1353/obs.2021.0006","url":null,"abstract":"Abstract:We offer descriptive and normative standards for the principled pursuit of causal inference. These standards address critiques of both the algorithmic and the data modeling cultures identified in (Breiman, 2001), and provide a fruitful synthesis of both cultures. We contrast the resulting \"cautious causal inference\" with overly optimistic methods inspired by algorithmic data analysis methods prevalent in machine learning, as well as older approaches to causal modeling that employ overly restrictive parametric models.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"7 1","pages":"179 - 183"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2021.0006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43520306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:Leo Breiman's "Statistical Modeling: The Two Cultures" is a treasure for any statistician who engages with real world problem. I argue that there is a more fundamental dichotomy between the principles of statistical modeling and the techniques for statistical modeling. Focusing entirely on the techniques in statistical education and research can be dangerous. I join Breiman's call for statistics to return to its roots.
{"title":"Statistical Modeling: Returning to its Roots","authors":"Qingyuan Zhao","doi":"10.1353/obs.2021.0014","DOIUrl":"https://doi.org/10.1353/obs.2021.0014","url":null,"abstract":"Abstract:Leo Breiman's \"Statistical Modeling: The Two Cultures\" is a treasure for any statistician who engages with real world problem. I argue that there is a more fundamental dichotomy between the principles of statistical modeling and the techniques for statistical modeling. Focusing entirely on the techniques in statistical education and research can be dangerous. I join Breiman's call for statistics to return to its roots.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"7 1","pages":"229 - 234"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2021.0014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42393513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikki L B Freeman, John Sperger, Helal El-Zaatari, Anna R Kahkoska, Minxin Lu, Michael Valancius, Arti V Virkud, Tarek M Zikry, Michael R Kosorok
In the twenty years since Dr. Leo Breiman's incendiary paper Statistical Modeling: The Two Cultures was first published, algorithmic modeling techniques have gone from controversial to commonplace in the statistical community. While the widespread adoption of these methods as part of the contemporary statistician's toolkit is a testament to Dr. Breiman's vision, the number of high-profile failures of algorithmic models suggests that Dr. Breiman's final remark that "the emphasis needs to be on the problem and the data" has been less widely heeded. In the spirit of Dr. Breiman, we detail an emerging research community in statistics - data-driven decision support. We assert that to realize the full potential of decision support, broadly and in the context of precision health, will require a culture of social awareness and accountability, in addition to ongoing attention towards complex technical challenges.
{"title":"Beyond Two Cultures: Cultural Infrastructure for Data-driven Decision Support.","authors":"Nikki L B Freeman, John Sperger, Helal El-Zaatari, Anna R Kahkoska, Minxin Lu, Michael Valancius, Arti V Virkud, Tarek M Zikry, Michael R Kosorok","doi":"10.1353/obs.2021.0024","DOIUrl":"10.1353/obs.2021.0024","url":null,"abstract":"<p><p>In the twenty years since Dr. Leo Breiman's incendiary paper <i>Statistical Modeling: The Two Cultures</i> was first published, algorithmic modeling techniques have gone from controversial to commonplace in the statistical community. While the widespread adoption of these methods as part of the contemporary statistician's toolkit is a testament to Dr. Breiman's vision, the number of high-profile failures of algorithmic models suggests that Dr. Breiman's final remark that \"the emphasis needs to be on the problem and the data\" has been less widely heeded. In the spirit of Dr. Breiman, we detail an emerging research community in statistics - data-driven decision support. We assert that to realize the full potential of decision support, broadly and in the context of precision health, will require a culture of social awareness and accountability, in addition to ongoing attention towards complex technical challenges.</p>","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"7 1","pages":"77-94"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802367/pdf/nihms-1773096.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39741992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms.
我们考虑扩展 Leo Breiman 在《统计建模:两种文化》一文中的论点:两种文化 "的延伸,将算法建模的分叉纳入其中,重点关注参数回归、可解释算法和复杂(可能可解释)算法。
{"title":"Considerations Across Three Cultures: Parametric Regressions, Interpretable Algorithms, and Complex Algorithms.","authors":"Ani Eloyan, Sherri Rose","doi":"10.1353/obs.2021.0009","DOIUrl":"10.1353/obs.2021.0009","url":null,"abstract":"<p><p>We consider an extension of Leo Breiman's thesis from \"Statistical Modeling: The Two Cultures\" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms.</p>","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"7 1","pages":"191-196"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415757/pdf/nihms-1732979.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39387492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causation in Action: Some Remarks Attendant to Re-reading Hill (1965)","authors":"Herbert L. Smith","doi":"10.1353/OBS.2020.0007","DOIUrl":"https://doi.org/10.1353/OBS.2020.0007","url":null,"abstract":"","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"6 1","pages":"33 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/OBS.2020.0007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45137848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:In this paper, we use a two-step approach for heterogeneous subgroup identification with a synthetic data set motivated by the National Study of Learning Mindsets. In the first step, optimal full propensity score matching is used to estimate stratum-specific treatment effects. In the second step, regression trees identify key subgroups based on covariates for which the treatment effect varies. In working with regression trees, we emphasize the role of the cost-complexity tuning parameter, selected through permutation-based Type I error rate studies, in justifying inferential decision-making, which we contrast with graphical and quantitative exploration for future study. Results indicate that the mindset intervention was effective, overall, in improving student achievement. While our exploratory analyses identified XC, C1, and X1 as potential effect modifiers worthy of further study, we find no statistically significant evidence of effect heterogeneity with the exception of urbanicity category XC = 3, but the finding is not robust to propensity score estimation method.
摘要:本文采用一种两步法,利用国家学习心态研究(National Study of Learning mindset)的综合数据集进行异质性亚群识别。第一步,利用最优全倾向评分匹配来估计层特异性处理效果。在第二步中,回归树根据治疗效果变化的协变量确定关键子组。在使用回归树时,我们强调通过基于排列的I型错误率研究选择的成本-复杂性调整参数在证明推理决策中的作用,并将其与未来研究的图形和定量探索进行对比。结果表明,心态干预在提高学生成绩方面是有效的。虽然我们的探索性分析发现XC、C1和X1是值得进一步研究的潜在影响修饰因子,但除了城市化类别XC = 3外,我们没有发现统计学上显著的效应异质性证据,但这一发现对于倾向得分估计方法并不稳健。
{"title":"Heterogeneous Subgroup Identification with Observational Data: A Case Study Based on the National Study of Learning Mindsets","authors":"Bryan Keller, Jianshen Chen, Tianyang Zhang","doi":"10.1353/obs.2019.0010","DOIUrl":"https://doi.org/10.1353/obs.2019.0010","url":null,"abstract":"Abstract:In this paper, we use a two-step approach for heterogeneous subgroup identification with a synthetic data set motivated by the National Study of Learning Mindsets. In the first step, optimal full propensity score matching is used to estimate stratum-specific treatment effects. In the second step, regression trees identify key subgroups based on covariates for which the treatment effect varies. In working with regression trees, we emphasize the role of the cost-complexity tuning parameter, selected through permutation-based Type I error rate studies, in justifying inferential decision-making, which we contrast with graphical and quantitative exploration for future study. Results indicate that the mindset intervention was effective, overall, in improving student achievement. While our exploratory analyses identified XC, C1, and X1 as potential effect modifiers worthy of further study, we find no statistically significant evidence of effect heterogeneity with the exception of urbanicity category XC = 3, but the finding is not robust to propensity score estimation method.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"5 1","pages":"104 - 93"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2019.0010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41420738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment on Cochran’s “Observational Studies”","authors":"B. Hansen, Adam C. Sales","doi":"10.1353/obs.2015.0017","DOIUrl":"https://doi.org/10.1353/obs.2015.0017","url":null,"abstract":"","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"1 1","pages":"184 - 193"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2015.0017","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48808444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:BackgroundIn the binary outcome framework to causal mediation, closed-form expressions introduced by Valeri and VanderWeele for the natural direct and indirect effect odds ratios (ORs) are established from a logistic outcome model by invoking several approximations that hold under the rare-disease assumption. Such ORs are expected to be close to corresponding effects on the risk ratio (RR) scale based on a log-binomial outcome model, however new insight indicates that this is not always verified. The objective was to report on mediation results from these two models when the incidence of the outcome was <10%.MethodsStandard (approximate) ORs and RRs were estimated using data on a cohort of asthmatic pregnant women from Québec (Canada) and their babies. Prematurity and low birthweight were the mediator and outcome variables, respectively, and two binary exposure variables were considered: treatment to inhaled corticosteroids and placental abruption. Exact closed-form effects expressed on the OR scale were also derived and estimated using a SAS code we provide. A study based on two simulation scenarios was subsequently devised to supplement on the substantive findings.ResultsMany approximate ORs and RRs estimated from our cohort analyses did not closely agree. Approximate ORs were systematically observed farther from RRs in comparison with exact ORs, possibly leading to different conclusions regarding the null hypothesis. Exact OR estimates were very close to RR estimates for exposure to inhaled corticosteroids, but less so for placental abruption. The approximate OR estimator was found to exhibit important bias and undercoverage in the simulation scenario which featured a strong mediator-outcome relationship.ConclusionsLogistic and log-binomial outcome models can yield dissimilar binary-binary mediation effects even if the outcome incidence is small marginally. Large discrepancies between approximate ORs and RRs may indicate invalid inference for these ORs. Exact OR estimates can be obtained for validation or to replace RRs if the log-binomial model exhibits convergence problems.
{"title":"Comparing logistic and log-binomial models for causal mediation analyses of binary mediators and rare binary outcomes: evidence to support cross-checking of mediation results in practice","authors":"Mariia Samoilenko, L. Blais, Geneviève Lefebvre","doi":"10.1353/OBS.2018.0013","DOIUrl":"https://doi.org/10.1353/OBS.2018.0013","url":null,"abstract":"Abstract:BackgroundIn the binary outcome framework to causal mediation, closed-form expressions introduced by Valeri and VanderWeele for the natural direct and indirect effect odds ratios (ORs) are established from a logistic outcome model by invoking several approximations that hold under the rare-disease assumption. Such ORs are expected to be close to corresponding effects on the risk ratio (RR) scale based on a log-binomial outcome model, however new insight indicates that this is not always verified. The objective was to report on mediation results from these two models when the incidence of the outcome was <10%.MethodsStandard (approximate) ORs and RRs were estimated using data on a cohort of asthmatic pregnant women from Québec (Canada) and their babies. Prematurity and low birthweight were the mediator and outcome variables, respectively, and two binary exposure variables were considered: treatment to inhaled corticosteroids and placental abruption. Exact closed-form effects expressed on the OR scale were also derived and estimated using a SAS code we provide. A study based on two simulation scenarios was subsequently devised to supplement on the substantive findings.ResultsMany approximate ORs and RRs estimated from our cohort analyses did not closely agree. Approximate ORs were systematically observed farther from RRs in comparison with exact ORs, possibly leading to different conclusions regarding the null hypothesis. Exact OR estimates were very close to RR estimates for exposure to inhaled corticosteroids, but less so for placental abruption. The approximate OR estimator was found to exhibit important bias and undercoverage in the simulation scenario which featured a strong mediator-outcome relationship.ConclusionsLogistic and log-binomial outcome models can yield dissimilar binary-binary mediation effects even if the outcome incidence is small marginally. Large discrepancies between approximate ORs and RRs may indicate invalid inference for these ORs. Exact OR estimates can be obtained for validation or to replace RRs if the log-binomial model exhibits convergence problems.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"4 1","pages":"193 - 216"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/OBS.2018.0013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42473431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}