Efrén Cruz‐Cortés, Fan Yang, E. Juarez-colunga, Theodore Warsavage, D. Ghosh
Abstract:The discussion paper "Statistical Modeling: the Two Cultures" (Statistical Science, Vol 16, 2001) by the late Leo Breiman sent shockwaves throughout the statistical community and subsequently redirected the efforts of much of the field towards machine learning, high-dimensional analysis and data mining approaches. In this discussion, we discuss some of the implications of this work in the sphere of causal inference. In particular, we define the concept of comparability, which is fundamental to the ability to draw causal inferences and reinterpret some concepts in high-dimensional data analysis from this viewpoint. One of the points we highlight in this discussion is the need to consider data-adaptive estimands for causal effects with high-dimensional confounders. We also revisit matching and develop some mathematical formalism for matching algorithms.
{"title":"Comment on 'Statistical Modelling: the Two Cultures' by Leo Breiman","authors":"Efrén Cruz‐Cortés, Fan Yang, E. Juarez-colunga, Theodore Warsavage, D. Ghosh","doi":"10.1353/obs.2021.0021","DOIUrl":"https://doi.org/10.1353/obs.2021.0021","url":null,"abstract":"Abstract:The discussion paper \"Statistical Modeling: the Two Cultures\" (Statistical Science, Vol 16, 2001) by the late Leo Breiman sent shockwaves throughout the statistical community and subsequently redirected the efforts of much of the field towards machine learning, high-dimensional analysis and data mining approaches. In this discussion, we discuss some of the implications of this work in the sphere of causal inference. In particular, we define the concept of comparability, which is fundamental to the ability to draw causal inferences and reinterpret some concepts in high-dimensional data analysis from this viewpoint. One of the points we highlight in this discussion is the need to consider data-adaptive estimands for causal effects with high-dimensional confounders. We also revisit matching and develop some mathematical formalism for matching algorithms.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49211516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:In a challenging paper 20 years ago, Leo Breiman challenged the statistical culture of his time. Some perceptive comments by David Cox and Brad Efron appeared with it. In this paper I try to look at this work in the light of modern culture and find much to agree but also much to disagree with. It's still a pleasure to read.
{"title":"Comments on Breiman: Statistical Modelling: The Two Cultures and Commentaries","authors":"P. Bickel","doi":"10.1353/obs.2021.0018","DOIUrl":"https://doi.org/10.1353/obs.2021.0018","url":null,"abstract":"Abstract:In a challenging paper 20 years ago, Leo Breiman challenged the statistical culture of his time. Some perceptive comments by David Cox and Brad Efron appeared with it. In this paper I try to look at this work in the light of modern culture and find much to agree but also much to disagree with. It's still a pleasure to read.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46673849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:In this commentary, we assess the cultural fit of Bayesian nonparametrics in light of advances in the field since Breiman's 2001 article. We argue that Bayesian nonparametrics synthesizes desirable elements of the data modeling and algorithmic cultures to yield new insights and methodological improvements. We discuss how these methods have been combined with identification strategies from the causal inference literature to do flexible inference for interpretable target parameters.
{"title":"Nonparametric Bayes: A Bridge Between Cultures","authors":"Arman Oganisian, J. Roy","doi":"10.1353/obs.2021.0005","DOIUrl":"https://doi.org/10.1353/obs.2021.0005","url":null,"abstract":"Abstract:In this commentary, we assess the cultural fit of Bayesian nonparametrics in light of advances in the field since Breiman's 2001 article. We argue that Bayesian nonparametrics synthesizes desirable elements of the data modeling and algorithmic cultures to yield new insights and methodological improvements. We discuss how these methods have been combined with identification strategies from the causal inference literature to do flexible inference for interpretable target parameters.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46892734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:Breiman led the way in thinking differently about statistics. Many of his iconoclastic ideas have become standard in the data science sphere. This discussion argues for some rebalancing, while gratefully acknowledging his achievements.
{"title":"Leo Breiman's Challenge: A Retrospective","authors":"D. Banks","doi":"10.1353/obs.2021.0017","DOIUrl":"https://doi.org/10.1353/obs.2021.0017","url":null,"abstract":"Abstract:Breiman led the way in thinking differently about statistics. Many of his iconoclastic ideas have become standard in the data science sphere. This discussion argues for some rebalancing, while gratefully acknowledging his achievements.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2021.0017","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66461065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:This note provides a re-assessment of Breiman's contributions to the art of statistical modeling, in light of recent advances in machine learning and causal inference. It highlights the crisp separation between the data-fitting and data-interpretation components of statistical modeling.
{"title":"Causally Colored Reflections on Leo Breiman's \"Statistical Modeling: The Two Cultures\" (2001)","authors":"J. Pearl","doi":"10.1353/obs.2021.0008","DOIUrl":"https://doi.org/10.1353/obs.2021.0008","url":null,"abstract":"Abstract:This note provides a re-assessment of Breiman's contributions to the art of statistical modeling, in light of recent advances in machine learning and causal inference. It highlights the crisp separation between the data-fitting and data-interpretation components of statistical modeling.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/obs.2021.0008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46247573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:Instead of two cultures, the story of the last couple decades of data science is about the interplay between three different types of reasoning using data. Two of these types of reasoning were well known when Breiman wrote his Two Cultures paper – warranted reasoning (e.g., randomized trials and sampling) and model reasoning (e.g., linear models). Breiman, though he does not appear to have realized it fully, was in fact describing the dynamics arising in a data community that was making progress using the newest, third type of reasoning – outcome reasoning. In this commentary we clarify this dynamic a bit, and suggest some useful language for identifying and differentiating types of problems better suited for outcome reasoning.
{"title":"Reasoning Using Data: Two Old Ways and One New","authors":"M. Baiocchi, J. Rodu","doi":"10.1353/obs.2021.0016","DOIUrl":"https://doi.org/10.1353/obs.2021.0016","url":null,"abstract":"Abstract:Instead of two cultures, the story of the last couple decades of data science is about the interplay between three different types of reasoning using data. Two of these types of reasoning were well known when Breiman wrote his Two Cultures paper – warranted reasoning (e.g., randomized trials and sampling) and model reasoning (e.g., linear models). Breiman, though he does not appear to have realized it fully, was in fact describing the dynamics arising in a data community that was making progress using the newest, third type of reasoning – outcome reasoning. In this commentary we clarify this dynamic a bit, and suggest some useful language for identifying and differentiating types of problems better suited for outcome reasoning.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45242994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:Twenty years after Leo Breiman's wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community's commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.
{"title":"Statistical Modelling in the Age of Data Science","authors":"S. Vansteelandt","doi":"10.1353/obs.2021.0013","DOIUrl":"https://doi.org/10.1353/obs.2021.0013","url":null,"abstract":"Abstract:Twenty years after Leo Breiman's wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community's commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44767646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:The past two decades have witnessed deep cross-fertilization between the two cultures—statistics (data/generative modeling) and machine learning (algorithmic modeling), which is in stark contrast to the scene pictured in Breiman's inspiring work. In light of this major confluence, we find it helpful to single out a few salient examples showcasing the impacts of one to the other, and the research progress out of them. We point out in the end that the current big data era especially requires joint efforts from both cultures in order to address some common challenges including decentralized data analysis, privacy, fairness, etc.
{"title":"Modern Data Modeling: Cross-Fertilization of the Two Cultures","authors":"Jianqing Fan, Cong Ma, Kaizheng Wang, Ziwei Zhu","doi":"10.1353/obs.2021.0023","DOIUrl":"https://doi.org/10.1353/obs.2021.0023","url":null,"abstract":"Abstract:The past two decades have witnessed deep cross-fertilization between the two cultures—statistics (data/generative modeling) and machine learning (algorithmic modeling), which is in stark contrast to the scene pictured in Breiman's inspiring work. In light of this major confluence, we find it helpful to single out a few salient examples showcasing the impacts of one to the other, and the research progress out of them. We point out in the end that the current big data era especially requires joint efforts from both cultures in order to address some common challenges including decentralized data analysis, privacy, fairness, etc.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44653723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:In his 2001 Statistical Science paper, Leo Breiman called attention to "two cultures" of data analysts, the first associated with computer science and the second with statistics. Breiman saw flaws in the traditionally-oriented statistical culture and advocated the predictively-oriented approach he identified with computer science. Although many of his observations were accurate and useful, Breiman failed to acknowledge the merits of statistical modeling, and he mischaracterized the role of statistics in science. To explain, I discuss machine learning and artificial intelligence; excessive cautiousness in statistics; dangers of statistical modeling; potential accomplishments of statistical modeling; the statistical paradigm; the nature of statistical models; and statistical methods that work well in practice. Everyone who is interested in the use of computer science and statistics in data analysis should grapple with the issues raised by Breiman's article.
{"title":"The Two Cultures: Statistics and Machine Learning in Science","authors":"R. Kass","doi":"10.1353/obs.2021.0000","DOIUrl":"https://doi.org/10.1353/obs.2021.0000","url":null,"abstract":"Abstract:In his 2001 Statistical Science paper, Leo Breiman called attention to \"two cultures\" of data analysts, the first associated with computer science and the second with statistics. Breiman saw flaws in the traditionally-oriented statistical culture and advocated the predictively-oriented approach he identified with computer science. Although many of his observations were accurate and useful, Breiman failed to acknowledge the merits of statistical modeling, and he mischaracterized the role of statistics in science. To explain, I discuss machine learning and artificial intelligence; excessive cautiousness in statistics; dangers of statistical modeling; potential accomplishments of statistical modeling; the statistical paradigm; the nature of statistical models; and statistical methods that work well in practice. Everyone who is interested in the use of computer science and statistics in data analysis should grapple with the issues raised by Breiman's article.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46545125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract:Since Breiman's "Two Cultures" paper's appearance in 2002, the term prediction has gained incredible significance in research, practice, society, and humanity. "Two Cultures" led to many useful advancements and surprising discoveries. Experiencing first hand the different cultures in the statistics and machine learning communities that Brieman expressed so early and clearly, I've then encountered even more differences. I describe additional modeling distinctions and further modeling "cultures". Recognizing these cultures, understanding their reasoning, and comparing and contrasting them, opens our eyes to new ways of viewing the world and creates opportunities for innovation and collaboration.
{"title":"Comment on Breiman's \"Two Cultures\" (2002): From Two Cultures to Multicultural","authors":"G. Shmueli","doi":"10.1353/obs.2021.0010","DOIUrl":"https://doi.org/10.1353/obs.2021.0010","url":null,"abstract":"Abstract:Since Breiman's \"Two Cultures\" paper's appearance in 2002, the term prediction has gained incredible significance in research, practice, society, and humanity. \"Two Cultures\" led to many useful advancements and surprising discoveries. Experiencing first hand the different cultures in the statistics and machine learning communities that Brieman expressed so early and clearly, I've then encountered even more differences. I describe additional modeling distinctions and further modeling \"cultures\". Recognizing these cultures, understanding their reasoning, and comparing and contrasting them, opens our eyes to new ways of viewing the world and creates opportunities for innovation and collaboration.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45798487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}