数据科学时代的统计建模

Observational studies Pub Date : 2021-07-27 DOI:10.1353/obs.2021.0013

S. Vansteelandt

{"title":"数据科学时代的统计建模","authors":"S. Vansteelandt","doi":"10.1353/obs.2021.0013","DOIUrl":null,"url":null,"abstract":"Abstract:Twenty years after Leo Breiman's wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community's commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Statistical Modelling in the Age of Data Science\",\"authors\":\"S. Vansteelandt\",\"doi\":\"10.1353/obs.2021.0013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract:Twenty years after Leo Breiman's wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community's commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.\",\"PeriodicalId\":74335,\"journal\":{\"name\":\"Observational studies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Observational studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1353/obs.2021.0013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Observational studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/obs.2021.0013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

摘要：在Leo Breiman对数据模型的使用敲响警钟20年后，我重新考虑了他的担忧，这些担忧在很大程度上受到了预测和分类问题的影响，因为估计效应和（条件）关联的问题要多得多。从这个角度来看，人们意识到统计界对使用数据模型的承诺仍然占主导地位，而且存在问题，但算法建模（机器学习）由于几乎完全专注于预测和分类，并不容易提供令人满意的替代方案。唯一成功的前进道路是把两种文化联系起来。它需要来自算法建模文化的机器学习技能，以减少模型错误指定偏差，并实现统计分析的预先指定。此外，它还需要数据建模技能，以便选择和构建针对科学问题的可解释效果和关联度量；以便通过与最小和充分理解的假设相关，从所考虑的抽样设计下的观测数据中确定这些措施；最后，为了减少正则化偏差，并通过与渐近理论相关来量化所获得估计中的不确定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Statistical Modelling in the Age of Data Science

Abstract:Twenty years after Leo Breiman's wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community's commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Observational studies

CiteScore

0.80

自引率

0.00%

发文量