数据科学的一些统计原则

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Australian & New Zealand Journal of Statistics Pub Date : 2021-05-08 DOI:10.1111/anzs.12324
Noel Cressie
{"title":"数据科学的一些统计原则","authors":"Noel Cressie","doi":"10.1111/anzs.12324","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow ‘error bars’ to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.</p>\n </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12324","citationCount":"4","resultStr":"{\"title\":\"A few statistical principles for data science\",\"authors\":\"Noel Cressie\",\"doi\":\"10.1111/anzs.12324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow ‘error bars’ to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.</p>\\n </div>\",\"PeriodicalId\":55428,\"journal\":{\"name\":\"Australian & New Zealand Journal of Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2021-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1111/anzs.12324\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Australian & New Zealand Journal of Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/anzs.12324\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australian & New Zealand Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/anzs.12324","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 4

摘要

在任何其他情况下,首先定义地形的范围(数据科学),然后定位和描述地标(原则)可能是有意义的。但我们正在经历的这场数据革命与地籍调查背道而驰。数据科学领域不断被吞并。例如,生物计量学传统上是各种形式的农业统计,但现在,在数据科学中,它意味着对可用于识别个体的特征的研究。非侵入式测量的例子包括身高、体重、指纹、视网膜扫描、声音、照片/视频(面部标志和面部表情)和步态。对于统计学家来说,对这些数据进行多变量分析将是一个复杂的项目,但软件工程师似乎完全没有问题。在任何应用统计学项目中,统计学家都担心不确定性,并通过将数据建模为从概率空间生成的实现来量化不确定性。不确定性量化的另一种方法是找到相似的数据集,然后利用这些数据集之间结果的可变性来捕捉不确定性。这两种方法都允许在从原始数据集获得的估计值上放置“误差条”,尽管解释不同。第三种方法,专注于给出单一答案,放弃不确定性量化,可以被认为是数据工程,尽管它在数据科学领域占有一席之地。本文为数据科学家提供了一些(实际上是9条)统计原则,当我从事复杂的跨学科项目时,这些原则已经并将继续帮助我。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A few statistical principles for data science

In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow ‘error bars’ to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Australian & New Zealand Journal of Statistics
Australian & New Zealand Journal of Statistics 数学-统计学与概率论
CiteScore
1.30
自引率
9.10%
发文量
31
审稿时长
>12 weeks
期刊介绍: The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems.
期刊最新文献
Issue Information Exact samples sizes for clinical trials subject to size and power constraints Examining collinearities Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data Distributional modelling of positively skewed data via the flexible Weibull extension distribution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1