Statistics Surveys最新文献

英文中文

Discrete variations of the fractional Brownian motion in the presence of outliers and an additive noise 存在异常值和加性噪声时分数布朗运动的离散变化

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2010-01-01 DOI: 10.1214/09-SS059

S. Achard, Jean‐François Coeurjolly

This paper gives an overview of the problem of estimating the Hurst parameter of a fractional Brownian motion when the data are observed with outliers and/or with an additive noise by using methods based on discrete variations. We show that the classical estimation procedure based on the log-linearity of the variogram of dilated series is made more robust to outliers and/or an additive noise by considering sample quantiles and trimmed means of the squared series or differences of empirical variances. These different procedures are compared and discussed through a large simulation study and are implemented in the texttt{R} package texttt{dvfBm}.

本文概述了用基于离散变分的方法估计带有异常值和/或加性噪声的分数阶布朗运动的赫斯特参数问题。我们表明，通过考虑样本分位数和平方序列的裁剪平均值或经验方差的差异，基于扩展序列变异函数的对数线性的经典估计过程对异常值和/或加性噪声具有更强的鲁棒性。通过大型仿真研究对这些不同的程序进行了比较和讨论，并在texttt{R}包texttt{dvfBm}中实现。

引用次数: 34

Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Wilcoxon-Mann-Whitney还是t检验？论假设检验的假设和决策规则的多重解释。

IF 11 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2010-01-01 DOI: 10.1214/09-SS051

Michael P Fay, Michael A Proschan

In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.

在假设检验的数学方法中，我们从一组明确定义的假设开始，并为这些假设选择具有最佳属性的检验。在实践中，我们经常从不太精确的假设开始。例如，研究人员通常想知道两组中哪一组的反应更大，t检验或Wilcoxon-Mann-Whitney （WMW）检验都可以接受。虽然t检验和WMW检验通常与完全不同的假设相关联，但任何一个检验的决策规则和p值都可以与许多不同的假设集相关联，我们称之为视角。将可能应用决策规则的许多不同的透视图收集在一个地方是有用的，因为每个透视图允许对相关的p值进行不同的解释。在这里，我们收集了许多这样的观点，用于两样本t检验、WMW检验和其他相关检验。我们在每个角度下讨论有效性和一致性，并根据这些不同的角度讨论测试之间的建议。最后，我们简要讨论了测试遗传中立性的决策规则，其中许多观点的知识对于正确解释决策规则至关重要。

{"title":"Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules.","authors":"Michael P Fay, Michael A Proschan","doi":"10.1214/09-SS051","DOIUrl":"10.1214/09-SS051","url":null,"abstract":"<p><p>In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.</p>","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"4 ","pages":"1-39"},"PeriodicalIF":11.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2857732/pdf/nihms-185373.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28940092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Navigating Random Forests and related advances in algorithmic modeling 导航随机森林和相关的算法建模进展

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2009-12-01 DOI: 10.1214/07-SS033

David S. Siroky

This article addresses current methodological research on nonparametric Random Forests. It provides a brief intellectual history of Random Forests that covers CART, boosting and bagging methods. It then introduces the primary methods by which researchers can visualize results, the relationships between covariates and responses, and the out-of-bag test set error. In addition, the article considers current research on universal consistency and importance tests in Random Forests. Finally, several uses for Random Forests are discussed, and available software is identified. AMS 2000 subject classifications: 62-02, 62-04, 62G08, 62G09, 62H30, 93E25, 62M99, 62N99.

本文讨论了目前非参数随机森林的方法学研究。它提供了一个简短的知识历史的随机森林，包括CART，促进和装袋方法。然后介绍了研究人员可以可视化结果的主要方法，协变量和响应之间的关系，以及袋外测试集误差。此外，本文还考虑了随机森林中普遍一致性检验和重要性检验的研究现状。最后，讨论了随机森林的几种用途，并确定了可用的软件。AMS 2000学科分类:62-02、62-04、62G08、62G09、62H30、93E25、62M99、62N99。

引用次数: 166

A survey of cross-validation procedures for model selection 模型选择交叉验证程序的调查

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2009-07-27 DOI: 10.1214/09-SS054

Sylvain Arlot, Alain Celisse

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

用于估计估计器的风险或执行模型选择，交叉验证是一种广泛的策略，因为它的简单性和明显的通用性。关于交叉验证方法的模型选择性能有很多研究结果。本调查旨在将这些结果与模型选择理论的最新进展联系起来，特别强调从严格的理论结果中区分经验陈述。作为结论，提供了根据手头问题的特定特征选择最佳交叉验证程序的指导方针。

引用次数: 3333

Causal inference in statistics: An overview 统计中的因果推理:概述

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2009-07-15 DOI: 10.1214/09-SS057

J. Pearl

This review presents empiricalresearcherswith recent advances in causal inference, and stresses the paradigmatic shifts that must be un- dertaken in moving from traditionalstatistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that un- derly all causal inferences, the languages used in formulating those assump- tions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coher- ent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interven- tions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attri- bution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.

本文综述了实证研究人员在因果推理方面的最新进展，并强调了从传统的统计分析到多元数据的因果分析必须进行的范式转变。特别强调的是所有因果推论的假设，在表述这些假设时使用的语言，所有因果和反事实主张的条件性质，以及为评估这些主张而开发的方法。这些进步是用Pearl (2000a)中描述的基于结构因果模型(SCM)的一般因果理论来说明的，它包含并统一了其他因果方法，并为原因和反事实的分析提供了连贯的数学基础。特别是，本文调查了用于推断(从数据和假设的组合)三种类型因果问题的答案的数学工具的发展:(1)关于潜在干预的影响的查询(也称为“因果效应”或“政策评估”);(2)关于反事实概率的查询，(包括评估“后悔”，“归因”或“结果的原因”);(3)关于直接和间接影响的查询(也称为“中介”)。最后，本文定义了结构框架和潜在结果框架之间的形式和概念关系，并提出了利用两者的强大特征进行共生分析的工具。

{"title":"Causal inference in statistics: An overview","authors":"J. Pearl","doi":"10.1214/09-SS057","DOIUrl":"https://doi.org/10.1214/09-SS057","url":null,"abstract":"This review presents empiricalresearcherswith recent advances in causal inference, and stresses the paradigmatic shifts that must be un- dertaken in moving from traditionalstatistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that un- derly all causal inferences, the languages used in formulating those assump- tions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coher- ent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interven- tions, (also called \"causal effects\" or \"policy evaluation\") (2) queries about probabilities of counterfactuals, (including assessment of \"regret,\" \"attri- bution\" or \"causes of effects\") and (3) queries about direct and indirect effects (also known as \"mediation\"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"29 1","pages":"96-146"},"PeriodicalIF":3.3,"publicationDate":"2009-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87061617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1798

Statistical models: Conventional, penalized and hierarchical likelihood 统计模型:传统可能性、惩罚可能性和分层可能性

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2009-01-01 DOI: 10.1214/08-SS039

D. Commenges

We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly in the literature, for defining the misspecification risk of a model and for grounding the likelihood and the likelihood cross-validation, which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and particular sieves estimators are shown to be equivalent. The similarity of these likelihoods with a posteriori distributions in a Bayesian approach is considered

我们给出了统计模型和似然的概述，以及它的两个变体:惩罚似然和分层似然。Kullback-Leibler散度在文献中被反复提及，用于定义模型的错误规范风险，并为似然和似然交叉验证奠定基础，可用于选择惩罚似然中的权重。惩罚似然估计族和特殊筛估计族是等价的。在贝叶斯方法中考虑了这些可能性与后验分布的相似性

引用次数: 12

Recent developments in nonregular fractional factorial designs 不规则分数阶乘设计的最新进展

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2008-10-16 DOI: 10.1214/08-SS040

Hongquan Xu, F. Phoa, W. Wong

Nonregular fractional factorial designs such as Plackett-Burman designs and other orthogonal arrays are widely used in various screening experiments for their run size economy and flexibility. The traditional analysis focuses on main e�ffects only. Hamada and Wu (1992) went beyond the traditional approach and proposed an analysis strategy to demonstrate that some interactions could be entertained and estimated beyond a few significant main effects. Their groundbreaking work stimulated much of the recent developments in design criterion creation, construction and analysis of nonregular designs. This paper reviews important developments in optimality criteria and comparison, including projection properties, generalized resolution, various generalized minimum aberration criteria, optimality results, construction methods and analysis strategies for nonregular designs.

不规则分数因子设计如Plackett-Burman设计和其他正交阵列因其运行规模、经济性和灵活性而广泛应用于各种筛选实验。传统的分析只关注主效应。Hamada和Wu(1992)超越了传统的方法，提出了一种分析策略，以证明除了少数重要的主效应之外，一些相互作用可以被考虑和估计。他们开创性的工作刺激了最近在设计标准的创建、构建和非规则设计分析方面的许多发展。本文综述了最优性准则和比较的重要进展，包括投影性质、广义分辨率、各种广义最小像差准则、最优性结果、不规则设计的施工方法和分析策略。

引用次数: 79

Sparse sampling: Spatial design for monitoring stream networks 稀疏采样:监测流网络的空间设计

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2008-08-29 DOI: 10.1214/07-SS032

Melissa J. Dobbie, B. Henderson, D. Stevens

Spatial designs for monitoring stream networks, especially ephemeral systems, are typically non-standard, `sparse' and can be very complex, reflecting the complexity of the ecosystem being monitored, the scale of the population, and the competing multiple monitoring objectives. The main purpose of this paper is to present a review of approaches to spatial design to enable informed decisions to be made about developing practical and optimal spatial designs for future monitoring of streams.

监测河流网络的空间设计，特别是短暂的系统，通常是非标准的，“稀疏”的，可能非常复杂，反映了被监测生态系统的复杂性，人口的规模，以及相互竞争的多个监测目标。本文的主要目的是回顾空间设计的方法，以便为未来的河流监测开发实用和最佳的空间设计做出明智的决策。

引用次数: 72

Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases 复杂疾病基因组研究中相关高维SNP数据分析的统计学进展与挑战

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2008-03-28 DOI: 10.1214/07-SS026

Yulan Liang, A. Kelemen

Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.

近年来，信息技术在生物医学等应用领域的进步创造了大量具有高维特征空间的大型多样化数据集，为我们提供了大量的信息和提高人类生活质量的新机会。与此同时，新数据的不断到来也带来了巨大的挑战，这要求研究人员将这些原始数据转化为科学知识，以便从中受益。近年来，利用SNP数据进行复杂疾病的关联研究在生物医学研究中越来越受欢迎。本文综述了复杂疾病基因组关联研究中相关高维SNP数据分析的最新统计进展和挑战。本文综述了用于高维相关数据的一般特征约简方法和用于SNP数据的更具体的方法，包括无监督单倍型映射、标签SNP选择和使用统计测试/评分、统计建模和机器学习方法的监督SNP选择，重点是如何识别相互作用位点。

{"title":"Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases","authors":"Yulan Liang, A. Kelemen","doi":"10.1214/07-SS026","DOIUrl":"https://doi.org/10.1214/07-SS026","url":null,"abstract":"Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"15 1","pages":"43-60"},"PeriodicalIF":3.3,"publicationDate":"2008-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83241006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Least angle and ℓ1 penalized regression: A review 最小角和1惩罚回归:综述

IF 3.3 Q1 STATISTICS & PROBABILITY

Statistics Surveys

Pub Date : 2008-02-07 DOI: 10.1214/08-SS035

T. Hesterberg, Nam-Hee Choi, L. Meier, C. Fraley

Least Angle Regression is a promising technique for variable selection applications, offering a nice alternative to stepwise regression. It provides an explanation for the similar behavior of LASSO (l1-penalized regression) and forward stagewise regression, and provides a fast imple- mentation of both. The idea has caught on rapidly, and sparked a great deal of research interest. In this paper, we give an overview of Least Angle Regression and the current state of related research.

最小角回归对于变量选择应用来说是一种很有前途的技术，它为逐步回归提供了一个很好的替代方案。它为LASSO(11惩罚回归)和前向阶段回归的相似行为提供了解释，并提供了两者的快速实现。这个想法迅速流行起来，并引发了大量的研究兴趣。本文对最小角回归进行了概述，并对相关研究现状进行了综述。

引用次数: 289

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistics Surveys

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀