首页 > 最新文献

Stat最新文献

英文 中文
Developing partnerships for academic data science consulting and collaboration units 发展学术数据科学咨询与合作单位的伙伴关系
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-11 DOI: 10.1002/sta4.644
Marianne Huebner, Laura Bond, Felesia Stukes, Joel Herndon, David J. Edwards, Gina-Maria Pomann
Data science consulting and collaboration units (DSUs) are core infrastructure for research at universities. Activities span data management, study design, data analysis, data visualization, predictive modelling, preparing reports, manuscript writing and advising on statistical methods and may include an experiential or teaching component. Partnerships are needed for a thriving DSU as an active part of the larger university network. Guidance for identifying, developing and managing successful partnerships for DSUs can be summarized in six rules: (1) align with institutional strategic plans, (2) cultivate partnerships that fit your mission, (3) ensure sustainability and prepare for growth, (4) define clear expectations in a partnership agreement, (5) communicate and (6) expect the unexpected. While these rules are not exhaustive, they are derived from experiences in a diverse set of DSUs, which vary by administrative home, mission, staffing and funding model. As examples in this paper illustrate, these rules can be adapted to different organizational models for DSUs. Clear expectations in partnership agreements are essential for high quality and consistent collaborations and address core activities, duration, staffing, cost and evaluation. A DSU is an organizational asset that should involve thoughtful investment if the institution is to gain real value.
数据科学咨询与合作单位(DSU)是大学研究的核心基础设施。其活动包括数据管理、研究设计、数据分析、数据可视化、预测建模、编写报告、撰写手稿和提供统计方法建议,还可能包括体验或教学内容。作为大学网络的一个积极组成部分,数据科学大学的蓬勃发展需要伙伴关系。有关确定、发展和管理成功合作关系的指导原则可以概括为六条:(1) 与机构战略计划保持一致;(2) 培养符合自身使命的合作关系;(3) 确保可持续性并为发展做好准备;(4) 在合作协议中明确预期;(5) 沟通;(6) 预计意外情况。虽然这些规则并非详尽无遗,但它们都是根据不同 DSU 的经验总结出来的,这些 DSU 的行政归属、使命、人员配备和筹资模式各不相同。正如本文中的例子所示,这些规则可适用于不同组织模式的数据收集股。伙伴关系协议中明确的预期对于高质量和一致的合作至关重要,这些预期涉及核心活动、期限、人员配备、成本和评估。数据收集与分析单位是一种组织资产,如果机构要获得真正的价值,就应该进行深思熟虑的投资。
{"title":"Developing partnerships for academic data science consulting and collaboration units","authors":"Marianne Huebner, Laura Bond, Felesia Stukes, Joel Herndon, David J. Edwards, Gina-Maria Pomann","doi":"10.1002/sta4.644","DOIUrl":"https://doi.org/10.1002/sta4.644","url":null,"abstract":"Data science consulting and collaboration units (DSUs) are core infrastructure for research at universities. Activities span data management, study design, data analysis, data visualization, predictive modelling, preparing reports, manuscript writing and advising on statistical methods and may include an experiential or teaching component. Partnerships are needed for a thriving DSU as an active part of the larger university network. Guidance for identifying, developing and managing successful partnerships for DSUs can be summarized in six rules: (1) align with institutional strategic plans, (2) cultivate partnerships that fit your mission, (3) ensure sustainability and prepare for growth, (4) define clear expectations in a partnership agreement, (5) communicate and (6) expect the unexpected. While these rules are not exhaustive, they are derived from experiences in a diverse set of DSUs, which vary by administrative home, mission, staffing and funding model. As examples in this paper illustrate, these rules can be adapted to different organizational models for DSUs. Clear expectations in partnership agreements are essential for high quality and consistent collaborations and address core activities, duration, staffing, cost and evaluation. A DSU is an organizational asset that should involve thoughtful investment if the institution is to gain real value.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139459547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equivalence testing for multiple groups 多组等效测试
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-10 DOI: 10.1002/sta4.645
Tony Pourmohamad, Herbert K. H. Lee
Testing for equivalence, rather than testing for a difference, is an important component of some scientific studies. While the focus of the existing literature is on comparing two groups for equivalence, real-world applications arise regularly that require testing across more than two groups. This paper reviews the existing approaches for testing across multiple groups and proposes a novel framework for multigroup equivalence testing under a Bayesian paradigm. This approach allows for a more scientifically meaningful definition of the equivalence margin and a more powerful test than the few existing alternatives. This approach also allows a new definition of equivalence based on future differences.
等效测试而非差异测试是某些科学研究的重要组成部分。虽然现有文献的重点是比较两组的等效性,但实际应用中经常出现需要跨两组以上进行测试的情况。本文回顾了现有的多组测试方法,并提出了贝叶斯范式下的多组等效性测试新框架。与现有的几种替代方法相比,这种方法对等值边际的定义更具科学意义,测试功能也更强大。这种方法还可以根据未来差异对等效性进行新的定义。
{"title":"Equivalence testing for multiple groups","authors":"Tony Pourmohamad, Herbert K. H. Lee","doi":"10.1002/sta4.645","DOIUrl":"https://doi.org/10.1002/sta4.645","url":null,"abstract":"Testing for equivalence, rather than testing for a difference, is an important component of some scientific studies. While the focus of the existing literature is on comparing two groups for equivalence, real-world applications arise regularly that require testing across more than two groups. This paper reviews the existing approaches for testing across multiple groups and proposes a novel framework for multigroup equivalence testing under a Bayesian paradigm. This approach allows for a more scientifically meaningful definition of the equivalence margin and a more powerful test than the few existing alternatives. This approach also allows a new definition of equivalence based on future differences.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"3 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139460065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative estimating equations for disease mapping with spatial zero‐inflated Poisson data 利用空间零膨胀泊松数据绘制疾病分布图的迭代估计方程
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 DOI: 10.1002/sta4.646
Pei-Sheng Lin, Jun Zhu, Feng‐Chang Lin
Spatial epidemiology often involves the analysis of spatial count data with an unusually high proportion of zero observations. While Bayesian hierarchical models perform very well for zero‐inflated data in many situations, a smooth response surface is usually required for the Bayesian methods to converge. However, for infectious disease data with excessive zeros, a Wombling issue with large spatial variation could make the Bayesian methods infeasible. To address this issue, we develop estimating equations associated with disease mapping by including over‐dispersion and spatial noises in a spatial zero‐inflated Poisson model. Asymptotic properties are derived for the parameter estimates. Simulations and data analysis are used to assess and illustrate the proposed method.
空间流行病学通常涉及对零观测值比例异常高的空间计数数据的分析。虽然贝叶斯层次模型在许多情况下对零膨胀数据的处理效果非常好,但贝叶斯方法通常需要一个平滑的响应面才能收敛。然而,对于零点过多的传染病数据,空间变化较大的 Wombling 问题可能会使贝叶斯方法变得不可行。为了解决这个问题,我们在空间零膨胀泊松模型中加入了过度分散和空间噪声,从而建立了与疾病映射相关的估计方程。得出了参数估计的渐近特性。模拟和数据分析用于评估和说明所提出的方法。
{"title":"Iterative estimating equations for disease mapping with spatial zero‐inflated Poisson data","authors":"Pei-Sheng Lin, Jun Zhu, Feng‐Chang Lin","doi":"10.1002/sta4.646","DOIUrl":"https://doi.org/10.1002/sta4.646","url":null,"abstract":"Spatial epidemiology often involves the analysis of spatial count data with an unusually high proportion of zero observations. While Bayesian hierarchical models perform very well for zero‐inflated data in many situations, a smooth response surface is usually required for the Bayesian methods to converge. However, for infectious disease data with excessive zeros, a Wombling issue with large spatial variation could make the Bayesian methods infeasible. To address this issue, we develop estimating equations associated with disease mapping by including over‐dispersion and spatial noises in a spatial zero‐inflated Poisson model. Asymptotic properties are derived for the parameter estimates. Simulations and data analysis are used to assess and illustrate the proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"91 25","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139454718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Significance of modes in the torus by topological data analysis 通过拓扑数据分析看环状体中模式的意义
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-17 DOI: 10.1002/sta4.636
Changjo Yu, Sungkyu Jung, Jisu Kim
This paper addresses the problem of identifying modes or density bumps in multivariate angular or circular data, which have diverse applications in fields like medicine, biology and physics. We focus on the use of topological data analysis and persistent homology for this task. Specifically, we extend the methods for uncertainty quantification in the context of a torus sample space, where circular data lie. To achieve this, we employ two types of density estimators, namely, the von Mises kernel density estimator and the von Mises mixture model, to compute persistent homology, and propose a scale-space view for searching significant bumps in the density. The results of bump hunting are summarised and visualised through a scale-space diagram. Our approach using the mixture model for persistent homology offers advantages over conventional methods, allowing for dendrogram visualisation of components and identification of mode locations. For testing whether a detected mode is really there, we propose several inference tools based on bootstrap resampling and concentration inequalities, establishing their theoretical applicability. Experimental results on SARS-CoV-2 spike glycoprotein torsion angle data demonstrate the effectiveness of our proposed methods in practice.
本文探讨了在多元角度或圆形数据中识别模式或密度凹凸的问题,这些数据在医学、生物学和物理学等领域有着广泛的应用。我们重点关注拓扑数据分析和持久同源性在这项任务中的应用。具体来说,我们将不确定性量化方法扩展到环形数据所在的环形样本空间中。为此,我们采用了两种密度估算器,即 von Mises 核密度估算器和 von Mises 混合模型,来计算持久同源性,并提出了在密度中搜索重要凹凸的尺度空间视图。我们通过标度空间图总结并直观地展示了凹凸搜索的结果。与传统方法相比,我们使用混合物模型计算持久同源性的方法具有优势,可以实现成分的树枝图可视化和模式位置的识别。为了检验检测到的模式是否真实存在,我们提出了几种基于引导重采样和浓度不等式的推理工具,并确定了它们的理论适用性。在 SARS-CoV-2 穗状糖蛋白扭转角数据上的实验结果证明了我们提出的方法在实践中的有效性。
{"title":"Significance of modes in the torus by topological data analysis","authors":"Changjo Yu, Sungkyu Jung, Jisu Kim","doi":"10.1002/sta4.636","DOIUrl":"https://doi.org/10.1002/sta4.636","url":null,"abstract":"This paper addresses the problem of identifying modes or density bumps in multivariate angular or circular data, which have diverse applications in fields like medicine, biology and physics. We focus on the use of topological data analysis and persistent homology for this task. Specifically, we extend the methods for uncertainty quantification in the context of a torus sample space, where circular data lie. To achieve this, we employ two types of density estimators, namely, the von Mises kernel density estimator and the von Mises mixture model, to compute persistent homology, and propose a scale-space view for searching significant bumps in the density. The results of bump hunting are summarised and visualised through a scale-space diagram. Our approach using the mixture model for persistent homology offers advantages over conventional methods, allowing for dendrogram visualisation of components and identification of mode locations. For testing whether a detected mode is really there, we propose several inference tools based on bootstrap resampling and concentration inequalities, establishing their theoretical applicability. Experimental results on SARS-CoV-2 spike glycoprotein torsion angle data demonstrate the effectiveness of our proposed methods in practice.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"20 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138717531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An asymptotically efficient closed-form estimator for the Dirichlet distribution 迪里夏特分布的渐近有效闭式估计器
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-13 DOI: 10.1002/sta4.640
Jae Ho Chang, Sang Kyu Lee, Hyoung-Moon Kim
Maximum likelihood estimator (MLE) of the Dirichlet distribution is usually obtained by using the Newton–Raphson algorithm. However, in some cases, the computational costs can be burdensome, for example, in real-time processes. Therefore, it is beneficial to develop a closed-form estimator that is as efficient as the MLE for large sample. Here, we suggest asymptotically efficient closed-form estimator based on the classical large sample theory.
狄利克雷分布的极大似然估计量通常是用Newton-Raphson算法求得的。然而,在某些情况下,计算成本可能是繁重的,例如,在实时进程中。因此,对于大样本,开发一种与最大似然估计一样有效的封闭估计是有益的。本文在经典大样本理论的基础上,提出了渐近有效闭型估计量。
{"title":"An asymptotically efficient closed-form estimator for the Dirichlet distribution","authors":"Jae Ho Chang, Sang Kyu Lee, Hyoung-Moon Kim","doi":"10.1002/sta4.640","DOIUrl":"https://doi.org/10.1002/sta4.640","url":null,"abstract":"Maximum likelihood estimator (MLE) of the Dirichlet distribution is usually obtained by using the Newton–Raphson algorithm. However, in some cases, the computational costs can be burdensome, for example, in real-time processes. Therefore, it is beneficial to develop a closed-form estimator that is as efficient as the MLE for large sample. Here, we suggest asymptotically efficient closed-form estimator based on the classical large sample theory.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138629302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust nonparametric estimation of average treatment effects: A propensity score-based varying coefficient approach 平均治疗效果的稳健非参数估计:基于倾向得分的变化系数法
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-12 DOI: 10.1002/sta4.637
Zhaoqing Tian, Peng Wu, Zixin Yang, Dingjiao Cai, Qirui Hu
We present a novel nonparametric approach for estimating average treatment effects (ATEs), addressing a fundamental challenge in causal inference research, both in theory and empirical studies. Our method offers an effective solution to mitigate the instability problem caused by propensity scores close to zero or one, which are commonly encountered in (augmented) inverse probability weighting approaches. Notably, our method is straightforward to implement and does not depend on outcome model specification. We introduce an estimator for ATE and establish its consistency and asymptotic normality through rigorous analysis. To demonstrate the robustness of our method against extreme propensity scores, we conduct an extensive simulation study. Additionally, we apply our proposed methods to estimate the impact of social activity disengagement on cognitive ability using a nationally representative cohort study. Furthermore, we extend our proposed method to estimate the ATE on the treated population.
我们提出了一种新的非参数方法来估计平均治疗效果(ATEs),解决了因果推理研究中的一个基本挑战,无论是在理论还是实证研究中。我们的方法提供了一个有效的解决方案,以减轻倾向得分接近零或一所造成的不稳定问题,这是在(增广)逆概率加权方法中经常遇到的。值得注意的是,我们的方法很容易实现,并且不依赖于结果模型规范。我们引入了ATE的估计量,并通过严密的分析建立了它的相合性和渐近正态性。为了证明我们的方法对极端倾向得分的稳健性,我们进行了广泛的模拟研究。此外,我们运用我们提出的方法,通过一项具有全国代表性的队列研究来估计社交活动脱离对认知能力的影响。此外,我们扩展了我们提出的方法来估计处理人群的ATE。
{"title":"Robust nonparametric estimation of average treatment effects: A propensity score-based varying coefficient approach","authors":"Zhaoqing Tian, Peng Wu, Zixin Yang, Dingjiao Cai, Qirui Hu","doi":"10.1002/sta4.637","DOIUrl":"https://doi.org/10.1002/sta4.637","url":null,"abstract":"We present a novel nonparametric approach for estimating average treatment effects (ATEs), addressing a fundamental challenge in causal inference research, both in theory and empirical studies. Our method offers an effective solution to mitigate the instability problem caused by propensity scores close to zero or one, which are commonly encountered in (augmented) inverse probability weighting approaches. Notably, our method is straightforward to implement and does not depend on outcome model specification. We introduce an estimator for ATE and establish its consistency and asymptotic normality through rigorous analysis. To demonstrate the robustness of our method against extreme propensity scores, we conduct an extensive simulation study. Additionally, we apply our proposed methods to estimate the impact of social activity disengagement on cognitive ability using a nationally representative cohort study. Furthermore, we extend our proposed method to estimate the ATE on the treated population.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"286 1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138629207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Observation-driven exponential smoothing 观测驱动指数平滑法
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-07 DOI: 10.1002/sta4.642
Dimitris Karlis, Xanthi Pedeli, Cristiano Varin
This article presents an approach to forecasting count time series with a form of exponential smoothing built from observation-driven models. The proposed method is easy to implement and simple to interpret. A variant of the approach is also proposed to handle the impact of outliers on the forecast. The performance of the methodology is studied with simulations and illustrated with an analysis of the number of monthly cases of dengue fever observed in Italy for the years 2008–2021. An R package is made available to enable the reader to reproduce the results discussed in the article.
本文介绍了一种利用观测驱动模型建立的指数平滑法预测计数时间序列的方法。所提出的方法易于实施,解释起来也很简单。本文还提出了一种方法的变体,以处理异常值对预测的影响。通过模拟研究了该方法的性能,并通过分析 2008-2021 年在意大利观察到的登革热月病例数进行了说明。为使读者能够重现文章中讨论的结果,提供了一个 R 软件包。
{"title":"Observation-driven exponential smoothing","authors":"Dimitris Karlis, Xanthi Pedeli, Cristiano Varin","doi":"10.1002/sta4.642","DOIUrl":"https://doi.org/10.1002/sta4.642","url":null,"abstract":"This article presents an approach to forecasting count time series with a form of exponential smoothing built from observation-driven models. The proposed method is easy to implement and simple to interpret. A variant of the approach is also proposed to handle the impact of outliers on the forecast. The performance of the methodology is studied with simulations and illustrated with an analysis of the number of monthly cases of dengue fever observed in Italy for the years 2008–2021. An <span style=\"font-family:monospace\">R</span> package is made available to enable the reader to reproduce the results discussed in the article.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138548194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative analysis of contractual risks in statistical consulting 统计咨询合同风险的比较分析
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-07 DOI: 10.1002/sta4.639
David Shilane, Nicole L. Lorenzetti, David K. Kruetter
This study enumerates and compares the risks and rewards of different forms of statistical consulting contracts. We assess three different contract models: project-based fees, hourly fees, and retainer agreements and three different planned durations: project-based, time-based, and evergreen contracts. The requirements of time and effort vary considerably for many aspects of consulting work. The risks of statistical consulting contracts include both the general risks of consulting projects along with the specialized risks of statistical investigations. We enumerate a number of general risks in the categories of unanticipated developments, revisions and collaboration, and changing scopes of projects. Meanwhile, the specialized statistical risks include issues of study design, data quality, statistical investigation, and communication of statistical issues. Because of these concerns, the specialized risks of statistical investigations add considerably to the general risks of consulting projects. Moreover, these issues can be exacerbated or mitigated by the form of the consulting agreement. With a greater understanding of the risks and benefits of each type of contract, statistical consultants and clients can negotiate more mutually beneficial contracts for either or both parties. Through this discussion, we hope to raise awareness of these issues and help to create working conditions with a greater likelihood of a successful project for both statistical consultants and their clients.
本研究列举并比较了不同形式统计咨询合同的风险和回报。我们评估了三种不同的合同模式:按项目收费、按小时收费和预聘协议,以及三种不同的计划期限:按项目收费、按时间收费和常青合同。在咨询工作的许多方面,对时间和精力的要求差别很大。统计咨询合同的风险既包括咨询项目的一般风险,也包括统计调查的专门风险。我们列举了一些一般风险,包括意外发展、修订与合作、项目范围变化等。同时,专业统计风险包括研究设计、数据质量、统计调查和统计问题沟通等问题。由于这些问题,统计调查的专业风险大大增加了咨询项目的一般风险。此外,这些问题可能会因咨询协议的形式而加剧或减轻。如果对每种合同的风险和益处有了更深入的了解,统计咨询师和客户就可以通过谈判达成对双方都更有利的合同。通过本次讨论,我们希望提高对这些问题的认识,并帮助创造工作条件,使统计顾问及其客户更有可能取得项目成功。
{"title":"A comparative analysis of contractual risks in statistical consulting","authors":"David Shilane, Nicole L. Lorenzetti, David K. Kruetter","doi":"10.1002/sta4.639","DOIUrl":"https://doi.org/10.1002/sta4.639","url":null,"abstract":"This study enumerates and compares the risks and rewards of different forms of statistical consulting contracts. We assess three different contract models: project-based fees, hourly fees, and retainer agreements and three different planned durations: project-based, time-based, and evergreen contracts. The requirements of time and effort vary considerably for many aspects of consulting work. The risks of statistical consulting contracts include both the general risks of consulting projects along with the specialized risks of statistical investigations. We enumerate a number of general risks in the categories of unanticipated developments, revisions and collaboration, and changing scopes of projects. Meanwhile, the specialized statistical risks include issues of study design, data quality, statistical investigation, and communication of statistical issues. Because of these concerns, the specialized risks of statistical investigations add considerably to the general risks of consulting projects. Moreover, these issues can be exacerbated or mitigated by the form of the consulting agreement. With a greater understanding of the risks and benefits of each type of contract, statistical consultants and clients can negotiate more mutually beneficial contracts for either or both parties. Through this discussion, we hope to raise awareness of these issues and help to create working conditions with a greater likelihood of a successful project for both statistical consultants and their clients.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"16 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138548108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the ROC curve and the area under it with complex survey data 复杂调查资料下ROC曲线及其下面积的估计
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-04 DOI: 10.1002/sta4.635
Amaia Iparragirre, Irantzu Barrio, Inmaculada Arostegui
Logistic regression models are widely applied in daily practice. Hence, it is necessary to ensure they have an adequate predictive performance, which is usually estimated by means of the receiver operating characteristic (ROC) curve and the area under it (area under the curve [AUC]). Traditional estimators of these parameters are thought to be applied to simple random samples but are not appropriate for complex survey data. The goal of this work is to propose new weighted estimators for the ROC curve and AUC based on sampling weights which, in the context of complex survey data, indicate the number of units that each sampled observation represents in the population. The behaviour of the proposed estimators is evaluated and compared with the traditional unweighted ones by means of a simulation study. Finally, weighted and unweighted ROC curve and AUC estimators are applied to real survey data in order to compare the estimates in a real scenario. The results suggest the use of the weighted estimators proposed in this work in order to obtain unbiassed estimates for the ROC curve and AUC of logistic regression models fitted to complex survey data.
逻辑回归模型在日常实践中得到了广泛的应用。因此,有必要确保它们具有足够的预测性能,通常通过受试者工作特征(ROC)曲线及其下面积(曲线下面积[AUC])来估计。这些参数的传统估计被认为适用于简单的随机样本,但不适合复杂的调查数据。这项工作的目标是提出新的加权估计ROC曲线和AUC基于抽样权值,在复杂的调查数据的背景下,表明每个抽样观察在总体中代表的单位数。通过仿真研究,评价了所提估计器的性能,并与传统的未加权估计器进行了比较。最后,将加权和未加权ROC曲线和AUC估计器应用于实际调查数据,以便在真实场景中比较估计。结果表明,为了获得拟合复杂调查数据的logistic回归模型的ROC曲线和AUC的无偏估计,可以使用本工作中提出的加权估计器。
{"title":"Estimation of the ROC curve and the area under it with complex survey data","authors":"Amaia Iparragirre, Irantzu Barrio, Inmaculada Arostegui","doi":"10.1002/sta4.635","DOIUrl":"https://doi.org/10.1002/sta4.635","url":null,"abstract":"Logistic regression models are widely applied in daily practice. Hence, it is necessary to ensure they have an adequate predictive performance, which is usually estimated by means of the receiver operating characteristic (ROC) curve and the area under it (area under the curve [AUC]). Traditional estimators of these parameters are thought to be applied to simple random samples but are not appropriate for complex survey data. The goal of this work is to propose new weighted estimators for the ROC curve and AUC based on sampling weights which, in the context of complex survey data, indicate the number of units that each sampled observation represents in the population. The behaviour of the proposed estimators is evaluated and compared with the traditional unweighted ones by means of a simulation study. Finally, weighted and unweighted ROC curve and AUC estimators are applied to real survey data in order to compare the estimates in a real scenario. The results suggest the use of the weighted estimators proposed in this work in order to obtain unbiassed estimates for the ROC curve and AUC of logistic regression models fitted to complex survey data.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"24 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-degenerate U-statistics for data missing completely at random with application to testing independence 完全随机缺失数据的非退化u统计量,用于检验独立性
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-27 DOI: 10.1002/sta4.634
Danijel Aleksić, Marija Cuparić, Bojana Milošević
Although the era of digitalization has enabled access to large quantities of data, due to their insufficient structuring, some data are often missing, and sometimes, the percentage of missing data is significant compared to the entire sample. On the other hand, most of the statistical methodology is designed for complete data. Here, we explore the asymptotic properties of non-degenerate U-statistics when the data are missing completely at random and a complete-case approach is utilized. The obtained results are applied to the estimator of Kendall's tau used for testing independence. In this context, the median-based imputation approach is also considered, and asymptotic properties are explored. In addition, both complete-case and median imputation approaches are compared in an extensive Monte Carlo study.
虽然数字化时代使大量数据得以访问,但由于其结构化不足,一些数据经常缺失,有时缺失数据的百分比与整个样本相比显着。另一方面,大多数统计方法是为完整的数据而设计的。本文研究了当数据完全随机缺失时非退化u统计量的渐近性质,并采用完全情况方法。将所得结果应用于Kendall's tau的估计量,用于检验独立性。在这种情况下,也考虑了基于中位数的imputation方法,并探讨了渐近性质。此外,在一个广泛的蒙特卡罗研究中,对完全情况和中位数方法进行了比较。
{"title":"Non-degenerate U-statistics for data missing completely at random with application to testing independence","authors":"Danijel Aleksić, Marija Cuparić, Bojana Milošević","doi":"10.1002/sta4.634","DOIUrl":"https://doi.org/10.1002/sta4.634","url":null,"abstract":"Although the era of digitalization has enabled access to large quantities of data, due to their insufficient structuring, some data are often missing, and sometimes, the percentage of missing data is significant compared to the entire sample. On the other hand, most of the statistical methodology is designed for complete data. Here, we explore the asymptotic properties of non-degenerate <i>U</i>-statistics when the data are missing completely at random and a complete-case approach is utilized. The obtained results are applied to the estimator of Kendall's <math altimg=\"urn:x-wiley:sta4:media:sta4634:sta4634-math-0001\" display=\"inline\" location=\"graphic/sta4634-math-0001.png\">\u0000<mi>t</mi>\u0000<mi>a</mi>\u0000<mi>u</mi></math> used for testing independence. In this context, the median-based imputation approach is also considered, and asymptotic properties are explored. In addition, both complete-case and median imputation approaches are compared in an extensive Monte Carlo study.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"368 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1